If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

DNA structure and function

DNA is the information molecule. It stores instructions for making other large molecules, called proteins. These instructions are stored inside each of your cells, distributed among 46 long structures called chromosomes. These chromosomes are made up of thousands of shorter segments of DNA, called genes. Each gene stores the directions for making protein fragments, whole proteins, or multiple specific proteins.
DNA is well-suited to perform this biological function because of its molecular structure, and because of the development of a series of high performance enzymes that are fine-tuned to interact with this molecular structure in specific ways. The match between DNA structure and the activities of these enzymes is so effective and well-refined that DNA has become, over evolutionary time, the universal information-storage molecule for all forms of life. Nature has yet to find a better solution than DNA for storing, expressing, and passing along instructions for making proteins.

The molecular structure of DNA

In order to understand the biological function of DNA, you first need to understand its molecular structure. This requires learning the vocabulary for talking about the building blocks of DNA, and how these building blocks are assembled to make DNA molecules.

DNA molecules are polymers

Polymers are large molecules that are built up by repeatedly linking together smaller molecules, called monomers. Think of how a freight train is built by linking lots of individual boxcars together, or how this sentence is built by sticking together a specific sequence of individual letters (plus spaces and punctuation). In all three cases, the large structure—a train, a sentence, a DNA molecule—is composed of smaller structures that are linked together in non-random sequences— boxcars, letters, and, in the biological case, DNA monomers.

DNA monomers are called nucleotides

Just like a sentence “polymer” is composed of letter “monomers,” a DNA polymer is composed of monomers called nucleotides. A molecule of DNA is a bunch of nucleotide monomers, joined one after another into a very long chain.

There are four nucleotide monomers

The English language has a 26 letter alphabet. In contrast, the DNA “alphabet” has only four “letters,” the four nucleotide monomers. They have short and easy to remember names: A, C, T, G. Each nucleotide monomer is built from three simple molecular parts: a sugar, a phosphate group, and a nucleobase. (Don’t confuse this use of “base” with the other one, which refers to a molecule that raises the pH of a solution; they’re two different things.)

The sugar and acid in all four monomers are the same

All four nucleotides (A, T, G and C) are made by sticking a phosphate group and a nucleobase to a sugar. The sugar in all four nucleotides is called deoxyribose. It’s a cyclical molecule—most of its atoms are arranged in a ring-structure. The ring contains one oxygen and four carbons. A fifth carbon atom is attached to the fourth carbon of the ring. Deoxyribose also contains a hydroxyl group (-OH) attached to the third carbon in the ring.
A diagram showing the three main components of a nucleotide: the phosphate group, the deoxyribose sugar, and the nitrogenous base.
The phosphate group is a phosphorous atom with four oxygen atoms bonded to it. The phosphorous atom in phosphate has a marked tendency to bond to other oxygen atoms (for instance, the oxygen atom sticking off the deoxyribose sugar of another nucleotide).

The four nucleotide monomers are distinguished by their bases

Each type of nucleotide has a different nucleobase stuck to its deoxyribose sugar.
  • A nucleotide contains adenine
  • T nucleotide contains thymine
  • G nucleotide contains guanine
  • C nucleotide contains cytosine
    All four of these nucleobases are relatively complex molecules, with the unifying feature that they all tend to have multiple nitrogen atoms in their structures. For this reason, nucleobases are often also called nitrogenous bases.

Phosphodiester bonds in DNA polymers connect the 5’ carbon of one nucleotide to the 3’ carbon of another nucleotide

The nucleotide monomers in a DNA polymer are connected by strong electromagnetic attractions called phosphodiester bonds. Phosphodiester bonds are part of a larger class of electromagnetic attractions between atoms that chemists refer to as covalent bonds.
In order to keep things organized, biochemists have developed a numbering system for talking about the molecular structure of nucleotides. These numbers are applied to the carbon atoms in the sugar, starting at the carbon immediately to the right of the oxygen in the deoxyribose ring, and continuing in a clockwise fashion: the numbers range from 1’ (“one prime”), identifying the carbon immediately to the right of the oxygen) all the way to 5’ (“five prime”), identifying the carbon that sticks off the fourth and final carbon in the deoxyribose ring.
A diagram showing the carbons on the ribose ring numbered. The phosphate group is attached to the 5' carbon. The -OH group is attached to the 3' carbon and the base is attached to the 1' carbon.
The phosphodiester bonds that join one DNA nucleotide to another always link the 3’ carbon of the first nucleotide to the 5’ carbon of the second nucleotide. This forms a covalent bond between the oxygen sticking off the 3’ carbon of the first nucleotide, and the phosphorous atom in the phosphate group that sticks off the 5’ carbon of the second nucleotide. These bonds are called 3’-5’ phosphodiester bonds. Each time nucleotides are bound together, a water molecule is removed (or “lost”) through a process called dehydration synthesis. Many molecules rely on dehydration synthesis to assist with forming polymers.
A diagram showing how dehydration synthesis is used to make a string of DNA.

Chromosomes are made of two DNA polymers that stick together via non-covalent hydrogen bonds

Chromosomal DNA consists of two DNA polymers that make up a 3-dimensional (3D) structure called a double helix. In a double helix structure, the strands of DNA run antiparallel, meaning the 5’ end of one DNA strand is parallel with the 3’ end of the other DNA strand.
Diagram showing how the two strands of double stranded DNA runs anti-parallel to each other. One strand runs in a 3' to 5' direction while the other runs in a 5' to 3' direction.
The nucleotides forming each DNA strand are connected by noncovalent bonds, called hydrogen bonds. Considered individually, hydrogen bonds are much weaker than a single covalent bond, such as a phosphodiester bond. But, there are so many of them that the two DNA polymers are very strongly connected to each other.
The hydrogen bonds that join DNA polymers happen between certain hydrogen atoms on one base (called hydrogen bond donors) and certain oxygen or nitrogen atoms on the base across from it (called hydrogen bond acceptors). Adenine (“A”) and Thymine (“T”) each have one donor and one acceptor, whereas Cytosine (“C”) has one donor and two acceptors, and Guanine (“G”) has one acceptor and two donors.
The A nucleotides are always hydrogen bonded to T nucleotides, and C nucleotides are always hydrogen bonded to G nucleotides. This selective binding is called complementary base pairing, and creates consistency in the nucleotide sequences of the two DNA polymers that join together to make a chromosome. This was first observed by Erwin Chargaff, who developed methods for counting nucleotides in DNA samples, and found that the percent of A nucleotides always equaled the percent of T nucleotides, and the percent of G nucleotides always equaled the percent of C nucleotides (within a margin of error). Now, we know that complementary base pairing can be explained by reference to hydrogen bonding between the donors and acceptors on the bases of each nucleotide: A nucleotides and T nucleotides have a match (one donor and one acceptor each), and C nucleotides and G nucleotides have a match (the former has one donor and two acceptors, while the latter has one acceptor and two donors).
Diagram showing how adenine and thymine base pair while guanine and cytosine base pair. Adenine and thymine are bound to one another via two hydrogen bonds while guanine and cytosine are bound to one another via three hydrogen bonds.

The Biological function of DNA

DNA polymers direct the production of other polymers called proteins

A protein is one or more polymers of monomers called amino acids. Proteins are the workhorse molecules in your cells. They act as enzymes, structural support, hormones, and a whole host of other functional molecules. All traits derive from the interactions of proteins with each other and the surrounding environments.

A chromosome consists of smaller segments called genes

Chromosomes are very long structures consisting of two DNA polymers, joined together by hydrogen bonds connecting complementary base pairs. A chromosome is divided into segments of double-stranded DNA called genes.
Image showing how a chromosome is made up of DNA which contains genes.

Each gene is further divided into three nucleotide subsegments called codons

A codon is a segment (or piece) of double stranded DNA that is three nucleotides long. A gene can be thought of as many three-nucleotide codons strung together.
Image showing how each gene is made up codons (aka the A, T, G, and C bases).

Understanding DNA structure and function

Earlier, we compared a DNA polymer to a sentence, and the nucleotide monomers that make up a polymer to the letters of the alphabet that are used to write sentences down. Now that we know what genes are, and what codons are, we can extend this analogy a bit further, and begin to get an insight into how DNA stores biological information.
If nucleotides are like letters, then codons are like words. Unlike English, where we use 26 letters to make words of all different lengths and meanings, your cells use the four DNA nucleotide monomers to make “words”—codons—of just one length: three nucleotides long. If you do the math, you’ll see that this means that there are just 64 possible “words” in the DNA language—64 different ways of arranging the four DNA nucleotides into three-nucleotide-long combinations.
Just like in English, where each word is associated with a dictionary definition, the codons of the DNA language are each associated with specific amino acids. During translation on the ribosomes, each codon from the original DNA gene is matched with its corresponding amino acid (with the help of tRNA molecules). Just like a human reader puts the definitions of words together to arrive at the meaning of a sentence, a ribosome puts the amino acids referred to by each codon in a gene together, creating covalent bonds between them to make a protein.

A simplified example

Imagine a basic sort of organism that only makes four proteins, each of which consists of four amino acid monomers. The traits of such an organism—how it eats, how it looks like, how it moves, etc.—are fully determined by the actions of these proteins.
The genes that specify how to make each of the four proteins are split across two chromosomes. This means that each chromosome consists of two genes. Since the proteins specified by the genes all have four amino acid monomers, each gene must have four codons. And, since a codon always consists of three nucleotides, each gene contains 12 nucleotide monomers, and, therefore, each chromosome is 24 nucleotides long.

Consider the following: junk DNA and junk DNA reporting

The genome of the organism in our simplified example, purely imaginary as it is, doesn’t match up perfectly to how actual genomes are structured. Chromosomes do consist of only nucleotides (plus some proteins). But, they don’t consist of just genes, and genes don’t consist of only codons.
In fact, under ten percent of the nucleotides in your chromosomes are part of genes. The rest are filler nucleotides between genes—like if each sentence in this paragraph had a bunch of random letters inserted in between them. Scientists call these non-gene nucleotides junk DNA.
Junk DNA exists not just between your genes, but also inside of them. Codons are often separated by regions of nucleotides called introns, which don’t code for amino acids, like the blank space used to separate each paragraph on a page. Cells have developed a fascinating type of enzyme, called a spliceosome, that is able to locate and remove introns from genes as they are transferred to ribosomes. You can learn more about this process in our articles on DNA transcription and translation.
DNA gibberish, in the form of junk DNA and introns, didn’t need to be mentioned when we were coming to terms with how DNA is structured and how it works inside cells because it isn’t essential to the story. It’s just an interesting complication to the more fundamental concepts. Unfortunately, there is another sort of complication to the DNA story that scientists and the media often leave out of the discussion that is essential to the story of how DNA works.
If you’ve read or heard someone say that the gene “for” a particular human trait has just been discovered, then you are already familiar with the complication I am talking about. Just like your DNA is filled with junk nucleotides, many newspapers, TV news broadcasts, and websites are filled with junk reporting on DNA and its role in producing human traits.
What genes are “for” is proteins: as we’ve seen, they provide the instructions a ribosome needs to assemble the amino acid monomers that protein polymers consist of. Traits—everything from eye color, on the simple side, to complex things like autism or the ability to run extremely fast—arise from complicated interactions between proteins, the cells that make them, and the surrounding environment. These interactions vary across time and space, and are further modulated by non-biological factors like diet (relatively simple to understand) to socio-economic status and parenting (extremely hard to quantify and map on to biological processes). It’s almost always a gross simplification to refer to any one gene as being “for” a particular human trait.
Next time you see or hear a news reports of an exciting and potentially lucrative discovery of a gene “for” a particular trait, try translating it into something like this: “a purely statistical scientific study has produced some evidence that a correlation may exist between the presence or absence of a gene and the appearance of a particular trait in a limited sample of a larger population of humans—but more research is needed.” I’ll bet that this way of framing the story will get you closer to the truth—always a better place to be!

Want to join the conversation?

  • leafers ultimate style avatar for user ff142
    In the "nucleotide structure" diagram, deoxyribose is drawn but it says "ribose".

    It says "The phosphodiester bonds that join one DNA nucleotide to another always link the 5’ carbon of the first nucleotide to the 3’ carbon of the second nucleotide... These bonds are called 5’-3’ phosphodiester bonds." but DNA is synthesized from 5' to 3' direction according to the "How DNA is replicated" video where the 5'C of the second nucleotide is always linked to the 3'C of the first nucleotide. So shouldn't it say "The phosphodiester bonds that join one DNA nucleotide to another always link the 3’ carbon of the first nucleotide to the 5’ carbon of the second nucleotide... These bonds are called 3’-5’ phosphodiester bonds"?

    Ribose instead of deoxyribose is drawn in the dehydration synthesis diagram.

    It says "The nucleotides forming each DNA strand are connected by noncovalent bonds, called hydrogen bonds." but those are covalent peptide bonds. It should say the bonds holding the two DNA strands together are noncovalent hydrogen bonds.

    Are introns also considered junk DNA?
    (8 votes)
    Default Khan Academy avatar avatar for user
  • duskpin ultimate style avatar for user linda
    So, are nucleobases literally basic?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • aqualine ultimate style avatar for user Katelyn
    Why are there two chains of DNA instead of three or some other number?
    (7 votes)
    Default Khan Academy avatar avatar for user
  • mr pants pink style avatar for user Serena Lim
    What would happen if Adenine (A) could pair with all three other DNA bases – Cytosine (C), Guanine (G),
    and Thymine (T)? What would the implications for DNA replication and structure be?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user shahabrazavi01
      I can think of one consequence: In DNA replication, you could get two completely different strands of DNA than what you started with. So, as your cells divide, they would have a different DNA. For example, say you had a portion of your genome that read: 3' ATC 5'. The complementary strand would be 5' TAG 3'. But after replication, you would end up with 3' ATC 5' and 5' GAG 3' for the first strand, and 3' ACC 5' and 5' TAG 3' for the other. Notice how you retain the two original strands, but you now have two new complementary strands that don't match the original complementary strands. So DNA replication would not be reliable.
      (3 votes)
  • piceratops seed style avatar for user Veronique Bijou
    Just out of curiosity, why does the second double bonded Oxygen on Thymine not function as an acceptor? Are the Oxygens considered as on acceptor because they are essentially the same atom and their bonding to the main molecule is the same (not like NH and NH2 considered 2 different donors)?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user moizzazahid01
    why DNA has 3 phosphates and when it loses them why it looses 2 phosphates instead of 1?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • female robot grace style avatar for user tyersome
      Are you asking about why the nucleotides (nucleoside triphosphates) lose two phosphates when they are incorporated into a nucleic acid like DNA?

      If so, this is because loss of two phosphates releases more energy and creates more entropy than losing a single phosphate.

      This makes this reaction essentially irreversible, which is very important for efficiently synthesizing nucleic acids.

      Does that help?
      (4 votes)
  • blobby green style avatar for user nmarimbita
    what are the functions of DNA
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Nelson
    This lesson seems to refer to A, T, G, and C as both a nucleotide as well as a nucleobase. For example, adenine is a nucleotide that contains a sugar, phosphate, and an adenine nucleobase but adenine is also used to refer to only the nucleobase.

    Is this correct?
    (1 vote)
    Default Khan Academy avatar avatar for user
  • aqualine seed style avatar for user ijames337
    Explain how errors in Meiosis can lead to genetic variation
    (1 vote)
    Default Khan Academy avatar avatar for user
  • mr pants pink style avatar for user Serena Lim
    How does the fact that DNA is made up of a backbone, to which a variety of bases are attached, enable a
    mechanism of DNA synthesis that is simple (i.e. repeats the same step over and over again) while, at the same
    time, allowing for a great diversity of the actual DNA sequence?
    (1 vote)
    Default Khan Academy avatar avatar for user