If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Intro to gene expression (central dogma)

How genes in DNA can provide instructions for proteins. The central dogma of molecular biology: DNA → RNA → protein.

Overview: Gene expression

DNA is the genetic material of all organisms on Earth. When DNA is transmitted from parents to children, it can determine some of the children's characteristics (such as their eye color or hair color). But how does the sequence of a DNA molecule actually affect a human or other organism's features? For example, how did the sequence of nucleotides (As, Ts, Cs, and Gs) in the DNA of Mendel's pea plants determine the color of their flowers?

Genes specify functional products (such as proteins)

A DNA molecule isn't just a long, boring string of nucleotides. Instead, it's divided up into functional units called genes. Each gene provides instructions for a functional product, that is, a molecule needed to perform a job in the cell. In many cases, the functional product of a gene is a protein. For example, Mendel's flower color gene provides instructions for a protein that helps make colored molecules (pigments) in flower petals.
Diagram of how a gene can dictate a phenotype (observable feature) of an organism. The flower color gene that Mendel studied consists of a stretch of DNA found on a chromosome. The DNA has a particular sequence; part of it, shown in this diagram, is 5'-GTAAATCG-3' (upper strand), paired with the complementary sequence 3'-CATTTAGC-5' (lower strand). The DNA of the gene specifies production of a protein that helps make pigments. When the protein is present and functional, pigments are produced, and the flowers of a plant have a purple color.
Image based on experimental data reported by Hellens et al.1 and on similar figure in Reece et al.2.
The functional products of most known genes are proteins, or, more accurately, polypeptides. Polypeptide is just another word for a chain of amino acids. Although many proteins consist of a single polypeptide, some are made up of multiple polypeptides. Genes that specify polypeptides are called protein-coding genes.
Not all genes specify polypeptides. Instead, some provide instructions to build functional RNA molecules, such as the transfer RNAs and ribosomal RNAs that play roles in translation.

How does the DNA sequence of a gene specify a particular protein?

Many genes provide instructions for building polypeptides. How, exactly, does DNA direct the construction of a polypeptide? This process involves two major steps: transcription and translation.
  • In transcription, the DNA sequence of a gene is copied to make an RNA molecule. This step is called transcription because it involves rewriting, or transcribing, the DNA sequence in a similar RNA "alphabet." In eukaryotes, the RNA molecule must undergo processing to become a mature messenger RNA (mRNA).
  • In translation, the sequence of the mRNA is decoded to specify the amino acid sequence of a polypeptide. The name translation reflects that the nucleotide sequence of the mRNA sequence must be translated into the completely different "language" of amino acids.
Simplified schematic of central dogma, showing the sequences of the molecules involved.
The two strands of DNA have the following sequences:
5'-ATGATCTCGTAA-3' 3'-TACTAGAGCATT-5'
Transcription of one of the strands of DNA produces an mRNA that nearly matches the other strand of DNA in sequence. However, due to a biochemical difference between DNA and RNA, the Ts of DNA are replaced with Us in the mRNA. The mRNA sequence is:
5'-AUGAUCUCGUAA-5'
Translation involves reading the mRNA nucleotides in groups of three; each group specifies an amino acid (or provides a stop signal indicating that translation is finished).
3'-AUG AUC UCG UAA-5'
AUG Methionine AUC Isoleucine UCG Serine UAA "Stop"
Polypeptide sequence: (N-terminus) Methionine-Isoleucine-Serine (C-terminus)
Thus, during expression of a protein-coding gene, information flows from DNA RNA protein. This directional flow of information is known as the central dogma of molecular biology. Non-protein-coding genes (genes that specify functional RNAs) are still transcribed to produce an RNA, but this RNA is not translated into a polypeptide. For either type of gene, the process of going from DNA to a functional product is known as gene expression.

Transcription

In transcription, one strand of the DNA that makes up a gene, called the non-coding strand, acts as a template for the synthesis of a matching (complementary) RNA strand by an enzyme called RNA polymerase. This RNA strand is the primary transcript.
The two strands of DNA have the following sequences:
5'-ATGATCTCGTAA-3' 3'-TACTAGAGCATT-5'
The DNA opens up to form a bubble, and the lower strand serves as a template for the synthesis of a complementary RNA strand. This strand is called the template strand. Transcription of the template strand produces an mRNA that nearly matches the other strand (coding strand) of DNA in sequence. However, due to a biochemical difference between DNA and RNA, the Ts of DNA are replaced with Us in the mRNA. The mRNA sequence is:
5'-AUGAUCUCGUAA-5'
The primary transcript carries the same sequence information as the non-transcribed strand of DNA, sometimes called the coding strand. However, the primary transcript and the coding strand of DNA are not identical, thanks to some biochemical differences between DNA and RNA. One important difference is that RNA molecules do not include the base thymine (T). Instead, they have the similar base uracil (U). Like thymine, uracil pairs with adenine.

Transcription and RNA processing: Eukaryotes vs. bacteria

In bacteria, the primary RNA transcript can directly serve as a messenger RNA, or mRNA. Messenger RNAs get their name because they act as messengers between DNA and ribosomes. Ribosomes are RNA-and-protein structures in the cytosol where proteins are actually made.
In eukaryotes (such as humans), a primary transcript has to go through some extra processing steps in order to become a mature mRNA. During processing, caps are added to the ends of the RNA, and some pieces of it may be carefully removed in a process called splicing. These steps do not happen in bacteria.
Eukaryotic cell: Transcription takes place in the nucleus. The primary transcript also undergoes processing steps in the nucleus in order to become a mature mRNA. It is then exported to the cytosol, where it can associate with a ribosome and direct synthesis of a polypeptide in the process of translation.
Bacterium: Transcription takes place in the cytosol. Because of this, the mRNA doesn't have to travel anywhere before it can be translated by a ribosome. In fact, a ribosome may begin translating a mRNA before it is even fully transcribed (while transcription is still going on).
The location of transcription is also different between prokaryotes and eukaryotes. Eukaryotic transcription takes place in the nucleus, where the DNA is stored, while protein synthesis takes place in the cytosol. Because of this, a eukaryotic mRNA must be exported from the nucleus before it can be translated into a polypeptide. Prokaryotic cells, on the other hand, don't have a nucleus, so they carry out both transcription and translation in the cytosol.

Translation

After transcription (and, in eukaryotes, after processing), an mRNA molecule is ready to direct protein synthesis. The process of using information in an mRNA to build a polypeptide is called translation.

The genetic code

During translation, the nucleotide sequence of an mRNA is translated into the amino acid sequence of a polypeptide. Specifically, the nucleotides of the mRNA are read in triplets (groups of three) called codons. There are 61 codons that specify amino acids. One codon is a "start" codon that indicates where to start translation. The start codon specifies the amino acid methionine, so most polypeptides begin with this amino acid. Three other “stop” codons signal the end of a polypeptide. These relationships between codons and amino acids are called the genetic code.
The mRNA sequence is:
5'-AUGAUCUCGUAA-5'
Translation involves reading the mRNA nucleotides in groups of three; each group specifies an amino acid (or provides a stop signal indicating that translation is finished).
3'-AUG AUC UCG UAA-5'
AUG Methionine AUC Isoleucine UCG Serine UAA "Stop"
Polypeptide sequence: (N-terminus) Methionine-Isoleucine-Serine (C-terminus)

Steps of translation

Translation takes place inside of structures known as ribosomes. Ribosomes are molecular machines whose job is to build polypeptides. Once a ribosome latches on to an mRNA and finds the "start" codon, it will travel rapidly down the mRNA, one codon at a time. As it goes, it will gradually build a chain of amino acids that exactly mirrors the sequence of codons in the mRNA.
How does the ribosome "know" which amino acid to add for each codon? As it turns out, this matching is not done by the ribosome itself. Instead, it depends on a group of specialized RNA molecules called transfer RNAS (tRNAs). Each tRNA has a three nucleotides sticking out at one end, which can recognize (base-pair with) just one or a few particular codons. At the other end, the tRNA carries an amino acid – specifically, the amino acid that matches those codons.
Translation occurring in a ribosome. The mRNA is bound to the ribosome, where it can interact with tRNA molecule.
In this image, the mRNA has a sequence of:
3'-...AUG UAC AUC UCG GAU...-5'
A tRNA bound to the third codon (5'-AUC-3') has a complementary sequence of 3'-UAG-5'. It bears a chain of polypeptides consisting of methionine and isoleucine, which is attached to the tRNA by the isoleucine. To the right of this tRNA, another tRNA is binding to the next codon (5'-UCG-3'). This tRNA again has a complementary sequence of nucleotides (3'-AGC-5') and bears the amino acid serine, which is the amino acid specified by the mRNA codon. The serine carried by this tRNA will be added to the growing polypeptide chain.
Other tRNAs carrying other amino acids are floating around in the background. One carries Glu (glutamic acid) and has a sequence of nucleotides at its end that reads 3'-CUU-5'. The other carries Asp (aspartic acid) and has a sequence of nucleotides at its end that reads 3'-CUA-5'.
There are many tRNAs floating around in a cell, but only a tRNA that matches (base-pairs with) the codon that's currently being read can bind and deliver its amino acid cargo. Once a tRNA is snugly bound to its matching codon in the ribosome, its amino acid will be added to the end of the polypeptide chain.
  1. Matching tRNA binds to exposed codon in rightmost slot of ribosome.
  2. Chain of amino acids is transferred from tRNA in middle slot of ribosome onto the amino acid of the tRNA in the rightmost slot. This has the effect of adding the amino acid to the end of the amino acid chain.
  3. The ribosome shifts one codon over. The tRNA formerly in the middle slot moves to the leftmost slot and exits the ribosome. The tRNA formerly in the right slot moves into the middle slot and continues to hold the amino acid chain. A new codon is exposed in the rightmost slot for a new tRNA to bind to.
This process repeats many times, with the ribosome moving down the mRNA one codon at a time. A chain of amino acids is built up one by one, with an amino acid sequence that matches the sequence of codons found in the mRNA. Translation ends when the ribosome reaches a stop codon and releases the polypeptide.

What happens next?

Once the polypeptide is finished, it may be processed or modified, combine with other polypeptides, or be shipped to a specific destination inside or outside the cell. Ultimately, it will perform a specific job needed by the cell or organism – perhaps as a signaling molecule, structural element, or enzyme!

Summary:

  • DNA is divided up into functional units called genes, which may specify polypeptides (proteins and protein subunits) or functional RNAs (such as tRNAs and rRNAs).
  • Information from a gene is used to build a functional product in a process called gene expression.
  • A gene that encodes a polypeptide is expressed in two steps. In this process, information flows from DNA RNA protein, a directional relationship known as the central dogma of molecular biology.
    • Transcription: One strand of the gene's DNA is copied into RNA. In eukaryotes, the RNA transcript must undergo additional processing steps in order to become a mature messenger RNA (mRNA).
    • Translation: The nucleotide sequence of the mRNA is decoded to specify the amino acid sequence of a polypeptide. This process occurs inside a ribosome and requires adapter molecules called tRNAs.
  • During translation, the nucleotides of the mRNA are read in groups of three called codons. Each codon specifies a particular amino acid or a stop signal. This set of relationships is known as the genetic code.

Explore outside of Khan Academy

Do you want to learn more about transcription? Check out this scrollable interactive from LabXchange.
Do you want to learn more about translation? Check out this scrollable interactive from LabXchange.
LabXchange is a free online science education platform created at Harvard’s Faculty of Arts and Sciences and supported by the Amgen Foundation.

Want to join the conversation?

  • blobby green style avatar for user afzal.siddique99
    Why there are 61 codons? Why not 64??
    (42 votes)
    Default Khan Academy avatar avatar for user
  • leaf yellow style avatar for user SRINIDHI MEDARI
    what happens to the mRNA after translation process i.e after proteins are produced?
    (33 votes)
    Default Khan Academy avatar avatar for user
    • orange juice squid orange style avatar for user Jonathan
      Hi Srinidhi,

      After mRNA is translated, is either stored for later translation or is degraded. The eventual fate for every mRNA molecule is to be degraded. The process of degrading mRNA molecules happens at a relatively fixed rate.

      Hope this helps!

      Jonathan Myung
      (8 votes)
  • leaf green style avatar for user andrewshemon189
    I'm still confused on two things. One, what is a TATA box? ANd two, what are the poly-a tails and 5' caps?
    (14 votes)
    Default Khan Academy avatar avatar for user
    • aqualine ultimate style avatar for user aiwen.joy.lim
      The TATA box tells where a gene begins so that it can be transcribed. The Poly-A tail is a string of (mostly) adenines on the 3' end of the mRNA that gets eaten away by hydrolytic enzymes. It is there so that the coding section of the mRNA doesn't get eaten. (The hydrolytic enzymes themselves are there to protect from viruses.) It is also recognized by the nuclear pore and allows the mRNA to leave the nucleus. The 5' cap tells the ribosome where to begin translating.
      (26 votes)
  • purple pi purple style avatar for user Joyrell Thompson
    What happens if a mRNA breaks? Will part of the protein be produced from the broken piece?
    (5 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user DeepankarRoy
      Yes, most likely. If the context of the mRNA fits with the translational machinery (applicable for the part of mRNA with the initiation codon only. The part without the initiation codon would not be translated), it might produce a truncated protein where the N-terminal part would be present but the C-terminal part (wrt to the original full length protein) would not be there. However, most of these truncated proteins are recognized by the cellular repair machinery as abnormal and they are recycled. Sometimes though, such proteins can linger and may even participate in cellular functions (in a positive or detrimental way). Most likely source of truncated proteins is DNA rearrangement though, and mRNA breakage would not likely have a major effect (it might, depending upon the need of the original protein) as there would be other full-length mRNAs that would be translated into the protein of interest. Hope this helps.
      (15 votes)
  • blobby green style avatar for user stephen4934
    What is tRNA using to create these amino acids? Do they come from tge nutrition of what you eat?
    (4 votes)
    Default Khan Academy avatar avatar for user
  • aqualine ultimate style avatar for user Lunalgaleo
    What happens in a mutation where the Stop Codon is removed/altered? What does the cell do then? does it perform apoptosis?
    (4 votes)
    Default Khan Academy avatar avatar for user
    • winston baby style avatar for user Ivana - Science trainee
      There are repair mechanisms. That one is called Non stop Decay that mechanism is able to detect mRNA which cannot be degraded because there is no STOP codon. It has to detach mRNA from the ribosome so it can translate the next mRNA sequence.

      Nonstop decay is the mechanism of identifying and disposing aberrant transcripts that lack in-frame stop codons. It is hypothesized that these transcripts are identified during translation when the ribosome arrives at the 3′ end of the mRNA and stalls. Presumably the ribosome stalling recruits additional cofactors, Ski7 and the exosome complex. The exosome degrades the transcript using either one of is ribonucleolytic activities and the ribosome and the peptide are both released. Additional precautionary measures by the nonstop decay pathway may include translational repression of the nonstop transcript after translation, and proteolysis of the released peptide by the proteasome.

      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3638749/
      (4 votes)
  • blobby green style avatar for user Priyanka
    It is mentioned in The Genetic Code, that,
    One codon is a "start" codon that indicates where to start translation. The start codon specifies the amino acid methionine, so most polypeptides begin with this amino acid.

    AUG codes for methionine, which contains sulfur. In the Hershey-Chase experiment, they made use of the fact that all proteins contain sulfur (because of the presence of methionine, I guess)
    Are there proteins which do not begin with methionine?
    (4 votes)
    Default Khan Academy avatar avatar for user
    • female robot grace style avatar for user tyersome
      There are, but this is (usually) due to removal or modification of the amino-terminal (start) methionine. For example enzymes called "methionine amino-peptidases" cut off this amino acid from the beginning of some proteins — this is an example of what is known as a "post-translational modification".

      It is also quite common for the first part of a protein (including the starting methionine) to be removed during processing — an example is secreted proteins that have their signal sequences removed during secretion or membrane insertion.

      Methionines can also be oxidized to form chemically related residues.
      (4 votes)
  • blobby green style avatar for user malekkazar
    Can a DNA end in 3' and the last molecule in this end is a phosphate? Why not??
    (5 votes)
    Default Khan Academy avatar avatar for user
  • starky sapling style avatar for user ForgottenUser
    Why do the number of A's on the poly-A tail vary?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user leonardodebo
      Each time a mRNA is read, an ''A'' of the poly-A tail is cut off, when there's no more ''A'' in the tail, the mRNA can be degraded. A mRNA (let's call it mRNA 1) can have more ''A'' in its tail than another mRNA (mRNA 2) depending on how much the cell needs that product (1 instead of product 2).
      (5 votes)
  • purple pi pink style avatar for user Jenna Kim
    I don't understand: difference between non-coding strand, primary transcript, and coding strand?

    Is the non-coding strand the strand of DNA that isn't used as the template, and the coding strand the strand of DNA that is used as the template? But doesn't both strands of DNA become 'templates' for the RNA, so both of them are coding strands and non-coding strands, depending on which one you're looking at?

    Thanks in advance.
    (3 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user Shane McGookey
      Consider the directionality of the strands to help distinguish between them. One strand runs in the 3' to 5' direction, while the other strand runs in the 5' to 3' direction. This begets the antiparallel structure of the DNA.

      We refer to the strand that runs in the 3' to 5' direction as the template strand (the template strand is used to transcribe mRNA by matching complementary RNA nucleotides with the nucleotides of the template strand). The strand that runs in the 5' to 3' direction is referred to as the coding strand. Because the 5' to 3' strand is referred to as the coding strand, the template strand (the 3' to 5' strand) can also be referred to as the non-coding strand.

      The nomenclature is a bit confusing, but hopefully this adds some clarity to the naming.
      (5 votes)