The Medical Implications of Complete Mitochondrial DNA Sequencing

 

Ian Logan

 

 

The mitochondrial DNA sequencing for the genealogical community until very recently was only of parts of the genome considered to be medically unimportant; however anyone can now have a test that will provide the complete mitochondrial DNA (mtDNA) sequence.  But the results of this test may have medical implications; and in this article the factors showing this are discussed. The subjects of mitochondrial science, mutations, mutation lists, protein coding and transfer RNA are introduced to provide a background to understanding the origin of various mitochondrial diseases.  The condition of Leber's Hereditary Optic Neuropathy and the diseases associated with mutations in the tRNA for the amino acid, isoleucine, are used as examples.  Two complete mtDNA sequences are analysed to show how a report on their mutations can be produced; and the risk to a subject of getting a report listing potentially harmful mutations is discussed. Mention is made of various instances when complete mitochondrial sequencing may be requested under the guise of genealogy when it really is being done for medical reasons.  Complete mitochondrial DNA sequencing should not be undertaken without careful consideration.

 

 

Received:  October 2, 2005, accepted November 2, 2005

 

Address for correspondence:   Ian Logan, ilbg18230@blueyonder.co.uk

 

 



Introduction

 

The sequencing of mitochondrial DNA (mtDNA) has become very popular with both amateur and professional genealogists over the last few years.  This subject has been made particularly interesting by Professor Bryan Sykes's book, The Seven Daughters of Eve (Sykes, 2001) in which he discusses the concept that all people of European descent have as their female ancestors just a few women who lived many thousands of years ago.  His ideas are part of the general theory that all of mankind is descended from a “Mitochondrial Eve” who lived in Africa about 180,000 years ago.

 

However, the mtDNA sequencing that has been done until recently was only of parts of the genome considered to be medically unimportant; by which is meant that there was no proven connection between the results obtained and known illnesses.  Therefore, sequencing of mtDNA was considered as harmless to the person being tested and there was no reason for anyone to have any hesitation over being tested.  However, it is now possible to sequence the whole of the mtDNA genome, rather than just 4-8%; and the medical implications are beginning to give concern.

 

Many medical conditions are already known to be caused by changes in the mtDNA and the whole area of knowledge continues to expand rapidly.  So now it is perhaps not a question of having abnormal mtDNA as opposed to normal, but rather a matter of what harm is being caused to a person's health by the particular mtDNA they have inherited.

 

It is therefore no longer possible to say that sequencing the mtDNA genome is a harmless procedure; and in this article the factors that show this are discussed. However, because the underlying science may be new to many readers who do not have a background in a science, there are a number of basic scientific details that have to be explained.

 

 

2. Mitochondrial Science

 

Each mature nucleated cell of the body contains several hundreds of  mitochondria (Taylor 2005).  These are small structures involved in energy production within the cell.  A mitochondrion may be considered as a sort of power plant, with fuel being taken in and energy produced. In each of the mitochondria there are circular strands of genetic material made up of deoxyribonucleic acid.  This is the mitochondrial DNA, or mtDNA.  The structure of one of these strands was first determined in 1981 at the University of Cambridge (Anderson 1981). This structure can be called the mtDNA genome, or the mtDNA sequence.  Figure 1 shows a simplified view of mtDNA.

 

 

 

Figure 1  A Simplified View of mtDNA.

(Adapted from: http://www.mitomap.org/)  The circular strand of mtDNA is shown with nucleotide base 1 at the 12 o'clock position. By convention, the numbering of the nucleotide bases goes in an anti-clockwise direction.  The coding areas for the genes are shown in red, with their corresponding labels outside (see Table 6 for the full names).  The tRNA's are shown in blue with the three-letter abbreviations for the amino acids inside (see Table 7 for the full names).

 

 

 

The original subject was a person of European descent and her sequence now forms the Cambridge Reference Sequence (CRS).  In this sequence the nucleotide bases are numbered from 1 to 16,569; and subsequently, all other mtDNA sequences have been compared to the CRS.   Differences between sequences are normally given as a simple list.

 

 The purpose of the mtDNA is to carry encoded instructions for certain products in the cell.  In it, there are coding regions for two ribosomal RNA genes, 13 genes for the production of proteins involved in oxidative phosphorylation of fatty acids and carbohydrates (Taylor 2005), and, perhaps most interestingly in the present context, the coding for 22 transfer RNA (tRNA) genes for the 20 amino acids that are the building blocks of proteins.  There are also parts of the mtDNA that appear to have no function, other than perhaps to ensure the correct shape to the overall circular form of the mtDNA. The largest of these non-functional areas contains the hypervariable regions, HVR-1 and HVR-2.

 

The actions of mitochondria are mainly concerned with the energy processes within cells; and whilst a single mitochondrion is very small and has an insignificant influence on any particular cell, the cumulative effect of mitochondria on health and illness in a person can become clear when one considers that about 2 billion mitochondria are made every second throughout a person’s life.  This estimate is obtained by considering the human body as having 75 trillion cells, each with 250 mitochondria and having an average life of around 100 days.

 

Until recently sequencing of the mtDNA for genealogical purposes was restricted to the hypervariable regions; and many thousands of subjects have already been tested.  But now it is possible to sequence the whole mitochondrial genome for any individual at a reasonable cost; in addition, already about 2,500 complete mtDNA sequences have been published from scientific studies.  It is the data from these published complete mtDNA sequences that provide the material for this article.

         

 

3. Mitochondrial Nucleotides

 

The mtDNA genome is considered as a circular strand of DNA consisting of 16,569 nucleotide bases and for historical reasons the numbering of the bases begins at an arbitrary place—the start of what is termed “the hypervariable HVR-2 region.”

 

As in other DNA, there are 4 nucleotide bases, Adenine, Guanine, Cytosine and Thymine; and from now on these will be referred to by their initial letters, i.e. A, G, C and T.  The actual mtDNA is made up of a double strand, but for simplicity only one strand is normally shown.  The second strand is complementary by which is meant that a 'T' links with an 'A', an 'A' with a 'T', a 'C' with a 'G' and a 'G' with a 'C'. The pairings of A-T and C-G are known as “Watson-Crick base pairings” after the discoverers of the structure of DNA.

 

It is a feature of mtDNA that some of the coding is read on one strand going forward in one direction, whilst other coding is read from the complementary strand in the opposite direction.

 

It is not practical to print the full sequence of 16,569 bases that makes up the Cambridge Reference Sequence (CRS) in this article, but Table 1 is included to show the first 70 nucleotide bases.  The entire sequence may be found at:

 

http://www.mitomap.org/mitoseq.htm

 

 

4.  Mitochondrial Mutations

 

mtDNA sequences from persons who are maternally related to each other are normally identical; but it is found that once in about every 2,000 years in a given lineage, one of the 16569 nucleotide bases alters.  This is a random process.  Such an alteration is termed a mutation.

 

Table 1

First 70 Bases of the Cambridge Reference Sequence.

 

The bases are numbered from 1 to 70 and form the first part of the hypervariable region HVR-2. The sequence is often split into groups of 10 bases to aid counting.

 

gatcacaggt   ctatcaccct   attaaccact  cacgggagct  ctccatgcat  ttggtatttt  cgtctggggg

1            11           21          31          41          51          61

 


In most instances the change is of a “C” being replaced by a “T”, or vice-versa, or a “A” being replaced by a “G”, or vice-versa, and uncommonly the remaining possibilities.  It is also possible for nucleotide bases to be added—these are called insertions, or for bases to be lost—these are called deletions.

 

Mutations cause the mtDNA sequences from persons who are not closely related to be different.  And, it follows that the number and type of mutations between two persons gives a measure of their relatedness. Consequently, two persons who are both European, for example, will have fewer mutational differences as when compared with persons who are from very different parts of the world.

 

To illustrate this point, Table 2 shows a typical list of differences from the CRS.  The results come from an Italian (Achilli 2004).  The list here is short as the subject is being compared against the CRS individual who is considered to come from a more northerly part of the same continent, Europe.  The length of the list suggests a common matrilineal ancestor perhaps 20-30,000 years ago.

Table 2

A Typical "Mutation" List for a European

 

The mtDNA from an Italian subject shows differences from the CRS at the following places:

 

   T152C   A263G   315.C   A750G  A1438G  G3591A  A4310G  A4769G  A8860G

  T9148C T13020C A15326G C16168T T16519C

 

Note: The T152C shows that the 'T' at position 152 in the CRS is replaced by 'C' and the 315.C shows there is an insertion of a 'C' between 315 & 316.

 


 

Table 3 shows the list of differences from CRS for a person from Uganda (Macaulay 2005).  In this case the mutation list is very much longer; and supports the theory that the earliest African lineage, Haplogroup L0, and Europeans, as typified by the CRS, have been separate for approximately 180,000 years.  Kivisild (2005) gives the figure of 160,000 (range 138,000 - 182,000) years, based on their study of 277 complete mtDNA sequences.  But in the author's view the higher figure is to be preferred, and is supported by other studies (Macauley 2005).

 

 

 

Table 3

A More Complicated "Mutation" List

 

The mtDNA from an Ugandan subject, from Haplogroup L0, shows differences from the CRS at the following places:

 

     G143A    T146C    T152C     G185A     A189G     G247A     A263G     315.C     A750G

     G769A     825A    A978G    G1018A    C1048T    A1438G    G1719A    A2245G    A2706G

    G2758A   C2772T   C2789T    T2885C    C3107T    C3516A    C3594T    C3852T    A4104G

    C4194T   C4312T   A4562G    T4586C    A4769G    C4964T    C5321T    T5442C    C5603T

    T6185C   A6359G   C7028T    A7146G    T7148C    C7256T    G7521A    C8468T    C8655T

    A8701G   A8860G   C9042T    A9347G    T9540C    T9581C    C9620T    C9818T   G10143A

   A10398G  G10589A  C10664T   G10688A   T10790C   T10810C   T10873C   T10915C   T11287C

   T11299C  A11641G  G11719A   G11914A   G12007A   C12705T   A13105G   A13276G   A13470G

   C13506T  C13650T  C13680T   G13708A   G13928C   C14109T   C14620T   C14766T   C15136T

   A15326G  G15431A  T15852C   G16129A   C16169T   T16172C   C16173T   C16187T   T16189C

   C16223T  A16230G   16239T   C16278T   T16311C   C16327T   T16368C   T16519C

 

 

 

 

5. Phylogenetic trees

 

By studying the mutations that are found in different sequences it is possible to draw a tree to show how different populations have branched away from the line of mutations that leads from 'mitochondrial Eve' to the European person whose sequence has been chosen as the CRS.

 

Each of these populations has been given a label called a haplogroup; and the CRS sequence belongs to Haplogroup H2, the Italian sequence, above, to Haplogroup H9 and the African sequence, above, to Haplogroup L0.

 

Table 4 shows the probable mutations from “mitochondrial Eve” (mtEve) to the CRS. All the mutation sites are to be found in the Ugandan’s mutation list, given above, as Haplogroup L0 is the first population that branches off.  However, the Ugandan subject’s list of differences from CRS has twice as many mutations as appear in Table 4 because his list must include not only his mutations from mtEve, but also the reverse of the mutations between mtEve and CRS.

 

 

Table 4

Mutation Sites from “Mitochondrial Eve” to the CRS

 

(Omitting hypervariable areas)

 

Line from Branch

Point to Branch Point*        Mutations Along this Line at these Locations

 

From mtEve to BP-to-L1        4312   6185   9755  10589  11914  12007

From BP-to-L1 to BP-to-L5     2758   2885   7146   8468

From BP-to-L5 to BP-to-L2      825   8655  10688  10810  13105   13506   15301

From BP-to-L2 to BP-to-L4     3594   4104   7256   7521  13650

From BP-to-L4 to L3            769   1018    

From L3 to N                  8701   9540  10398  10873  15301

From N to R                  12705

From R to HV                 11719  14766

From HV to CRS                 750   1438   2706   4769   7028   8860  15326 

 

*  “Branch Point” or “BP” means a point on the line in the mtDNA phylogenetic tree between mtEve and CRS, where a branch leads to another named haplogroup.  These branch points are shown in Figure 1 in the article by Kivisild (2005).  A detailed discussion of the reasons for including each of the mutations given above will be the subject of a forthcoming article in JoGG.

 

Note: 15301 has changed twice.

 

 

 

It is important, however, to appreciate that if the sequence chosen as the CRS  had been the African sequence, as given above, European sequences would be seen as very different to the reference sequence  - it is just a matter of history, and with hindsight perhaps the use of the hypothetical reference sequence for 'mitochondrial Eve' would have been best from all points of view. However, in the remainder of this article, the normal convention—representing a sequence as a set of differences from CRS—will be followed. Using this convention, the word "mutation" will mean difference from CRS.

 

 

6. The Genes Encoded in the mtDNA

 

The mtDNA genome has 15 areas of coding for genes. There are two genes which are involved in ribosomal function (ribosomes are other small structures in a cell) and 13 genes concerned with the the biochemical process of oxidative phosphorylation.

 

The two ribosomal genes code for ribosomal RNA. In this article the function of the genes will not be discussed further as there does not appear to be any definite medical condition associated with any of the many known mutations in these areas of coding.  Presumably these mutations do not affect overall  ribosomal function to any significant degree.  The mutations are, however, useful when constructing phylogenetic trees.  Table 5 gives the details of these genes.

 

 

Table 5  The Names and Locations of the Ribosomal Genes Found in mtDNA.

Adapted from: http://www.mitomap.org/

 

Name:                                    Location

 

12S ribosomal RNA                        648 – 1601

16S ribosomal RNA                       1671 – 3229

 


 

 

However, the 13 genes that encode for proteins involved in oxidative phosphorylation are very important as they are involved with many medical conditions. These genes are listed in Table 6.

 

Table 6

The Names and Locations of the Oxidative Phoshorylation Enzyme Genes Found in mtDNA

(Adapted from: http://www.mitomap.org/)

 

Name:                                             Location:

 

NADH dehydrogenase subunit 1 (ND1)         3307  -  4263

NADH dehydrogenase subunit 2 (ND2)         4470  -  5513

NADH dehydrogenase subunit 3 (ND3)        10059  - 10404

NADH dehydrogenase subunit 4L (ND4L)      10470  - 10766

NADH dehydrogenase subunit 4 (ND4)        10760  - 12137

NADH dehydrogenase subunit 5 (ND5)        12337  - 14148

NADH dehydrogenase subunit 6 (ND6)        14149  - 14673  (Complement)

cytochrome c oxidase subunit I (COX1)      5904  -  7445

cytochrome c oxidase subunit II (COX2)     7586  -  8269

cytochrome c oxidase III (COX3)            9207  -  9987

ATP synthase F0 subunit 6 (ATP6)           8527  -  9207

ATP synthase F0 subunit 8 (ATP8)           8366  -  8572

cytochrome b (CYTB)                       14747  - 15886

 

Note:  Because of post-transcription processing of the last few bases in the RNA copy of the DNA, the number of locations is not always a multiple of three (see also Note 2 to Table 7).

 


 

 

These genes all code for protein using a triplet-codon system, which means that a group of 3 consecutive nucleotides identify an amino acid for the protein.

 

To illustrate this, 18 nucleotide bases from near the start of the gene ND1 (beginning at 3331) are shown with the corresponding amino acid coded by each triplet of bases:

 

 

CTC   ATT   GTA   CCC   ATT   CTA   ….

 

Leu   Ile    Val    Pro    Ile    Leu   ….

 

 

Table 7 shows the full details of the triplet-codon system.

 

Table 7

The Triplet-Codon System Used in Genes to Code for Amino Acids

 

Amino Acid           Triplet-codons     Amino Acid         Triplet-codons

 

Alanine(Ala)         GC*                Leucine(Leu)       TTA   TTG   CT*

Arginine(Arg)        AGA   AGG   CG*    Lysine(Lys)        AAA   AAG

Asparagine(Asn)      AAC   AAT          Methionine(Met)    ATA   ATG   ATT

Aspartic Acid(Asp)   GAC   GAT          Phenylalanine(Phe) TTC   TTT

Cysteine(Cys)        TGC   TGT          Proline(Pro)       CC*

Glutamic acid(Glu)   GAA   GAG          Serine(Ser)        AGC   AGT   TC*

Glutamine(Gln)       CAA   CAG          Threonine(Thr)     AC*

Glycine(Gly)         GG*                Tryptophan(Trp)    TGG

Histidine(His)       CAC   CAT          Tyrosine(Tyr)      TAC   TAT

Isoleucine(Ile)      ATC                Valine(Val)        GT*

 

And the beginning and end of coding areas usually have one of the triplet-codons:

 

START                ATA   ATG   ATT    STOP               TAA   TAG   TGA