The
Medical Implications of Complete Mitochondrial DNA Sequencing
Ian Logan
The
mitochondrial DNA sequencing for the genealogical community until very recently
was only of parts of the genome considered to be medically unimportant; however
anyone can now have a test that will provide the complete mitochondrial DNA
(mtDNA) sequence. But the results of
this test may have medical implications; and in this article the factors
showing this are discussed. The subjects of mitochondrial
science, mutations, mutation lists,
protein coding and transfer RNA
are introduced to provide a background to understanding the origin of various mitochondrial diseases. The condition of Leber's Hereditary Optic Neuropathy and the diseases associated
with mutations in the tRNA for the amino acid, isoleucine, are used as
examples. Two complete mtDNA sequences
are analysed to show how a report on their mutations can be produced; and the
risk to a subject of getting a report listing potentially harmful mutations is
discussed. Mention is made of various instances when complete mitochondrial
sequencing may be requested under the guise of genealogy when it really is
being done for medical reasons. Complete
mitochondrial DNA sequencing should not be undertaken without careful consideration.
Received:
Address for correspondence: Ian Logan, ilbg18230@blueyonder.co.uk
Introduction
The sequencing of
mitochondrial DNA (mtDNA) has become very popular with both amateur and
professional genealogists over the last few years. This subject has been made particularly
interesting by Professor Bryan Sykes's book, The Seven Daughters of Eve (Sykes, 2001) in which he discusses the
concept that all people of European descent have as their female ancestors just
a few women who lived many thousands of years ago. His ideas are part of the general theory that
all of mankind is descended from a “Mitochondrial
Eve” who lived in
However, the mtDNA
sequencing that has been done until recently was only of parts of the genome
considered to be medically unimportant; by which is meant that there was no
proven connection between the results obtained and known illnesses. Therefore, sequencing of mtDNA was considered
as harmless to the person being tested and there was no reason for anyone to
have any hesitation over being tested.
However, it is now possible to sequence the whole of the mtDNA genome,
rather than just 4-8%; and the medical implications are beginning to give
concern.
Many medical conditions
are already known to be caused by changes in the mtDNA and the whole area of
knowledge continues to expand rapidly.
So now it is perhaps not a question of having abnormal mtDNA as opposed to normal,
but rather a matter of what harm is
being caused to a person's health by the particular mtDNA they have inherited.
It is therefore no longer
possible to say that sequencing the mtDNA genome is a harmless procedure; and
in this article the factors that show this are discussed. However, because the
underlying science may be new to many readers who do not have a background in a
science, there are a number of basic scientific details that have to be
explained.
2. Mitochondrial Science
Each mature nucleated cell
of the body contains several hundreds of
mitochondria (Taylor 2005). These
are small structures involved in energy production within the cell. A mitochondrion may be considered as a sort
of power plant, with fuel being taken
in and energy produced. In each of the mitochondria there are circular strands
of genetic material made up of deoxyribonucleic acid. This is the mitochondrial DNA, or mtDNA. The structure of one of these strands was
first determined in 1981 at the

Figure 1 A Simplified View of mtDNA.
(Adapted
from: http://www.mitomap.org/) The circular strand of mtDNA is shown with
nucleotide base 1 at the
The original subject was a
person of European descent and her sequence now forms the Cambridge Reference Sequence (CRS).
In this sequence the nucleotide bases are numbered from 1 to 16,569; and
subsequently, all other mtDNA sequences have been compared to the CRS. Differences between sequences are normally
given as a simple list.
The purpose of the mtDNA is to carry encoded
instructions for certain products in the cell.
In it, there are coding regions for two ribosomal RNA genes, 13 genes
for the production of proteins involved in oxidative
phosphorylation of fatty acids and carbohydrates (Taylor 2005), and, perhaps
most interestingly in the present context, the coding for 22 transfer RNA
(tRNA) genes for the 20 amino acids that are the building blocks of
proteins. There are also parts of the
mtDNA that appear to have no function, other than perhaps to ensure the correct
shape to the overall circular form of the mtDNA. The largest of these
non-functional areas contains the hypervariable
regions, HVR-1 and HVR-2.
The actions of
mitochondria are mainly concerned with the energy processes within cells; and
whilst a single mitochondrion is very small and has an insignificant influence
on any particular cell, the cumulative effect of mitochondria on health and
illness in a person can become clear when one considers that about 2 billion
mitochondria are made every second throughout a person’s life. This estimate is obtained by considering the
human body as having 75 trillion cells, each with 250 mitochondria and having
an average life of around 100 days.
Until recently sequencing
of the mtDNA for genealogical purposes was restricted to the hypervariable regions; and many
thousands of subjects have already been tested.
But now it is possible to sequence the whole mitochondrial genome for
any individual at a reasonable cost; in addition, already about 2,500 complete
mtDNA sequences have been published from scientific studies. It is the data from these published complete
mtDNA sequences that provide the material for this article.
3. Mitochondrial Nucleotides
The mtDNA genome is
considered as a circular strand of DNA consisting of 16,569 nucleotide bases
and for historical reasons the numbering of the bases begins at an arbitrary
place—the start of what is termed “the hypervariable HVR-2 region.”
As in other DNA, there are
4 nucleotide bases, Adenine, Guanine, Cytosine and Thymine; and from now on
these will be referred to by their initial letters, i.e. A, G, C and T. The actual mtDNA is made up of a double
strand, but for simplicity only one strand is normally shown. The second strand is complementary by which is meant that a 'T' links with an 'A', an
'A' with a 'T', a 'C' with a 'G' and a 'G' with a 'C'. The pairings of A-T and
C-G are known as “Watson-Crick base
pairings” after the discoverers of the structure of DNA.
It is a feature of mtDNA
that some of the coding is read on one strand going forward in one direction,
whilst other coding is read from the complementary
strand in the opposite direction.
It is not practical to
print the full sequence of 16,569 bases that makes up the Cambridge Reference
Sequence (CRS) in this article, but Table
1 is included to show the first 70 nucleotide bases. The entire sequence may be found at:
http://www.mitomap.org/mitoseq.htm
4.
Mitochondrial Mutations
mtDNA sequences from
persons who are maternally related to each other are normally identical; but it
is found that once in about every 2,000 years in a given lineage, one of the
16569 nucleotide bases alters. This is a
ra
|
Table 1 First 70 Bases of
the The bases are numbered from 1 to 70 and form the first
part of the hypervariable region HVR-2. The sequence is often split into
groups of 10 bases to aid counting. gatcacaggt
ctatcaccct attaaccact cacgggagct
ctccatgcat ttggtatttt cgtctggggg 1
11 21 31 41 51 61 |
In most instances the
change is of a “C” being replaced by a “T”, or vice-versa, or a “A” being
replaced by a “G”, or vice-versa, and uncommonly the remaining
possibilities. It is also possible for
nucleotide bases to be added—these are called insertions, or for bases to be lost—these are called deletions.
Mutations cause the mtDNA
sequences from persons who are not closely related to be different. And, it follows that the number and type of
mutations between two persons gives a measure of their relatedness.
Consequently, two persons who are both European, for example, will have fewer
mutational differences as when compared with persons who are from very
different parts of the world.
To illustrate this point, Table 2 shows a typical list of
differences from the CRS. The results
come from an Italian (Achilli 2004). The
list here is short as the subject is being compared against the CRS individual
who is considered to come from a more northerly part of the same continent,
|
Table 2 A Typical "Mutation" List for a European The mtDNA from an
Italian subject shows differences from the CRS at the following places: T152C A263G
315.C A750G A1438G
G3591A A4310G A4769G
A8860G T9148C T13020C A15326G C16168T T16519C Note: The T152C shows that the 'T' at position 152
in the CRS is replaced by 'C' and the 315.C shows there is an insertion of a
'C' between 315 & 316. |
Table 3 shows the list
of differences from CRS for a person from
|
Table 3 A More Complicated
"Mutation" List The
mtDNA from an Ugandan subject, from Haplogroup L0, shows differences from the
CRS at the following places: G143A
T146C T152C G185A
A189G G247A A263G
315.C A750G G769A
825A A978G G1018A
C1048T A1438G G1719A
A2245G A2706G G2758A
C2772T C2789T T2885C
C3107T C3516A C3594T
C3852T A4104G C4194T C4312T
A4562G T4586C A4769G
C4964T C5321T T5442C
C5603T T6185C
A6359G C7028T A7146G
T7148C C7256T G7521A
C8468T C8655T A8701G A8860G
C9042T A9347G T9540C
T9581C C9620T C9818T
G10143A A10398G
G10589A C10664T G10688A
T10790C T10810C T10873C
T10915C T11287C T11299C
A11641G G11719A G11914A
G12007A C12705T A13105G
A13276G A13470G C13506T
C13650T C13680T G13708A
G13928C C14109T C14620T
C14766T C15136T A15326G
G15431A T15852C G16129A
C16169T T16172C C16173T
C16187T T16189C C16223T
A16230G 16239T C16278T
T16311C C16327T T16368C
T16519C |
5.
Phylogenetic trees
By studying the mutations
that are found in different sequences it is possible to draw a tree to show how different populations
have branched away from the line of mutations that leads from 'mitochondrial Eve' to the European person
whose sequence has been chosen as the CRS.
Each of these populations
has been given a label called a haplogroup; and the CRS sequence belongs
to Haplogroup H2, the Italian sequence, above, to Haplogroup H9 and the African
sequence, above, to Haplogroup L0.
Table 4
shows the probable mutations from “mitochondrial
Eve” (mtEve) to the CRS. All the mutation sites are to be found in the
Ugandan’s mutation list, given above, as Haplogroup L0 is the first population
that branches off. However, the Ugandan subject’s
list of differences from CRS has twice as many mutations as appear in Table 4
because his list must include not only his mutations from mtEve, but also the
reverse of the mutations between mtEve and CRS.
|
Table 4 Mutation Sites from “Mitochondrial Eve” to the CRS (Omitting hypervariable areas) Line
from Branch Point
to Branch Point* Mutations Along this Line at these
Locations From mtEve to
BP-to-L1 4312 6185
9755 10589 11914
12007 From BP-to-L1 to
BP-to-L5 2758 2885 7146 8468 From BP-to-L5 to
BP-to-L2 825 8655
10688 10810 13105 13506 15301 From BP-to-L2 to
BP-to-L4 3594 4104
7256
7521 13650 From BP-to-L4 to L3 769 1018
From L3 to N 8701 9540
10398 10873 15301 From N to R 12705 From R to HV 11719 14766 From HV to CRS 750 1438
2706 4769 7028
8860 15326 * “Branch Point” or “BP” means a point on the
line in the mtDNA phylogenetic tree between mtEve and CRS, where a branch
leads to another named haplogroup.
These branch points are shown in Figure 1 in the article by Kivisild
(2005). A detailed discussion of the reasons
for including each of the mutations given above will be the subject of a
forthcoming article in JoGG. Note: 15301 has changed twice. |
It is important, however,
to appreciate that if the sequence chosen as the CRS had been the African sequence, as given
above, European sequences would be seen as very different to the reference sequence - it is just a matter of history, and with
hindsight perhaps the use of the hypothetical reference sequence for 'mitochondrial Eve' would have been best
from all points of view. However, in the remainder of this article, the normal
convention—representing a sequence as a set of differences from CRS—will be
followed. Using this convention, the word "mutation" will mean
difference from CRS.
6. The Genes Encoded in the mtDNA
The mtDNA genome has 15
areas of coding for genes. There are two genes which are involved in ribosomal
function (ribosomes are other small structures in a cell) and 13 genes
concerned with the the biochemical process of oxidative phosphorylation.
The two ribosomal genes
code for ribosomal RNA. In this
article the function of the genes will not be discussed further as there does
not appear to be any definite medical condition associated with any of the many
known mutations in these areas of coding.
Presumably these mutations do not affect overall ribosomal function to any significant
degree. The mutations are, however,
useful when constructing phylogenetic trees. Table
5 gives the details of these genes.
|
Table 5 The Names and
Locations of the Ribosomal Genes Found in mtDNA. Adapted from: http://www.mitomap.org/ Name: Location 12S ribosomal RNA 648 – 1601 16S ribosomal RNA 1671 – 3229 |
However, the 13 genes that
encode for proteins involved in oxidative
phosphorylation are very important as they are involved with many medical
conditions. These genes are listed in Table
6.
|
Table 6 The Names and Locations of the Oxidative Phoshorylation Enzyme
Genes Found in mtDNA (Adapted from:
http://www.mitomap.org/) Name: Location: NADH
dehydrogenase subunit 1 (ND1)
3307 - 4263 NADH
dehydrogenase subunit 2 (ND2)
4470 - 5513 NADH
dehydrogenase subunit 3 (ND3)
10059 - 10404 NADH dehydrogenase subunit 4L (ND4L) 10470
- 10766 NADH dehydrogenase subunit 4 (ND4) 10760
- 12137 NADH dehydrogenase subunit 5 (ND5) 12337
- 14148 NADH
dehydrogenase subunit 6 (ND6)
14149 - 14673 (Complement) cytochrome
c oxidase subunit I (COX1)
5904 - 7445 cytochrome
c oxidase subunit II (COX2)
7586 - 8269 cytochrome
c oxidase III (COX3)
9207 - 9987 ATP
synthase F0 subunit 6 (ATP6)
8527 - 9207 ATP
synthase F0 subunit 8 (ATP8)
8366 - 8572 cytochrome
b (CYTB)
14747 - 15886 Note: Because of post-transcription processing of
the last few bases in the RNA copy of the DNA, the number of locations is not
always a multiple of three (see also Note 2 to Table 7). |
These genes all code for protein using a
triplet-codon system, which means that a group of 3 consecutive nucleotides
identify an amino acid for the protein.
To illustrate this, 18
nucleotide bases from near the start of the gene ND1 (beginning at 3331) are
shown with the corresponding amino acid coded by each triplet of bases:
CTC ATT GTA CCC ATT CTA ….
Leu Ile Val Pro Ile Leu ….
Table 7 shows the full details of the triplet-codon system.
|
Table 7 The Triplet-Codon System Used in Genes to Code for Amino
Acids Amino Acid
Triplet-codons Amino Acid Triplet-codons Alanine(Ala)
GC* Leucine(Leu) TTA
TTG CT* Arginine(Arg)
AGA AGG CG*
Lysine(Lys) AAA AAG Asparagine(Asn)
AAC AAT Methionine(Met) ATA
ATG ATT Aspartic
Acid(Asp) GAC GAT
Phenylalanine(Phe) TTC TTT Cysteine(Cys)
TGC TGT Proline(Pro) CC* Glutamic acid(Glu)
GAA GAG Serine(Ser) AGC
AGT TC* Glutamine(Gln)
CAA CAG Threonine(Thr) AC* Glycine(Gly)
GG* Tryptophan(Trp) TGG Histidine(His)
CAC CAT Tyrosine(Tyr) TAC
TAT Isoleucine(Ile)
ATC Valine(Val) GT* And the beginning and end of coding areas usually
have one of the triplet-codons: START
ATA ATG
ATT STOP TAA TAG
TGA |