The Continuing Hunt for Nuclear Mitochondrial
Abstract
Hunting
for Nuclear Mitochondrial
Address for correspondence:
Received:
Introduction
The 46 chromosomes in the Human genome contain many
hundreds of short sequences of bases that match sections of the
NUMTs
are found in the chromosomes of most species (Richly, 2004), and a wide variety
of species have been the subject of articles describing their NUMTs, including the domestic cat (Lopez, 1994; Antunes, 2007), and the ant (Martins, 2007).
A NUMT is formed by the incorporation of a fragment of
the mtDNA into a chromosome. This type
of event is very rare; but over a period of millions of years the number of
times this has happened has becomes appreciable. The formation of a NUMT is essentially a random event
and the fragment of mtDNA involved can be of any length, from just a few bases
to many thousands of bases, and any of the chromosomes can be involved. In many ways NUMTs
are considered to be “fossils” preserving the mtDNA sequence as it used to be
at various times in our evolutionary past.
After formation a NUMT becomes an ordinary part of the
chromosome and the integrity of its
As a result of these processes, the sequences of most NUMTs differ considerably from the sequence of modern mtDNA and
the identifying NUMTs can be considered to be a bit
of a “treasure hunt." This has led
to different researchers, unsurprisingly, coming to differing conclusions as to
whether a particular part of a chromosome represents a NUMT; and, if so, just
where that NUMT begins and ends.
It is possible by comparing the sequence of bases in NUMTs against the sequence of modern mtDNA and counting the
number of differences in the sequences to suggest a possible order for the
formation of NUMTs.
So when a sequence matches well against modern
The identification of NUMT sequences is of importance
to the study of genetic genealogy for two reasons. Firstly, it allows for suggestions to be made
as to which mutations might have occurred in the human mtDNA before the time
of ‘Mitochondrial Eve”, and secondly,
during the sequencing of human mtDNA laboratories need to take care so as not
to amplify NUMT sequences and mistake them for mitochondrial
The study undertaken for this paper is not primarily
concerned with the number of NUMTs and their
positions in the human genome, something previously considered in detail by Mourier (2001), Tourmen (2002), Woischnik (2002), Hazkani-Covo
(2003), Bensasson (2003), Mishmar
(2004), Ricchetti (2004), Hazkani-Covo
(2007), and most recently by Lascaro (2008). But instead this study concentrates on what
can be learnt from looking at the sequences themselves.
In particular, the study concentrates on the NUMT
sequences that contain matching sequences to the coding sequences for the 22
Transfer RNA’s found in modern mtDNA. In the mtDNA there is one tRNA sequence for each of 18 amino acids and two tRNA sequences for each of the amino acids, leucine and serine.
Each of the tRNAs
can be represented as having a two-dimensional “cloverleaf” structure with
stems and loops. Figure 1 shows
the suggested structures for two of the tRNAs. All of the tRNA’s
have a similar structure, but the sequences are sufficiently different from
each other that they are easily distinguished.

Figure
1 The
two-dimensional structures for the t-RNAs isoleucine and cysteine.
Methods
Early studies of NUMTs
relied on the actual sequencing of chromosomal sequences (for an example of
this method, see Herrnstadt, 1999). But with the publication of the Human Genome,
and the genomes of several other species, it is now possible to identify NUMTs using computer search programs.
The genome sequences for the human - Homo sapiens sapiens, the chimpanzee - Pan troglodytes,
and the Rhesus monkey - Macaca mulatta are to be found on the web site: http://www.ncbi.nlm.nih.gov/mapview/.
For this study the genome sequences were examined for NUMTs using the Basic Local Alignment and Search Tool or
BLAST, and in particular the “BLASTN: Compare Nucleotide Sequences” program (Altschul, 1990).
In most instances the searches were made on the reference
only sequences as they are the sequences that have been shown to be common
to the various assemblies and can be assigned to the different chromosomes.
At present reference only sequences are available for:
Homo sapiens sapiens – build 36.3 – 368
sequences, covering 2,870,843,926 bases,
Pan troglodytes – build 2.1 – 32,296 sequences,
covering 3,010,437,433 bases, and
Macaca mulatta – build 1.1 –
124,049 sequences, covering 3,011,952,279 bases.
The program BLASTN was used to compare nucleotide
sequences. Initially the program was
used with its default values. However,
the default Expect value of 0.01 limits the program to reporting
only close matches, while using an Expect value of 10 can allow
chromosomal sequences that match less well to be reported.
In the Advanced options it is also possible to
change the Word Size and this makes the matching algorithm less
sensitive. The default value is W11,
but using the parameter at its limit of W4 can be useful, however this
does make the program take a much longer time for each comparison.
Initially, the search string used with BLASTN was the
whole sequence of the Cambridge Reference Sequence (
Table 1 gives the
names of the amino acids, the locations of their corresponding tRNAs in the
Table 1
The 22 tRNA
Coding Sequences in the

Results
The results of the present study are given here in
three sections.
NUMTs that match tRNA sequences.
NUMTs
of “recent origin”
NUMTs
of “distant origin”
NUMTs
that Match tRNA Sequences
For each tRNA sequence in
the
As an example, Table 2 shows the results of
searching the human genome for NUMTs that match the
sequence for the tRNA for the amino acid alanine. The table
identifies 32 NUMTs that satisfy the search
criteria. The NUMTs
vary from having part of their sequence matching exactly, to having a sequence in which
about a fifth of the bases have changed.
The table contains only those NUMTs with a
sequence that covers the whole of the tRNA
sequence. There are other NUMT sequences
which match partially, but for the purpose of this paper they have been
excluded.
Table
2
NUMTs That Match the tRNA
Sequence in

It was found that the BLAST program did not produce
the complete set of matches in a single run when the modern mtDNA sequence is
used as a search string. However, when
these matches were in turn used as search strings it was possible to find further
matches. This procedure was then repeated
again and again until no more sequences were found.
For the tRNA for alanine there are 2 NUMTs with
sequences that do not show any variation from the
Table 3 shows a
similar pattern of NUMTs was produced for the amino
acid arginine.
In this instance there are 27 NUMTs that have
been identified, but none is of a “recent origin."
Table
3
NUMTs That Match the tRNA
Sequence in

NUMTs of “Recent Origin”
In the human genome there is only one large NUMT of
“recent origin” and this was first identified by Herrnstadt
(1999). The NUMT was presumably formed
after the split with the chimpanzee as it is only to be found in the human genome, and is
not in the genomes of either the chimpanzee or the rhesus monkey. The hominid in whom
this occurred lived prior to “Mitochondrial Eve,” since this NUMT is more
divergent from

Figure
2. Formation of the “Herrnstadt” NUMT. Initially, the mtDNA was only found in
mitochondria, but the partial destruction of a mtDNA ring led to the passage of
a fragment into the nucleus where it became incorporated into chromosome 1.
Table 4
shows there are 85 differences between this NUMT and the


On chromosome 14 there is a second, but much smaller,
NUMT of “recent origin.” This NUMT is 1,021 bases in length and matches against
the


The recent paper by Hazkani-Covo
and Covo (2008) gives a list of NUMTs
of “recent origin” - most of which are very short in
length and do not match against a complete tRNA
sequence. But for reasons that are not
totally clear, the two NUMTs discussed above are not
on the list.
NUMTs
of “Distant Origin”
The sequence of bases in a NUMT of “recent origin”
matches the
Tables 2 and 3 show the
details of NUMTs with sequences that match against
the tRNAs of alanine and arginine; and it is
possible to prepare a detailed analysis for any individual NUMTs. However, there are some NUMTs
of particular interest as it has been possible to show that there are NUMTs that can be found in the genome of Homo sapiens
The best example of this type of NUMT that is common to
the Human, Chimpanzee and Rhesus Monkey has been found on Chromosome 21. This NUMT of length 1851 bases corresponds to
the part of the mtDNA containing the tRNAs
for tryptophan, alanine, asparagine, cysteine and
tyrosine. In the Chimpanzee, Pan troglodytes, the whole of the NUMT is also found on Chromosome
21. However in the Rhesus Monkey, Macaca Mulatta where there is no Chromosome 21, it is found on
Chromosome 3.
The sequence from the genome of Homo sapiens shows a
considerable number of differences from the
The details of this NUMT are shown in Table 6.
Table 6
A NUMT of “Distant Origin” on Human Chromosome 21




Whereas the NUMT on chromosome 21 has been found to be
the largest NUMT that is common to the Human, Chimpanzee and Rhesus Monkey,
there are several others smaller NUMTs of this type.
Table 7
gives the details of a further 5 NUMTs that are found
on the Human chromosomes 3, 4, 8, and X.

Discussion
This paper has concentrated on identifying NUMTs in the human genome by using the BLAST program to
find matches against tRNA sequences in modern
mtDNA. This technique has led to the
identification of several NUMTs which are common to
the genomes of the Human, the Chimpanzee and the Rhesus Monkey. But developing these ideas has only been
possible by considering the published findings in various papers that have
appeared over the last few years. Actual
quotations from the papers are shown in italics.
The early researchers used a laboratory system which
involved using bacterial clones, specially prepared primers and direct
sequencing. This method was very
laborious, but nevertheless, was quite successful.
For example, Nomiyama (1985)
used this system to identify 2 NUMTs, subsequently
shown to be located on chromosome 3 (GenBank numbers X2226, M12298); and even
then it was clear that NUMTs were old as the author
suggested these 2 NUMTS “were transferred from mitochondria into nuclei about 12 and 15 millions of
years ago, respectively.”
Later Herrnstadt (1999) used a
similar method to identify a NUMT on chromosome 1 (GenBank number
AF134583). This NUMT was shown to have a
length 5,841 nucleotide bases. The
authors were able to link the NUMT to “a very distal portion of Chromosome 1” and in their
discussion they recognise that their NUMT was of a very recent origin and said “ it is estimated that this sequence was transferred to the
nucleus during evolution long after the divergence of humans from other
nonhuman primates." Although only the single NUMT was identified
in the study, the
paper did suggest the possibility of
there being other “hitherto unidentified numtDNA sequences."
By 2001, the method of identifying NUMTs
by searching Human
In 2002 a paper from France (Tourmen,
2002) suggested there were 286 NUMTs and stressed “Some pseudogenes
[NUMTs] appeared highly modified, containing inversions,
deletions, duplications, and displaced sequences."
Later, a paper from the
In 2003, a paper from
In a paper from the
Mishmar (2004) was able to identify 247 NUMTs
and discusses how it is possible by looking for selected mutations to determine
if one NUMT is more ancient than another.
The author suggests “nuclear mtDNA pseudogenes are genetic fossils
that reflect our past."
Later, Richetti (2004) was
able to identify 211 NUMTs. The paper is also interesting as the author
made the suggestion that “NUMT integrations preferentially target coding or regulatory sequences."
The paper of Schmitz, et al. (2005) is rather different
to the earlier papers as it discusses “the evolutionary pathway of a pseudogene
which separated from the corresponding mitochondrial gene more than 40 mya [million years ago]." Their study
concentrated on the larger of the ‘Nomiyama’ NUMTs (GenBank number X02226). The authors suggest that “numt sequences provide a much more
reliable base for dating” [than] “molecular dating based on primate mtDNA."
More recently, Hazkani-Covo
(2007), produced a survey of NUMTs
common to both human and the chimpanzee.
But, the researchers did not report any NUMTs
found also in the rhesus monkey.
Lascaro (2008) has produced a compilation of the 90 longest NUMTs found in the human genome. But in the present author’s opinion the
actual figures given for the start and finishing points for the NUMTs are still inaccurate.
In particular, the data from Lascaro has not
taken note of the parts of NUMT sequences that match to tRNA
sequences and this has resulted in many of the NUMTs
being reported as having lengths which are much less than they really are. Nevertheless, Lascaro’s
compilation is far more accurate than earlier attempts.
Finally, Covo (2008) discusses
just how NUMTs might be formed by the inclusion of
mtDNA material following breaks in chromosomal
The present study reports the result of carefully
matching the respective parts of NUMT sequences against the coding area of tRNA sequences in modern mtDNA.
This has shown that there are a few NUMTs
of “recent
origin”—that is of NUMTs formed since the branching
off of the human evolutionary line from the rhesus monkey and the chimpanzee.
But more importantly the study has shown that there is a
small number of NUMTs that are common to the genomes
of the human, chimpanzee and the rhesus monkey.
These NUMTs have a date of formation which
predates the branching of these primates from the human evolutionary line.
The study also shows that there is not as yet a consensus
view as to which parts of the human genome are NUMTs,
and thereby have an origin in the mitochondrial
However, the search for NUMTs
continues and the results presented in this paper are based on an analysis of
the genomes that are currently available.
There is a lot more yet to be discovered about NUMTs
in the human genome.
References
Richly E,
Leister D (2004) NUMTs in sequenced eukaryotic genomes. Mol Biol Evol, 21:1081-1084.