The Continuing Hunt for Nuclear Mitochondrial
for Nuclear Mitochondrial
Address for correspondence:
The 46 chromosomes in the Human genome contain many
hundreds of short sequences of bases that match sections of the
NUMTs are found in the chromosomes of most species (Richly, 2004), and a wide variety of species have been the subject of articles describing their NUMTs, including the domestic cat (Lopez, 1994; Antunes, 2007), and the ant (Martins, 2007).
A NUMT is formed by the incorporation of a fragment of the mtDNA into a chromosome. This type of event is very rare; but over a period of millions of years the number of times this has happened has becomes appreciable. The formation of a NUMT is essentially a random event and the fragment of mtDNA involved can be of any length, from just a few bases to many thousands of bases, and any of the chromosomes can be involved. In many ways NUMTs are considered to be “fossils” preserving the mtDNA sequence as it used to be at various times in our evolutionary past.
After formation a NUMT becomes an ordinary part of the
chromosome and the integrity of its
As a result of these processes, the sequences of most NUMTs differ considerably from the sequence of modern mtDNA and the identifying NUMTs can be considered to be a bit of a “treasure hunt." This has led to different researchers, unsurprisingly, coming to differing conclusions as to whether a particular part of a chromosome represents a NUMT; and, if so, just where that NUMT begins and ends.
It is possible by comparing the sequence of bases in NUMTs against the sequence of modern mtDNA and counting the
number of differences in the sequences to suggest a possible order for the
formation of NUMTs.
So when a sequence matches well against modern
The identification of NUMT sequences is of importance
to the study of genetic genealogy for two reasons. Firstly, it allows for suggestions to be made
as to which mutations might have occurred in the human mtDNA before the time
of ‘Mitochondrial Eve”, and secondly,
during the sequencing of human mtDNA laboratories need to take care so as not
to amplify NUMT sequences and mistake them for mitochondrial
The study undertaken for this paper is not primarily concerned with the number of NUMTs and their positions in the human genome, something previously considered in detail by Mourier (2001), Tourmen (2002), Woischnik (2002), Hazkani-Covo (2003), Bensasson (2003), Mishmar (2004), Ricchetti (2004), Hazkani-Covo (2007), and most recently by Lascaro (2008). But instead this study concentrates on what can be learnt from looking at the sequences themselves.
In particular, the study concentrates on the NUMT sequences that contain matching sequences to the coding sequences for the 22 Transfer RNA’s found in modern mtDNA. In the mtDNA there is one tRNA sequence for each of 18 amino acids and two tRNA sequences for each of the amino acids, leucine and serine.
Each of the tRNAs can be represented as having a two-dimensional “cloverleaf” structure with stems and loops. Figure 1 shows the suggested structures for two of the tRNAs. All of the tRNA’s have a similar structure, but the sequences are sufficiently different from each other that they are easily distinguished.
Figure 1 The two-dimensional structures for the t-RNAs isoleucine and cysteine.
Early studies of NUMTs relied on the actual sequencing of chromosomal sequences (for an example of this method, see Herrnstadt, 1999). But with the publication of the Human Genome, and the genomes of several other species, it is now possible to identify NUMTs using computer search programs.
The genome sequences for the human - Homo sapiens sapiens, the chimpanzee - Pan troglodytes, and the Rhesus monkey - Macaca mulatta are to be found on the web site: http://www.ncbi.nlm.nih.gov/mapview/.
For this study the genome sequences were examined for NUMTs using the Basic Local Alignment and Search Tool or BLAST, and in particular the “BLASTN: Compare Nucleotide Sequences” program (Altschul, 1990).
In most instances the searches were made on the reference only sequences as they are the sequences that have been shown to be common to the various assemblies and can be assigned to the different chromosomes.
At present reference only sequences are available for:
Homo sapiens sapiens – build 36.3 – 368 sequences, covering 2,870,843,926 bases,
Pan troglodytes – build 2.1 – 32,296 sequences, covering 3,010,437,433 bases, and
Macaca mulatta – build 1.1 – 124,049 sequences, covering 3,011,952,279 bases.
The program BLASTN was used to compare nucleotide sequences. Initially the program was used with its default values. However, the default Expect value of 0.01 limits the program to reporting only close matches, while using an Expect value of 10 can allow chromosomal sequences that match less well to be reported.
In the Advanced options it is also possible to change the Word Size and this makes the matching algorithm less sensitive. The default value is W11, but using the parameter at its limit of W4 can be useful, however this does make the program take a much longer time for each comparison.
Initially, the search string used with BLASTN was the
whole sequence of the Cambridge Reference Sequence (
Table 1 gives the
names of the amino acids, the locations of their corresponding tRNAs in the
The 22 tRNA
Coding Sequences in the
The results of the present study are given here in three sections.
NUMTs that match tRNA sequences.
NUMTs of “recent origin”
NUMTs of “distant origin”
NUMTs that Match tRNA Sequences
For each tRNA sequence in
As an example, Table 2 shows the results of searching the human genome for NUMTs that match the sequence for the tRNA for the amino acid alanine. The table identifies 32 NUMTs that satisfy the search criteria. The NUMTs vary from having part of their sequence matching exactly, to having a sequence in which about a fifth of the bases have changed. The table contains only those NUMTs with a sequence that covers the whole of the tRNA sequence. There are other NUMT sequences which match partially, but for the purpose of this paper they have been excluded.
NUMTs That Match the tRNA
It was found that the BLAST program did not produce the complete set of matches in a single run when the modern mtDNA sequence is used as a search string. However, when these matches were in turn used as search strings it was possible to find further matches. This procedure was then repeated again and again until no more sequences were found.
For the tRNA for alanine there are 2 NUMTs with
sequences that do not show any variation from the
Table 3 shows a similar pattern of NUMTs was produced for the amino acid arginine. In this instance there are 27 NUMTs that have been identified, but none is of a “recent origin."
NUMTs That Match the tRNA
NUMTs of “Recent Origin”
In the human genome there is only one large NUMT of
“recent origin” and this was first identified by Herrnstadt
(1999). The NUMT was presumably formed
after the split with the chimpanzee as it is only to be found in the human genome, and is
not in the genomes of either the chimpanzee or the rhesus monkey. The hominid in whom
this occurred lived prior to “Mitochondrial Eve,” since this NUMT is more
Figure 2. Formation of the “Herrnstadt” NUMT. Initially, the mtDNA was only found in mitochondria, but the partial destruction of a mtDNA ring led to the passage of a fragment into the nucleus where it became incorporated into chromosome 1.
shows there are 85 differences between this NUMT and the
On chromosome 14 there is a second, but much smaller,
NUMT of “recent origin.” This NUMT is 1,021 bases in length and matches against
The recent paper by Hazkani-Covo and Covo (2008) gives a list of NUMTs of “recent origin” - most of which are very short in length and do not match against a complete tRNA sequence. But for reasons that are not totally clear, the two NUMTs discussed above are not on the list.
NUMTs of “Distant Origin”
The sequence of bases in a NUMT of “recent origin”
Tables 2 and 3 show the
details of NUMTs with sequences that match against
the tRNAs of alanine and arginine; and it is
possible to prepare a detailed analysis for any individual NUMTs. However, there are some NUMTs
of particular interest as it has been possible to show that there are NUMTs that can be found in the genome of Homo sapiens
The best example of this type of NUMT that is common to the Human, Chimpanzee and Rhesus Monkey has been found on Chromosome 21. This NUMT of length 1851 bases corresponds to the part of the mtDNA containing the tRNAs for tryptophan, alanine, asparagine, cysteine and tyrosine. In the Chimpanzee, Pan troglodytes, the whole of the NUMT is also found on Chromosome 21. However in the Rhesus Monkey, Macaca Mulatta where there is no Chromosome 21, it is found on Chromosome 3.
The sequence from the genome of Homo sapiens shows a
considerable number of differences from the
The details of this NUMT are shown in Table 6.
A NUMT of “Distant Origin” on Human Chromosome 21
Whereas the NUMT on chromosome 21 has been found to be the largest NUMT that is common to the Human, Chimpanzee and Rhesus Monkey, there are several others smaller NUMTs of this type.
Table 7 gives the details of a further 5 NUMTs that are found on the Human chromosomes 3, 4, 8, and X.
This paper has concentrated on identifying NUMTs in the human genome by using the BLAST program to find matches against tRNA sequences in modern mtDNA. This technique has led to the identification of several NUMTs which are common to the genomes of the Human, the Chimpanzee and the Rhesus Monkey. But developing these ideas has only been possible by considering the published findings in various papers that have appeared over the last few years. Actual quotations from the papers are shown in italics.
The early researchers used a laboratory system which involved using bacterial clones, specially prepared primers and direct sequencing. This method was very laborious, but nevertheless, was quite successful.
For example, Nomiyama (1985) used this system to identify 2 NUMTs, subsequently shown to be located on chromosome 3 (GenBank numbers X2226, M12298); and even then it was clear that NUMTs were old as the author suggested these 2 NUMTS “were transferred from mitochondria into nuclei about 12 and 15 millions of years ago, respectively.”
Later Herrnstadt (1999) used a similar method to identify a NUMT on chromosome 1 (GenBank number AF134583). This NUMT was shown to have a length 5,841 nucleotide bases. The authors were able to link the NUMT to “a very distal portion of Chromosome 1” and in their discussion they recognise that their NUMT was of a very recent origin and said “ it is estimated that this sequence was transferred to the nucleus during evolution long after the divergence of humans from other nonhuman primates." Although only the single NUMT was identified in the study, the paper did suggest the possibility of there being other “hitherto unidentified numtDNA sequences."
By 2001, the method of identifying NUMTs
by searching Human
In 2002 a paper from France (Tourmen, 2002) suggested there were 286 NUMTs and stressed “Some pseudogenes [NUMTs] appeared highly modified, containing inversions, deletions, duplications, and displaced sequences."
Later, a paper from the
In 2003, a paper from
In a paper from the
Mishmar (2004) was able to identify 247 NUMTs and discusses how it is possible by looking for selected mutations to determine if one NUMT is more ancient than another. The author suggests “nuclear mtDNA pseudogenes are genetic fossils that reflect our past."
Later, Richetti (2004) was able to identify 211 NUMTs. The paper is also interesting as the author made the suggestion that “NUMT integrations preferentially target coding or regulatory sequences."
The paper of Schmitz, et al. (2005) is rather different to the earlier papers as it discusses “the evolutionary pathway of a pseudogene which separated from the corresponding mitochondrial gene more than 40 mya [million years ago]." Their study concentrated on the larger of the ‘Nomiyama’ NUMTs (GenBank number X02226). The authors suggest that “numt sequences provide a much more reliable base for dating” [than] “molecular dating based on primate mtDNA."
More recently, Hazkani-Covo (2007), produced a survey of NUMTs common to both human and the chimpanzee. But, the researchers did not report any NUMTs found also in the rhesus monkey.
Lascaro (2008) has produced a compilation of the 90 longest NUMTs found in the human genome. But in the present author’s opinion the actual figures given for the start and finishing points for the NUMTs are still inaccurate. In particular, the data from Lascaro has not taken note of the parts of NUMT sequences that match to tRNA sequences and this has resulted in many of the NUMTs being reported as having lengths which are much less than they really are. Nevertheless, Lascaro’s compilation is far more accurate than earlier attempts.
Finally, Covo (2008) discusses
just how NUMTs might be formed by the inclusion of
mtDNA material following breaks in chromosomal
The present study reports the result of carefully matching the respective parts of NUMT sequences against the coding area of tRNA sequences in modern mtDNA.
This has shown that there are a few NUMTs of “recent origin”—that is of NUMTs formed since the branching off of the human evolutionary line from the rhesus monkey and the chimpanzee.
But more importantly the study has shown that there is a small number of NUMTs that are common to the genomes of the human, chimpanzee and the rhesus monkey. These NUMTs have a date of formation which predates the branching of these primates from the human evolutionary line.
The study also shows that there is not as yet a consensus
view as to which parts of the human genome are NUMTs,
and thereby have an origin in the mitochondrial
However, the search for NUMTs continues and the results presented in this paper are based on an analysis of the genomes that are currently available. There is a lot more yet to be discovered about NUMTs in the human genome.
Hazkani-Covo E, Sorek R, Graur D (2003) Evolutionary dynamics of large Numts in the human genome: Rarity of independent insertions and abundance of post-insertion duplications. J Mol Evol, 56:169-174.
Lopez JV, Yuhki N, Masuda R, Modi W, O'Brien SJ(1994) Numt, a recent transfer and tandem amplification of mitochondrial DNA to the nuclear genome of the domestic cat. J Mol Evol, 39:174-190. Erratum in: J Mol Evol, 39:544.
Martins J Jr, Solomon SE, Mikheyev AS, Mueller UG, Ortiz A, Bacci M Jr (2007) Nuclear mitochondrial-like sequences in ants: evidence from Atta cephalotes (Formicidae: Attini). Insect Mol Biol, 16:777-784.
Mishmar D, Ruiz-Pesini E, Brandon M, Wallace DC (2004) Mitochondrial DNA-like sequences in the nucleus (NUMTs): insights into our African origins and the mechanism of foreign DNA integration. Hum Mutat, 23:125-133.