A Comprehensive Analysis of mtDNA Haplogroup J

 

Jim Logan

 

 

Abstract

 

In the furtherance of a better understanding of human genetic origins and migration history, the Federal GenBank database was mined for all Haplogroup J full-genome mtDNA sequences plus additional sequences that are complete for the coding region.  These data were used to develop a phylogeny for Haplogroup J using a matrix developed to show polymorphisms for each sequence organized within clades of the haplogroup.  The diversity within clades was then used to compute estimates of the age of each clade.  In the process, polymorphisms were analyzed to show their relationship to various genes as well as their relationship to selected medical conditions as reported in the literature.  Finally, the literature was reviewed for relevant phylogeographic data toward the ultimate development of a comprehensive history for human mtDNA Haplogroup J.

 

 

 

 

Address for correspondence:  Jim Logan, jjlnv@comcast.net

 

Received:  August 19, 2008; accepted:  September 27, 2008

 

 

 

Introduction

 

The analysis of mitochondrial DNA (mtDNA) has made significant contributions to the understanding of human evolution and migration.  Using restriction enzyme analysis techniques, i.e., comparing restriction fragment length polymorphisms (RFLP), it has been shown that a natural clustering of test results (Torroni et al., 1992) could be used to group the DNA samples into what were later called haplogroups, and thus infer broad genetic backgrounds (Richards et al., 1996).  As techniques were refined and it became feasible to do direct sequencing of significant segments of the mtDNA molecule, this clustering was refined and haplogroup classification motifs were developed (Richards et al., 1998).  As the database grew the classification structure was extended and relationships between haplogroups were better defined.  See Logan (2008) for a historical review of this process as it specifically relates to mtDNA Haplogroup J.  One of the purposes of this paper is to present a refined classification structure for Haplogroup J.

 

Until very recently, sequencing has typically been limited to the control region (displacement loop) of the mtDNA genome, which contains two hypervariable regions (i.e., regions of significantly higher mutation rates than the coding region) which provided relatively more information for a given length of sequence.  It was soon discovered, however, that these hyper variable regions have significantly higher instances of back mutations and homoplasies, (i.e., the occurrence of a given polymorphism in more that one haplogroup or even clades of the same haplogroup), thus leading to ambiguities and uncertainties for haplogroup assignment.  Some studies then turned to the use of selected markers from the coding region for the broad classification into haplogroups and then used results of sequencing the hypervariable regions to develop the clade structure within the haplogroup.  This has been successful for many purposes, but can lead to errors in specific haplogroups.  For example, although polymorphisms at nucleotide positions (nps) 16126 and 16069 are adequate for identifying Haplogroup J, and sequencing the complete control region can provide some substructure for J1, there are no polymorphisms within the hypervariable region 1 (HVR1) to cleanly differentiate the J2 clades from those in J1 (Logan, 2008).  Whether the purpose was to study a geographic region, a specific disease, or some other purpose, scientists have now published a sufficient number of full-genome sequences to permit a multi-level phylogeny for Haplogroup J and develop estimates of ages of the various clades.

 

Methods

 

All mtDNA sequence data used in the current analysis was extracted from the GenBank database maintained by the National Center for Biotechnology Information (NCBI) of the National Institutes of Health. (See Benson et al., 2007 for a description).  The broadest available representation of current worldwide population of haplogroup J was achieved by selecting every full-genome sequence (FGS) plus those sequences that were complete except for the control region.  Identification of the sequences used in this study is given in Table 1.   The Greasemonkey utility (Logan, 2007) was used to identify the sequences to extract and to list the mutations exhibited by each of these sequences.  Note that the term mutation is generally limited to the difference between a single sequence and some reference; when multiple sequences are analyzed and differences are found relative to a reference site, polymorphism is generally the preferred term.

 

 

Table 1

Studies Cited and the Geographic Locations of the Haplogroup J Sequences Used in the Present Study

 

 

 

Of the 156 sequences selected, 111 are the same sequences used in the previous study (Logan, 2008), seven represent full sequences added to GenBank since that study, and 38 are sequences omitted from the earlier study because there was missing data within the control region thought to be critical in developing the initial J phylogeny.  Verification criteria for membership in Haplogroup J include the following:  All 156 sequences contained the A12612G and G13708A polymorphisms and all except three contained A10398G.  In addition all 118 truly full-genome sequences had C295T, C16069T, and T16126C in the control region and all except one sequence had a T489C mutation (Logan, 2008).

 

Each of these sequences was parsed and a matrix was developed to include a column for each sequence and a row for each polymorphism identified.  This matrix is the reference for both a detailed analysis of the polymorphisms (including a survey of medical relationships) and the refinement of the Haplogroup J phylogeny.  These data were used to compute the average number of polymorphisms in each branch of the phylogeny and to estimate the age of the clades in the phylogeny.

 

However, certain limitations of this matrix and its origins should be noted.  First, although the ethnic origins of donors were generally from European populations or from those located in the western or southern regions of Asia, they do not represent a formally stratified sample.  Although it is assumed that the dataset is adequate for development of a phylogenetic tree and an initial estimation of the age of the clades, any conclusions from the geographic distributions calculated from this data should be used with caution.

 

Second, there was no uniformity in the DNA collection process used by the various researchers.  Many of the early studies extracted blood from the participants and then extracted DNA from various components of that blood (Torroni et al., 1992).  In other samples, especially those from studies looking for relationships between DNA and disease, the DNA was extracted from biopsies of muscle tissue or even brain tissue (Zsurka, 2004).  Some of the latest studies extracted DNA using buccal swabs or mouth washes as collection processes.  Although any of these processes can be expected to include mutations passed down through the germ line, they will vary relative to the presence of somatic mutations and heteroplasmies, especially in testing of older donors.

 

Third, there is no data about either the age or gender of the donor.  As shown in the section on Medical Implications below, both factors are significant in analysis of certain diseases and in longevity studies.

 

Analysis of the Polymorphisms

 

The development of a phylogeographic analysis is dependent on both good geographic data and characterization of the DNA haplotypes occurring in each region.  However, the polymorphisms observed include both mutations passed down from generation to generation, as well as those that occur within a given organism.  Furthermore, a DNA sample extracted from a given tissue involves multiple cells, each of which has multiple mitochondria, which in turn have multiple DNA molecules (Jobling et al., 2004).  Since these mitochondria reproduce independently, they also mutate independently producing heteroplasmies, which may or may not be reported in testing or may be of too low level to be detected.  Finally, the frequency and type of these mutations are influenced by the type of tissue (blood, muscle, brain, etc.) being used as the DNA source and the age of the organism.  See Rand (2001) for a more detailed discussion. Thus, depending on these factors as well as the technology being used for extracting and sequencing the DNA, and reporting standards, there can be significant differences in test results, even from the same individual.  Therefore, the statistical data about polymorphisms given in this paper and their use in developing the Haplogroup J phylogeny and age estimates must be considered as initial findings.  These findings should to be refined as further data sets are developed with proper stratification for tissue tested, geographic origin, and both gender and age of the participants.  A complete list of all observed polymorphisms, their location within various genes, and whether or not they are synonymous, is presented in the supplementary material in conjunction with the data organized for development of the phylogeny.

 

Analysis of 156 sequences of Haplogroup J identified 411 distinct polymorphisms, of which 106 were observed three or more times.  The 243 singletons and 62 doubletons, representing almost three-fourths of the total polymorphisms observed, are apparently rare within J.  Although these rare polymorphisms are not significant in defining the basal phylogeny, they are useful for inferring the ages of the clades of that phylogeny, and as additional DNA samples are accumulated and matched with genealogical and archaeological data, they may be significant in pinpointing geographic origins and migrations of specific families.

 

The 16569 base-pair length of the Cambridge Reference Sequence (CRS) encodes 13 genes important to cell metabolism, two ribosomal RNA genes important to transcription and translation of the mitochondrial genome, and 22 transfer RNA genes important in assembling specific amino acids into the products of the mitochondrial genome (Anderson 1981).  It also contains one major non-coding region (i.e., the control region) and several very small non-coding sequences within the coding region.  Any small variation in the mtDNA sequence can have biological significance depending on several factors:  (1) the type of polymorphism (e.g., a simple nucleotide substitution versus an insertion or deletion), (2) its position within the transcription sequence (e.g., its position within a codon that codes for a given amino acid within the protein to which the gene translates), and (3) the specifics of the change, such the substitution of a T for a C.  In order to assess these factors (and others) the set of 411 polymorphisms were subjected to a detailed analysis.

 

Polymorphisms can be arranged into two major categories–those that involve point substitution (transitions, transversions, and heteroplasmies) and those that affect the length of the sequence (insertions and deletions, or indels).  If a mutation of the first category occurs within a gene, that mutation has the potential for making a change in the protein for which the gene encodes and thus affecting the phenotype.  However, since there is redundancy in the genetic code, many of these mutations (referred to as synonymous mutations) do not result in an amino acid substitution and, thus no change in the protein for which the gene codes.  A mutation of the second category occurring within a gene results in a shift in the reading frame which can cause a complete failure of the production of the prospective protein.  On the other hand, with the possible exception of interfering with replication of the DNA itself, polymorphisms occurring in one of the non-coding regions have no known effect.  The effects of mutations on the genes that code for ribosomal RNA, have not been researched here.

 

A catalog of polymorphisms was developed from the results of comparing each sequence in the reference database with the revised Cambridge Reference Sequence (Andrews, 1999).  The sequences in the reference database, as identified in Table 1, were extracted from GenBank (Benson et al., 2007) and the polymorphisms were identified through the use of Greasemonkey (Logan, 2007).  The mtDB database (Ingman et al., 2006) was then searched for each polymorphism to determine the functional locus within the mtDNA genome, and where appropriate, the codon affected, the position on that codon and any resulting change in the encoded protein.  For those polymorphisms not cataloged in mtDB, that database was nevertheless helpful as a general guide in identifying the appropriate codon in the reference rCRS (Genbank sequence AC_000021); the Human Mitochondrial Genetic Code from Table 1 found in Anderson et al. (1981) was then used to determine the change in the amino acid.  The catalog includes a complete list of polymorphisms, their locations on the respective gene or RNA, implied change in the resulting amino acid residues on associated protein, and, where appropriate, the point on the phylogeny where it is most significant.

 

A summary of the type of polymorphism versus its locus class is provided in Table 2.  Note, however, that polymorphisms in the control region are probably under-reported since some of the sequences in the reference database were not complete in that region.  Of the 411 polymorphisms detected, only 20 (4%) were insertions or deletions (indels) and these occurred primarily in the non-coding region with a few occurring within the region that codes for the ribosomal RNA.  The fact that none occurred in either the genes or in the transfer RNAs is probably due to the deleterious effects that would result and thus would not be passed along in the germ line.

 

 

Table 2

Distributions of Types of Polymorphisms Across the Mitochondrial Genome

 

 

 

Of the 20 indels detected, one deletion and three insertions occurred within regions defining ribosomal RNA.  Each of these is associated with a successive repeat sequence within that RNA and thus impact would be expected to be minimal.  For example at positions 2141 through 2149 of the revised Cambridge Reference Sequence (rCRS) there is a pattern of four AG repeats.  The insertion shown as 2149.1A and 2149.2G simply extends the length of this repeat sequence to five repeats.  All remaining indels occurred in non-coding regions and all but three of these are also associated with repeat sequences.  For example, at locations 514 through 523 of the rCRS there is a pattern of five CA repeats, CACACACACA.  There are eight instances of C522 and A523 deletes, reducing the length to four, but there are also two instances of a 523.1C and 523.2A, extending the length to six, and one instance of a 523.1C and 523.2C  (See Hurst (2007) for further discussion on length heteroplasmies).

 

Most of the insertions observed were associated with repeats of a single nucleotide type – most commonly a C.  For example the 309.1C insertion was observed 48 times in the sample set of 118 full genome sequences.  This insertion relates to the well known sequence from 303 through 315 of the rCRS which consists of a sequence of seven C repeats followed by a T and this followed by five C repeats, CCCCCCCTCCCCC.  The 309.1C indicates that there was the insertion of a C after position 309–that is, insertion of a C somewhere before the T in the above sequence.  Associated with the same sequence there were also insertions 309.2C, 310.1T and 315.1C.  

 

Of the substitutions, the vast majority (89 % of the total) were simple transitions where a purine was substituted for a purine or a pyrimidine was substituted for a pyrimidine.  A little over 4% of the substitutions, however, were transversions (mostly singletons) where a purine was substituted for a pyrimidine or visa versa.  Less than 2% were heteroplasmies – a polymorphism within a single organism where the state at a given locus in some DNA molecules was different from the corresponding state in other molecules.  Six of the heteroplasmies occurred in the non-coding region and two in the regions that encode for a ribosomal RNA.  It should be noted that heteroplasmies are typically unbalanced with one variant dominating the other.  It is thus likely that other heteroplasmies were present in the test sequences but went undetected.  For males, their heteroplasmies cannot be passed on.  For females, there is potential for them to be passed to offspring and descendants either subsequently reverting back to “wild” state or stabilizing to a new state.  The significance of such heteroplasmies is thus gender dependent, but such data not available from GenBank.

 

Table 3 shows how each of these polymorphism types were distributed throughout the various segments of the mitochondrial genome.  Note that due to several small overlaps in segment definitions, the lengths of the segments add to slightly greater that the 16569 base pair length of the rCRS genome.  As an indication of variability of polymorphisms across the genome, the table also shows the polymorphism density defined as the ratio of the number of polymorphisms within a gene or region divided by the length of that sequence.  Note that considering the small numbers involved, the density of polymorphisms throughout the genes encoding for proteins is fairly uniform with an average of 2.1% compared to the 8.6% for the control regions.  This four-to-one ratio is no doubt low because of the incompleteness of some of the available sequences as described above.  The frequency of polymorphism in the genes for ribosomal RNA is somewhat lower at 1.0%.  The control region, which accounts for less than 7% of the mtDNA genome, produced over 23% of the polymorphisms.

 

 

 

Table 3

Statistical Distribution of Polymorphisms for Various Regions of the Mitochondrial Genome

 

 

 

Medical Implications

 

A single nucleotide change within a sequence can cause deleterious or advantageous changes in the performance of mitochondrial-coded products (e.g., proteins).  Such changes can be inherited through the gene line from mother to child or they may occur somatically within selected tissues of the individual.  Several recent studies have shown correlation between the frequency of selective mutations and a variety of diseases and longevity itself.  Such correlation, however, does not necessarily imply a cause and effect relationship.  There are very complex relationships between the workings of mitochondrial DNA and nuclear DNA that are not well understood (Carelli, 2003).  In the concluding remarks of their paper Santoro et al. (2006) stated that

 

Aging and longevity, as complex traits having a significant genetic component, likely depend on many nuclear gene variants interacting with mtDNA variability, both inherited and somatic.  We also surmise that what we hypothesize for aging and longevity could have more general relevance and be extended to other complex trains, such as age-related diseases like cardiovascular diseases and diabetes . . .

 

and both Alzheimer’s Disease and Parkinson’s Disease.  The description of such nuclear and mitochondrial DNA interactions is beyond the scope of this paper.  This section simply describes a few major medical conditions that have been found at elevated (or reduced) frequency within the mtDNA Haplogroup J population.  The only polymorphisms considered here are the ones that actually appeared in the reference database; they are summarized in Table 4.  The disease associations were those available from MitoMap (Ruiz-Pesini et al., 2007).

 

 

 

Table 4

Polymorphisms Observed in the Haplogroup J Reference Database that Have Been Reported as Associated with mtDNA-Related Diseases

 

 

 

 

In a study of the relationships between mtDNA polymorphisms and aging, De Benedictis et al. (1999), found that 23% a group of centurions in northern Italy were Haplogroup J, whereas only 2% of a control group of younger persons were Haplogroup J.  This contrasted with the results of Haplogroup U that showed centurions were about 2% versus 23.5% for the control group.  A subsequent study (Dato et al., 2004) concluded that this effect was population specific since comparable statistics were not found in southern Italy.  Ongoing research will likely show that the Haplogroup J population of northern Italy has a higher percentage of J2 than that of southern Italy.  As shown in the chart, the polymorphism found to be most significantly related to longevity within J is C150T and as shown below, that polymorphism value is also an indicator for J2.  This differentiation likely resulted over many years of separation as one group migrated from the Near East through central Europe and ultimately into northern Italy (with a significant percentage of J2) and the other migrated through the coastal areas of the Mediterranean with a lower percentage of J2.  It should be pointed out that the C150T itself (and similarly A10398G) may not have any effect on longevity but rather are markers that are simply statistically correlated.

 

In a similar study of an Irish population (Ross et al., 2001), Haplogroup J was singled out for special study of longevity.  No significant association was found when considering that haplogroup as a whole.  However, when they separated the samples into two categories based on restriction fragment analysis, they found that one category had a much higher frequency of centenarians than that the control group whereas the other had a much lower frequency.  Then, in a later paper (Ross et al., 2003), and using the same population, they looked specifically at Parkinson’s disease.  They found of the 12% of the population that was diseased, 2% were in one J group whereas 10%% were in the other J group.  They called the first group J1 and the second J2 but unfortunately, their subdivision cannot be correlated with the subclades of J found in the present study since the polymorphic restriction sites have not been identified or to correspond to any polymorphisms found in the reference database.

 

In a related study of the control region only, Zhang et al. (2003) looked at 207 subjects from Northern, Central, and Southern Italy and found that centenarians and twins had a significantly higher percentage of C150T transitions compared to controls.  Based on analysis of multiple tissue types and comparison of twins, as well as longitudinal studies, they concluded the C150T transition can be inherited but it can also occur somatically with age.  In considering the possible impact of the C150T transition they noted its proximity to the secondary origin of the replication of the heavy strand of the mtDNA molecule.  They found T152C to be fairly common occurrence along with the C150T and also a few T146C in proximity. Further analysis suggested that

 

The somatic event(s) at or near position 150 transition may be part of a general remodeling of the mtDNA replication machinery, probably nuclearly controlled.  This remodeling could accelerate mtDNA replication and compensate for the oxidative damage of mtDNA and its functional deterioration occurring in old age.

 

The current study found that T150C occurred exclusively in the J2 subclade of Haplogroup J and is thus a strong indicator of that subclade, although not definitive.  The reason for this phenomenon has not been determined.

 

The latest available study to look at the relationship between longevity and Haplogroup J found no significance in the Ashkenazi Jewish centenarians relative to their control group (Shlush, et al, 2008).  Although they referenced the study by Zhang (2003), who pointed out the possible significance of the polymorphism 150C, they missed an opportunity for follow-up testing in their well defined and well understood study population.  Unfortunately, 150 is not within the narrow range of the control region they sequenced (16024-16300).  Similarly, they would be required to acquire additional test data to permit them to assess the possible borader relationship between longevity and the J2 clade for which 150C is an indicator.

 

The disease most commonly associated with mtDNA Haplogroup J is Leber’s Hereditary Optic Neuropathy (LHON), also known as Leber Optic Atrophy (LOA).  This disease occurs about five times more frequently in Haplogroup J than it does in the general population (Torroni et al., 1997).  LHON is a maternally inherited disease that presents itself in adolescence or adulthood and can lead to partial or total blindness (Wallace 1988).  Although some twenty-five mtDNA variants have been observed to be related, the primary mutations are G3460A, G11778A, and T14484C (Brown et al., 2002).  One or another of these mutations is found in ninety percent of the families with LHON, although they rarely occur together (John Hopkins, 2008).  Of the 156 sequences in the reference database, G11778A occurred four times (twice in J1c4 and twice in J1d), T14484C occurred twice (once in J1d and once in J2b1), and G3460A occurred once in J1c5.  MitoMap (Ruiz-Pesini et al., 2007) also listed two reports of progressive dystonia as associated with LHON and specifically with G11778A.  The insulin resistance associated with T4216C may just be due to that position being a point mutation for the super-haplogroup JT.

 

Within the Haplogroup J population, the polymorphism most commonly associated with either Parkinson’s or Alzheimer’s disease is G5460A, which, incidentally, is one of the two definitive coding region markers that define subclade J1b1.  In addition both Parkinson’s and Alzheimer’s are highly correlated with deterioration of mitochondrial performance, brought on by increasing frequency of polymorphisms, many, or most of which are in heteroplasmic form.

 

MitoMap showed a relationship between the T11084C polymorphism and the disease MELAS (mitochondrial myopathy, encephalopathy, lactic acidosis, and stroke-like episodes).  A search of the associated bibliography showed only a weak statistical association and that the most common polymorphism for the disease is at position 3243, which was not observed in the reference database.  Finally, T16189C has been reported as being associated with various diseases including type 2 diabetes, cardiomyopathy, and endometrial cancer.  No bibliographic references were provided to support these reports.

 

There is a major study currently underway in Europe which is intended to clarify these relationships and identify others (Franceschi et al., 2007).  This “5-year European EU-Integrated Project” is entitled “Genetics of Healthy Aging (GEHA)” and constituted by 25 partner organizations “to identify genes involved in healthy aging and longevity, which allow individuals to survive to advanced old age in good cognitive and physical function and in the absence of major age-related diseases.”  By agreement of the participating partners, it is scheduled to end April 30, 2009.  Results should be forthcoming soon.

 

A Refined Phylogeny

 

An initial phylogeny for mtDNA haplogroup J was presented in an earlier paper (Logan, 2008).  A slightly refined phylogeny is presented here and includes results of analyzing seven full genome sequences added to GenBank since the earlier analysis plus 34 sequences that are complete for the coding region but not complete in the control region.  The inclusion of these last 45 sequences does not cause any changes in the primary structure but does permit identification of some detail at the extremities.

 

As described in the earlier paper, this phylogeny was developed using a maximum parsimony approach ignoring insertions and deletions (see Analysis of The Polymorphisms above).  In addition, the polymorphisms located at sites 16311 and 16519 were excluded from the analysis as being too variable to be useful.  However, Hagelberg (2003) has suggested that 16311, and possibly 16519, could be the result of ancient recombination.  No recent study has been found to support this hypothesis.  Future research may ultimately show utility of these polymorphisms.

 

The refined phylogeny is present in graphic form in Figure 1.  The supporting data is available in the supplementary files.  Note that this chart includes polymorphisms that are in parentheses or are underlined to indicate special conditions.  For example the 185 and 228 shown as markers for J1d are both in parentheses because they appear to be subject to back mutations with neither of them appearing in all samples for the J1c clade, nor either of them defining a proper subclade of J1c.  However, of the 74 full-genome sequences that are classified as J1c, all but two include one or both of these markers and there is only one occurrence outside the J1c subclades.  Similarly the polymorphisms at 152 and 16193, shown in conjunction with subclades J1c, appear to have originated more than once within the haplogroup. These and similar special markers are included to be used as classification aids for cases that are not full genome sequences, but do have sequences from the control region.

 

Age of The Clades

 

One of the first uses of molecular biology to determine the age of the human species was just over 40 years ago.  Sarich and Wilson (1967) looked at the variations of serum albumins (a blood protein) in humans and non-human primates and concluded that the split between homo, chimpanzee, and gorilla was approximately 5 to 8 million years ago.  For calibration, they used the assumption that hominoids in general separated from the old world monkeys 30 million years ago.  Within a decade of that study, techniques were sufficiently developed to analyze the DNA itself.  Using restriction fragment techniques to analyze samples from baboon, macaques, guenon, and human samples, Brown (1979) estimated that the average mutation rate of mtDNA was about 2% per site per million years.

 

Before another decade was complete, excitement was aroused in the press and anthropology community when Cann et al. (1987) used mtDNA variations to propose that the current human population “stems from one woman who is postulated to have lived about 200,000 years ago, probably in Africa."  This woman is commonly referred to as “Mitochondrial Eve."  The important concept here is that of a molecular clock.  Since that time, there have been numerous studies that apply this concept to estimate the age of various selective populations (e.g., Stoneking et al., 1986; Wills, 1995).  Other studies have looked at the variations of mutation rate for selective regions of the mtDNA molecule (e.g., control region vs coding region) and relative to the effect on corresponding coding region (e.g., synonymous versus non synonymous mutations).  Ingman et al. (2000) concluded that the control region “has not evolved at a constant rate across all human lineages and is consequently not suitable for dating evolutionary events."  Restricting their analysis to the coding region and using a divergence time between humans and chimpanzees of 5 Myr, they proposed a mutation rate of 1.70 x 10-8 substitutions per site.  In a follow-on study, Ingman and Gyllenstein (2001) analyzed the mutation rate for individual genes.  The average of these mutation rates is 1.26 x 10-8.  It is not clear why the average computed by data in the second paper is different from that in the first.  Both studies were based on 53 samples chosen to be representative of 14 major linguistic phyla in an attempt to avoid bias inherent in selecting individuals based on current population size and geographic location.

 

Subsequently, Mishmar et al. (2003) used the 53 sequences of Ingman and Gullenstein, but added 48 from African, Asian, European, Siberian and North American populations, to conclude that there are significant differences between geographic populations caused by natural selection brought on by differences in climate and diet.  Comparing the ratio of non-synonymous to synonymous mutations within the various genes, they found significant differences between tropical, temperate, and arctic-based populations.  Based on estimated coalescence dates for various haplogroups, they estimated the mtDNA evolution rate to be 1.26 x 10-8 substitutions per nucleotide per year.

 

An alternate basis for calibration of substitution rates was demonstrated by Stoneking et al. (1992; 2005) by capitalizing on a founding event to analyze the population of Papua New Guinea.  Their analysis showed that this population had a well defined start date that could be estimated and, further, that population had developed into the current population in relative isolation.  They arrived at a “rate of human mtDNA evolution” that was in good agreement with the 2-4% per million previously proposed.  Atkinson et al. (2008) built on this idea, assumed a founding date of 45,000 years ago, and developed a substitution rate of 1.691 x 10-8 substitutions/site/year for the coding region.

 

The studies described above estimated mutation rates based on evolutionary models with calibration typically based on assumed date of separation between humans and chimpanzees.  Attempts have also been made to compute mutation rates directly from pedigree data.  Early divergence estimates were typically obtained using family data developed for disease studies and consisting of very small sample sizes relative to the rates being estimated.  Nevertheless, the general conclusion was that divergence rates for pedigree data were approximately an order of magnitude higher that evolutionary rates (e.g., Howell et al., 2003.)  However, as described by Rand (2001), there are many factors that should be considered in stating a final substitution rates.  Taking into considerations the gender of the donor, whether the polymorphism appeared to be germ line vs somatic, whether or not the polymorphisms had become fixed, Santos et al. (2008) showed that evolutionary substitution rates and pedigree substitution rates could be reconciled.

 

This is a good point to note the imprecision of terminology between mutation rates and substitution rates.  Mutation rate has to do with the actual change in a DNA molecule with time, whereas substitution rate, as used here, has to do with the observable difference with respect to some reference.  Many mutations are never observable in testing done for purposes of population genetics.  On the other hand, testing of specific tissues may reveal mutations (including heteroplasmies) that have developed somatically as the organism ages.

 

The problem of calibration and the variability of mutation rates across the mitochondrial genome have been studied in some detail by Endicott and Ho (2008).   Eventually we will be able to account for more of the variability in our analysis.  In the meantime, the present work takes a very straightforward but simplified approach for computing the ages of clades of mtDNA Haplogroup J.  A substitution rate of 1.7 x 10-8 substitutions/site/year for the coding region was chosen as representative of the literature.  Using 15447 for the number of base-pairs in the coding region, this converts to 3808 years per substitution.  For each clade the mean length of the branches (i.e., the average number of substitutions observed back to the defining polymorphisms) is multiplied by this factor.  The result is an estimate of the coalescence time, or Time to the Most Recent Common Ancestor (TMRCA) of the members of that clade.  The result of these computations is given in Table 5 and shown on a time-scaled phylogeny in Figure 2.  It should be noted, the standard deviation of length, and subsequently the range of ages estimates, is related to the variability of the data; it is not a confidence interval relative to the estimated age.

 

 

 

Table 5

Estimated ages of the clades of mtDNA Haplogroup J

 

 

 

 

 

Figure 2.  Estimated ages of the clades of mtDNA Haplogroup J

 

 

 

These ages should be taken as indicating the approximate relative ages of the clades.  The astute reader will notice anomalies within these ages.  For example, mechanistic computations produced an age for J2 and J2a that are somewhat older than J as the complete clade.  This is an artifact of the randomness of mutations and the relative small number of polymorphisms available for defining subclades. It should also be remembered that an overall average substitution rate cannot take into account the fact that mutations do not occur uniformly across the genome, nor can they take into account the fact that clades experienced different migration patterns and thus are subject to different mutation pressures from different climates and diets.  Furthermore, since the database used in the analysis was drawn from GenBank, and is thus opportunistic, it most certainly does not represent a random sampling of the current population of the haplogroup.

 

After describing caveats in their extensive review of status of mutation rates, Bandelt et al. (2006) concluded that the

 

. . . extreme form of weighting that only accepts the coding region but rejects the entire control region is at best provisional and certainly not recommended in the long run.  An informed strategy would use rules to decode on a site-by-site basis and contrast synonymous with non-synonymous mutations.

 

The technology and data should be available to do such a study in the next few years.  For example, data collected in association with the Genographic Project has been used to develop substitution rates for a few selected polymorphisms within the coding region (Rosset et al., 2008).

 

Origins and Migrations

 

There is general agreement that there have been three major movements in the peopling of Europe and numerous smaller ones.  The first major one, of course, was the initial entry into Europe of anatomically modern humans during the Paleolithic period.  Although there is ongoing debate about exact paths through the Near East, there is growing agreement that the ultimate origin of this first set of migrations and initial colonization of Europe was from Africa.  The second major set of migrations were from glacial refugia back into the northern regions of Europe where population had been decimated by the Last Glacial Maximum.  The third is the inclusion of at least a limited number of migrants associated with the waves of advance of culture, such as agriculture, that occurred during the Neolithic period.  The current challenge is to use genetics to develop details about these major movements, to progressively identify and describe the many lesser movements, and to integrate these results with results from other disciplines such as archaeology and linguistics.

 

One approach to develop such details is the use of genetics and founder analysis to identify populations, date them through using substitution rates for calibration, and analyze the associated geographic data (Stoneking et al., 1992).  Phylogeographic analysis, that is the geographic profile of clusters of haplotypes, can provide the basis for inferring geographic origins of selected populations, and probably migration paths.  Such inferences take on additional importance in anthropology and population genetics when they are supported by studies from archaeology, climatology, ecology, and linguistics.

 

One of the earliest uses of the founder analysis approach was the work of Torroni et al. (1992), which concluded that the Amerind and Nadene populations Native Americans were primarily from two independent migrations that probably occurred several thousand years apart.  However, using the modern technique of Bayesian skyline plot analysis (Drummond et al., 2005), Mulligan et al. (2008) have developed a three-stage model for the peopling of the Americas; this was one long migration sequence that included three identifiable stages: (1) divergence of Amerind ancestor from the Asian gene pool, (2) a prolonged period of isolation, and (3) rapid expansion into the Americas with a large population increase.

 

Comas et al. (1997) demonstrated the potential of mtDNA founder analysis when they analyzed data from nine distinct European and West Asian populations and performed analyses to identify statistical similarities between them.  Each population came from published samples from a different research team that focused on a specific geographic area, including a Basque, British, Sardinian, Swiss, Tuscan, Bulgarian, two different Turkish, and a Middle Eastern region.  Although differences appeared to be quite low when compared to other world populations (e.g., Africa, Asia, Polynesian, and Native America), a neighbor-joining network developed from the differences turned out to be robust.  That network showed “an approximately east-west gradient, with the Middle East at one extreme, followed by the two Turkish samples, most European populations, and, at the other end, the Basque sample."

 

A large-scale phylogeographic study of mtDNA in Western Europe by Richards et al. (1998) proposed that approximately 85% of 757 individuals tested had their origins in the European Upper Paleolithic whereas the ancestors of the other 15% had arrived more recently from the Near East–predominantly from the haplogroup cluster JT.  In an earlier paper (Richards, 1996) they provided the observation that their cluster 2B (now known as Haplogroup T) was relatively uniform throughout Europe with a concentration of about 8%, but their cluster 2A (now know as Haplogroup J) varied widely with a range of about 2% in the Basques up to 22% in Cornwall.  They made the observation that it is “in the Middle East, but not elsewhere in the world, we find two missing ancestral haplotypes that link the western and central European clusters” and that the ancestral haplotype is found “in the Middle East but in none of the European samples or elsewhere."  Thus, they suggest that Haplogroup J originated in the Middle East and that several different lineages migrated into Europe, splitting into the western and central European clusters but having little impact on the Iberian Peninsula, especially the Basque country."  They also developed phylogenetic network for each of several haplotype clusters, including Haplogroups J and T and from these concluded the overall age of Haplogroup J was approximately 28,000 years old, originating in the Middle East and arriving in Europe “when the Neolithic economy spread into Europe starting about 10,000 years ago,” with components evolving between 8,000 and 6,500 years ago.  Although they provided additional detail for subclusters J1 and J2, it has since been determined that their use of HVR1 data for classification is inadequate (Palanichamy et al., 2004; Logan, 2008) and thus their conclusions are not accurate.  In a later review, Richards (2003) expressed the opinion that the main Neolithic  founders were “likely to have been members of Haplogroups J and T1, but that the “contribution of the Neolithic Near Eastern lineages to the gene pool of modern Europeans was around a quarter or less."

 

Using a much expanded study group, Richards et al. (2000) “formalized the procedure for founder analysis, investigated the extent of confounding recurrent gene flow between the putative source and derived populations, and developed criteria that take into account the effects of both gene flow and recurrent mutations."  Among their results, they refined the overall age of Haplogroup J to 42,400-53,700 years as determined from the Near East samples and to 23,000-27,400 years as determined from European samples.  The corresponding ages for Haplogroup T are 41,900-52,000 and 33,100-40,200 respectively.  Although these two clades were apparently contemporary in the Near East, they clearly had different migration patterns into Europe.  Their origins within the Middle East have not been established.

 

In an attempt to identify and describe the effects on mtDNA of “demographic phenomena dating back to the Paleolithic, the Mesolithic, or the Neolithic” periods, Simoni et al. (2000) collected 2619 mtDNA sequences for HVR1 distributed over 36 regions of Europe.  Although the sample size was relatively small in some regions, they developed an overall table of frequencies for the major haplogroups in each of the regions.  No occurrences of Haplogroup J were identified is several regions such as Norway and Saami.  However, the highs of 17% and 15.9% found in Iceland and Cornwall, respectively, are similar to the frequency of 16.7% in the Near East--its presumed origin.  Similarly, Haplogroup T was not observed in Norway or Saami, with the highs of 25.5% occurring in Georgia and 21.7% in the Italian Alps.  The Near East frequency was 11.9%.

 

There is not yet available a comprehensive founder analysis for Haplogroups J or T throughout Europe.  However, some inferences can be made about a few very specific geographic areas.  One such localized study is the one by Alfanso-Sanchez et al. (2006) that studied the Swanetia region of Georgia.  Georgia is in the Caucasus region between Asia and Europe and is considered on the major migration routes from the Near East, yet it is interesting that Haplogroup J did not make the top ten but her sister, Haplogroup T, was one of four that appeared greater than 10% in the population.  This could be indicative that the two haplogroups took significantly different migration paths from the Near East into Europe.  It is also likely that this indicates that there was a considerable geographic separation of the origin of these two haplogroups, thus causing the different routes.

 

The origin of Haplogroups J and T in the Middle East and their Neolithic expansion into Europe are well known but the exact origins and specific migration patterns have yet to be established.  There is little doubt that the Jordan area was a significant element in the story.  In a study of 101 samples from Amman, Jordan and 44 samples from the area of the Dead Sea in the Jordan Valley, Gonzalez et al. (2008) analyzed haplogroup frequencies for these locations and compared them with 23 other Middle Eastern populations.  The Amman population was found to have typical haplogroup diversity with the frequencies of Haplogroups J and T at about 6% and 10% respectively, whereas the diversity of the Dead Sea population was found to be very limited–it was devoid of Haplogroup T and had only a few J1 samples.  The final conclusion of the authors of the study was that “although the Levant is a proven crossroad of bi-directional migrations between Africa and West Asia, some geographic areas, such as the Dead Sea area, and social isolates, such as the Druze, have generally resisted that human traffic.”

 

Malyarchuk and his associates did a series of studies of Eastern European populations relating to the origin of the Slavs:  Russians and Ukrainians (Malyarchuk and Derenko, 2001), Poles and Russians (Malyarchuk et al., 2002), Bosnians and Slovenians (Malyarchuk et al., 2003), and Czechs (Malyarchuk et al., 2006).  In each of these studies they found that most of the mtDNA found belonged to western haplogroups (H, HV, J, T, U, N1, W, and X).  Within this broad similarity, they did find heterogeneity between regions with a very broad north-south correlation between their test populations and the corresponding regions to the west.  The overall frequencies of Haplogroups J and T found in each region are shown in Table 6.

 

 

 

Table 6

Frequency of Haplogroups J and T within Eastern Europe

 

 

 

Malyarchuk and his associates also investigated the origin of the Roma (Gypsies) in Poland (Malyarchuk et al., 2005) and Slovakia (Malyarchuk et al., 2008).  The most interesting result of these two studies relative to Haplogroups J and T was the complete absence of T in the Polish population, but 13 of the 69 samples (18.8%) were Haplogroup J.  All 13 J samples had the same HVR1/HVR2 signature–an obvious founder effect.  The Slovak percentages were more typical with 9.2% concentration of Haplogroup J in the Roma population compared to 9.6% in the general population, but for Haplogroup T, the frequency of 10.6% was nearly twice that of the general population. 

 

Technology of extraction and analysis of mtDNA has progressed to the point where studies of ancient DNA (aDNA) are increasingly reported.  Iazgirre and de la Rue (1999) reported on the extraction and coding-region RFLP classification of mtDNA from 121 dental samples from four prehistoric Basque sites.  Radiocarbon dating places these samples between approximately 3400 and 5000 years before present.  Ignoring the site with a very small sample size, they found that one site had no Haplogroup J or T but the other two sites had a frequency of 15.9% and 16.7% for Haplogroup J and 4.8% and 16.7% for a Haplogroup T-X combination.  Some years later, an expanded team (Alzualde et al., 2005) then looked at a different site in the Basque area, dated to the sixth and seventh century AD.  They were able to sequence the HVR1 for 48 of the 67 that they classified using RFLP analysis.  They found frequencies of 16.7 % for Haplogroup J and 10.8% for the Haplogroup T-X combination.  These were comparable to the values for the previous aDNA but significantly higher than the 2.4% and 6% that had been previously reported for the extant population.  They suggest that based on lineage J as a mark of migration of Neolithic population from the Near East, that this heterogeneity within the Basque regions shows that “adoption of Neolithic culture followed different paths within the same region,” and that certain inferences based solely on the frequencies in present day populations do not appear to be correct."  From further analysis of the DNA and associated archaeology, Azuslde (2006) concluded by stressing “the importance of ancient NDA data for reconstructing the biological history of human populations, rendering it possible to verify certain hypotheses based solely on current population data."  Further they questioned “the generally accepted belief that, since ancient times, the influence of other human groups has been very scarce in the Basque Country." 

 

However, a recent study was conduced to provide “a more complete characterization of the mitochondiral genome variability of the Basques” (Alfonso-Sanchez et al., 2008).  They sequenced HVR1 and HVR2 of 55 healthy men selected to be non-related based on a three-generation pedigree charts.  The most interesting result from that study was the high frequency of J, especially J1c and J2a with frequencies of 10.9% and 3.6% respectively.  This 14.5% total J is in sharp contrast to the 2.4% commonly referenced for the Basques.  On the other hand, it is in line both with the results from ancient DNA and what would be expected when compared with other local regions from the north of Iberia.  The complex pattern of spatial heterogeneity is likely to be the result of “restricted gene flow, and accordingly, population fragmentation and reproductive isolation.”

 

Richards et al. (2000) were cited above as the team that formalized founder analysis of populations using mtDNA data.  Thirty-five team members were represented as co-authors of that paper and the supplementary data they produced deserves a more detailed review.  Their database (Macaulay, 2001) includes results of HVR1 analysis of 4100 samples from 24 widely distributed regions of the Near East and Europe.  The team found 1451 different haplotypes and assigned them to haplogroups using the then existing mtDNA motif classification system.  The results included 384 samples of Haplogroup J, or 9.37% of the total, and 363 samples for Haplogroup T, or 8.85%.

 

For the present study sample sizes and counts for Haplogroups J and T were extracted from the Macaulay database for each geographical region, and the frequencies of the corresponding haplogroups were computed.  The results are shown in Table 7.  The average length for each of the haplogroups is also shown.

 

 

 

 

Table 7

Summary of Geographic Analysis of Haplogroup Data Extracted from Macaulay (2000)

 

 

 

Maps showing the frequency by regions for Haplogroups J and T are presented in Figures 3 and 4, respectively.  However, caution must be exercised to avoid reading too much into these maps.  In addition to dealing with relative small numbers for some regions, many other factors must be taken into consideration before actual migration paths can be drawn.  More specifically, the current frequency in any given region is affected by many factors: population movements do not necessarily produce smooth gradients, but may instead represent movements for relative long distances in an irregular manner; there are back migrations; a population may be decimated by natural disasters or diseases; etc.  For example, a casual glance at the map for J might suggest that it originated in what is now Yemen and expanded through the Balkans and on to northwest Europe with spurs on both sides of this line.  However, a look at the length of the corresponding haplotype branches suggests that the oldest populations are in the Bedouin, Iranian, and Kurdish regions, supporting the idea of a Middle East origin, but further north.

 

 

 

 

Figure 3.  Relative frequency (in percent) of Haplogroup J as derived from (Macaulay 2001)

 

 

 

 

 

 

Figure 4.  Relative frequency (in percent) of Haplogroup T as derived from (Macaulay 2001)

 

 

 

 

This review of the literature concerning the origins of the clades is representative but is certainly not exhaustive.  More work is required to integrate results, but more importantly, new research is required to provide more data and more complete data.  There are several reasons for this.

 

First, studies have not kept up with the technology.  For some geographical areas, the only results available are from RFLP analysis.  In other studies the sequence data was limited to HVR1, sometimes complemented with RFLP typing and selective sequencing.  Very few results are available for the entire mitochondrial genome.

 

Second, knowledge of the general phylogeny of Haplogroup J is still evolving and consensus has not yet been reached.  The HVR1 motifs used by most of the available studies are not adequate for high-resolution classification of Haplogroup J, only for identifying a haplogroup as a whole.  Most studies are not even distinguishing J2 from J1.  Furthermore, errors have been identified, but not all later studies have recognized these errors, or at least have not taken them into consideration consistently.

 

Third, most basic research is geographically very limited in scope, but then comparisons are made with data from studies of other geographic areas--studies that may be inconsistent in purpose.

 

Fourth, global databases (such as GenBank) are a great asset for comparing sequences, but are not structured to capture context data beyond literature citations.  Founder analysis, for example, requires location.   Supplementary databases are needed to cross reference each DNA sample to geographic location of source, any associated archaeological context (e.g., dating of skeletal remains), significant pedigree data (including location and dates), as available, etc.

 

With time, the improvements will be made, but of course, the technology will have moved on.  Nevertheless, the author, for one, expects to continue to review the literature for data relative to a better understanding of Haplogroup J and performing analyses toward that understanding, including refinement of the classification structure, development of expanded databases, and integration of pieces into a global anthropology.

 

Conclusions

 

The work described in this paper is a work-in-progress.  It provides a broad review of available data concerning mtDNA Haplogroup J and tries to contribute to the evolving knowledge by developing a phylogeny and associated age estimates.  It must be noted, however, that the quality of the product is limited by the techniques employed.  For example, as stated above, a single mutation rate cannot adequately represent the entire genome.  Future analyses should consider both the differences across the various types of gene (e.g., coding for protein versus RNA) and even specific genes.  For genes that encode proteins, the analysis should differentiate those polymorphisms that affect amino acid sequences from those that do not.  Currently, neither the size of the database, nor knowledge of various mutation rates, were adequate to take these issues into consideration.

 

Furthermore, as illustrated by the discussion of Origins and Migrations, just the tip of the iceberg has been addressed.  Much work is needed to bring together and integrate the many ongoing relevant studies.  For example, no attempt has yet been made to analyze population size growth for Haplogroup J.  The potential for such analysis can be seen in the study by Atkinson et al. (2008).  They employed the Bayesian skyline plot (BSP) with simulation (Drummond et al., 2005) to “simultaneously estimate a posterior probability distribution for the ancestral genealogy, branch lengths, substitution model parameters, and population parameters through time.  Such analyses can then be integrated with the archaeological record, legend, and recorded history to develop a more complete story of Haplogroup J.

 

New studies are required, with the data needs to be developed and integrated.  A single project that is both focused on Haplogroup J (and T) and of broad geographic scope may not be feasible at this time.  However, it is hoped that a consortium might develop to permit multiple researchers to contribute to an appropriately designed comprehensive project.  The author is currently administrating a public discussion group and associated file exchange to further the cause.  Interested persons may join through the link to the mtDNA Haplogroup J Project shown under Web Resources, below.

 

Supplementary Material

 

Supplementary data is available at:

 

http://www.jogg.info/42/logansuppl.xls

 

Web Resources

 

http://tech.groups.yahoo.com/group/J-mtDNA/

mtDNA Haplogroup J Project

http://www.mitomap.org

Human Mitochondrial Genome Database

 

http://www.genpat.uu.se/mtDB/

MtDB: Human Mitochondrial Genome Database

 

References

 

Alfanso-Sanchez MA, Martinez-Bouzas C, Castro A, Pena JA, Fernandez-Fernandez I, Herrera RJ, de Pancorbo MM (2006)  Sequence polymorphisms in the mtDNA control region in a human isolate: the Georgians from Swanetia.  J Hum Genet, 51:429-439.

 

Alfanso-Sanchez MA, Cardoso S, Martinez-Bouzas C, Pena JA, Herrera RJ, Castro A, Fernandez-Fernandez I, de Pancorbo MM (2008)  Mitochondrial DNA haplogroup diversity in Basques: a reassessment based on HVI and HVII polymorphisms.  Am J Hum Bio, 20:154-164.

 

Alzualde A, Izagirre N, Alonso S, Alonso A, de la Rua C (2005)  Temporal Mitochondrial DNA Variation in the Basque Country: Influence of Post-Neolithic Events.  Ann Hum Genet, 69:665-679.

 

Alzualde A, Izagirre N, Alonse S, Alonso A, Albarran C, Azkarate A,  de la Rua C (2006)  Insights Into the isolation of the Basques: mtDNA lineages from the historical site of Aldaieta. Am J Phys Anthropol, 130:394-404.

 

Anderson S, Bankier AT, Barrell BG, de Bruijn MHL, Coulson AR, Drouin J, Eperon IC, Nierlich DP, Roe BA, Sanger F, Schreier PH, Smith AJH, Staden R, Young IG (1981)  Sequence and organization of the human mitochondrial genome. Nature,  290:457-465.

 

Andrews RM, Hubacka I, Chinnery PF, Lightowlers RN, Turnbull DM, Howell N (1999)  Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA.  Nature Genet, 23:147.

 

Annunen-Basilla J, Finnila S, Mykkanen K, Moilanen JS, Veijola J, Poyhonen M, Viitanen M, Kalimo H, Majamaa K (2006)  Mitochondrial DNA sequence variation and mutation rate in patients with CADASIL.  Neurogenetics, 7:185-194.

 

Atkinson QD., Gray RD, Drummond AJ (2008)  mtDNA variation predicts population size in humans and reveals a major southern Asian chapter in human prehistory.  Mol Biol Evol, 252:468-474.

 

Bandelt HJ, Kong QP, Richards M, Macaulay V (2006)  Estimates of mutation rates and coalescence times.  In: Bandelt HJ, Macaulay V, Richards M (Eds.)  Nucleic Acids and Molecular Biology, Vol. 18, Springer-Verlag.

 

de Benedictis G, Rose G, Carreiri G, de Luca M, Falcone E, Passarino G, Bonafe M, Monti D, Baffio G, Bertolini S, Mari D, Mattace R, Franceschi C (1999)  Mitochondrial DNA inherited variants are associated with successful aging and longevity in humans.  FESEB J, 13:1532-1536.

 

Benson DA, Karsch-Mizrachi I, Lipman DJ , Ostell J, Wheeler DL (2007)  GenBank.  Nuc Acids Res, 35:D21-D25  (Database Issue).   The database is available at the following URL:

http://www.ncbi.nlm.nih.gov/sites/entrez?db=nuccore.

 

Brown MD, Starikovskaya E, Derbeneva O, Seyed Hosseini S, Allen JC, Mikhailovskaya IE, Sukernik RI, Wallace DC (2002)  The role of mtDNA background in disease expression: a new primary LHON mutation associated with Western Eurasian Haplogroup J.  Hum Genet, 110:130-138.

 

Brown WM, George M, Wilson AC (1979)  Rapid evolution of animal mitochondrial DNA.  Proc Nat Acad Sci (USA), 764:1967-1971.

 

Cann RL, Stoneking M, Wilson AC (1987)  Mitochondrial DNA and human evolution.  Nature, 325:31-36. 

 

Carelli V, Giordano C, d’Amati G (2003)  Pathogenic expression of homoplasmic mtDNA mutations needs a complex nuclear-mitochondrial interaction.  Trends Genet, 19:257-262. 

 

Carelli V, Achilli A, Valentino ML, Rengo C, Semino O, Pala M, Olivieri A, Mattiazzi M, Pallotti F, Carrara F, Zeviani M, Leuzzi V, Carducci C, Valle G, Simionati B, Mendieta L, Salomao S, Belfort R, Sadun AA, Torroni A (2006)  Haplogroup effects and recombination of mitochondrial DNA: Novel clues from the analysis of Leber hereditary optic neuropathy pedigrees.  Am J Hum Genet, 78:564-574.

 

Coble MD, Just RS, O’Callaghan JE, Letmanyl IH, Peterson CT, Irwin JA, Parsons TJ (2004)  Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians.  Int J Legal Med, 118:137-146.

 

Comas D, Calafell F, Mateu E, Perez-Lezaun A, Bosch E, Bertranpetit J (1997)  Mitochondiral DNA variation and the origin of the Europeans.  Hum Genet, 99:443-449.

 

Dato S, Passarino G, Rose G, Altomare K, Bellizzi D, Mari V, Feraco E, Franceschi C,  de Benedictis G (2004)  Association of mitochondrial DNA haplogroup J with longevity is population specific.  Eur J Hum Gen, 12:1080-1082.

 

Detjen K. A., S. Tinschert, D, Kaufmann, B. Algermissen, P. Nurnberg, and M. Schuelke (2006)  Identical mitochondrial DNA between monozygous twins with discordant neurofibromatosis type 1 phenotype, unpublished.

 

Drummond AJ, Rambaut F, Shapior B, Pybus OG (2005) Bayesian Coalescent Inference of Past Population Dynamics from Molecular Sequences.  Mol Biol Evol, 22(5):1185-1192.

 

Elson JL, Majamaa K, Howell N, Chinnery PF (2007)  Associating mitochondrial DNA variation with complex traits.  Am J Hum Genet, 80:378-381.

 

Endicott P, Ho SYW (2008)  A Bayesian evaluation of human mitochondrial substitution rates.  Am J Hum Genet, 82:895-902.

 

Franceschi C, and 23 coauthors (2007)  Genetics of healthy aging in Europe: the EU-integrated project GEHA.  Ann NY Acad Sci, 1100:21-45.

 

Fraumene C, Belle EMS, Castri L, Sanna S, Mancosu G, Cosso M, Marras F, Barbujani G, Pirastu M, Angius A (2006)  High resolution analysis and phylogenetic network construction using complete mtDNA sequences in Sardinian genetic isolates.  Mol Biol Evol, 23:2101-2111.

 

Gasparre G, Porcelli AM, Bonora E, Pennisi LF, Toller M, Iommarini L, Ghelli A, Moretti M, Betts CM, Martinelli GN, Ceroni AR, Curcio F, Carelli V, Rugolo M, Tallini G, Romeo G (2007)  Disruptive mitochondrial DNA mutations in complex I subunits are markers of oncocytic phenotype in thyroid tumors, Proc Nat Acad Sci (USA), 104(21):9001-9008.

 

Gonder MK, Mortesen HM, Reed FA, de Sousa A, Tiskhoff SA (2007)  Whole-mtDNA genome sequence analysis of ancient African lineages.  Mol Biol Evol, 24:757-768.

 

Gonzalez AM, Karadsheh N, Maca-Meyer N, Glores G, Cabrera VM, Larruga HM (2008)  Mitochondrial DNA variation in Jordanians and their genetic relationship to other Middle East populations.  Ann Hum Bio, 35:212-231.

 

Greenspan B (2007)  Direct submission of Family Tree DNA full sequence mtDNA test results to GenBank.

 

Hagelberg E (2003)  Recombination or mutation rate heterogeneity?  Implications for Mitochondrial Eve.  Trends in Genetics, 19:84-90.

 

Hartmann A., M. Thieme, L. K. Nanduri, T. Stempfl, C. Moehle, T. Kivisild, Oefner PJ (2008)  Validation of microarray-based sequencing of 93 worldwide mitochondrial genomes. Unpublished.

 

Herrnstadt C, Elson JL, Fahy E, Preston G, Turnbull DM, Anderson C, Ghosh SS, Olefsky JM, Beal MF, Davis RE, Howell N (2007)  Reduced-median-network analysis of complete mitochondrial DNA coding-region sequences for the major African, Asian, and European haplogroups.  Am J Hum Genet, 70:1152-1171.  See also Elson (2007) for an update of the phylogeny.

 

Howell N, Smejkal CB, Mackey DA, Chinnery PF, Turnbull DM, Herrnstadt C (2003)  The pedigree rate of sequence divergence in the human mitochondrial genome: there is a difference between phylogenetic and pedigree rates.  Am J Hum Genet, 72:659-670.

 

Hurst WR (2007)  Mitochondrial DNA control-region mutations at position 514-524 in Haplogroup K and beyond.  J Genet Geneal, 3:47-62.

 

Ingman M, Kaessmann H, Paabo S, Gyllensten U (2000)  Mitochondrial genome variation and the origin of modern humans.  Nature, 408:708-713.

 

Ingman M, Gyllensten U (2001)  Analysis of the complete human mtDNA genome: Methodology and inferences for human evolution.  J Heredity, 92):454-461.

 

Ingman M, Gyllensten U (2006)  MtDB: Human Mitochondrial Genome Database, a resource for population genetics and medical sciences.  Nucleic Acids Res, 34:D749-D751.  The database is available at http://www.genpat.uu.se/mtDB/.

 

Ingman H Gyllensten U (2007)  Rate variation between mitochondrial domains and adaptive evolution in humans.  Hum Mol Genet, 16:2281-2287.

 

Izagirre N, de la Rua C (1999)  An mtDNA analysis in ancient Basque populations: implications for Haplogroup V as a marker for the major Paleolithic expansion from southwestern Europe.  Am J Hum Genet, 65:199-207.

 

John Hopkins University (2008)  Leber Hereditary Optic Neuropathy, LHON. Online Mendelian Inheritance in Man.

 

Jobling MA, Hurles ME, Tyler-Smith C (2004)  Human Evolutionary Genetics, Garland Publishing, New York and Oxford.

 

Kivisild T, Shen P, Wall DP, Do B, Sung R, Davis K, Passarino G, Underhill PA, Scharfe C, Torroni A, Scozzari R, Modiano D, Coppa A, de Knijff P, Feldman M, Cavalli-Sforza LL, Oefner PJ (2006)  The role of selection in the evolution of human mitochondrial genomes.  Genetics, 172:272-287.

 

Logan Ian (2007)  Mitochondrial DNA (mtDNA), website at http://www.ianlogan.co.uk/mtDNA.htm.

 

Logan J (2008)  The subclades of mtDNA Haplogroup J and proposed motifs for assigning control-region sequences into these clades.  J Genet Geneol, 4:12-26.

 

Maca-Meyer N, Gonzalez AM, Larruga JM, Flores C, Cabrera VM (2001)  Major genetic mitochondrial lineages delineate early human expansions. BMC Genetics, 2:13

 

Macaulay V (2001) “Supplementary data from Richards et al. (2000),” available at http://www.stats.gla.ac.uk/~vincent/founder2000/index.html.

 

Malyarchuk BA, Derenko MV (2001) Mitochondrial DNA variability in Russian and Ukrainians: implication to the origin of the Eastern Slavs.  Ann Hum Genet, 65:63-78.

 

Malyarchuk BA, Grzybowski T, Derenko MV, Czarny J, Wozniak M, Miscicka-Sliwka D (2002)  Mithchondrial DNA variability in Poles and Russians.  Ann Hum Genet, 66:261-283.

 

Malyarchuk BA, Grzybowski T, Derenko MV, Czarny J, Drobnic K, Miscicka-Sliwka D (2003)  Mitochondrial DNA variability in Bosnians and Slovenians.  Ann Hum Genet, 67:412-427.

 

Malyarchuk BA, Grzybowski T, Derenko MV, Czarny J, Miscicka-Sliwka D (2005)  Mitochondrial DNA diversity in the Polish Roma.  Ann Hum Genet, 70:195-206.

 

Malyarchuk BA, Perkova MA, Derenko MV, Vanecek T, Lazur J, Gomolcak P (2008)  Mithchondrial DNA variability in Slovaks, with application to the Roma origin.  Ann Hum Genet, 72:228-240.

 

Malyarchuk BA, Venecek T, Perkova MA, Derenko MV, Sip M (2006)  Mitochondrial DNA variability in the Czech population, with application to the ethnic history of Slavs.  Hum Biol, 78:581-696.

 

Mishmar D, Ruiz-Pesini E, Golick P, Macaulay V, Clark AG, Hosseini S, Brandon M, Easley MK, Chen E, Brown MD, Sukernik RI, Oickers A, Wallace DM (2003)  Natural selection shaped regional mtDNA variations in humans.  Proc Nat Acad Sci (USA), 100:171-176.

 

Mitomap – A human mitochondrial genome database (2008), http://www.mitomap.org/

 

Moilanen JS, Finnila A, Majamaa K (2003)  Lineage-specific selection in human mtDNA: Lack of polymorphisms in a segment of MTDN5 gene in Haplogroup J.  Mol Biol Evol, 20:2132-2142.

 

Mulligan CJ, Kitchen A, Miyamoto MM (2008)  Updated three-stage model for peopling of the Americas.  PLOS One, 3:e3199.

 

Palanichamy MG, Sun C, Agrawal S, Bandelt HJ, Kong QP, Khan F, Wang CY, Chaudhuri TK, Palla V, Zhang YP (2004)  Phylogeny of mitochondrial DNA macrohaplogroup N in India, based on complete sequencing: Implications for the peopling of South Asia.  Am J Hum Genet, 75:966-975.

 

Parsons TJ (2005)  Singular nucleotide polymorphisms over the entire mtDNA genome that increase the forensic discrimination of common HV1/HV2 types in ‘Hispanics.’  Unpublished.

 

Pereira L, Goncalves J, Franco-Duarte R, Silva J, Rocha T, Arnold C, Richards M, Macaulay V (2006)  No evidence for a mtDNA role in sperm motility: data from complete sequencing of asthenozoospermic males.  Mol Biol Evol, 24:868-874.

 

Rand DM. (2001) The units of selection of mitochondrial DNA.  Ann Rev Ecol Syst, 32:415-448.

 

Richards M, Corte-Real H, Forster P, Macaulay V, Wilkinson-Herbots H, Demaine A, Papiha S, Hedges R, Bandelt HJ, Sykes B (1996)  Paleolithic and Neolithic lineages in the European mitochondrial gene pool.  Am J Hum Genet, 59:185-203.  See also the critique by L L. Cavalli-Sforza and E. Minch (1997)  in 61:247-251 and the authors’ reply in 61:251-254.

 

Richards MB, Macaulay VA, Bandelt HJ,  Sykes BC (1998)  Phylogeography of mitochondrial DNA in western Europe.  Ann Hum Genet, 62:241-260.

 

Richards M, Macaulay V, and 35 others (2000)  Tracing European founder lineages in the Near Eastern mtDNA pool.  Am J Hum Genet, 67:1251-1276.

 

Richards M (2003)  The Neolithic invasion of Europe.  Annu Rev Anthropol, 32:135-162.

 

Ross OA, McCormack R, Curran MD, Duquid RA, Barnett YA, Rea IM, Middleton D (2001)  Mitochondrial DNA polymorphism: the role in longevity of the Irish population.  Exp Gerentol, 36:1161-1178.

 

Ross OA, McCormack R, Maxwell LD, Dugrud RA, Quinn DJ, Barnett YA, Rea IM. El-Agnaf OMA, Gibson JM, Wallace A, Middleton D, Curran MD (2003) mt4216C variant in linkage with the mtDNA TJ cluster may confer a susceptibility to mitochondrial dysfunction resulting in an increased risk of Parkinson’s disease in the Irish.  Exp Gerentol, 38:397-405.

 

Rosset S, Wells RS, Soria-Hernanz DF, Tyler-Smith C, Royyuru AK, Behar DM, Genographic Consortium (2008)  Maximum likelihood estimation of site-specific mutation rates in human mitochondrial DNA from partial phylogenetic classification.  Genetics, E-Published Articles Ahead of Print. doi:10.1534/genetics.108.091116 (Sep 14, 2008).

 

Ruiz-Pesini E, Lott MT, Procaccio V, Poole JC, Brandon MC, Mishmar D, Yi C, Kreuziger J, Baldi P, Wallace DC (2007)  An enhanced MITOMAP with a global mtDNA mutational phylogeny.  Nucl. Acids Res, 35:D823-D828.

 

Santoro A, Salvioli S, Raule N, Carpi M, Sevina F, Valensin S, Monti D, Bellizzi D, Passarino G, Rose G, Benedictic GD, Franceschi C (2006)  Mitochondrial DNA involvement in human longevity.  Biochimica et Biophysica Acta, 1757:1388-1399.

 

Santos C, Montiel R, Arruda A, Alverez L, Aluja MP, Lima M (2008) Mutation patterns of mtDNA: empirical inferences for the coding region.  BMC Evol Biol, 8:167.

 

Sarich VM, Wilson AC (1967) Inummunological time scale for hominid evolution.  Science 158:1200-1203.

 

Shlush LI, Atzmon G, Weisshof R, Behar D, Yudkovsky G, Barzilai N, Skorecki K (2008)  Ashkenazi Jewish Centenarians Do Not Demonstrate Enrichment in Mitochondrial Haplogroup J.  PlosOne, 3(10): e3425.

 

Simoni L, Calafell F, Pettener D, Bertranpetit J, Barbujani G (2000)  Geographic patters of mtDNA diversity in Europe.  Am J Hum Genet, 66:262-278. and Erratum. Am J Hum Genet, 66:1785.  See also the comments on the article, along with the authors’ response, inTorroni, et al. (2000)  Letter to the editor.  Am J Hum Genet 66:1173-1179.

 

Stoneking M,Bharia K, Wilson AC (1986)  Rate of sequence divergence estimated from restriction maps of mitochondrial DNAs from Papua New Guinea.  Cold Spring Harbor Symposia on Quantitative Biology,  Vol LI.  (On-line access available through JSTOR for members or through member organizations.)

 

Stoneking M, Sherry ST, Reed AJ, Vigilant L (1992)  New approaches to dating suggest a recent age for the human mtDNA ancestor.  Phil Trans R Soc. Lond, 337:167-175.

 

Torroni A, Schurr TG, Yang CC, Szathmary EJE, Williams RC, Schanfield MS, Troup GA, Knowler WC, Lawrence DN, Weiss KM, Wallace DC (1992)  Native American mitochondrial DNA analysis indicates that the Amerind and Nadine populations were founded by two independent migrations.  Genetics, 130:153-162.

 

Wallace DC, Singh G, Lott MT, Hodge JA, Shurr TG, Lezza AMS, Elsas LJ, Nikoskelainen EK (1988)  Mitochondrial DNA mutations associated with Leber’s hereditary optic neueropathy.  Science, 242:1427-1430.

 

Wills C (1995)  When did Eve live?  An evolutionary detective story.  Evolution, 49:593-607.

 

Zhang J, Asin-Cayuela J, Fish J, Michikawa Y, Bonafe M, Olivieri F, Passarine G, Benedictis GD, Franceschi C, Atardi G (2003) Strikingly higher frequency in centenarians and twins of mtDNA mutation causing remodeling of replication origin in leukocytes.  Proc Nat Acad Sci (USA), 100:1116-1121.

 

Zsurka G, Schroder R, Hornblum C, Rudolph J, Wiesner RJ, Elger CE, Krunz WS (2004)  Tissue dependent co-segregation of the novel pathogenic G12276A mitochondrial tRNALeu(CUN) mutation with the A185G D-loop polymorphism.  J Med Genet, 41:e124.