A Refined Phylogeny for mtDNA Haplogroup J
Abstract
This short report presents an
updated version of the phylogeny for mtDNA Haplogroup J based on 253 full
genome sequences plus 38 sequences that are complete in the coding region but
incomplete in the control region.
Address
for correspondence:
Received:
Introduction
A
preliminary phylogeny for Haplogroup J was presented in “The Subclades of mtDNA
Haplogroup J” (
Having
established an initial structure for Haplogroup J, based on 111 full genome
sequences, the next step was
to develop a broader perspective on the haplogroup and the
results of this work were presented in the article, “A Comprehensive Analysis
of mtDNA Haplogroup J” (
The
availability of mtDNA sequences continues to grow with over 5400 human
mitochondrial "full genome sequences" currently available in GenBank,
of which over 200 sequences are Haplogroup J.
In addition a number of full genome sequences were made available through
the Haplogroup J testing program at Family Tree
Methods
Analysis of sequences and
development of a matrix from which the phylogeny was inferred are as described
in previous reports (
Conclusions
A formal definition of the clades
is given in the Table 1. Each
clade name is included in a box which is indented from the left to show the
hierarchical level of that clade. It is
followed by a list of the polymorphisms relative to the revised Cambridge
Reference Sequence (Andrews et al 1999, Mitomap, 2008). For the convenience of those who wish to use
this table to estimate the clade of a sequence using control region test
results, the control region polymorphisms are shown in bold and those from the
HVR2 region are further italicized. Each
polymorphism is shown as a numeric position indicator preceded
by a letter indicating the reference sequence allele and followed by a letter
indicating the observed allele.
Exceptions include insertions where there is no reference value and
deletions where a "d" suffix is used indicate the absence of a
nucleotide at that position. For back
mutations the position number is followed by an
"@" rather than repeating the reference value. The underscored polymorphisms indicate that
they have significant homoplasic presence in other
clades of Haplogroup J. Similarly,
parentheses are used to indicate that a given polymorphism is absent in a
significant number of sequence, such as due to back mutations. For the convenience of those who may wish to
evaluate the support for these definitions, the number of times that the
indicated set of polymorphisms occurred in the database is shown near the first
column.
Table 1
A Phylogeny for mtDNA Haplogroup J


This data is also presented here
in the form of a two-part graphic. Figure
1 shows the overall structure of mtDNA Haplogroup J, and details of the
various subclades, except for the J1c subclade that is detailed within its
context in Figure 2.

Figure 1 The phylogeny of
mtDNA Haplogroup J in tree format.

Figure 2 Details of mtDNA
subclade J1c, shown in context of overall Haplogroup J.
A matrix of the aligned and
analyzed haplotypes used in the development of this phylogeny is available in
the supplementary material. Note that
selected columns of the matrix are lightly shaded to indicate those sequences
that are complete in the coding region but not in the control region. Thus, empty cells that are shaded and
correspond to control regions polymorphisms should be considered as "not known" rather than "no polymorphism." This matrix, along with the table and
figures, will be periodically updated in the supplementary data files as new
information becomes available.
Discussion
The purpose of this brief report
is to make the updated phylogeny presented here freely available to all
interested parties. However, the results
must be considered a work in progress and further refinements may take place as
new data is acquired and analyzed. In
particular, some of the clade definitions at the end of limbs and branches must
be considered tentative because they are based on small sample sizes and will
be confirmed or restructured with further analysis incorporating additional
data. Furthermore, the nomenclature is
subject to change as a result of harmonization with other researchers. For example, an active effort is underway to
harmonize this work with the updates to the tree of van Oven and Kayser (2009).
As of this writing there are three
clusters that have been flagged for possible future definition as
subclades. Each of these are clearly identifiable in the supplementary matrix and all
three of them are marked on the graphic version of the phylogeny. However, these are not included in Table
1, which includes the definitions of the subclades.
The first issue is the apparent
further subdivision of clade J1c8 that is clearly visible in the supplementary
matrix. Upon closer examination it was
determined that some sequences were reported to have heteroplasmic results at
T16092, whereas others reported simple substitutions. It is probable that the difference is caused
by different testing and/or reporting standards. For this release, this polymorphism has
simply been ignored. This is of little
overall significance since this occurs at the extreme of the phylogeny.
The second indication of possible refinement
is the possible addition of a J1c10 based on a 16188 insertion and includes
three sequences. Closer examination show
that two of these three sequence are identical and the
third one differs from these two at a single nucleotide position. Since they all came from the same study, it
is possible that they are all from the same family and thus are not independent
samples suitable for defining a clade.
Thus, this clade is held in abeyance pending confirmation from
additional data.
The third is an unresolved
reticulation at J2a1a. It appears that
G513A and A3447G could be used to define a new branch, but so could T1850C,
together with the T insertion at position 310.
However these two potential definitions have a substantial overlap
making clear definition impossible. This
situation also occurs at the extreme of the tree and will likely resolve itself
as additional data is gathered.
Acknowledgements
I wish to thank Mannis van Oven for his thorough review of various forms of
the phylogeny presented in this paper and pointing out the back mutation at
A2706 that I had missed. Not only have
his comments improved this paper but also his collaboration has brought our
respective work into basic harmonization and established a firm basis for
developing a worldwide consensus for the phylogeny of Haplogroup J.
Note: Corrections added 31 May 2009 and 15 July 2009.
Supplementary Material
Supplementary data is available
at:
http://www.jogg.info/51/files/logansuppl.htm
Web Resources
Home page for
Family Tree
http://www.familytreedna.com/public/J-mtDNA/
Public access
page for J-mtDNA Project at Family Tree
Website that includes description
of Greasemonkey scripts used to extract polymorphisms
associated with full genome mtDNA sequences.
Home page for
the
http://www.mitomap.org/mitoseq.html
Reference page for the revised
Cambridge Reference Sequence provided by the MitoMap
organization.
http://www.ncbi.nlm.nih.gov/sites/entrez?db=nucleotide
Search page for retrieving mtDNA
sequences from GenBank.
A web page that is
periodically updated to show the entire mtDNA phylogeny as it is
developed. It is maintained by Mannis van Oven at
References
Benson DA,
Karsch-Mizrachi I, Lipman DJ , Ostell J, Wheeler DL (2007) GenBank.
Nuc Acids Res, 35:D21-D25 (Database Issue). The
database is available at the following URL:
http://www.ncbi.nlm.nih.gov/sites/entrez?db=nuccore.
FTDNA (2008) Family Tree
GenBank (2009).
A database that contains publicly available
ISOGG (2008)
J-mtDNA Project (2008) The J-mtDNA Project at Family Tree
Logan JJ (2008b) A comprehensive
analysis of mtDNA Haplogroup J. J
Genet Geneol, 4:104-124.
Logan Ian (2009) Ian Logan website. See Web Resources.
MitoMap (2008) Revised Cambridge Reference Sequence (rCRS) of the Human Mitochondrial
van Oven
M, Kayser M (2009)
Updated comprehensive phylogenetic tree of global human mitochondrial
DNA variation. Human Mutation,
30:E386-E394. See also Phylotree.org under Web Resource.