Editor’s Corner
Mutation
Rates – Who’s Got the Right Values?
A discussion on Y-
Most of the interest has been in
the rates derived from father-son pairs, as that seems most applicable to Y-
chromosome surname projects. Shortly
after Family Tree
To address the issue of mutation
rates in an independent study, Charles Kerchner
started his “mutation log” in 2005 where data from surname projects can be
deposited. The overall goals and methods
of the project are essentially identical to those of the FTDNA study, but this
one is in the public domain where the numbers behind the averages may be
seen. In order to submit data to the
mutation log, the genealogy of the participants must be known to the project
administrator and he must have reconstructed the ancestral haplotype so that
mutations from that haplotype can be accurately counted. Mutations are only counted once when the same
mutation is inherited by more than one participant. Sometimes the genealogy is not sufficiently
well-known to make it clear which mutations were inherited from an ancestor and
which have occurred independently, but Kerchner asks
that in case of uncertainty, the data be left out.
Kerchner’s study has been somewhat
successful, but it is likely that there are many more projects with useful data
existing in the community of surname projects than have been submitted. It is rather unfortunate that not every
administrator has taken the trouble to submit his or her data, because the
results could provide a very important check on the FTDNA study.
At the time of this writing, there
have been 45 submissions from various surname projects. There are differing numbers of transmissions
and mutations for each panel, but for example, overall on markers 1-37, there have
been 75258 marker transmissions reported and 309 mutations have been observed,
for an average mutation rate per marker on the 37-marker panel of 0.0041 ±
.0002 (one standard deviation). The
corresponding mutation rates calculated from similar data for the panels 1-12,
13-25, 26-37, and 1-25 are 0.0024, 0.0029, 0.0071, and 0.0027. Data on FTDNA markers 38-67 are just starting
to be submitted, but it is obvious already that this panel has an average rate
that is probably the lowest of the four panels.
There are some significant
differences in the average mutation rates from the FTDNA study and the Kerchner study. FTDNA’s rates are 40-60% higher—they are not within the
error bars of the Kerchner rates (FTDNA hasn’t
divulged their error bars). There can be
selection bias when the data are voluntarily self-reported, rather than being
collected according to a predetermined sampling procedure. However, this problem apparently applies to both
the FTDNA and Kerchner studies, though each may be
affected in a different way.
Another approach to calculating
mutation rates was published in the Fall 2006 issue of
this journal by John Chandler (2006).
The average mutation rates for the
1-12, 1-25, and 1-37 panels were found by Chandler (for the father-son
calibration) to be 0.00187±0.00028, 0.00278±0.00042, and 0.00492±0.00074. In this case the 95% confidence intervals for
the Kerchner rates and the
It is very important that surname
project administrators submit their data on their known genealogies to the Kerchner project so that the uncertainties in his rates may
be further reduced. This should be done
without regard to the number of mutations (or lack of mutations) that have
occurred in those projects. There are
far more projects having useful data than have been submitted to Kerchner’s log so far.
For those who have difficulty in understanding how to submit the data,
Charles is willing to help. This is
another area where our community of “amateurs” is demonstrating that we can
make a significant contribution to genetics as applied to genealogy and
anthropology.
In a study that uses a known
genealogy, there is usually no guesswork necessary in calculating the mutation
rate. The number of father-to-son
transmissions of the marker set is known, and it is usually possible to
reconstruct the haplotype for the common ancestor. Then it becomes a simple matter of counting
the mutations observed in the genealogical tree that leads to the present-day
participants and dividing by the number of marker transmissions.
However, in many surname projects
and in all population studies, the genealogy is not known. This has led to discussions about how to correct
for the unknown genealogy, unknown population (or family) dynamics, and the
unknown sampling bias that may have been at work in producing the pool of
available descendants and the selection of the actual participants.
The number of mutations showing in
a group of participants who are all descended from a common ancestor will generally
be higher than the actual number of mutations that has occurred in the
genealogical history of this group. That
is because for some of the mutations presently showing in participants, they
will have been inherited by two or more participants from a common ancestor in
whom the mutation first appeared. If one
simply counts the number of present-day mutations, the derived mutation rate
will be too high. If an independent rate
is assumed and the TMRCA is calculated, the excess apparent mutations will
cause the TMRCA to be too large.
Where the genealogy is not known,
there will also be unknown factors of population dynamics at work—some lines
from the ancestor will be more prolific than others, biasing the overall
results toward the mutation experience of the prolific branch. Other lines may have become extinct. These factors usually have the effect of
reducing diversity and causing the calculated TMRCA to be too small. The best way to handle population dynamics is
still controversial and the issue is usually ignored.
When FTDNA calculates the TMRCA
for a pair of individuals, these issues of genealogy and population dynamics do
not apply because the lines from a pair of participants to their most recent
common ancestor are (by definition) direct lines with no ambiguities. In this case the father-son mutation rates,
rather than the “effective rates,” are obviously the appropriate rates to
use. However, the results of this
calculation will only be as good as the father-son rates that are employed.
Zhivotovsky (2004) published a
paper in which he attempted to get around these difficulties by calculating an
“effective mutation rate” that is empirically derived from a set of descendants
of an ancestor who lived at a known time in the past. All of the unknown factors such as the
genealogy or the population dynamics, are simply
averaged out in calculating the effective mutation rate, assuming 25 years per
generation (which may be too small).
This can work well if there are a number of such case studies that can
be analyzed and the resulting average rates can be averaged (Zhivotovsky
averaged the rates from three population groups), and if the cases that are
included are representative of the situation to which the derived rate is to be
applied. In practice, it is not so easy
to guess whether the case studies have the necessary characteristics to be
appropriate.
Zhivotovsky’s “effective” mutation rate is
averaged over just a few traditionally measured markers. However,
However, in using three different
datasets and averaging the result from each, Zhivotovsky seems to have
introduced a small problem: the markers used in the different datasets were not
exactly the same, especially for the third dataset, so he was averaging rates over
different markers. Even with unlimited
sample size, the rates from the three groups should not be the same. Zhivotovsky averaged them anyway.
However, we can illustrate the
approach to recalibrating
It remains rather important that
we have an independent check on the mutation rates of
References
Chandler J (2006) Estimating Per-Locus Mutation
Rates. J Genet Geneal,
2:27-33.
Kerchner (2007) Y-