Diagnostic Y-
Phillip G. Goff and T. Whit
Athey
Abstract
Y-Chromosome Haplogroup G reaches
its highest frequency in the
______________________________________________________________________________________
Received
Address for correspondence: philgoff@comcast.net
___________________________________________________________________________
Introduction
Interest in Y-chromosome
testing for paternal ancestry genealogical research has steadily increased
since 2000.[1] As of December 2005, about 40,000
genealogically-relevant haplotypes are available through various online
databases.[2] Many who have been tested for their Y-
Members of Y Haplogroup G
have repeat values on several Y-
The study of
Information on the markers
Nomenclature
Some Y-
Oxford Ancestors (OA) uses
a non-ISFG/NIST-standard nomenclature for the marker
The repeat value for
There are apparently no
differences in nomenclature on
Normally,
only the overall
Methods
To test the diagnostic
value of
To determine the degree of
correlation between
Y-Search and Y-Base
estimate that 1.6% and 1.0% of their records (in November, 2005), respectively,
are in Haplogroup G. If the OA database
contains the same proportion of Haplogroup G, results, it was predicted that
there would be between 41 and 68 Haplogroup G records in the OA database. The OA database was interrogated with
SNP-tested Haplogroup G 9-marker haplotypes (
Until recently, the
markers
Table 1 Allele
Frequencies for
|
Repeats |
E3a |
E3b |
F* |
G |
G1a |
G2 |
H |
I1a |
I-P37 (pka I1b) |
|
10 |
|
|
|
|
|
|
|
|
0.500 |
|
11 |
|
|
|
|
|
|
|
|
|
|
12 |
1.00 |
0.028 |
1.00 |
0.115 |
1.00 |
0.072 |
|
0.974 |
|
|
13 |
|
|
|
|
|
0.139 |
1.00 |
|
|
|
14 |
|
|
|
0.885 |
|
0.841 |
|
0.007 |
|
|
Missing |
|
0.972 |
|
|
|
0.058 |
|
0.020 |
0.500 |
|
N |
7 |
36 |
2 |
26 |
1 |
69 |
1 |
151 |
2 |
|
Repeats |
I-M223 (pka I1c) |
J |
J2 |
K |
|
N |
Q |
R1a |
R1b |
|
10 |
|
0.100 |
|
|
|
|
|
|
|
|
11 |
|
|
|
|
|
|
|
|
0.002 |
|
12 |
0.875 |
0.900 |
1.00 |
1.00 |
1.00 |
1.00 |
0.812 |
1.00 |
0.981 |
|
13 |
|
|
|
|
|
|
|
|
0.013 |
|
14 |
|
|
|
|
|
|
|
|
|
|
Missing |
0.125 |
|
|
|
|
|
0.187 |
|
0.004 |
|
N |
8 |
10 |
1 |
1 |
6 |
3 |
16 |
15 |
474 |
Candidate Haplogroup G
haplotypes were extracted from the SMGF database using somewhat different
search criteria[4]
from those used for the OA database. Candidate
haplotypes were tested using the Haplogroup Predictor Program (Athey 2005) and
only those with a score exceeding 50 for Haplogroup G were used. Multiple haplotypes with the same surname
listed were deleted, retaining only one haplotype per surname (except where the
haplotypes were clearly unrelated).
The marker DYF399S1 is
only available from DNAFP, and none of the public databases (except DNAFP’s own Y-Match) currently accept data on this
marker. Therefore, all of the data for
members of Haplogroup G were sent to the authors in private communications
(n=5), was commissioned for the present study (n=1), or was found in Y-Match
(n=1, but some of the results we received in private communications are now
also in Y-Match).
Results
In Table 1, the
columns labeled G1a and G2 had SNP information that confirmed those
designations. The column labeled simply
G did not have SNP information but was predicted to be in G using the
Haplogroup Predictor program.
The testing of one G2-P15 subject for
DYF371 was carried out to estimate whether or not the value of 14 on
Therefore, it appears that the two repeats
were added in a G2 individual at some early time after the founding of G2. Therefore, we would not normally expect to
find
. . . . . GGTGTTCTGATGAGGATAATT/TATAC/TATAC/TGTAC/TGTAC/TATAC/TATAC/TATAC/TATAC/TATAC/TATAC/TATAC/TATAC/TATAC/TATAC/TATAC/TATAC/TATAC/TATAC/CATAC/TATAC/CATAC/TATAC/TATAC/TATAC/CATAC/CATAC/TATAC/TATAC/TATAC/CATAC/TATAC/TATAC/TATAC/AACCAATTAATTAGCTGAGTATAATAA . . . . .
From the sequence, we see
that this example has the following repeat structure (Redd
2002):
(TATAC)2(TGTAC)2(TATAC)14(CATAC)1(TATAC)1(CATAC)1(TATAC)3(CATAC)2
(TATAC)3(CATAC)1(TATAC)3
Some commercial labs (e.g.,
By fortunate coincidence, one of the
sequences for
(TATAC)2(TGTAC)2(TATAC)14(CATAC)1(TATAC)1
(CATAC)1(TATAC)3……….(TATAC)1(CATAC)1(TATAC)3
Here we see that 20 bases
of the form
(CATAC)2(TATAC)2
have been deleted. Since the deletion occurred in a normally
invariant part of the marker, it should be considered as a Unique Event
Polymorphism (UEP). Interestingly, the
companies reporting only what they believe to be the main repeat section on
this marker, would report a value of 10 for
Allele frequencies on
Table 2
Allele Frequency Distribution for
|
Repeats |
G2 |
GxG2 |
E3a |
E3b |
I1a |
I-P37 |
I- M223 |
J1 |
J2 |
N |
Q |
R1a |
R1b |
|
25 |
0.115 |
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
0.541 |
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
0.331 |
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
0.010 |
|
|
|
|
0.001 |
|
|
0.052 |
|
|
0.042 |
|
|
29 |
|
|
0.088 |
|
|
0.034 |
|
0.780 |
0.139 |
|
0.029 |
0.011 |
0.068 |
|
30 |
|
|
0.647 |
0.15 |
0.003 |
0.138 |
0.081 |
0.195 |
0.671 |
0.069 |
0.286 |
0.800 |
0.808 |
|
31 |
|
1.0 |
0.176 |
0.81 |
0.952 |
0.724 |
0.459 |
0.024 |
0.134 |
0.897 |
0.314 |
0.147 |
0.110 |
|
32 |
|
|
0.088 |
0.04 |
0.043 |
0.103 |
0.378 |
|
0.004 |
0.034 |
0.286 |
|
|
|
33 |
|
|
|
|
0.002 |
0.005 |
0.081 |
|
|
|
0.086 |
|
|
|
34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
n |
148 |
4 |
34 |
27 |
588 |
29 |
37 |
41 |
231 |
29 |
35 |
95 |
73 |
The limited data for GxG2 suggests that
the deletion event in
The structure of
24bp (TCTCT)13 214bp
The structure for
24bp (TCTCT)16 214bp
This shows that the relatively high number
of repeats (16 in this case, though 16 is actually low for G2) is simply a
result of extra repeats of the usual type.
The allele frequency distribution for
Allele frequencies on
Table 3 Allele Frequency Distributions for
|
Repeats |
G2 388=12 |
G2 388 =13 |
GxG2 |
E3a |
E3b |
I1a |
I-P37 |
I- M223 |
J1 |
J2 |
N |
Q |
R1a |
R1b |
|
8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
|
|
|
|
|
0 |
|
|
|
0.007 |
|
|
|
|
|
10 |
|
|
|
|
|
0.001 |
|
0.135 |
|
0.049 |
|
|
|
|
|
11 |
|
|
|
0.025 |
0.02 |
0.023 |
0.158 |
0.712 |
|
0.285 |
|
|
0.035 |
0.009 |
|
12 |
|
|
|
0.146 |
0.58 |
0.089 |
0.079 |
0.115 |
0.024 |
0.251 |
|
0.051 |
0.617 |
0.070 |
|
13 |
|
|
|
0.462 |
0.22 |
0.642 |
0.658 |
0.019 |
0.238 |
0.135 |
|
0.538 |
0.348 |
0.683 |
|
14 |
0.032 |
|
|
0.196 |
0.12 |
0.205 |
0.105 |
0.019 |
0.476 |
0.225 |
0.188 |
0.359 |
|
0.189 |
|
15 |
0.113 |
|
|
0.089 |
0.03 |
0.036 |
|
|
0.214 |
0.049 |
0.406 |
0.026 |
|
0.048 |
|
16 |
0.226 |
0.046 |
|
0.044 |
|
0.004 |
|
|
0.048 |
|
0.344 |
0.026 |
|
|
|
17 |
0.371 |
0.103 |
.2 |
0.025 |
|
|
|
|
|
|
0.063 |
|
|
|
|
18 |
0.145 |
0.322 |
.8 |
0.013 |
|
|
|
|
|
|
|
|
|
|
|
19 |
0.081 |
0.310 |
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
0.016 |
0.126 |
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
0.016 |
0.080 |
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
|
0.011 |
|
|
|
|
|
|
|
|
|
|
|
|
|
N |
62 |
87 |
5 |
157 |
60 |
687 |
38 |
52 |
42 |
267 |
32 |
39 |
115 |
227 |
DYF399S1
One of the alleles of DYF399S1 (probably the shortest allele in most
people) has the structure:
. . . . . ggttttcaccagtttgcataggtagagggaggccaaaagcccaacagg
AAA/aaat/A/aaag/aaag/aaag/AA/aaag/A/aaag/aaag/aaag/aaag/aaag/aaag/
aaag/aaag/aaag/aaag/aaag/aaag/aaag/aaag/aaag/aaag/aaag/aaag/
ttttacccttttgacagcatatgagactt . . . .
The main part of this allele (the central section above) can be written
more compactly as:
AAA(aaat)A(aaag)3AA(aaag)A(aaag)18
where the lower case letters are part of a
countable repeat motif and the upper case letters are “extra” bases (10 of them
in the above example). This example
would be scored as 18.10 (or 18.0 in the short notation).
There are as yet only a few results
available for DYF399S1, but those available for Haplogroup G (all but one are
G2 and the remaining one is GxG2) are shown in Table 4. Beside the odd structure for the shortest
allele in Haplogroup G2, the whole number of repeats in the shortest allele is
lower than in Haplogroups R1b, I, and J, although one example in Haplogroup I
is the lowest so far found (16). Each G
person represented in Table 4 has one allele with the “half” repeat (a .2 following
the main number), one allele with a .1 following the main number, and one
allele with a whole number. Even though
the three alleles are reported in numerical order, the alleles can be
distinguished for members of Haplogroup G.
One person in the table had four allele values (a member of G2),
apparently representing a doubling of the allele with the whole number of
repeats.
Table 4 Allele
Frequencies for DYF399S1 in Haplogroup G
|
|
|
|
|
||||||||
|
Repeats |
Count |
Freq. |
Repeats |
Count |
Freq. |
Repeats |
Count |
Freq. |
Repeats |
Count |
Freq. |
|
17.2 |
4 |
.571 |
(missing) |
1 |
.143 |
21 |
2 |
.286 |
(missing) |
5 |
.714 |
|
18.2 |
2 |
.286 |
20.1 |
5 |
.714 |
22 |
3 |
.429 |
24 |
1 |
.143 |