28-Jan-08
Peter Gwozdz
pete2g2@comcast.net
These notes are my reminder, for use of Yhrd in study of Polish R1a1 haplotypes.
http://www.yhrd.org/index.html.
15 Jan 2008: Release 23 is out with 54,863 haplotypes in 477 populations. 53,075 haplotypes of these are completely typed for 9 (MinHt) and 25,606 for 11 loci (ExtHt).
Separable by city.
25 Nov 2008 Update: http://www.yhrd.org/ new URL; new web page format; 65,165 haplotypes
minHt has the same 9 markers; extended haplotype now has +8 markers - same 17 used by Pawlowski in Gdansk (Pawl 19 also has YCA I & II)
Release 25: link does not work; front page says “35 new population samples with 6,203 haplotypes”
“Search Haplotypes” has a note:
7 Loci 65,165 haplotypes
9 63,369
11 37,163
12 13,751
17 10,243
Click “Search database” then “GeoSearch” then “Haplotype + Continent”. Then pick the STR values. Restrict to Europe.
The (Polish R1a1) haplotypes of interest to me are:
|
P |
17 |
13 |
17 |
25 |
10 |
11 |
13 |
10 |
14 |
11 |
10 |
|
N |
16 |
13 |
16 |
25 |
10 |
11 |
13 |
11 |
14 |
11 |
11 |
|
K |
16 |
13 |
17 |
25 |
10 |
11 |
13 |
11 |
14 |
11 |
10 |
Those P, N, K are my codes that I have been using.
Result of a search is how many of that haplotype total. For search with less than 11 a list of finished haplotypes and how many of each. Result also: how many of that haplotype per city, map with red dots for cities that have any & blue dots for cities without, blow-up map on click, a list of how many per city / sample size. Result also: a list of all 1-step related haplotypes with how many total of each.
The list by city is necessary to ignore those cities with small sample size. Or to group cities by country manually.
Searches can alternately be done worldwide, or by metapopulation such as Eurasian MP. I tried them all and it makes little difference for the PNK haplotypes of interest to me, because they primarily show up in East Europe and Central Europe. The list of cities with that haplotype is about the same with worldwide vs European search vs Euroasian MP search. Unfortunately the list of cities with zero of the searched haplotype is always worldwide, so the zero list is too long. The table always says “worldwide” for totals but I verified that is only a label and it means totals for the continent or metapopulation on restricted searches.
European data is mentioned: 26,395 samples and all have at least 7 loci; 26,305 have 9 loci, 9,713 have 11 loci.
The maps are better using the first 9 loci instead of all 11, because more cities with more samples per city are included vs 11 loci. Although the first 7 loci data base is not significantly larger for Europe, it’s good to also do a 7 loci map because the statistics are better with more haplotype hits per city. P&N differ from K by only one locus at 7 loci, but it actually works very nicely with only 7. I tried 7, 9, and 11 for all.
The Yhrd 11 loci are essentially the same as the FamilyTree 12 loci for the purpose of studying the PNK haplotypes, because those 3 types are identical for all 3 PNK at the 3 loci that differ between Yhrd & FamilyTree, and those 3 loci do not vary much in the data bases.
Also available are graphs by city, with graphs of the top 20 haplotypes in that city: Click on “Search Database” then “population”. Select a city from the pull-down menu. This graph service only uses 9 loci per haplotype. I don’t see the sample sizes, but the graphs are obviously step graphs for 1, 2, 3,… number of samples, so I can figure the sample size = reciprocal of the % for the first step. (Remember, the Search gave a list of count / sample size by city, same sample size for a city.) Obviously, it’s good to ignore small numbers of samples, because even 1 or 2 samples get graphed. Some cities do not have enough data for any significance, but some cities have lots of data.
Another service is calculating frequency of a typed-in haplotype for 3 metapopulations: Eastern European vs Western European vs South-eastern European. I typed in all 3 PNK and verified that all 3 are more than 10 times as common in Eastern Europe vs West & Southeast. Overall frequency at 11 loci in Eastern Europe: P = 2.3%; N = 2.6%, K = 2.5%.
By searching a blank haplotype in Europe, I got a list of the top 50 haplotypes in Europe. (Can’t do it for just East Europe.) PNK are there, at lower percent. Only one other R1a1 haplotype is there, same as K except 391=11; this one is more common in Russia and Ukraine, less common in Poland, explaining why it did not come in my Polish Project study.
My hypothesis: P type is a relatively young clade that went through a large, rapid population expansion; relatively younger clades have higher percent of the modal haplotype. As a large clade ages (and mutates) the frequency of the modal haplotype decreases and the relative frequency of the 1-step and 2-step mutated haplotypes increase. (With very old age the 1-step decreases & 3+ steps increase.) My hypothesis is nicely confirmed that P is youngest and K is oldest of the 3 PNK; confirmation is in terms of the ratio of modal to 1-step. I’ll write that up as a separate report.
P type Discussion:
Yhrd verifies that P is Polish.
Graphs at 9 loci, where P differs from K by 2 steps and from N by 3 steps:
10 Polish cities have decent statistics and each of them has P as the first or second most common haplotype, with frequency 3% to 8%. Bialystok in the east separates population into 5 types and P is 3.8% of the “Belarusian” type. (The 3 cities with less than 3% P have very low statistics - only a few of each haplotype. Zakopany shows 14% P type but that’s a silly graph, with only 7 haplotypes total, one each at 14%.)
Ukraine: 3 cities with low statistics: 6 total P samples. P is first in Lviv, but that is only 4/105 samples.
Germany to the west has low P with excellent data for many cities: Berlin P is 3rd at 2%.Leipzig P is 5th at 0.9%. Munich P is 8th at 1%.
Belarus, just to the east of Poland, has very little data for 6 cities, but taken as a whole there is only one P type sample. Russian cities have low P
South Slavs; Serbia, Bosnia, Slovakia (no Chech cities): 1 city with good statistics and 5 with low sample sizes: only 2 P samples total.
North: Lithuania, Estonia, Latvia: 3 cities with low data: Combined: 2 P out of 492 less than 1%.
28-Jan-08
update