Geographic
Patterns of R1b in the
Kevin D. Campbell
Abstract
Stephen
Oppenheimer’s book, The Origins of the
British— A Genetic Detective Story, references a clan nomenclature which is
not explicitly defined in the text nor linked to the underlying data. This
paper attempts to understand Oppenheimer’s analysis while incorporating results
from subsequent clan testing to hypothesize the haplotype definitions for
Oppenheimer’s R1b sub-clans.
Address for correspondence: Campbell@alum.mit.edu.
Received: July 12, 2007;
accepted: August 30, 2007
Introduction
The
essence of genetic genealogy is to understand where we’ve come from. However, many of the recent papers and books
have missed opportunities to provide strong links relating genetics to regional
locations. Specifically, two recent
books by
In the
case of Sykes’ work, Blood of the Isles was targeted to the general
population and written in the manner of a popular work of non-fiction. In contrast, Stephen Oppenheimer’s book The
Origins of the British synthesizes historical, anthropological,
archaeological, linguistic, and genetic evidence into a cohesive set of
conclusions.
While
Oppenheimer chose to include the term “genetics” in the title of his book, it
comprises only a part of his overall analysis.
However, by leading with genetics, Oppenheimer owes a certain level of
traceability to the reader to allow a thorough and detailed review of his
analysis.
To his
credit, Sykes has provided his samples underlying Blood of the Isles for
examination, but he failed to provide a detailed analysis of those samples for
the reader. For example, Sykes does not
fully describe his “clan” system in adequate detail. A “clan” is a group of individuals with
closely matching Y-
My
analysis of Sykes’ clans, which included the determination of the probable clan
definitions, was published in the last issue of this journal (
In this
article I will attempt to achieve the same general result for Oppenheimer’s
book. In particular, the present article
will attempt to provide some insight into the probable definitions of
Oppenheimer’s clans.
Methods
Sykes’
and Oppenheimer’s analyses have both similarities and differences that affect
how one might approach reverse-engineering each. Both authors chose to coin “clan names” as
shorthand monikers for genetic groups derived from their analysis. While these clan names provide convenient
shorthand for mass market books, serious researchers want to see the genetic
definitions of these groups.
Sykes’
work is based upon original data collection (primarily via blood samples), and
he has published his full dataset on his web site for use by other researchers.[1]
Oppenheimer’s
study is not based upon new genetic data, but rather is based upon a
re-analysis of previously published information. Oppenheimer uses five key sources for his
British data: The studies of Capelli
(2003), Wilson et al (2001), Weale et al (2002), and Hill et al (2000), and
data provided by D. Faux and J. Wilson related to the Orkney and Shetland
Islands, one of which (Capelli’s) is available on the Internet.[2] Collectively from these sources, Oppenheimer
compiled a composite dataset containing 3,084 samples, “though by far the
largest body of data in the composite
Since the
vast majority of Oppenheimer’s data came from Capelli, the first step in the
method was to understand the nature and limitations of Capelli’s dataset.
A second part of my method resulted in the identification
of the 16 R1b clans that Oppenheimer uses in his analysis. Like Sykes, Oppenheimer does not completely
disclose the Y-
Results
Reviewing Capelli (2003)
Though
Capelli’s study, “A Y chromosome census of the
Capelli
collected 1,772 Y chromosomes samples from 25 predominantly small urban
locations in the

Figure 1.
Locations Used for Capelli’s Original Data Collection
To
understand Capelli’s published dataset, the 1,772
The
haplogroup mapping results from the calculator and key counts of Capelli’s data
used in Oppenheimer’s study are shown in Table 1. The haplogroup results are shown as rows
while the column counts are derived from the individual datasets.
It can be
seen from this table that the vast majority of Oppenheimer’s data is from the
Capelli dataset. 71% of his overall data
and 85% of that from the
Table 1.
Summary of Capelli and Oppenheimer Datasets

can be
traced back directly to Capelli.
Oppenheimer has acknowledged this heavy reliance in his book.[6]
Given
this reliance, it is clear that Oppenheimer’s genetic analysis is based upon
the six microsatellites included in the Capelli data. 67 unique R1b haplotypes were extracted from
the dataset and are included in Appendix A. These 67 haplotypes subsume the 1,301 R1b
Capelli samples shown in Table 1, and this data represents 76% of all
the data used by Oppenheimer in his analysis of R1b migration patterns.
Oppenheimer Clans
Following
the same approach as for Sykes’ data in
The
missing piece in Oppenheimer’s study is the definition of these clusters in
terms of the underlying microsatellites.
Nowhere in his book are these clusters fully specified.
Though
one approach to determining the cluster/clan definitions could be a bottom-up
analysis of the data listed in Appendix A, essentially
reverse-engineering Oppenheimer’s work, another strategy was selected. Simply put, since Oppenheimer’s genetic clans
are apparently based primarily on six microsatellites, it was decided to look
at how he typed specific participants to attempt to deduce the R1b cluster
definitions from their results.
While
attempting to collect information on the Oppenheimer clan definitions, another
researcher,
Analysis
of the results of Oppenheimer’s genotyping has been illuminating. When Oppenheimer Clans are viewed on
Table
2 shows the
Oppenheimer Clan results from
Table 2.
Oppenheimer Genotypes with Associated Capelli Microsatellites
and Intra-Clan Differences Highlighted

It should
be noted that because there are fewer Oppenheimer clusters than haplotypes in
the Capelli dataset (and fewer than the possible number of combinations of six
markers, by necessity), an Oppenheimer cluster must span more than one unique
combination of six markers. Or stated
another way, since Oppenheimer partitions R1b into only 16 groups, some groups
must contain more than one of the 67 haplotypes listed in Appendix A.
When
looking at Oppenheimer’s empiric results in light of Capelli’s underlying
markers, several conclusions are evident from Table 2.
First, no
six-marker haplotypes are split among two or more Clans -- i.e., each
haplogroup maps into one and only one clan designation. This supports the hypothesis that Clan
designations are based primarily on these six markers.[8]
Second,
Oppenheimer clan families (e.g., R1b-8 and 8a;
R1b-14 a/b/c, R1b-15 a/b/c, etc.) seem to be generally separated by a
single step mutation of a single marker.
For example, R1b-8 and R1b-8a seem to be differentiated by
Third,
even with the small sample size collected by
A summary
of Oppenheimer’s R1b Clan Tree is reprinted as Figure 2. In this figure, the estimated time of
branching, the standard deviation, and the number of samples of each clan that
were present in his dataset are extracted from various footnotes throughout the
Oppenheimer’s book. The corresponding
six-marker haplotypes are also included where possible.
Figure 2. Oppenheimer’s Clan Structure with Empiric
Results (Click for Figure)
In
Oppenheimer’s Analysis, the Atlantic Modal Haplotype (i.e., Ruy or R1b-10)
splits from R1b-9 (i.e.,
When
viewed in the context of the aforementioned microsatellites, one can also see
how Oppenheimer might draw this conclusion.
Table 3 shows this specific progression of R1b Clans proposed by
Oppenheimer in his book.
Table 3.
Haplotype Progression Suggested by Oppenheimer’s Analysis

The
haplotype progression shown in Table 3 further reinforces the conclusion
that Oppenheimer’s analysis used these microsatellites. The haplotype sequence shown in Table 3
is logical and follows Oppenheimer’s sequence.
The haplotype sequence does not support other progressions such as R1b-9
à R1b-8 à R1b-10 or R1b-10 à R1b-9 à R1b-8 that would contradict
Oppenheimer’s conclusions.
As a final check, the author attempted to recreate
several of the Oppenheimer’s Clan maps included in his book. In Figures 3a and 3b, the data for the
hypothesized clans R1b-15c and R1b-9 were plotted on Capelli’s map of the

Figure 3a.
Comparison of Capelli Samples with Oppenheimer Clan R1b-15c
(Size indicates the relative number of observed
samples in the Capelli dataset)

Figure 3b.
Comparison of Capelli Samples with Oppenheimer Clan R1b-9
(Numbers indicate number of observed samples in the
Capelli dataset)
There are
some limitations to this analysis. For
example, (1) the known ex post facto clan samples shown in Table 2
are very small, (2) not all of Capelli’s known haplotypes have been genotyped
into Oppenheimer Clans, and (3) while significant, only 85% of Oppenheimer’s
R1b British Isles data is attributable to Capelli’s underlying dataset in the
first place.
These
caveats notwithstanding, the author believes that Figure 3 further
reinforces the assertion that the insights into Oppenheimer’s clan nomenclature
can be deduced when ex post facto results are compared to Capelli’s
dataset.
Conclusions
The analysis presented in this paper tends to
confirm the hypothesis that Oppenheimer's Clan system is based upon the six
microsatellites presented in the data of
Electronic Data Base Information
Capelli
C, et al.(2003) dataset:
http://freepages.genealogy.rootsweb.com/~gallgaedhil/Capelli.htm
Oxford
Genetic Atlas Project (OGAP), data from Sykes (2006):
http://www.bloodoftheisles.net/results.html
http://www.geocities.com/mcewanjc/p3modal.htm
http://home.comcast.net/~hapest5.html
References
Oppenheimer,
Stephen. The Origins of the British—A
Genetic Detective Story. Constable
and
Sykes B
(2006) Blood of the Isles: Exploring
The Genetic Roots of Our Tribal History.
Bantam,
Appendix A
R1b Haplotypes
Present in Capelli’s Study

[2] Oppenheimer (2006), Chapter 3, footnote 41.
[4] Capelli (2003) dataset located at: http://freepages.genealogy.rootsweb.com/~gallgaedhil/Capelli.htm
[5]
[6] Oppenheimer (2006), p. 123.
[8] While this statement was true for a long time, a recent empiric posting provides one contradiction in the 48 observations included in Table 2. i.e., Samples #36 and #40 are typed as R1b-14a and R1b-14c, respectively, but contain the same six marker haplotype. The author suspects that this discontinuity is due to lab error but this discrepancy is noted for the reader so they can weigh this anomaly accordingly.
[9] 1,642 samples are accounted for out of the full 1,947 R1b sample data set. i.e., 1,511 plus 436 (See Table 1)