The Advantages of a Dual DNA/Documentary Approach to Reconstruct the Family Trees of
a Surname
Chris Pomery
Abstract
Family history research has
traditionally been done primarily through a combination of oral and documentary
research. Recently, the development of
Y-chromosome DNA tests has created a process so
that men, sharing the same surname, can verify whether their previous
documentary research has assigned them to the correct tree or direct their
future research towards documenting it.
This article shows how integrating a collection of surname-defined
Y-chromosome DNA results produces advantages for
a traditional global surname documentary research project and outlines a
methodology for managing a dual approach project that combines DNA tests and documentary research
where the primary aim is to reveal the origin(s) of the name. While the discussion that follows focuses on
the use of national records relating to England and Wales within the context of
a surname originating in one of those countries, the methodology outlined and
conclusions reached are broadly applicable to any country where comparable
records are available.
Address
for correspondence: Chris Pomery,
DNAresearch@pobox.com
Received: September 1, 2009; accepted October
8, 2009
Introduction
Family
historians who expand their researches beyond the discovery of their immediate
family are commonly stimulated by an interest in a particular surname which
they seek to place within a wider geographical and temporal context than its
connection within their personal family tree.
Whole surname documentary studies have, however, typically been confined
to surnames that are either of low frequency in the general population or
identified with a specific geographical area.
The
thinking behind this paper is that a comprehensive documentary reconstruction
of most surnames back to the mid-nineteenth century is now an achievable goal
in those countries where the key national datasets, such as national censuses
and civil registrations, are indexed and online. At the same time, further areas of historical
study are opening up as (a) the comparative body of Y-chromosome results grows
and (b) the number of fully reconstructed surnames increases.
It
will clearly be easier to undertake a documentary reconstruction process for
those countries that have published online indexes of their key national
records (such as England, Wales and Scotland) and more difficult in countries
which do not provide complete online access (such as both parts of Ireland),
where there is no centralised record system (such as France), or where the
system of surname transmission is more complex (such as Portugal).
Reviewing
the situation with regard to UK-origin surnames, I estimate that there are
around 6,000 document-led surname reconstruction projects underway worldwide of
which around 2,200 are registered with the Guild of One-name Studies (“the
Guild”).1
Despite its title, the Guild does not set methodological
standards for its members’ research, requiring them simply to make a loose
commitment to collect data at their own pace and to respond to enquiries from
other researchers. I estimate that
perhaps only 40% of its members are actively reconstructing all their trees
using data from the key national datasets now online.
Taking
the Family Tree DNA
(FTDNA) figure for registered surnames as a baseline, I estimate that there are
around 5,000 unique Y-chromosome surname projects underway worldwide, of which
roughly 90% are administered wholly or partly through FTDNA. A quick review of the websites associated
with FTDNA-registered projects suggests that few have any active documentary
research component or corpus of active researchers. The vast majority of projects are structured
simply to aggregate same-surname results to the end of identifying ‘genetic
families,’ i.e. those name bearers who share an identical, or near identical,
genetic signature along the male line of descent. They generally take a passive approach
towards the documentary reconstruction process, allowing DNA
testees to upload their documentary research but
without undertaking any centralised or consistent updating, coordinating or
checking of it. A large proportion of
projects focus on reconstructing the trees primarily within an
emigrant-receiving country such as the US, or to a
lesser extent within an emigrant-donating country such as the UK, but rarely
integrating the two.
This
observation is not intended as a criticism of those project managers. A neutral way of putting it is that they have
out-sourced the function of documentary research to the DNA
testees individually and are relying on them to
update the DNA
project as and when they expand their research.
Framing the discussion within this terminology, one function of this
paper is to suggest the benefits to DNA
project managers firstly of taking an active role in coordinating the updating
of this documentary research, and secondly of leading that research process
actively themselves at the hub of a dual-approach project.
It
is a common feature in both documentary and DNA
projects to include additional surnames within their remit. In the case of documentary projects these are
generally thought of as lexical ‘variants’ of a core surname, the hypothesis
being that collectively they constitute a single ‘name.’ The same thinking can be seen among
Y-chromosome DNA
projects, though in a more idiosyncratic form and wider purpose. Given the higher relative prevalence of
US-based project administrators, some DNA
projects include foreign equivalents, i.e. surnames from emigrant-donating
countries that are suspected have been naturalised to their Anglo-Saxon
equivalent or nearest homophone.
One
cannot compare the headline numbers of documentary and DNA
surname projects as one is not comparing like with like. It is in the interest of a DNA
project to capture results from an extended pool of surnames and there is no
penalty for doing so, while the contrary is true for documentary projects where
the workload of data collection and reconstruction is greatly increased when
additional variant names are added. One
trend is clear: the number of DNA-based
surname projects is rising fast while the number of documentary-based projects
is barely rising at all.2
In
summary, it is plausible that the 6,000 documentary projects might include
around 15,000 unique and viable surnames (i.e. they are found in the present
day population) while the 5,000 Y-chromosome projects include a somewhat larger
number.
At
the intersection of the two types of project lie an estimated 250 dual-approach
projects where the active documentary reconstruction work and the collation of DNA
results are being conducted in parallel.
Figures from the Guild of One-Name Studies suggest that around 200
surname DNA
projects are led by Guild members. The
majority of these projects are hosted at FTDNA and are tagged with the Guild’s
logo within the site’s surname listings as shown in Table 1. At present perhaps 20-25% of these 250
dual-approach projects have collected significant numbers of Y-chromosome
results, are actively collecting data from available national datasets, and are
actively reconstructing all the trees within their surname(s) on a centralised
basis.3
Of
the estimated 50-60 active global dual-approach projects only
a handful have reported their results in any format. I am not aware that any dual DNA/documentary
project has yet been written up and published under conditions of peer
review. The vast majority of write ups
of Y-chromosome DNA
projects exist on websites, generally created by the project
administrator. Guild members have access
through the Guild website to summaries of project reports designed for
documentary project managers for two advanced projects (Pomery, 2008; Creer, 2008).
In
addition, interim reports are available for the Cave, Carden,
and Pike projects (Cave, 2008; Carden, 2008; Pike,
2008). Many articles have been written
on the topic of dual projects, including a three-part series by Susan Meates in
the Journal of One-Name Studies (Meates, 2006a, 2006b, 2006c), and another
on linking a DNA
project to an historical study by John Plant published in Nomina
(Plant, 2005).
The
guiding purpose of this paper is to show the benefits of the dual approach
being pioneered by these early projects, to demonstrate that it is a feasible
route for project managers of both DNA
and documentary projects to take, and to describe the future benefits for the
family history community as a whole as the results from this kind of dual
project start being published.
Given
that this paper is predominantly addressed to an audience of DNA
project managers, the following discussion is written for the point of view of
a DNA project
manager working towards setting up a complementary documentary project. Interestingly, there is no comparable
publication within the documentary arena where the reverse structure, tailoring
the discussion from the point of view of a traditional documentary project
manager, can be similarly written up at length.
The
following methodology is not suggested as a formal orthodoxy that should be
adopted by all dual-approach projects or by all DNA
project managers. There is plenty of
scope for varying goals among dual-approach projects, for example, in terms of
geographical focus and documentary rigour.
The ‘fishing trip’ approach to DNA
result collection, adopted by many DNA
project managers, is certainly a viable route to take at the outset of a
project.
My
aim here is simply to describe a methodology that I created ad hoc as new
resources became available to me during the past decade and to offer it as a
model for adaptation and refinement by others.
I am also interested to find out from other researchers whether their
projects have followed the same broad trajectory as my own and whether they
have reached the same conclusions that I have about how to organise them.
Documentary
Project Methods
The
goal of a documentary-led surname project is to reconstruct all its constituent
family trees back from the present day to their point of origin. At completion it will reveal the geographical
origin(s) of the surname and whether it is headed by a single individual. A fully documented project will also account
for illegitimate births and for any other non-standard transmissions of the
surname from one generation to the next.
Indirectly, such a project can also generate statistical data, including
time series data, on a variety of historical and social topics, including the
span of ‘a generation’ and the prevalence of illegitimate births over
time. Where the social histories of the
individuals have also been reconstructed, this data can support an associated
economic and social analysis of change and other historical features.
Documentary
projects tend by their nature to focus on the origin of a name. The question they seek to answer is “where
did the name-bearers come from?” In the context of an English-origin surname,
they tend to focus on researching the origin of the name in England. While documenting the subsequent history of
emigrants is an interest the priority is more often to document the origin of
the surname in the home country.
A
second feature is that its focus will tend towards prioritising the histories
of the male name bearers, partly because there are fewer of them and specifically
because their life histories have such an impact on the transmission and
development of the surname.
The
standard methodology behind a documentary reconstruction project is to work
back in time, generation by generation, from the present day. Such studies often start with oral and
personal sources, focusing on the researcher’s immediate family, before
expanding to include local and national documentary records.
An
initial problem for any documentary project is that there is no single source
that provides an accurate list of living name bearers. In the UK context,
the best list to work from is the last edition of the national electoral roll
prior to 2002, the date when it became possible to opt out one’s details from
the published version. No pre-2002 list
is viewable online, though commercial versions on CD can sometimes be purchased
second-hand. Subsequent editions of the
public and edited electoral roll, and other up-to-date sources such as the British
Telecom list of land-line phone numbers,4
perform markedly less well to create a present-day baseline.
Any
baseline list can be checked against the total number or records found in the
Office of National Statistics database list in order to estimate the number of
present-day name bearers in England and Wales.5
The
most important set of documentary records in a reconstruction process are the
national records of births, marriages and deaths from 1837 to the present day
(“civil registration records”). The
comprehensiveness of these records is not uniform as only in 1874 did it become
a legal requirement for individuals to report events. Common sense suggests that early
under-reporting of births was higher than for deaths or marriages.
The
public index to these BMD registration records has always supplied additional
data that greatly supports the process of linking individual event records
together to form the profile of specific persons. For birth records the maiden name of the
mother is given; for marriages, the spouse’s surname; and for deaths, the date
of birth or the age at death. However,
this additional data is not currently available across all years back to 1837,
its inclusion dating respectively from 1911, 1912 and 1866.
A
government-backed online index plans to extend the coverage of this additional
data back to 1837, though the project to put this enhanced index online stalled
in 2008 and no date to complete and publish it has subsequently been
announced. Independent mass indexing
projects, both those done by volunteers (such as FreeBMD)
and those done by commercial firms, have already created online versions of the
original unenhanced government index.
Once
the civil registration data back to 1837 has been collected for a surname, a
quick linkage process can be undertaken to link sets of records together.
·
A death record can be
linked to a specific birth record based upon the date of birth or age at death.
·
A marriage record can be
linked to a birth/death composite record, though with less confidence because
the inference is made primarily through the forename match. This process is easier to do using twentieth
century records as it became more common for people to have multiple middle
names.
·
A birth record can be
linked to a marriage record in the previous generation by matching the maiden
name of the mother to the former name of the spouse given in the marriage
index. However, these are technically
two different data items. For example, a
mother may on a birth certificate indicate her previous married name rather
than her surname by birth just as a woman re-marrying may be recorded in the
marriage index under her previous married surname rather than her maiden name.
While
recognising that there is no agreed standard for linking record data, this is
not the place to discuss problems inherent in assessing and sorting data or how
one can objectively choose between alternative event linkage options (which
mostly can only be resolved by acquiring further data). While the above linkage process will generate
a number of unlinked records, individuals and families, it is robust enough to
recreate the majority of the members of the majority of trees in the period
from the present day back to 1837. The
linkage process can be made much more secure by cross-referencing the
individual profiles created using the civil registration records with data from
the eight national censuses between 1841 and 1911 all but one of which1911 all
but one of which are now online in their entirety from multiple vendors.
To
summarise, by cross-referencing the two sets of primary data, the civil
registration and the national census records, it is generally possible to
recreate most of the detail in most trees within a surname back to 1841. Present-day data from partial electoral roll
and telephone directories can broadly be mapped onto this historical data,
though with many gaps. There will
certainly be a number of unlinked records, e.g. individuals who die or marry
but who appear to have no birth record.
And increasingly there are a number of records that appear orphaned due
to the inability of the indexes to map the complexities of modern society, e.g.
births where the mother has not married into the surname (so no corresponding
marriage record can be found), or births where the mother was born a name bearer
and where her child has not taken the surname of its father.
While
very few documentary surname reconstruction projects using all these data have
been written up or published in any form, it is possible now to conclude that:
·
it is technically feasible
to perform the linkage process described above;
·
while the linkage process
works more efficiently with low and medium-frequency surnames, it is still
broadly effective (though correspondingly more time-consuming to perform) even for
some high-frequency surnames;
·
the
cost of corroborating the basic linkages stimulated by the civil registration
indexes starts as low as the subscription fee to a single online provider of UK census
data.
There
are two outputs from a surname linkage process useful within a parallel DNA
project.
1. The total number of male individuals
available as potential DNA
testees is already grouped into a finite number of
family trees with a point of origin prior to 1837 (albeit with a rump of
unlinked male individuals).
2. A list of potential emigrants has been
created, namely those individuals where a birth is recorded among the
nineteenth century records but no death record has been found.
The
linkage process broadly moves the baseline for the surname back from the
present day to the middle of the nineteenth century, a chronological distance
of some 170 years or, depending on each tree’s history, four to six
generations.
The
new baseline created for further documentary research reduces the numbers of
name-bearers whose ancestry is being tracked back in time as the number of
name-bearers for any given surname circa 1840 in the UK will be of the order of
one-third the present-day level. While
research in the period prior to 1840 will focus on parish records, and benefit
from the lack of mobility of families compared to the present day, the number
of different ways that a surname is spelled tends to increase. One huge advantage of a parallel DNA
project is that has the power to uncover genetic connections among surnames not
included in the original documentary research programme, potentially expanding
the list of recognisable variants associated with the core surname as it exists
in the modern era.
Use
Within A DNA
Project
A
standard surname DNA
project, working without significant documentary inputs, is broadly speaking a
net ready to collect the result of any male name bearer who wishes to pay for a
DNA
test. Under this approach, the results
collected within the project will be biased in two ways relating to the trees
within the surname(s) under study: firstly, towards residents of countries that
are more receptive to the benefits of DNA
testing (principally the USA), and
secondly towards members of already documented trees. The latter observation may seem counter-intuitive;
after all, one might expect men who’ve done no family history research to
realise that they have the most to gain by taking a DNA
test. Experience from my own project
suggests that those men whose family members have already done some research,
or who have some inkling that they belong to a particular tree, are the first
to pay for a test, thus weighting the results for the surname as a whole
towards those trees that have already been documented.
Many
DNA projects
are set up to confirm the hypothesis that one or more previously researched
trees link together as a single tree under a common surname-bearing
ancestor. Though this is an excellent
starting point, my thesis in this paper is that to discover the origins and
structure of a surname the only option is to identify every single tree,
gradually reducing their number through repeated iterations of documentary
research and DNA
testing.
A
DNA project
possessing the information from a documentary linkage process described above
is now in the position to take a targeted approach towards DNA
testing its name bearers. Instead of
testing anyone ready to pay, or men resident in or associated with defined
geographical locations, the project manager can now set out to systematically
test one male from each documented tree.
Approaches
to potential testees can be organised in different
ways. For example, one might prioritise
those that appear to have origins outside of the hypothesised area of the
surname’s origin (on the grounds that the DNA
result might link them to a tree within that area), or those that appear to
have their origins within the area of origin (to determine whether those which
stem from that area have the same DNA
signature), or those who appear in the largest trees currently documented (of
interest potentially to the largest number of living name bearers).
While
one can readily create intermediate goals within a project designed ultimately
to reconstruct the trees of an entire surname, the goal of a DNA
project for a surname with an origin in, say, England is to build up a matrix
of results of individuals living in England and linked to trees with origins in
England. This is a vital caveat for any
surname study using its DNA
results to hypothesise about its origins.
Just as the most commonly found haplotype does not necessarily signal
the DNA
signature of the oldest ancestor, even a calculation based upon the number of
historical individuals within trees cannot do so unless these calculations are
built exclusively upon data gathered within the country of origin.
The
simple reason for this is that two populations, for example in England and the USA, have not
faced the same conditions and will have reproduced at different rates. The disparity in the genetic makeup comparing
different same-surname populations is the result of differences in their
reproductive success (brought about by local conditions) and the operation of
the founder effect (whereby a new population starts out as less genetically
diverse than the original population).
No DNA
project I am aware of has formally reported on these effects but anecdotal
evidence from several surname projects which I have seen confirms both their
presence and influence within surname-led Y-chromosome projects.
An
initial aim for a dual documentary/DNA
project is to test one man per documented tree.
While this might seem a huge task, any active documentary research
conducted in parallel will over time consolidate and reduce the total number of
trees in the combined project. This
process happens most quickly where there is a genuinely iterative research
process combining inputs from both the DNA
and documentary projects as they arise.
When two men are found to have the same DNA
signature, this directs the documentary research activity towards finding the
common ancestor signalled by the DNA
result. In many cases this ancestor will
have existed after 1841 or a generation or two prior to it. A regular pattern emerges: over time several small
trees coalesce into a single larger tree, which by then will have several DNA
tests associated with it through its living descendants, all of whom report a
consistently held Y-chromosome result.
A
common feature of dual projects is that while some documented trees appear to
have no living descendants in the country of origin, the trees do hold men who
have emigrated and who have living descendants overseas. In these cases one has no option but to DNA
test a male member living outside of the country of origin. Figures from the Pomeroy project suggest this
scenario covers around 10-20% of trees in an ongoing reconstruction project.
As
projects develop over time and the number of distinct trees is reduced, a new
priority will assert itself within the DNA
project: to test a minimum of two men per tree.
This cross-referencing removes the possibility that the solo DNA
haplotype associated with any particular tree does not, in fact, reveal the DNA
of the entire tree but merely of that individual’s personal line which has been
contaminated by external DNA.
With
two or more men tested per tree, the single DNA
signature identified is not then associated with the individuals who have been
tested but with the common male ancestor they share. At this point, one can say that the DNA
result is that of a specific historical figure who is located both in time and
space. In the case of trees where,
because there are no living descendants in the home country, the men being
tested live outside it, that specific historical figure can have lived no
earlier than the original emigrant.
Any
wide scale Y-chromosome surname testing programme will throw up inconsistencies
that have to be explained, of which the main one will be the presence of DNA
within a tree which is different from the DNA
of trees geographically close to it or in the rest of that documented tree.
Particularly
in the early stages of a dual project, descendants of different trees that
appear to originate in the same geographical area may return markedly different
DNA
haplotypes. The question then posed is
which scenario is more likely: could they one day be documented within the same
tree but one of them hold a different DNA
result due to an earlier non-paternity event, or do the different DNA
results point towards them being two long-established trees of different and
unrelated origins? In this context, non-paternity event includes not only
marital infidelity, but also social means of introducing different Y
chromosomes into a surname line, e.g. adoption of unrelated children or
re-adoption of the surname along the female line. One way to resolve this kind of question is
to look at the haplotypes associated with other trees originating in the same
geographical area. Do they show a
dominant DNA
signature, or not? The direct way is to take a second test within each of the
apparently conflicting trees. This
method will answer the question of whether the initial DNA
result should be associated with the individual or the tree. Indeed, repeated use of this method can
potentially pinpoint the individual who first carried the different DNA
within the tree.
By
adopting the above approach, the matrix of DNA
results collected will reveal patterns of linkages between trees, and specific
historical individuals within those trees, rather than between present-day
descendants. The historical content and
value of the pool of DNA
results is made much richer, and much more useful, by developing it in a
coordinated manner.
A
key point to note is that as a DNA
testing programme develops, the need to use contextual
data to help refine the hypotheses associated with the results increases. In summary: the DNA
results point to potential linkages, the contextual evidence either supports or
conflicts with those hypotheses, but only the documentary evidence demonstrates
how the tree is actually put together.
I
suspect that dual-approach surname studies will find three broad haplotype
frequency patterns – a single haplotype dominant, a pair of roughly equal
frequency haplotypes, or no dominant haplotype.
As yet we have no data on this issue as so few dual-approach studies
have reached the point of completion and are able to report their findings on
this point, so at this stage it is moot whether they will reveal a consistent
pattern across all surname types, frequencies and geographical regions of
origin.
Under
the method described above, even though the DNA
results can now be interpreted on the level of trees rather than individual
living males, one still cannot assert that the modal haplotype – the most
frequently found result – indicates the DNA
signature of the surname’s original founder.
The contextual information that is useful at this point is the
historical number of name bearers in each tree, not the number alive today. Some trees, it turns out are old and thin,
i.e. they have very few living descendants but have old origins, while others
may be fat and young, i.e. they have a large number of descendants within a
tree that can be traced no further back than a couple of generations. By counting the total number of historical
individuals in the trees associated with each DNA
signature, a more balanced picture can emerge when hypothesising about the
modal DNA
signature.
Pomeroy:
Worked Example
Details
from an actual project will help to illustrate the points made above.
When
the Pomeroy DNA
project was launched back in 2000, a documentary project had already been
underway for many years and much data, including civil registration index
entries, collected. Its findings were,
however, patchy. A few trees had been
researched back to the 1600s, but most existed as fragments. In many cases, a review of submitted research
revealed that for some trees one original piece of research had been passed to
us by several different people, and while each had added some additional data
none had gone back to check the original material. The unchallenged acceptance and re-use of
existing material is a feature of family history research that has increased
since the arrival of the web.
The
decision to test one man per tree was taken primarily to maximise the
opportunity offered within a large scale Y-chromosome testing programme
conducted by Professor Bryan Sykes’ lab at the University of Oxford, and partly
to avoid mass mailing the pool of 825 men whose home addresses we had taken
from a complete edition of the electoral roll.
Slightly more than 300 trees were identified by using the civil
registration data (no on-line census data were available at that time), of
which individuals from 51 of the largest trees known at that time were DNA
tested. The results revealed two strong DNA
signatures, a pattern that remains true after nearly ten years’ further
research.
After
the first results came through in 2001, the targeted approach to identifying DNA
testees was extended, and by 2007 broadly speaking we were able to claim that every significant tree of UK origin
had at least one DNA
result associated with it. By late 2009
the number of documented historic trees has been consolidated down to 49, plus
an additional 8 where there are no living descendants in the UK but which may
have living descendants outside of it in former colonies. All but one of these 49 trees is documented
to an origin prior to 1841. Put simply,
we are now quite sure that there are no post-1837 trees, evidenced by UK records, that we do not know about.
The
drop in the number of trees from about 300 to 49 appears dramatic, but it was
largely achieved by patient documentary research. Back in 2001 we were still waiting for the
first of the national censuses to go online.
FreeBMD, the most readily available online
source of civil registration records for England and Wales, had
barely started (as of September 2009 it hosted more than 173 million unique
records). As new datasets became
available we incorporated them into our documentary research, to the point
where we have now cross-referenced census indexes in all but one case from more
than one supplier per census. As it
stands, only several hundred civil registration events out of some fifteen
thousand post-1837 records remain to be linked to a specific individual and
which we do not anticipate will be resolved by the arrival of the eventual
enhanced index to the civil registration records.
Critical
to developing the dual project approach has been the availability of funds to
support simultaneously both mechanisms for resolving doubt or unravelling
inconsistencies: further DNA
testing and the purchase of documentary data or access to it. Our dual project is backed by a family
association willing to subsidise DNA
tests where needed. Our documentary
research is led from the centre and recorded in a single offline database,
though other more distributed or online models of collaboration could work at
least as well.
Pomeroy
Project Results’ Analysis
As
things stand, we have now DNA
tested the oldest common descendant that we can in almost every tree associated
with our main surname and three originally defined variants. Put simply, we’ve run out of men to DNA
test, except in a few specific instances which we consider to be low
priorities.
What
strikes me is that the pattern of the DNA
results is broadly the same as we discovered in 2001. What has changed is my interpretation of it.
The
two strong DNA
signature haplotypes still stand out, and together are associated with 14 of
the 49 trees which contain more than 40% of all historical name bearers born in
the UK. A further 9 trees are grouped within three
other haplotypes, 20 have an associated DNA
result recorded only once within the project and 6 small trees are presently untestable due to the lack of a willing testee.
Some
of the 20 singleton results, where the haplotype has been found in only one
tree in the project, fall in trees with a documented origin back three or more
centuries, while others fall within trees where only one individual has been
tested to date and could thus turn out to be individual haplotype anomalies
caused by a non-paternity event.
Back
in 2001 Professor Sykes confidently told me that the pattern of results we’d
produced indicated a multiple origin surname, i.e. that different, unrelated
men took on the name and founded trees at different times and places. I was quite happy with that result at the
time as it fitted with my own expectation that most surnames will have multiple
origins. Indeed, the reason I first
contacted Bryan Sykes was to query his result for his own surname, which is of
higher frequency in the UK
than Pomeroy by a factor for four or five, and which he had declared to be of
single ancestral origin.
Nearly
a decade later I look at the combined and expanded DNA
and documentary data and wonder if the pattern revealed might suggest a single
origin after all. Under this scenario
all the differences in the DNA
results found in trees other than the two strong modal results could be
accounted for as resulting from non-paternity events earlier than their
currently documented origin but since the period of the settled formation of
surnames, the majority in the period 1400-1650.
Even more tantalising is the possibility that the oldest trees heading
the pair of strong modal haplotypes could both turn out to contain old non-paternity
events as part of a single tree. This
might potentially link back to the Norman noble family based in Berry Pomeroy Castle, Devon.
Our
surname project has reached the limits of what DNA
testing, in its current form, can reveal to help us identify and cluster the
trees in our project, but there remains substantial documentary research to be
done within the period of parish records, roughly 1550-1840, and prior to it,
in order to reveal the full picture of the surname's origins and all its true
variants.
Ten
Key Learning Points
Following
are ten key points for dual-approach projects, the first six related to
management and the final four related to the analysis of results.
1. The running of a dual-approach surname
project is different from a standard all-comers DNA
project. It requires a focus on the
country of origin of the surname, both in terms of documentary research and in
terms of identifying individuals to take the DNA
test.
2. While DNA
results of name bearers outside the originating country are of interest within a
sub-project covering a defined geographical area, they are of no use in
identifying the origin of the surname.
There will, however, be occasions where no living descendants of a
UK-origin tree can be found except in sub-lines headed by emigrants. In these cases the emigrant’s descendants are
the only living bearers of the haplotype associated with an historical member
of the tree.
3. The project manager of a growing
dual-approach project will increasingly wish to subsidise the cost of the DNA
tests of new participants as the search for the right man to take the test
becomes more and more specific.
Sometimes the hunt will lead around the country, or indeed the world,
for the one man whose result will add more meaning to the existing matrix of
data.
4. Advancing the documentary reconstruction
side also requires funds. Alternative
individual profiles suggested by the documentary data will wait to be resolved
until funds can be applied to solve them, primarily through purchasing the
underlying data behind the civil registration event indicated in the online
indexes. A project manager also has to
reserve time to liaise with other researchers and to help correct mistakes in
their submitted research (which you can be sure that they will have made).
5. There are distinct marketing benefits
when running a defined tree-based DNA
testing programme; it should be easier to raise money and persuade specific
people to take part when you can show them that their personal result fits into
a wider plan.
6. The project needs a regularly updated
report of its findings both as a record of its status and as a marketing tool.
7. The full benefits of a dual documentary/DNA
project are only realised when contextual data is included. DNA
results can be sorted into ‘genetic families’ based upon the raw DNA
results alone, but thereafter the analysis of the trees and how they link
together is driven by contextual data and documentary evidence. The overall aim of a dual-approach project is
to produce a set of combined data that is internally and externally consistent.
8. Early non-paternity events can create a
significant DNA
haplotype within the pool of surname-wide DNA
results. The oldest non-paternity event
so far documented in a Pomeroy tree falls in the early 1600s, and the
associated haplotype is clearly different from any other found within the
project. Well distributed haplotypes
within a surname, even where documented in an old tree, can plausibly be
hypothesised as very old non-paternity events.
In many cases researchers are faced with a choice of origins for
individual trees that can be documented back to an early origin but apparently
no further: is the individual at the head of the tree its original ancestor,
and if so did they take the surname as a conscious choice or were they
bequeathed a different genetic heritage through a non-paternity event?
9. The dual project approach gradually
removes the need for project managers to use the Time to Most recent Common
Ancestor (TMRCA) calculation. As the
project develops you will find that you are not asking whether two DNA
results are related or not, which the TMRCA calculation is used to adjudicate,
but whether two trees can be linked documentarily or not.
10. The rate of female surname transmission
(through unmarried mothers) is rising fast in the UK: the
current annual rate found in the civil registration records within my surname
group is greater than 30%. Barring any
further advances in DNA
testing technology, this only increases the need for a complementary documentary
and oral research approach within a DNA
testing programme.
Conclusions
1. A dual DNA/documentary
approach will always produce superior benefits for researchers than a single
approach of either type.
2. A surname reconstruction project will
combine three methods: oral history, genetic history, and documentary
history. The overall aim of a dual
documentary/DNA
approach is to build a surname project that has a consistent set of DNA
results that are explained by the documentary evidence.
3. Documentary reconstruction back to the
1840s for surnames of English or Welsh origin is feasible because the civil
registration and census data are now readily available online.
4. Documentary reconstruction is best
undertaken for surnames up to a certain frequency. Using the Pomeroy project, with fewer than
2,500 living name bearers in the UK across
its four constituent surnames, as a baseline I estimate that tackling surnames
up to twice that frequency using the dual approach is feasible.
5. After a surname-wide DNA
project is completed, the finishing touches to the dual project, and further
advances, will always be supplied by pre-modern documentary evidence.
Next
Steps
Within
the family history community, given that a single method approach is markedly inferior
to a dual project approach, I would like to suggest that:
·
DNA
project managers can profitably explore collaboration with Guild members, and
the body of researchers that they each coordinate, in order
o
to increase the number of
dual-approach projects;
o
to
outsource the active reconstruction of trees to active researchers.
·
For documentary
researchers, including Guild members and lineage societies, it is doubtful
whether any documentary surname project can be deemed accurate or complete
without a corresponding genetic testing project being undertaken to verify the
composition of the trees as documented.
Even in those cases where documentary researchers do not wish to run a DNA
project, it is a simple task to set up a surname DNA
project at Family Tree DNA
for the purpose of collecting any ad hoc DNA
results that you can, and you may later be able to identify a DNA
project administrator from among their number.
The
discussion in this paper suggests that detailed papers in the following area
would benefit both documentary researchers and DNA
project managers:
·
A list and an analysis of
the results of existing dual-approach projects, including an evaluation of how
many unique haplotypes have been found per surname and evidence of the
different genetic histories of their US
and UK populations.
o
A detailed write up of a
single complete, or near complete, dual-approach surname project, or of a
single tree headed by a known ancestor analysed using the dual-approach method.
o
A meta review of
Y-chromosome surname projects to analyse how often new genetic material enters
the male line and how often mutations occur, and to create statistics on the
number of unique haplotypes found among descendants holding a surname where the
documentary records indicate an unambiguous single origin in a specific county.
Looking
forward, the insights outlined in this paper suggest that historians may in
future wish to:
·
Use databases of public
Y-chromosome results to identify links between surnames that have not been
hitherto recognised and thus shed light on the process of surname evolution in
the medieval period. This analysis would
be built upon documentary research to establish the presence of surnames in the
same location around the time of surname formation. Some surnames with the same DNA
haplotype(s) may be related because they stem from the same localised gene
pool, while others will be revealed as linked because they share the same
genetic ancestor within the timeframe since surnames became established though
a previously unrecognised etymological connection.
·
Use databases of public
Y-chromosome results to develop a regional DNA
analysis, for example by collating and reviewing data on all Cornish-origin and
Yorkshire-origin surnames to see what patterns emerge, or a classification
analysis, for example looking at the different types of surnames such as
locative-origin names or nickname-origin names.
Databases of Irish and Scots clans already exist and reveal how a gene
pool can be mapped against a range of family/clan names.6
·
Develop time series of data
within genetically-verified family trees, e.g. illegitimacy rates, infant
mortality rates, age at death and the length of a generation.
Acknowledgments
My thanks to Debbie Kennett, John Creer,
and Susan Meates for comments on pre-publication drafts of this paper.
Disclosure
Chris
Pomery has a commercial contract to promote Family Tree DNA
in the UK The opinions
expressed in this article are entirely his own.
Web Resources
Guild
of One-Name Studies
http://www.one-name.org
FreeBMD
www.freebmd.org.uk
Irish DNA Data
www.gen.tcd.ie/molpopgen/resources.php
Scottish
Clans Project Data
www.scottishdna.net/view.html
References
Carden A (2009) The Carden
DNA Project.
Pomery
C (2008) Using DNA
Testing in the Pomeroy Surname Reconstruction Project. (29 June 2008).
Creer
J (2008) Creer DNA Study: How DNA Analysis Has Transformed
the Knowledge of a Manx Family's History.
Plant JS (2005) Modern methods and
a controversial surname: Plant. Nomina, 28, 115–133.
Meates S (2006a) DNA project has
produced discoveries in the Meates One-Name Study not possible with paper
records alone. J One-Name Studies,
9(1):6-10.
Meates S (2006b) DNA testing of
tremendous value in sorting out variants in my one-name study (Part 2). J One-Name Studies, 9(2):6-9.
Meates S (2006c) Some tips on
establishing a DNA project for your one-name study. Final part of a special
series on DNA Projects. J
One-Name Studies, 9(3):8-10.
Sykes
B, Irven C (2000) Surnames and the Y Chromosome. Am J
Hum Genet, 66:1417-1419.