Haplogroup E3b1a2 as a Possible Indicator of Settlement in Roman Britain by Soldiers of Balkan Origin

 

Steven C. Bird

 

Abstract 

 

The invasion of Britain by the Roman military in CE 43, and the subsequent occupation of Britain for nearly four centuries, brought thousands of soldiers from the Balkan peninsula to Britain as part of auxiliary units and as regular legionnaires.  The presence of Haplogroup E3b1a-M78 among the male populations of present-day Wales, England and Scotland, and its nearly complete absence among the modern male population of Ireland, provide a potential genetic indicator of settlement during the 1st through 4th Centuries CE by Roman soldiers from the Balkan peninsula and their male Romano-British descendants.  Haplotype data from several major genetic surveys of Britain and Ireland are examined, analyzed and correlated with historical, epigraphic and archaeological information, with the goal of identifying any significant phylogeographic associations between E3b1a-M78 and those known Romano-British settlements and military posts that were associated specifically with Roman soldiers of Balkan origin.  Studies by Cruciani et al. (2007), Perečić et al. (2005), and Marjanovic et al. (2005), examining the distribution of E3b1a-M78 and E3b1a2-V13 in the Balkans, are analyzed further to provide evidence of phylogeographic associations between the E3b1a2 haplotypes identified within the Balkans by these studies and those regions of the Balkans occupied first by the Roman army in antiquity.  E3b1a2 is found to be at its highest frequency worldwide in the geographic region corresponding closely to the ancient Roman province of Moesia Superior, a region that today encompasses Kosovo, southern Serbia, northern Macedonia and extreme northwestern Bulgaria.  The Balkan studies also provide evidence to support the use of E3b1a-M78 (in the present study) as a close proxy for the presence of E3b1a2-V13 (representing 85% of the parent E3b1a-M78 clade) in both the Balkans and in Britain. 

 

 

 

Address for correspondence: stevenbird1000@hotmail.com

 

Received:  July 1, 2007;  accepted:  Sept. 15, 2007.

 

 

 

 

Introduction

 

The origins, arrival times and possible routes of migration of the E3b haplogroup[1] to Britain have been the subject of debate among population geneticists for several years.  In his book, The Origins of the British, Stephen Oppenheimer (2006) advanced a theory of a Neolithic time period for the arrival of E3b (and a companion haplogroup, J2) in Britain,[2] corresponding to the period from 6.5-5.5 kya (thousands of years ago) and originating from the Balkan peninsula.[3]  The data upon which his conclusions concerning E3b and J2 were based[4] were derived from two surveys of British haplotypes, published by Weale et al. (2002) and Capelli et al. (2003).  Oppenheimer based his hypothesis, in part, upon a study by Semino et al. (2004) that addressed the spread of haplogroups E3b and J in western Europe during the Neolithic era, but did not include any data from the island of Britain specifically, and upon the earlier work of Ammerman and Cavalli-Sforza (1984) concerning the “demic diffusion” model, in which the Linearbandkeramik or LBK culture arose after the Neolithic transition (with the Starčevo-Körös-Çris cultures) about 6000 BCE or 8 kya.

 

At about the same time as the release of Oppenheimer’s book, Bryan Sykes (2006) published his book Blood of the Isles (entitled Saxons, Vikings, and Celts in the U.S. release).  Simultaneously, Sykes also published the data from the Oxford Genetic Atlas Project (OGAP) online.[5]  While the studies of Weale and Capelli were limited geographically either to a specific transect of Britain or to a grid pattern containing large gaps of unsampled and potentially significant geographic regions within Britain, the Sykes OGAP study covered every region of the island of Britain.[6] 

 

Very shortly after the publication of these two books, Cruciani et al. (2007) published a new study defining ten subclades of haplogroup E3b1a-M78 through several newly identified unique event polymorphisms (UEP’s).[7] The subclade E3b1a2 (identified by the presence of the V13 and V36 UEPs) was found by Cruciani et al. (2007) to have a strong phylogeographic association with the southern Balkan peninsula; this subclade also was found by the same study to correspond very closely to the α (“alpha”) cluster of E3b1a-M78, first identified by Cruciani et al., (2004) using microsatellite (STR) data.  Cruciani (2007) also stated that the subclade defined by the V13 UEP (phylogenetically equivalent to E3b1a2 and E3b1α) was found in 85% of western European males who also were positive for E-M78.

 

Semino et al. (2004) viewed E3b1a-M78, of which E3b1a2 is, by far, the most common subclade in Europe, as an indicator of the diffusion of people from the Balkans (along with a “companion” clade, J2b1-M12/M102) and therefore a candidate for a residual genetic signature of the Neolithic demic diffusion model. Cruciani et al. (2007) have brought the Neolithic dating assumption into question, however, by their revised dating of the expansion of E-V13 and J-M12, from the Balkans to the remainder of Europe, to a period no earlier than the Early Bronze Age (“EBA”)[8] 

 

Two dating methods were employed by Cruciani (2007) to calculate the “time to most recent common ancestor” ("TMRCA"): that of Zhivotovsky et al. (2006) based on his “evolutionary effective” mutation rate for an average square distance ("ASD") calculation, and the second based on Forster et al. (1996) and Saillard et al. (2000) utilizing ρ ("rho") statistics, employed to “assay how robust the time obtained is to choice of method.”  Cruciani et al. (2007) found that Forster’s method produced time estimates that were slightly younger than the ASD-based method but that the difference was significant only for the root of the entire haplogroup.

 

An important finding of this study was that E-V13 and J-M12 had essentially identical population coalescence times.  They concluded that the E-V13 and J-M12 subclades expanded in Europe outside of the Balkans as the result of “a single evolutionary event at the basis of the distribution of haplogroups E-V13 and J-M12 within Europe, a finding never appreciated before.”  Further, Cruciani, et al. (2007) wrote that

 

Our estimated coalescence age of about 4.5 ky for haplogroups E-V13 and J-M12 in Europe (and their C.I.s) would also exclude a demographic expansion associated with the introduction of agriculture from Anatolia and would place this event at the beginning of the Balkan Bronze Age, a period that saw strong demographic changes as clearly testified from archeological records.

 

These expansion times were calculated by Cruciani (2007) to have occurred between 4.0-4.7 kya for E-V13 and 4.1-4.7 kya for J2-M12, with the upper limit of the expansion time for E-V13 at 5.3 kya and for J2-M12 at 6.4 kya.  Both expansion times therefore are centered at approximately 4.3-4.35 kya, a period of time corresponding to the EBA in the southern Balkans (Hoddinott, 1981).

 

Cruciani et al.’s E-V13 and J2-M12 coalescence times bear a striking similarity to carbon-14-based date calculations for certain archaeological sites in the Maritsa river valley and its tributaries, near the city of Nova Zagora, Bulgaria (Nilolova, 2002).  These sites are associated directly with the proto-Thracian culture of the southern Balkans that came to dominate the region during the first millennium BCE.  Sites surveyed included Ezero, Yunatsite, Dubene-Sarovka and Plovdiv-Nebet Tepe, all of which had deep associations with the developing EBA proto-Thracian culture of the region.  It is evident that if Cruciani et al. (2007) are approximately correct in their dating of the expansion of E-V13 from the Balkans, then Oppenheimer’s theory of the role of E3b in Neolithic Britain is flawed fundamentally.  E3b1a2 could not have arrived in Britain during the Neolithic era (6.5-5.5 kya) if it had not yet expanded from the southern Balkans.  

 

Another difficulty for the acceptance of Oppenheimer’s “Neolithic” arrival time for E3b and J2 in Britain is the virtual absence of these haplogroups in Ireland, according to two recent large-scale population studies (McEvoy, 2006a; Moore, 2006).  The data, compiled by the Smurfit Institute of Genetics at Trinity College, University of Dublin, demonstrated that E3b appeared in Ireland at an extremely low level of just eight examples out of a total of 1921 haplotypes tested, or 0.42% (less than 1/2 of one percent) when combined.[9]  Remarkably, no samples of Y-haplogroup J (any subclade) were found by either study.  In a surname study, also from the Smurfit Institute, McEvoy et al. (2006b) found a 5-6% E3b presence (n=3) within a very small sample (N=47) among males in Ireland who had "Norse" surnames. Two of the resulting three E3b samples, however, may have resulted from a single founder (surname Arthur).  McEvoy (2006b) stated, "In the Arthur [surname] both samples . . . were identical suggesting a single origin or introduction to Ireland."  Therefore the apparently higher percentage of E3b haplotypes found in this study may be due to a founder effect and to the very small sample size.  Capelli et al. (2003) also had found the "Neolithic" haplogroups of E3b and J entirely absent in Ireland, based on two sample locations, one taken from "a site in central Ireland that has had no known history of contact with Anglo-Saxon or Viking invaders," (Castlerea) and the other near Dublin (Rush). 

 

If E3b1a-M78 had in fact arrived during the Neolithic era by water routes from Iberia and the Mediterranean, there would not appear to be any obvious reason for it to be distributed so unevenly between Britain and Ireland. Oppenheimer acknowledged this problem indirectly in Chapter 5 of his book (“Invasion of the Farmers”) and stated in so many words that he had no explanation for the avoidance of Ireland in favor of Britain by E3b (and J2), a problem that apparently did not affect other Neolithic haplogroups identified by Oppenheimer  (2006, pp. 193-194, 206-207), namely I1b* and I1b2, allegedly following the same route along the Iberian coast and from the Mediterranean.

 

While a Neolithic arrival date for E3b in Britain, as suggested by Oppenheimer, is evidently rendered impossible by Cruciani et al. (2007), a Bronze Age arrival date is not necessarily excluded.  The British Bronze Age lasted from approximately 2400 to 600 BCE and involved a succession of closely related cultures arriving from the continent.  Norman Davies (1999, pp. 21-25) has identified several groups associated with the British Bronze Age.  The first to arrive were the "Beaker Folk," followed by the "Flanged-axe Warriors" and, three centuries later, the "Urnfield People."  The Beaker Folk (also referred to as the "Beaker Culture") are believed to have come mainly from Northwest Europe (Cockburn, 1969, pp. 36-41), and also may have been associated with the spread of the Celtic language (Cunliffe, 2004).  An excavation of the so-called "Amesbury Archer" grave, near Stonehenge, has provided an example of a Beaker Culture high status burial, probably an elite ruler (as evidenced by the valuable grave goods found with the skeleton), whose origins have been traced to an area in the Alps using oxygen isotope analysis of tooth enamel (Fitzpatrick, 2003).  He is dated to approximately 2,300 BCE, and may have spoken an early form of Celtic.  The place of origin is significant because it locates the "Archer's" birthplace in a region of Europe other than the Balkan peninsula at approximately the same time that E-V13 only was beginning to expand from the Balkans to the rest of Europe.  Therefore, it is improbable that the "Archer" (and his associated Beaker culture peers) originated from a region closely associated with E3b1a2 during the EBA.

 

Bronze Age and Iron Age Celtic-style cultures have been identified by both Oppenheimer and Sykes as being associated with the so-called "Western Atlantic Modal Haplotype" (R1b1c).  Alcock (1972, pp. 99-112) has examined the model of a Celtic Irish-Sea culture-province in the pre-Roman Iron Age (“IA”), in particular connections across the Irish Sea, including a dominant Irish cultural component, as well as related settlement in Wales, Strathclyde, Argyll, and southwestern Britain.  The problems encountered by the Neolithic theory of Oppenheimer, i.e., the virtual absence of E3b and the complete absence of J in Ireland, also are inherent in any BA or IA scenario.  A BA or IA culture that extended across the Irish Sea probably would not have caused a significant difference in the Y genetic admixture of Britain and Ireland.  Barring any later discovery of significant levels of E3b or J in Ireland, it would appear that whatever historic or prehistoric migratory movements were responsible for these haplogroups' presence in Britain had little or no impact on the male genetic component of Ireland. 

 

Difficulties with Neolithic, BA, and IA models for the migration of E3b and J2 to Britain raise a question: If these haplogroups did not arrive during these eras, then when might they have migrated? To assist in developing a meaningful hypothesis concerning possible migratory routes for E3b1a2 from the Balkan peninsula to Britain, a comparison of the published haplotypes from the three aforementioned population surveys was made, with the goal of identifying any residual patterns of settlement among British E3b haplotypes. 

 

Methodology

 

The three data sets of Capelli, Weale, and Sykes used six to ten STR markers to determine the haplotype of each sample; this necessarily limited the resolution of the data.  Both Weale and Capelli reported the haplogroup results for each of their haplotypes, classifying E3b haplotypes as "M35" (Capelli) or “Haplogroup 21,” using the older nomenclature for the haplogroup defined by YAP/SRY4064 (Weale).  Sykes reported a greater number of Y-STR values for each haplotype tested, up to ten, but did not provide any individual haplogroup identifications in the OGAP.  In the case of the first two studies, samples were selected for inclusion if the donor and the donor's paternal grandfather had been born within 20 km (Weale) or 20 miles (32 km) (Capelli) of the location being surveyed.  In the third study (Sykes) the geographic origin of the haplotype was assigned according to the paternal grandfather's birthplace.[10]  In effect, the haplotypes gathered were a sample of Britain's male population distribution patterns largely before World War I. 

 

A factor preventing direct comparison (by percentage) of these three data sets was a substantial difference in the format used for reporting the geographic origins of individual haplotypes.  Weale (2002) and Capelli (2003) each specified geographic locations (towns or villages) in their reports of haplotype frequencies; Sykes (2006), however, reported his findings by combining locations into larger geographic regions that, in most cases, joined several British counties into a single data set (such as "Central England," or "Borders").[11]  A map of the approximate locations for all "Eshu" haplotypes was provided in the Appendix of the book, but the precise geographic location of individual haplotypes was not provided.[12] 

 

Even with these limitations, however, it was possible to identify some trends in the three data sets when combined.  Using Whit Athey's Y-Haplogroup Predictor, version 5 (Athey, 2006), nearly all of the haplotypes in Sykes’s OGAP study could be classified into their respective broader haplogroup categories, (such as E3b-M35 or I1b1-P37.2), based on their reported allele values.  The haplotypes estimated to be E3b were then further analyzed and refined using histograms (Figure 1) developed in the present study, based on the allele frequencies found in E3b1-M35 and three of its subclades, and a public Y-DNA database compiled by the "E3b Y-DNA Project” from those haplotypes whose subclades have been confirmed as M35+.[13] 

 

 

 

 

Figure 1.  Histograms of E3b-M35, Allele Frequencies by Subclade

 

 

 

As a third method of estimation (and to provide a check of the robustness of the derived subclade assignments), a median-joining network was constructed in NETWORK 4.2.0.1,[14] using the allele data from the OGAP predicted as E3b by Athey's Y-Haplogroup Predictor, as shown in Figure 2.[15]  The E3b (estimated) data from the OGAP is presented in Table 1, along with the subclade assignments determined using the methodology outlined above.  These assignments are also presented in Table 2, grouped according to the OGAP's geographic regions and the overall percentages for each E3b subclade.

 

 

       Key to multiple taxa nodes:

 

       Modal:         A2960, 5371, 4018, A3040, A3097, A3174, A2923, A2833

                             A3429, A2967,A3029, A9065 (E-M78)

       2745:          2745, A2243, A2950, A1201 (E-M78)

       738:                     738, A2090, A2109, A3093 (E-M78)

       5251:                   5251, A8115, A8135, A8584 (E-M78)

       A2981:        A2981, 503 (E-M78)

       A2751:        A2751, A2211 (E-M78)

       A2547:        A2547, A231 (E-M78)

       (All other nodes have one taxon each.  Geographic descriptions apply only to the adjacent taxon).

 

 

Figure 2.  Median-Joining Network, OGAP Data, E3b

 

 

 

Table 1.  OGAP E3b Data Grouped by Region and Classified by Subclade

 


 

 

3

3

1

3

4

3

3

3

3

4

 

 

 

 

9

9

9

9

2

8

8

9

8

2

 

 

OGAP

 

3

0

 

1

6

8

9

2

9

5

 

 

Haplotype

 

 

 

 

 

 

 

|

 

|

 

Estimated

 

Number

OGAP Regional Identification

 

 

 

 

 

 

1

 

2

 

Subclade

 

 738 

 Argyll 

13

23

13

10

 

 

13

11

17

0

E3b1a-M78

 

 A2547 

 Borders

13

24

13

10

11

12

14

11

17

0

E3b1a-M78

 

 A2960 

 East Anglia 

13

24

13

10

11

12

13

11

17

12

E3b1a-M78

 

 5251 

 East Anglia 

13

24

14

10

 

 

13

11

18

 

E3b1a-M78

 

 5371 

 East Anglia 

13

24

13

10

 

 

13

11

17

 

E3b1a-M78

 

 5924 

 East Anglia 

13