DNA and Family Tree Research

Friday, 17 April 2020

Statistical Analysis of Irish Type III signature

Chi-squared test on Irish Type III Analysis

Dennis Wright’s paper from 2009 describes a four-fold greater frequency of the Irish Type III (IT3) signature among Dalcassian surnames compared to non-Dalcassian surnames … http://www.jogg.info/pages/51/files/Wright.pdf

This data is summarised in Tables 7 and 8 of the paper.

Thus, among men with Dalcassian surnames, 57 had the IT3 signature and 214 did not. Similarly, among men with non-Dalcassian surnames, 37 had the IT3 signature and 334 did not.

Chi-square test

I put these values into the 2x2 contingency table that forms part of the chi-square calculator at https://www.socscistatistics.com/tests/chisquare/default2.aspx.

The contingency table below provides the following information: the observed cell totals, (the expected cell totals) and [the chi-square statistic for each cell].

The chi-square statistic, p-value and statement of significance appear beneath the table. Blue means you're dealing with dependent variables; red, independent.


	IT3+	IT3-	*Marginal Row Totals*
Dalcasian	57 (39.68) [7.56]	214 (231.32) [1.3]	271
Non-Dalcassian	37 (54.32) [5.52]	334 (316.68) [0.95]	371
*Marginal Column Totals*	94	548	642 (Grand Total)

The chi-square statistic is 15.3283. The p-value is .00009. This result is significant at p < .01.

The chi-square statistic with Yates correction is 14.4561. The p-value is .000143. Significant at p < .01. (There's probably a consensus now that the correction is over-cautious in its desire to avoid a type 1 error, but the statistic is there if you want to use it).

If we analyse only those Dalcassian surnames in bold in Table 7, we get the following results:


	IT3+	IT3-	*Marginal Row Totals*
Dalcasian	51 (22.04) [38.03]	73 (101.96) [8.22]	124
Non-Dalcassian	37 (65.96) [12.71]	334 (305.04) [2.75]	371
*Marginal Column Totals*	88	407	495 (Grand Total)

The chi-square statistic is 61.7173. The p-value is . This result is significant at p < .01.

The chi-square statistic with Yates correction is 59.6042. The p-value is . Significant at p < .01.

Fisher Exact Test

I also used another calculator on the website to do a Fisher Exact Test … https://www.socscistatistics.com/tests/fisher/default2.aspx

The Fisher exact test statistic and statement of significance appear beneath the table. Blue means you're dealing with dependent variables; red, independent.

Results
	IT3+	IT3-	*Marginal Row Totals*
Dalcassian	57	214	271
non-Dalcassian	37	334	371
Marginal Column Totals	94	548	642 (Grand Total)

The Fisher exact test statistic value is 0.0001. The result is significant at p < .01.

If we analyse only those Dalcassian surnames in bold in Table 7, we get the following results:

Results
	IT3+	IT3-	*Marginal Row Totals*
Dalcassian	51	73	124
non-Dalcassian	37	334	371
Marginal Column Totals	88	407	495 (Grand Total)

The Fisher exact test statistic value is < 0.00001. The result is significant at p < .01.

Conclusions

Both the chi-square test and Fisher exact test confirm that all comparisons are statistically significant, with p < 0.01 for all comparisons.

Maurice Gleeson

April 2020

Thursday, 16 April 2020

When were surnames introduced to Ireland?

The short answer is: about 1000 years ago, but ...

And that "but" represents the fact that different Irish surnames arose at different times, usually between 900 to 1350 AD, and mostly between 950-1150 AD, with the busiest period being 1000-1050 AD. Also, surnames with the prefix O or Ó were formed prior to 1200 AD and those that formed afterwards were mainly those with the prefix Mac. These are the top-line conclusions from the data analysis that follows.

Back in 1923, Woulfe claimed that "Irish surnames came into use gradually from about the middle of the 10th to the end of the 13th century". [1] But let's look at some of the hard evidence supporting this statement. I have gathered such data from two sources - the introduction to the 1923 edition of Woulfe's Irish Names and Surnames [1] and a more recent 1999 journal article by Ó Murchada. [2]

In his 1999 review article, Ó Murchada provides a table in the appendix with years of death for 78 progenitors of a selection of Irish surnames. [2] The complete list of 78 surnames is included in Footnote 1.

Ó Murchada suggests that the average surname would have come into use during the time of the progenitors' great-grandsons.

Using Corpus Genealogiarum Hiberniae as source, I extracted thirty genealogies which could be traced in the annals and dated for at least ten generations. This gave a total of 416 generations, which when divided into a total of 13,779 years, furnished an average generation gap of 33.2 years, i.e. the number of years between the death of a father and that of his son / successor ... By my calculations, adding sixty-six years to the date of death of the eponym will give an approximate date for the death of his grandson, and at any time subsequent to that, in his great-grandson’s era, one could expect the surname to have come into use ... (Ó Murchada p30)

Adding 66 years to the year of death of the progenitor to arrive at the approximate date of surname introduction seems like a reasonable approach. Woulfe takes a similar strategy by adding 60 years to the date when the progenitor flourished, died or was slain (see below). [1] This approach is supported by specific examples from the list of surnames.

The earliest surname (O Clery) is probably the first fixed surname to be used in Europe. The originator died in 858 AD and the first record of its use is 58 years later in 916 AD in the Annals of the Four Masters. A second occurrence of this fixed surname occurs 34 years later in 950 AD. [1]

Woulfe gives other examples of specific dates when several fixed surnames were first mentioned in the ancient texts, illustrating that inherited fixed patronymic surnames were well-established before the turn of the first millennium. [1]

Ó Canannáin (O'Cannon) ... 941 (he "flourished" in 950 AD so this surname appears to have been introduced during the lifetime of the progenitor. In the ancient texts, "flourished" simply means lived, indicated by the Latin word floruit, or the abbreviation fl.)
Ua Néill (O'Neill) ... 943 (the progenitor was slain in 919)
Ua Ruairc (O'Rourke) of Breifney ... 952 (progenitor died in 893)
Ua Ciardha (O'Keary) of Cairbre ... 952
Mag Aongusa (Maguiness) ... 956
Ó Maoldoraidh (O'Muldory) of Tirconnaill ... <999 (progenitor flourished in 870)
Ó Dubhda (O'Dowd) of Tireragh ... <999 (progenitor flourished in 876)
Ó Ceallaigh (O'Kelly) of Ui Maine ... <999 (progenitor flourished in 874)

This data also disproves the mistaken belief that surnames were introduced by Royal Decree during the reign of Brian Boru (1002-1014). In fact the O'Brien surname (which derived from Brian's name) did not become a fixed inherited surname until the time of his grandsons. [1]

Note that it was not uncommon for the same surname to arise in different places and hence the clan territory is frequently added after the surname as a qualifier. For example, we have the O'Donnell clan of Corca Bhaiscinn, another of the same name in Ui Maine, and a third in Tirconnell. And there was also the O'Conor clan of Connacht, another in Corcomruadh, and a third clan in Offaly.

So, adding up the various dates of surname introduction in Ó Murchada's article and dividing by 78 gives an average date of 1072 AD for the introduction of surnames in Ireland. As we can see in the bar chart below, the majority of the surnames were introduced between 950 and 1150 AD (59/78 = 76%), with the time period 1000-1049 being the busiest for surname introduction.

Estimated dates of surname introduction for 78 Irish surnames - data extracted from O'Murchada 1999 [2]

A similar pattern is seen with the data extracted from Woulfe. [1] In the Introduction to the hard copy of his Irish Names and Surnames (page xvi), Woulfe lists 46 surnames that had previously been compiled by O'Donovan and extracted mainly from from the Annals of Ulster and the Annals of the Four Masters. The list (see Footnote 2 below) includes dates when the various progenitors flourished, died or were slain.

Woulfe states that the date when the surname became fixed "cannot have been more than 60 years from the period when the ancestor flourished or died". However, in an attempt to standardise the dates, I have added an additional 10 years to any dates when they "flourished" (to give an approximate date of death or manslaughter) and then added 60 years on top of this. This gives an average of 66 years between the death of the progenitor and the introduction of the surname ... which is the same interval employed by Ó Murchada above.

Based on these adjustments, the average year for the introduction of these 46 surnames was 1048 AD. The earliest was 920 AD and the latest was 1350 AD. The largest number of surnames appeared in the period between 1000-1049 AD with 83% (38/46) of surnames being introduced between 950-1150 AD.

Estimated dates of surname introduction for 46 Irish surnames - data extracted from Woulfe 1923 [1]
(based in turn on O'Donovan)

Combining the two datasets and removing any duplicates (or likely duplicates) produced 109 distinct surnames (see Excel spreadsheet here). The average year of introduction of surnames was 1068 AD (range 920-1350) but the busiest period for surname introduction was again 1000-1049 AD. The majority of surnames (84/109= 77%) were introduced in the period between 950-1150 AD.

The proportion of surnames introduced in successive centuries is as follows:

10th Century ... 24/109 = 22%
11th Century ... 50/109 = 46%
12th Century ... 26/109 = 24%
13th Century ... 7/109 = 6%
14th Century ... 2/109 = 2%

Estimated dates of surname introduction for 109 Irish surnames - combined dataset

In the combined dataset, surnames prefixed with O or Ó arose between 920 and 1193 AD, and surnames prefixed with Mac between 955 and 1350 AD (as illustrated in the bar chart below). The prefix Mac means "son of" and O or Ó means "grandson of" or alternatively "descendant of".

Interestingly, O surnames were introduced about 124 years earlier than Mac surnames (1032 vs 1155 AD for average dates of introduction). Woulfe states that the creation of surnames with the O or Ó prefix "had almost certainly ceased" prior to the Norman Invasion, and that surnames that arose thereafter were primarily of the Mac variety. This is almost true - in the combined dataset, only two O surnames arose after 1169 AD (the start of the Norman Conquest); one in 1170 (O'Shaughnessy) and another in 1193 (O'Growney) - see Excel spreadsheet here.

Estimated dates of emergence of O and Mac surnames - combined dataset

This data paints a very clear picture of the emergence of surnames in Ireland.

But from a genetic genealogy perspective, why is this important? Knowing when a particular Irish surname was first introduced will be of particular use to Surname Project Administrators as it identifies a maximum age for the genetic group that is presumed to be descended from that particular surname founder. This upper age limit can help constrain the date calculations for the various branches in this portion of the Tree of Mankind, including those branches below the overarching SNP for the surname in question, as well as the adjacent branches of different but genetically-related surnames.

However, the data presented above does not answer the questions: when did surnames in Ireland become commonplace? when were they adopted by the majority of the population?

And that is a topic for a subsequent article.

Maurice Gleeson

April 2020

Sources

[1] Woulfe, Patrick. Sloinnte Gaedheal is Gall: Irish Names and Surnames (1923 Dublin), pages xv-xx. Online version available at http://www.libraryireland.com/names/of/o-fearghail.php

[2] The Formation of Gaelic Surnames in Ireland: Choosing the Eponyms by Diarmuid Ó Murchadha, Locus Project, University College, Cork. Nomina (1999) - available at http://www.snsbi.org.uk/Nomina_articles/Nomina_22_OMurchadha.pdf

Footnotes
1) 78 surnames from Ó Murchadha [1] above ...

2) 46 surnames from Woulfe [2] above ...

Wednesday, 4 March 2020

What are the most common mtDNA subclades in Ireland?

I have been co-administrator on the Ireland mtDNA Project for several years, along with project founder Katherine Borges. When I did this analysis (July 2019) there were over 2500 people in the project. That figure is now closer to 3000.

Project members are self-selected - in other words, they believe they have Irish ancestry on their direct female line (mother's mother's mother's line). This has certain important implications and limitations. Firstly, these results may be a truer reflection of the subclade prevalence among the Irish diaspora rather than the present-day population in Ireland itself. Secondly, biases due to founder effects among the diaspora population are likely to have introduced a certain amount of skew into these results. And as a consequence, these results are not likely to be a true reflection of the prevalence of these subclades in either modern Ireland or ancient Ireland (say 2000 years ago) ... but they may be close (more to the former than the latter). Furthermore, the sheer weight of numbers in this analysis is likely to minimise any "outliers" and reduce the influence of any biases (without removing them completely). In other words, this is probably close to what we would get with a Real World sample (if we were to take one).

The Eupedia website includes a simplified version of the mtDNA Haplotree (below). This can be considered to be the Tree of Womankind because mtDNA allows us to track back in time along the direct female line (mother, mother, mother, etc). More fine-detailed versions of the mtDNA haplotree are maintained on the FTDNA website and the phylotree.org website. And more detailed information about mitochondrial DNA can be found on Wikipedia and the ISOGG wiki.

The mtDNA-Haplotree = the Tree of Womankind

I sorted the project data by major haplogroup and subclade, and then added up the numbers for each major subclade in an Excel spreadsheet. Below is the breakdown by major haplogroup. Haplogroup H is the most common of the haplogroups, accounting for 40% of the members in the project. A further 50% is fairly evenly distributed between 5 haplogroups - U 12%, T 10%, K 10%, J 10% and I 8%. The remaining 10% of participants belong to V (4%), W, X & HV (all 2%) with remainder being L (1%) and a small number from A, B, C, D, N & R.

Haplogroup H is the largest group within the project. Here is a further breakdown by subgroup.

Maciamo Hay has compiled a frequency distribution of mtDNA haplogroups for Europe on the Eupedia website. This analysis appears to have been performed in late 2014 / early 2015. Here are the frequencies for the UK & Ireland (this is based on the list of sources at the end of this article).

In the Eupedia sample, the frequencies for Ireland are based on only 299 subjects (only about 12% of the size of our Ireland mtDNA sample from July 2019). The results are broadly similar to what we see in the Ireland mtDNA Project but of note I is only 3% (vs 8%), V is incorporated into HV0, and X is only 0.7% (vs 2%). Using the same categories as those on the Eupedia website, we get the following comparative table ...

Comparison of the data from Eupedia & Ireland mtDNA Project

The values for each category are broadly in agreement. Again, the only notable difference is for Haplogroup I - this is 7.9% of the total population in the Ireland mtDNA Project sample but accounts for only 3% of the total numbers in the Eupedia analysis.

Comparing the Irish results to those in England, Scotland & Wales, there is very little difference between the various countries, which is not surprising given the similar origins of the peoples of these islands, and the relatively slower rate of mutation of mitochondrial DNA compared to Y-DNA. The only notable differences are the slightly higher rate of K (10-12% vs 7-8%), and possibly U (0.3-0.5% vs up to 2.7%).

We can see how these proportions compare to the prevalence of mtDNA subclades in other European countries from additional Eupedia analyses. It is important to appreciate that when we look at modern day geographic distributions of the various haplogroups (and their subclades) that we are only looking at the survivors - it is likely that many subgroups died out over time or are very poorly represented in present-day populations.

An example of present-day geographic distributions (from Eupedia)

The Eupedia website also includes an analysis of each of the individual haplogroups and their major subclades. Here are the analyses for the most common Irish haplogroups:

Haplogroup H - most prevalent in Western Europe
Haplogroup J - most prevalent in Saudi Arabia, but with hotspots in Wales & Cornwall
Haplogroup K - widespread in Europe, with various hotspots (including Ireland)
Haplogroup T - two main branches, widespread distribution in Europe with various hotspots
Haplogroup I - fairly ubiquitous in Europe with hotspots in western Ireland and elsewhere
Haplogroup U5 - most prevalent in Scandinavia

The prevalence of each of these Haplogroups can be broken down further by the various subclades (i.e. subgroups). Below is a tabular summary of the various subclades and their frequency in the Ireland mtDNA Project (click to enlarge).

mtDNA haplogroup subclades in the Ireland mtDNA Project (n=2506, July 2019)
(click to enlarge)

Taken with the results of mtDNA from ancient samples, mtDNA data from present-day populations continues to inform us about the migration of Woman out of Africa and how She populated the rest of the world. As more people undertake mtDNA testing and join the Ireland mtDNA Project (and similar projects), more data will be available on the more downstream subgroups, and a more granular, fine-detailed picture will begin to emerge of the migration of women from Europe and into Ireland (since the Last Ice Age receded some 12,000 years ago).

Tree of Womankind with present-day geographic locations of the major branches
(from Wikipedia)

Maurice Gleeson

March 2020

Eupedia Sources of mtDNA frequencies for UK & Ireland

Thursday, 23 January 2020

Some Comments on the General Scheme of a "Certain Institutional Burials (Authorised Interventions) Bill 2019"

On 10th Dec 2019, Minister Kathrine Zappone published some proposed legislation in relation to the excavation, identification and re-interment of the 800 or so children that could be buried in the disused pit on the site of the former Mothers & Babies Home in Tuam. There are several interesting aspects to this proposed legislation:

the identification programme is only open to members of the public who believe that they may be the parent, child, sibling or half-sibling of the deceased children and can prove that they have reasonable grounds to believe so
it does not apply to burial sites where the last burial occurred before 1950
there is no mention of genetic genealogy within the proposed legislation
members of the public can make Submissions regarding the proposed law by Friday 24th Jan 2020 (tomorrow).

I'll address these key issues below, but for those of you who may not be familiar with the story, here is a brief summary:

In 2013, a local historian (Catherine Corless) broke the news that there may be up to 800 children buried in a disused sewage pit on the site of the former Mothers & Babies Home in Tuam, Co. Galway.
She had found death records for 796 children but burial records for only 2 of them. This raised the question: where were the rest of the children buried?
It had been known locally since the 1970s that there were skeletal remains of children in a pit that had been on the site of the Home - could this be where some or all of them were buried? or was it possible that some death records were fake and that some of the "dead" children had in fact been trafficked or sold to childless American couples.
In 2017, a Commission of Inquiry confirmed that there were skeletal remains in the pit. The Minister committed to bring forth legislation to allow for the excavation of the site and the identification of the children therein. The proposed legislation was published on 10th Dec 2019.
There was a higher death rate in the Tuam Home compared to other similar institutions and there have been calls for further investigation and the possibility of criminal negligence has been raised.

The proposed legislation is 89 pages in length and not too difficult to read. It allows for an "Agency" to be set up to supervise the excavation, exhumation, identification, and re-interment of any human remains found on the sites of former Mother & Baby Homes (including that at Tuam). The criteria for intervention include "manifestly inappropriate" burials that "would not reasonably be considered to be a dignified interment" and which are buried "in a manner or in a location that is repugnant to common decency".

Interestingly, the new law does not cover mass grave sites where the last burial was prior to 1950 ...

This could have implications for any Mother & Baby Homes that closed their doors prior to 1950. These Homes would not be subject to the proposed law and therefore no legal route for excavations at these Homes would exist. They would remain in legal limbo.

The sections of particular relevance to genetic genealogy are mainly in Part 6: Provision for Identification of Deceased Persons (from page 60 onwards). Whilst genetic genealogy is not specifically mentioned in the proposed legislation, neither is it ruled out. In fact, there may even be provision for it within the wording of the current text which states that one of the functions of the Agency could include ...

Part 6 refers to "familial matching" which is a term usually used in reference to standard forensic autosomal STR analysis. There is no specific inclusion of genetic genealogy. Similarly it is not specifically excluded. And in addition, the wording says that it "includes" comparison for familial matching, implying that other methods might also be used.

The legislation allows for the establishment of a "DNA (Historic Remains) Database" by Forensic Science Ireland (FSI, the Irish government's forensic lab). The database consists of 4 types of DNA profile:

DNA from the unidentified human remains (e.g. those in the pit at Tuam)
DNA from permitted family members
DNA from two groups of people ("Agency" & "Prescribed Persons") working on the project (i.e. this is a standard elimination database to identify possible contamination of samples by people working in the lab, etc)

Comparisons will be made between the DNA Profile generated for each set of unidentified human remains and every other profile in the database. Thus each DNA Profile from the mass grave will be compared against:

every other profile from the mass grave (to see if the children are related)
all the DNA profiles contributed by permitted family members
all the DNA profiles in the elimination databases ("Agency" & "Prescribed Persons")

The wording of the objectives of the identification programme is particularly interesting. The aim is to allow family members to be informed that an identification has been made and to allow them to decide what to do with the remains. Thus the objective is primarily focussed on the immediate family of the deceased person and less on the right of the deceased person to have their identity restored in death. This raises the question: as Irish citizens, do all the children buried in the mass grave at Tuam have the right to have their name on their gravestone?

Nowhere does the proposed legislation discuss what happens to the ones that cannot be identified (e.g. no DNA Profile obtained). Or what happens to the ones who can potentially be identified (e.g. excellent DNA Profile) but for whom no family has come forward.

Perhaps the most contentious aspect of the proposed legislation is who it allows to take part in the comparison database. All potential family members must be vetted. No one is automatically allowed in to the family part of the database. And those permitted to take part are restricted to immediate family members (i.e. parent, child, sibling, half-sibling). Aunts, uncles, nephews and nieces are all excluded as are 1st cousins and anyone more distantly related.

Furthermore, the evidence you need to prove that you can take part will be specified by the Director of the Agency ...

Appeals can be made to an Adjudicator (p85) but will the answer be "no" to other family members?

The section entitled "Head 66" appears to allow for non-immediate family to be involved but I am unsure of this interpretation - it may be that it only relates to samples given by permitted family members before the legislation was introduced who have subsequently died. It states that a sample from a person who "reasonably believed that he or she was closely related" to a person in the mass grave would be allowed into the database but there is no definition of what they mean by "closely related" - does it extend to nephews and nieces? The situation is complicated by a missing page in the document (i.e. page 71 which contains "Head 54 - Taking of Samples from family members to generate DNA Profiles").

Given the range of dates when the Tuam children died (1925-1961) there is unlikely to be many / any living parents, there may be some siblings, but the larger proportion of surviving family members will be more distantly related (e.g. nephews and nieces). It appears that these relatives will not be allowed in the database. Is this ethical?

It may be that the reason why the range of permitted family members is restricted to parents, children, siblings and half-siblings is because this is similar to the range of relationships that standard forensic tests can detect. Genetic genealogy tests can detect a much broader range of relationships, including 2nd, 3rd and 4th cousins. If these tests were to be used, the range of permitted family members could be extended much further ... and this would raise the ethical question: what range of family members should be included? Should anyone with Irish ancestry be allowed to be included? Should identification extend to all the children or just the ones whose family members come forward?

Familial Identification (p73) will be made on the balance of probabilities and the outcome of comparisons will describe one of several possible outcomes including:

a strong likelihood of a familial link consistent with the relationship to the permitted family member
moderate likelihood of a familial link
weak familial match
no familial match

... but how are these likelihoods defined in practice. This is unclear. And will Genetic Genealogy techniques be applied in doubtful cases? Appeals against the likelihood decision can also be made to an Adjudicator (p85).

In addition, the above wording suggests that genetic genealogy will not be employed in the identification process, as this sort of wording would not arise with standard genetic genealogy tests (e.g. autosomal SNP array). If genetic genealogy is not to be employed, this could have significant implications for the chances of identifying sets of remains.

The decision whether or not to proceed with a full excavation, exhumation and identification of the mass grave hinges on the results of a Pilot Programme and will be made by the Director of the Agency in charge of that particular mass grave (be it Tuam or Bessborough or wherever). The decision largely depends on the proportion of samples tested that generate a reasonable DNA Profile for comparison purposes. But this raises several pertinent questions:

what type of DNA Profiles will be generated as part of the Pilot Programme??
what would be considered a reasonable DNA profile?
what proportion of reasonable profiles would be needed for a Go decision? 30%? 50%? 70%
if genetic genealogy tests are not performed in the Pilot Programme, this could detrimentally influence the Go / No Go decision

Destruction of samples is also a potential problem. Destruction of samples from permitted family members is set to occur 3 months after a DNA Profile has been generated ... but what kind of profile? autosomal STR? autosomal SNP array? WGS? It might be better to store the sample in case more comprehensive testing is indicated at a later point in time ... or until alternative family members come forward.

There is no mention of what will happen to samples from the unidentified human remains. Will these be stored? Or will they be destroyed? And under what circumstances?

If you want to make a submission regarding this proposed legislation, the closing date is tomorrow 24th Jan 2020. The draft legislation can be found here and you will find full instructions for making a submission on the Irish government's website here.

Maurice Gleeson

Jan 2020

Wednesday, 21 August 2019

Getting the most from your new Big Y-700 results

The Big Y test changed to a completely new technology earlier this year. It now covers 50% more of the Y chromosome than previously. And so it is anticipated that the new test will discover additional SNP markers that the old technology did not detect. Furthermore, the new SNPs should be able to more accurately date the various branching points on the Tree of Mankind.

It also gives us approximately 700 STR markers whereas the previous test only gave approximately 500 STRs. As a result, the old test is called the Big Y-500 and the new one is called the Big Y-700. Going forward, all new Big Y orders will use this new technology.

For those who did the old test, it is possible to upgrade from the Big Y-500 to the Big Y-700. But for everyone who does the new test, or upgrades from the old version to the new version, it is essential that you upload a copy of your results to the Big Tree so that we can get some essential additional analyses. You will find instructions for doing so on the Big Tree website here and on the Y-DNA Data Warehouse website here but I include a briefer summary below.

What do you get from your Results?

Your results should be analysed within a week or two and you can check them by navigating to your particular portion of the Big Tree. For members of Ryan Group 2 (for example), their Terminal SNP is M756 and you will find this branch on the Big Tree here (see screenshot below). The diagram nicely illustrates their placement on the Tree of Mankind and the surnames of the people sitting on neighbouring branches to their own. This information can be very useful for determining the geographic origins of your particular direct male line and for determining if your name is associated with an Ancient Irish Clan.

Project Administrators can use programmes like the SAPP tool to generate Mutation History Trees and determine the likely branching structure of your particular "genetic family" from the time of surname origins up to the present day. This process can also help identify which Ryan's (for example) are more closely related to each other and which are more distantly related. It is also possible to date the branching points within the Mutation History Tree using SNP data as well as STR data. This process is likely to become more accurate with the advent of the new Big Y-700 data and the identification of new SNPs. It is anticipated that the new data will reduce the number of "years per SNP" from about 130 to about 80 years per SNP. You can read more about this here.

You can also click on your surname above your kit number for an analysis of your Unique / Private SNPs. These may prove useful in the future for defining new downstream branches in the Mutation History Tree and for dating new branching points. But this very much depends on new people joining the project and undertaking Big Y-700 testing (so that we can compare apples with apples). And as this is a new test, it is likely that we will have to wait some time before we begin to see real benefits from it.

Creating a Link to your Big Y results

In order to create a downloadable link to your Big Y results, first log in to your FTDNA account and go to your Big Y Results page ...

Then click on the blue Download Raw Data button ...

Then you need to create a link to two separate files - your VCF file and your BAM file. The VCF file is used for placing you on The Big Tree. The BAM file is used for high-end technical analysis by the folks at the Y-DNA Data Warehouse. You can see some of the results so far on their Coverage Page here (and if you like you can search for kits by surname, including your own).

1) to create a link to your VCF file, right click on the green Download VCF button, and then click on "Copy link" from the drop-down menu. You will later paste this link into the the "Download URL" box on the Submission Form.

Alternatively you can simply (left) click on the green Download VCF button and this downloads a 10 MB file to your computer. This can then be directly uploaded via the Submission Form below. However it is preferable (and less problematic) to generate a link instead.

2) to create a link to your BAM file, click on the green Generate BAM button. You will then get a message that "Your Big Y BAM file is currently being generated" (see below). This generates a very large BAM file ... but it takes several days to prepare so you will have to come back to this page in a few days time! Put a reminder in your diary / calendar!

Uploading your VCF file

Having created the first link (to your VCF file) and copied it, click here to go to the Y-DNA Data Warehouse and fill in the form with your standard information - email, kit number, surname of your paternal MDKA (Most Distant Known Ancestor), and (most importantly) the link to your file - you do this by pasting the link you copied earlier into the "Download URL" box underneath the heading "Raw Data Upload" at the bottom of the page.

If you want to upload the actual file itself (rather than a link), click on the blue Direct tab under "Raw Data Upload" and then click on the "Choose File" button and attach the file from where you downloaded it onto your computer (on my laptop, the "Choose File" button appears to be slightly hidden under some text but it works if you click on the start of the text).

Don't forget to tick the checkbox to confirm you agree with the Data Policy and then click the blue Submit button.

Uploading your BAM file

Several days later, come back to this same place to get a link to your newly generated BAM file. So, navigate to your Big Y Results page, and after clicking on the blue Download Raw Data button, you will find that the BAM file has been generated. DO NOT DOWNLOAD IT - you don't need to and it is way too big. Instead, click on the green Share BAM button and then the green Copy button in order to copy a link to your BAM file. You will share this link in the next step.

Then go to the Y-DNA Data Warehouse and fill in the same form as before BUT ...

select Other for the Testing Lab
enter your Kit ID Number
leave everything else on its default setting
paste the link to the BAM file in the "Download URL" box underneath the heading "Raw Data Upload"
tick the checkbox to confirm you agree with the Data Policy and then click the blue Submit button

Maurice Gleeson

Aug 2019

Wednesday, 10 July 2019

Optimising your Anonymity & Privacy with DNA tests

Here are some practical hints and tips to optimise your Privacy if you are thinking of doing a DNA test (or you have already done one).

1) Don’t test!

This is the simplest way to avoid exposing your self to potential online scrutiny and unwanted intrusion from others. If you are not sure whether you should do a DNA test or not, do yourself a favour and don't test. You will only worry about it if you do.

2) Get your brother to do it instead

Some people are less concerned about privacy than others ... so if this is how one of your siblings feels, why not ask them to test instead? One person I know did this and everyone was happy. Win-win.

3) Don't use your Real Name

You are not obliged to use your real name. You can use whatever name you want. I don't recommend using "Clint Eastwood" (unless you want unlimited fan-mail) - much better to use something completely nondescript like John Williams or Jane Jones.

Genealogically it makes sense to use your surname (as this will help with any genealogical research) but again, it's not essential. You can just as easily use an alias, a pseudonym, or a nom de plume. Or even a sequence of letters & numbers … FYL227 has a particular ring to it.

A cunning disguise will fool most people
(this is obviously Groucho Marx in a wig)

4) Disguise your Personal Information

Similar to above, you are under no obligation to use your real date of birth. Now is the perfect opportunity to take 10 years off your age. I did and I feel so much better.

You could also create a bespoke, untraceable email address just for your DNA tests. It's easy to set one up on Gmail and have any messages directed to your inbox. I believe 1234567@gmail.com is already taken but something similar would work just as well. It would be extremely difficult to identify you from a seemingly random combination of letters and numbers.

Only give the minimum amount of information necessary. I don't bother with my postal address or telephone number. If they can't reach me by email then I am probably on a retreat to the North Pole and they are unlikely to reach me by snail mail or telephone either.

5) Privatise your DNA account
All the testing companies allow you the option to make your results completely private. For some, this means that your matches cannot see you, but you cannot see them either. And this seems like it might defeat the purpose of doing the test in the first place, but not so! You can de-privatise your results when you want to work on them, and re-privatise them when you have finished. This minimises the amount of time you are "exposed to public view" by your matches.

Here is how to privatise your DNA matches on the various websites ...

Ancestry: go to Your DNA Results Summary, click on Settings, then scroll down to Visibility & Sharing, click on DNA Matches, tick the Off button, and click Save. To reverse this process, tick the On button, and click Save. Once you have privatised your DNA matches, they cannot see you and you cannot see them.
23andMe: click on your name or icon in the top right, click on Settings, scroll down to Privacy / Sharing, click on the Edit button, scroll down to DNA Relatives and click on Manage your Preferences, then click on "I would like to stop participating in DNA Relatives". Then click the Finish button. Once you have privatised your DNA matches, they cannot see you and you cannot see them.
MyHeritage: click on your name in the menu bar at the top, then click on My Privacy, then click on My DNA Preferences, then select the DNA kit you wish to customise (from the drop-down menu), then untick the Enable DNA Matching box, and then click on Save. Once you have privatised your DNA matches, they cannot see you and you cannot see them.
FTDNA: click on your name in the menu bar at the top, then click on Account Settings, then click on Privacy & Sharing, and then under Matching Preferences, click on the button beside Opt in to Matching so that it switches to the Off position. Your changes are automatically saved. A pop-up box appears at the bottom of the page after about 10 seconds stating "Your selections have been saved". Once you have privatised your DNA matches, they cannot see you and you cannot see them.
LivingDNA: click on Profiles in the menu bar at the left, select your profile, scroll down to Family Matching, click on Opted In, tick the Opt Out button, and then click Save. Once you have privatised your DNA matches, they cannot see you and you cannot see them.
GEDmatch: on the Home Page, scroll down to Your DNA Resources and find your kit number. Click on the Edit icon to the right of your kit number. Scroll down to Public Profile, and under Change Access, tick the Research button and then the green Change button. This makes your kit private and no one can see you as a match, but you can still see all your DNA matches.

6) Privatise your Family Tree
Without a family tree attached to them, DNA results are relatively useless. You could show up as a close "2nd cousin match" to someone else but if you haven't supplied any family tree information, it can be very difficult for them to figure out how you fit in to their tree.

Keeping your family tree private is as effective as keeping your DNA results hidden (if not moreso).

7) Delete your DNA account

If you have finished working with them, you could delete your results completely. This works really well if you have transferred your results to a particular website from another company - you can always keep the original results on the website you initially tested with and re-upload them again at any time.

Similarly, you can delete your kit from any website and have your sample destroyed.

So there are ways and means of finding the level of privacy and security that you personally feel comfortable with. Can you think of any others? Leave a comment below.

Have fun! Play safe!

Maurice Gleeson

July 2019

updated Sep 2023

Tuesday, 9 July 2019

Irish Mother finds her son ... 60 years later

When it came time for her to deliver, she was taken into a room and put to sleep. When she woke up, the large bump of her pregnancy was gone, and so was her child. For the past 60 years she has always wondered if it was a boy or a girl - they wouldn't tell her.

Now, 60 years later, thanks to DNA, she knows. It's a boy.

There are many people in Ireland searching for their birth family. Some are adoptees, some are foundlings, some are people who were raised in industrial schools, some of whom were boarded out. Over the past few years, many of these people have turned to DNA for help, and these numbers are increasing all the time as the success stories of people finding family through DNA are becoming more widespread.

But it's not just the children that are searching for their families, it's the parents too. I have been working with several birth mothers (in their 70s and 80s) who are trying to locate the child that was taken away from them many decades beforehand. Many tell a similar story, like the one at the top of this article. They had little control over what happened to them. Decisions were made for them. And they were left with little or no information about the child they gave birth to, not even what gender it was.

I am delighted to announce that one of my clients (the woman above) has finally reconnected with her son. She gave up her child 60 odd years ago, and it only took 12 months for DNA to find him. She tested with Ancestry and then uploaded her data to FamilyTreeDNA, MyHeritage & Gedmatch (the recommended approach).

Now comes the next step in their journey - getting to know each other, building bridges, putting the past in the past, and moving into the future. This is a slow process that will take a lot of work on both sides.

Any birth parent who wants to find and contact their child should first seek advice from the Adoption Authority of Ireland (AAI). They can help you sign up to the National Adoption Contact Preference Register (application form here, and Frequently Asked Questions here) and help you to contact the Agency who placed your child for adoption. You can email the AAI at tracing@aai.gov.ie. This should be your first port of call before turning to DNA.

If tracing using the first-line method above is unsuccessful, then you can consider DNA testing. The recommended approach is to test with Ancestry, and then upload a copy of the results to MyHeritage, FamilyTreeDNA, LivingDNA and Gedmatch. If this is unsuccessful, you should also test with 23andMe. If this is still unsuccessful, then it becomes a waiting game. You are hoping that some time soon your child or one of their children will do a DNA test and pop up in one of the databases as your closest match.

When they do, the connection may be instantaneous and things may move very quickly indeed so be prepared - think about what you want to tell them, think about the sort of questions they may ask you, write it all down, and put it in a letter (or two) that you can post or email to your child.

For most people, reconnection is an emotional rollercoaster. It is best to have professional help on hand in case you need it. Take things slowly. You will need time to process your feelings. So will the other person and their family. Be kind to yourself and to others.

Further information can be found in an earlier blog post here. For ways of optimising your Privacy with DNA tests, read this post here.

My thanks go to Ancestry who provided free DNA kits to help with this research.

Maurice Gleeson

July 2019