Tuesday, 23 February 2021

Honorary Research Fellow

I have been privileged to have been appointed recently as Honorary Research Fellow with the University of Strathclyde. I am looking forward to working with the team and advancing academic research into genetic genealogy.

The University is located in the heart of Glasgow and last year won Scottish University of the Year. I'll be working within the Department of Genealogical Studies which is part of the Centre for Lifelong Learning which itself is part of the Faculty of Humanities & Social Science.

Many of you will be familiar with the Department of Genealogical Studies as it runs several popular Postgraduate Courses, including the MSc in Genealogical Studies. These courses are organised by Course Director Tahitia McCabe and she presents an overview of what is on offer in this video here.

Genetic Genealogy is strongly represented among the courses offered and the team includes some familiar names including Graham Holton (Principal Tutor), Alasdair Macdonald and Dr Iain McDonald (Honorary Research Fellow), all of whom have spoken at Genetic Genealogy Ireland or Family Tree Live.

Research undertaken by the team includes the following:

Battle of Bannockburn Family History Project
In June 1314, the Battle of Bannockburn saw the Scots (under Robert the Bruce) beat the English (under Edward II). The project was established to mark the 700th anniversary of the battle and aimed to highlight some of the principle participants. The project has both a genealogical strand (which includes biographical information and 4-generation genealogies) and a genetic genealogy component, which utilises Y-DNA of living descendants to identify the genetic signature of the combatants.

Declaration of Arbroath Family History Project
Over 40 Scottish nobles appended their signature or seal to the Declaration of Arbroath in 1320, asserting the independence of Scotland as a separate sovereign nation. This project also has both genealogical and genetic strands and aims to build the genealogies of each of the Scottish nobles and connect people living today with these signatories of the Declaration. Further information of the project can be found here. The output of this project will form the basis of a touring exhibition throughout Scotland. It is hoped that an online version of this exhibition will also be available.


The announcement of my appointment

Maurice Gleeson
Feb 2021


Thursday, 28 January 2021

How to group Surname Project members

There are several important questions that face Project Administrators of Surname DNA Projects:

  1. Why should I group people together?
  2. How should I group people together?
  3. What does each group tell me?

As an Administrator of 15 DNA Projects for a variety of Irish Surnames, I have pondered these issues, explored different alternatives, fallen down rabbit-holes, and revised my thinking. So here is my current streamlined approach - no doubt it will evolve further as time goes by. These are just my own personal musings - other admins may differ in their approach (and that's fine - there is no right way or wrong way to run a project). And the discussion below applies only to Surname DNA Projects - other DNA Projects will have different reasons for grouping and therefore alternative grouping strategies.

I offer these thoughts and ideas so that project members may get a better understanding of the thinking behind the process of grouping people, and so that project administrators might pick up a few useful tips - please take what you like and discard the rest. 

So let's explore each of these topics in turn.

Why should I group people together?

For me, the purpose of a Surname DNA Project is to study the surname. That may seem obvious but it has several important implications.

Firstly, fixed inherited surnames arose in Ireland about 1000 years ago and in England & Scotland about 800 years ago. Wales was a bit later still (with some parts of Wales not adopting the practice of a fixed inherited surname until the 1850s). This defines the period of study as being roughly the last 1000 years. And therefore, we should aim to create subgroups of people who are related to each other within that timeframe.

For Irish and Scottish surnames at least, anything beyond 1000 years ago steps into the realm of Clan history, and that in itself is a fascinating area of research, but one that falls more under the remit of geographic projects (e.g. the Munster Irish project), haplogroup projects (e.g. R-L226 project), or even specific clan projects (e.g. Ancient Breifne Clans project). 

So for surname projects, we should be aiming to identify groups of related people, with the same surname, who are likely to be related to each other within the last 1000 years. Such groups are likely to descend from a single individual who was the progenitor of the surname for that group.

And if we are lucky, we may be able to make a case for having identified the genetic signature of the first person to bear the name 1000 years ago. For Irish surnames, we may even be able to link this to some of the Traditional Genealogies and therefore to a specific Irish clan, thus connecting project members with a much deeper part of their ancestral heritage.


How should I group people together?

Some years ago I developed the concept of Markers of Potential Relatedness (MPR). Simply said, these are markers that point you toward the conclusion that two or more people are related to each other. And by "related" I mean within the last 1000 years.

These Markers of Potential Relatedness help us to identify people who may be related within the last 1000 years and who therefore belong within the same subgroup. 

You can see a presentation that takes a deep dive into this concept in this video here, but the most useful MPRs in practice (and the main ones I use for grouping people together) are as follows:

  1. a known relationship
  2. same downstream SNP
  3. close Genetic Distance to people with the same surname
  4. same USP (Unique STR Pattern)

Let's go through each in turn.

A Known Relationship

The first one is obvious - if two people have a known relationship, then clearly they are "related within the last 1000 years" and therefore belong in the same group. Some people may not know that they are related (e.g. 4th cousins) but have the same common ancestor showing up in the "Paternal Ancestor Name" column on the project's Results Page.  A little communication between these project members can confirm the connection and justify their being grouped together.


Same downstream SNP

If two people share the same "downstream" SNP (i.e. close to 1000 years old or less), then I group them together, especially if they have the same surname. 

Rob Spencer's Admin Utilities tool is a great way of seeing exactly where a particular SNP sits and what SNPs sit above it. Entering any SNP will generate the SNP Sequence for that SNP.


TMRCA dates for downstream SNPs can be checked by simply googling the SNP name and YFULL.

People with the same downstream SNP but a different surname may be an indication that a Surname Switch has happened at some point in the past - the trouble is that without other information, you won't know on whose ancestral line the switch occurred.  Then you are faced with the classic question: which came first? - the Fry chicken or the Boylan egg?

The Big Y test gives much more definitive data than SNP Packs or single SNP tests and is my preferred (and recommended) method of SNP-testing.

Close Genetic Distance to people with the same surname

When a new member joins one of my projects, the first thing I do is check whether or not he has the surname being studied (or one of its potential variants). I then check his Y-STR Matches and see if he matches any other project members - if he does, I assign him to the same group that they are in. I will also double check that any downstream SNP data he has is consistent with the SNP results of other members of that group. And I may also check to see if he shares any Unique STR Pattern that characterises that particular group (see below).

Much of the time this criterion is perfectly fine for grouping people together, but we can run into major difficulties if there is significant Convergence present i.e. just by chance, the genetic profile of a person is similar to the profile of many other "non-related" people. This has been a significant issue with the M222 groups in some of my projects. 

You can recognise when Convergence is likely to be present by looking at the number of matches - if a project member has a huge number of matches to a wide variety of different surnames, then Convergence is likely and most of these would be "false positive" matches. Yes, he does share a common ancestor with every single match but this may be thousands of years ago rather than hundreds. In other words, the connection is a lot further back than it looks. And it may be well beyond the arbitrary 1000 year threshold we have set for defining subgroups.

In this situation, I would group everyone with the same surname (or variant) into the same large overarching group (call it, say, Group 3). All of these people may or may not be related within the last 1000 years.

Then within this large group, I would create subgroups (3a, 3b, etc) of people with known downstream SNP data that places them on a downstream branch of the Tree of Mankind close to our 1000 year threshold. I may look up the age of the SNP on YFULL to make sure the TMRCA date is roughly somewhere between 1000-1500 years ago.


Having created these SNP-defined subgroups, I would then add in non-SNP-tested individuals based on much more restrictive Genetic Distance criteria than those used for "declaring a match" i.e. 2/37, 4/67 and 5/111 as opposed to 4/37, 7/67 and 10/111. This approach minimises the risk of inappropriate grouping but does not get rid of it completely. Ultimately the only way of being sure that someone has been placed in the correct subgroup is for that person to do the Big Y test to identify their SNP profile. This is the recommended course of action for anyone who has not managed to make it into one of the SNP-defined subgroups.

If you are taking over a dormant project with many Ungrouped individuals, there is a helpful shortcut to the above process. First, download a csv spreadsheet of the results from the FTDNA Results Page and upload it to Chase Ashley's Y-DNA Grouping App. Next, click on the "See Reorganized Table" button and the app will automatically group all the people for you. These new groupings are usually fairly accurate but will need to be checked visually. Problems occur when any new groups are likely to be plagued by false positive matches due to Convergence (e.g. any M222 groups). In this case, create subgroups using only SNP-tested participants - all non-SNP-tested individuals can remain in the Ungrouped section. Then click on each of the Ungrouped members to see how closely they are related to each of the SNP-defined subgroups you have created. If they do not meet the more stringent Genetic Distance criteria defined previously (i.e. 2/37, 4/67, 5/111), then they can be left in the overarching group (e.g. Group 3) rather than one of its subgroups (3a, 3b, etc). If they meet the criteria for 2 or more of the subgroups, then similarly they can be left in the overarching group. But if they meet the criteria for only one subgroup (e.g. 3f), then they can be moved into that group. 

Below is an example from the Riley DNA Project. Everyone in the overarching Group 1 needs Big Y testing in order to be accurately grouped in one of the subgroups (1a to 1f in this example). 

Participants who do not meet the criteria for a subgroup are left in the overarching group
(Group 1 in this example)

A good example of this process in practice is from my O'Malley DNA Project. Many Mayo O'Malley's test positive for the M222 SNP marker. I placed them in Group 3 - a large overarching group for all M222+ O'Malley's. So far, downstream SNP testing has identified 6 subgroups below this. The common ancestor for all 6 subgroups lived about 2000 years ago (the TMRCA for the M222 SNP Block), and the common ancestor for each subgroup lived about 1000 years ago or less. You can read a detailed account of this specific example in this blog post here.

The common ancestor for each of the individual 6 subgroups is within the last 1000 years

Same USP (Unique STR Pattern)

When a group of people have the same value for several specific STR markers, this can indicate a specific "signature" for that particular group and anyone with the same signature can be deemed to be "related" and thus should be grouped with them. The number of STR markers that make up a Unique STR Pattern varies a lot, but the more markers involved, the more robust the USP.

USPs were easy to spot on the Results Pages of the old WorldFamilies.Net (WFN) website (sadly now defunct) and a similar scheme would be most welcome on FTDNA's Results Pages. The WFN website compared each group's genetic signature against the signature (modal haplotree) of an upstream branch of the Tree of Mankind and thus identified any USPs and presented them as coloured columns on their Results Pages. The coloured pattern in the diagram below beautifully portrays the Unique STR Pattern within different subgroups of the Gleason DNA Project. 

It is much more difficult to see USPs on the FTDNA pages because they are not highlighted in colour. You would need to use Dave Vance's SAPP Programme or Chase Ashley's Y-DNA Grouping App to highlight any USPs.


So those are the main methods I use for assigning project members to a specific group.

In addition, I have some general advice on formatting the name for each group:

  • number each group (01, 02, 03, etc) - it makes it easier to refer to when writing about it or discussing it with project members. 
  • include the possible ancestral location (this may be obvious from the MDKA information)
  • include the abbreviated SNP Sequence (get it from Rob Spencer's Admin Utilities)
  • include any specific guidance (e.g. if R-M269, upgrade to Big Y) or point members toward additional information (e.g. see Updates tab in About section for Next Steps) - this may include links to haplogroup, geographic & clan projects that they should join, as well as useful general information (e.g. how to get the most out of your Y-DNA test, essential things everyone should do).


What does each group tell me?

Far more has been written about how to group project members than about how to analyse the resultant groups. The grouping process only takes you half-way ... you then need to analyse each group in turn. If the overall objective of a Surname DNA Project is to study the surname, then grouping merely lays the foundation upon which subsequent analysis is based.

The sort of questions that can be explored in any analysis of a specific group include: where is the group from? does this link us to the known history of the surname? how old is the group? what is the branching structure? how did the name evolve over time? is there an association with a pre-surname clan?

A practical example of how to approach analysis of individual groups is detailed in this video here (delivered at the O'Malley Clan Gathering in 2019).


Having a clear picture of the desired outcomes of your research allows you to create more specific project goals. Thus the objectives for any surname study could include the following:

  • To identify distinct genetic groups of people carrying surname X (or one of its variants)
  • To analyse each genetic group and assess where did it come from, how old it is, and is there any connection to a pre-surname "clan"?
  • To communicate the conclusions of the analysis for each genetic group
  • To help focus project members on specific directions for their own ongoing genealogical research


After all this work, you will need an effective way of communicating it to your project members. Different admins use different methods. Some publish regular updates on the project website on FTDNA. Others create a separate website or blog or newsletter or annual report. Whatever method you choose, you should plan to keep your project members informed about the current status of the project and any new developments affecting specific groups. Also bear in mind that you will eventually need to pass this task on to a successor so it is wise to design your communication strategy with this in mind.

Hope you find something of use among these hints and tips.

Maurice Gleeson
Jan 2021





Saturday, 26 December 2020

The Big Y & Irish Clan Research

There has been increasing interest in Irish Clan Research as more fine-detailed Y-DNA data becomes available and the Irish branches of the Tree of Mankind have grown larger and sprouted more downstream branches. Soon we should be able to identify specific DNA signatures for particular Irish clans. This has already happened with the Uí Neill (O'Neill) in northwest Ireland and the Dál gCais (ancestors of Brian Boru) in Clare & Limerick. The discovery of the burial site of Red Hugh O'Donnell has also created a lot of media buzz (as discussed in a previous post).


The usual test to start with is the Y-DNA-37 test from FTDNA. Only men can do this because women do not have a Y chromosome. So you will need to find the appropriate male relative to test. The more upscale test is the Big Y test and this tells us very specifically on which branch of the Tree of Mankind you sit. This is the most valuable test for Irish Clan Research.

So if you want to help further this type of research, find your surname project (just google: FTDNA & your surname), discuss testing options with the Project Admin (their email will be on the project's home page), and decide whether you want to start with the Y-DNA-37 or go straight for the Big Y test. The tests are frequently discounted in the many sales that occur throughout the year (the screenshot below is from the Christmas Sale 2020).



Here is a detailed step-by-step breakdown of the various options ...

Step 1. Have you done a Y-DNA test at FTDNA already? 

If no, read below. If yes, go to the next step.

  1. go to the FTDNA website & scroll down until you get to the 3 test types (see screenshot above)
  2. choose the test you want and click on the ORDER NOW button. You may wish to start with the usual starter test (Y-DNA-37) or you may wish to go for broke and order the top of the line test (the Big Y-700). You may want to discuss the pros & cons with the relevant Project Administrator (see Step 2 below)
  3. after you have ordered the test, find the project you want to join (google: FTDNA & your surname), click on the JOIN button in the photo on the project's main page, and follow the instructions (the example below is from the O'Donnell DNA Project)



  4. If you have a coupon code for an additional discount off the price of your test, follow the instructions below ... 
a) click on “Coupon Code” when you get to the Shopping Cart screen…




b) enter the code in the Coupon Code box that appears




c) click on Apply and you will see the price drop ... then click on Proceed to Checkout




Step 2. Are you a member of the relevant Clan Project?

If no, read below. If yes, go to the next step.

  1. find the project you want to join (google: FTDNA & your surname), click on the JOIN button in the photo on the project's main page, and follow the instructions (the example above is from the O'Donnell DNA Project)
  2. if you have questions, email the Project Admin for advice & guidance. Their email is on the project's Main Page.

Step 3. Have you or one of your close relatives done the Big Y test?

If no, read below. If yes, go to the next step.

    1. Sign in to your FTDNA account
    2. click on ADD ONS & UPGRADES at the top right of your main page (see screenshot below)
    3. Scroll down to the Big Y-700 test and click on the ORDER NOW button
    4. If you have a coupon code for an additional discount off the price of your test, follow the instructions in Step 1.4 above 



    Step 4. Is your version of the Big Y test the Big Y-500?

      If no, read below. If yes, go to the next step.

        1. the Big Y-500 test was upgraded in Jan 2019 to the Big Y-700 test, which provides a lot more information than the previous version. There is a good blog post about it here.
        2. You can check which version of the Big Y test you have done by hovering over your name (top right) & clicking on ORDER HISTORY (see screenshot below).
        3. If you have only done the Big Y-500, discuss with your Project Administrator about the value of upgrading it to the Big Y-700 test. It may be helpful ... it may be not.
        4. If you have a coupon code for an additional discount off the price of your test, follow the instructions in Step 1.4 above 



        Step 5. Have you uploaded your Big Y data to the Y-DNA Warehouse?

          If no, read below. If yes, you're good! You have optimised the value you will get from your Big Y test.

          1. The Y-DNA Warehouse is a repository for Big Y data. It allows researchers to use the data to help advance many types of research, including Irish Clan Research. Appropriate safeguards are taken to protect your data and your privacy. Full details can be found in the Data Policy section here (scroll down to the end of the page).
          2. Instructions for uploading your Big Y data can be found here.

            Remember, the volunteer Project Administrators are a great source of information, so never hesitate to drop them an email with any questions you may have.

            With a bit of luck, you may find that you have a direct genetic connection to one of the major Irish Clans.

            Maurice Gleeson
            Dec 2020








            Tuesday, 11 August 2020

            Digging up your Ancestors - Citizen Science meets Ancient DNA

            There have been major advances in recent years in the field of Ancient DNA. The science has evolved to the point where the DNA profile extracted from ancient bones can be linked directly to surname projects at FamilyTreeDNA. This is particularly relevant to Irish surname projects and is sparking a renewed interest in medieval Irish history and Irish Clan research. But what is the optimal way of connecting Ancient DNA to Citizen Science? Read on!

            Dutch Water Color Painting of Irish Men and Women, about 1575
            (from Wikimedia Commons)

            Y-DNA and Citizen Science

            Y-DNA has been used for paternity testing and forensic cases since the 1980s but it was only with the advent of direct-to-consumer DNA testing by FamilyTreeDNA (FTDNA) in the early 2000s that saw Y-DNA being used in surname research (Y-DNA and surnames both follow the father's father's father's line). There are now over 10,000 group projects at FamilyTreeDNA, connecting people to their surname origins and in some cases a Clan history. You can find out if there is a project for your surname by simply googling: FTDNA & "your surname".

            Many projects have now reached an advanced state of maturity and have helped characterise the number of distinct genetic groups associated with a particular surname, how old each genetic group is, where it came from geographically, and whether or not it is associated with a particular Clan (Irish or Scottish).

            These projects are run by volunteer Administrators who help project members with their questions, collate & analyse the project data, and publish their results & conclusions. This is a great example of Citizen Science in action. The output from these projects has greatly accelerated the construction of the Tree of Mankind (Y-Haplotree) and the Tree of Womankind (mitochondrial Haplotree) to the extent that the ongoing construction of these trees has passed from the Academic Scientists to the Citizen Scientists.

            The Rapid Evolution of Ancient DNA Research

            Ancient DNA hit the headlines with the discovery of the remains of Richard III in 2012. DNA played a crucial role in his identification and the story captured the public imagination. Project Administrators discussed the possibility of using Ancient DNA within their surname projects. The Barrymore project was an early attempt to link ancient DNA to a specific surname project at FamilyTreeDNA. 

            From The Guardian, 4 Feb 2013

            More recently, advances in testing ancient DNA is producing exciting results about Ireland’s ancient past that is rewriting the history books. The Ancient DNA Lab at Trinity College Dublin has DNA tested over 100 ancient Irish samples collected over the last 200 years by intrepid archaeologists and antiquarians, and lying in wait in museum storerooms all over Ireland. These samples date from 6000 years ago up to medieval times. The first publication from this group was in 2015 and made news headlines across the world. [1,2] It completely upended long-established theories of “Celtic” origins for the Irish and showed that the modern Irish genome is substantially pre-Celtic. Since then testing ancient Irish DNA has progressed at a furious pace and further publications from this ground-breaking work are continuing to emerge. The most recent revealed evidence of an elite dynasty at the Newgrange passage tomb some 5000 years ago. [3,4] 

            Ancient DNA testing is now being applied to samples from the last millennium - in other words, within the surname era. In 2016, a road-widening scheme uncovered the site of a medieval community in Ranelagh, Co. Roscommon, which was occupied from about 500 to 1100 AD. Almost 800 skeletal remains have been found and DNA analysis of some of these is progressing. This major discovery will tell us a lot about life in medieval Ireland, how our ancestors lived, and how they died. We may even be able to link some of these medieval individuals to specific Irish Clans and even surnames, thanks to the multitude of people who have had their Y-DNA tested at FamilyTreeDNA (over 750,000).

            The medieval ring fort at Ranelagh
            (from The Irish Examiner)

            Then in May 2020, Spanish archaeologists found the site of the old chapel in Valladolid where Irish prince Red Hugh O'Donnell was buried in 1602. They located the chapter, and discovered several intact skeletons. [8] It was anticipated that identification of Red Hugh would be facilitated by the absence of his two big toes which were amputated due to frostbite following a daring escape from Dublin Castle across the Wicklow Hills. However, apparently many of the skeletons discovered were missing their feet and thus identification may have to rely heavily on DNA testing of all the skeletons and comparison of the resulting DNA profiles with those of living relatives with genealogically-established pedigrees. The O'Donnell DNA Project at FamilyTreeDNA will also help in this regard.

            The archaeological dig discovered 16 skeletons in the Chapel of Marvels at Valladolid, Spain 
            (photo: Jonathan Tajes from El Día de Valladolid website)

            It is highly likely that similar discoveries will be made over time and other examples of ancient DNA that falls within the surname era will emerge. Comparing this ancient DNA against the DNA of living people who have volunteered for surname projects and other group projects at FamilyTreeDNA will potentially allow these ancient individuals to be identified by surname and by Clan affiliation. And that will add considerably to the value of the academic research as well as advancing the aims of Surname DNA Projects run by Citizen Scientists. 

            But what kind of DNA extraction, testing and comparison needs to be done in order to optimise the chances of a successful outcome?

            Ancient DNA analysis

            There are no set standards for the retrieval and analysis of Ancient DNA, but recent projects have applied the following techniques and methods:

            1) Getting the tissue sample

            When ancient remains are excavated, one of the first questions faced by the project team is which bone to use to obtain a tissue sample for DNA testing. Bone tissue sampling from the petrous part of the temporal bone in the skull appears to offer the highest chances of success. This is the densest bone in the human body and, in ancient samples, the yield of human DNA from this bone is usually higher than elsewhere (e.g. molar teeth, other bones). There is also less risk of contamination by DNA from soil bacteria. Kendra Sirak at UCD (University College Dublin) has devised a technique of sampling the bone from inside the skull and this causes less bone destruction. For the identification of the remains of Irish rebel Thomas Kent in 2016, Jens Carlsson described how he and his team discarded the first third of the sample, analysed the middle third, and kept the last third for any future additional analyses.

            2)  Testing the extracted DNA

            Once the tissue sample has been obtained, the next step is to extract DNA from it (if possible). If there is a sufficient sample of DNA extracted, Whole Genome Sequencing (WGS) would be the test of choice. This provides all 3 types of DNA (Y-DNA, mitochondrial DNA, and autosomal DNA) and both types of DNA marker (STR & SNP markers). Coverage of the genome can be quite good - two of the first 4 ancient genomes sequenced in Ireland achieved 10-11x coverage. [1] WGS can also achieve high quality DNA data that is optimal for comparison against reference samples, including the types of test used by the commercial direct-to-consumer companies as well as standard forensic tests. 

            One of the major advantages of WGS is that it assesses 3 billion points on the human genome. In comparison, standard forensic tests only analyse about 17 DNA markers. That's 17 vs 3,000,000,000 - a huge order of magnitude difference. And that difference is associated with a huge jump in the quality of the information that can be gleaned from the data. 

            Similarly, commercial DNA tests also analyse many more markers than forensic tests - commercial autosomal DNA tests assess >600,000 markers (compared to 17 in the standard forensic autosomal tests, which use STRs) and Y-DNA testing assesses up to 851 STR markers (compared to up to 23 STR markers with forensic Y-DNA tests). In addition, forensic Y-DNA tests do not assess Y-DNA SNP markers whereas commercial Y-DNA tests (like FTDNA's Big Y test) assesses >200,000 SNP markers. Again, the quality of the information that can be extracted from these commercial Y-DNA tests is far superior to that associated with standard forensic Y-DNA tests.

            As an alternative to WGS, chip-based technologies are being developed (e.g. at David Reich's lab in Harvard, Connecticut). These would assess about 1.2 million autosomal SNP markers (compared to the 17 autosomal STR markers used in standard forensic tests).

            Summary of Process of Extraction of Ancient DNA
            (from my YouTube video)

            As far as relative-matching is concerned, the type of relationship that can be identified with standard forensic autosomal DNA tests extends only as far as parents, siblings, aunt/uncle, and niece/nephew, but cannot go beyond this with any degree of reliability. This is an important consideration if we hope to identify any of the remains of the 800 children believed to be buried at Tuam. In contrast, commercial DNA tests can reliably identify much more distant cousins, extending out to 4th cousins or greater.

            Forensic Y-DNA tests are very limited in their ability to group people into distinct genetic groups or place someone on the Tree of Mankind. In contrast, commercial Y-DNA tests (like the Big Y) are routinely used in Surname DNA Projects at FamilyTreeDNA  help group people into well-defined genetic groups with a common ancestor within a genealogical timeframe (the last 1000 years), and place people very precisely on the Tree of Mankind. In relation to ancient DNA, this could help identify a person's surname and potentially a Clan affiliation - something that would be relevant with regards to the identification of Red Hugh O'Donnell, for example.

            Thus WGS or chip-based tests would provide considerably more information than standard forensic tests and WGS should be the first choice when it comes to testing. However, the big disadvantages of WGS are a) cost and b) the need to utilise a much larger sample of DNA than is needed for standard forensic tests. So practicalities may dictate whether or not WGS is possible. Nevertheless, it should be the test of choice for ancient DNA analysis.

            3) Comparing the Ancient DNA to reference samples

            Any DNA extracted from ancient remains can be compared against targeted individuals (e.g. as was the case for Richard III, and the WWI soldiers from Fromelles - see video here) or against a more general population in a genetic genealogy database (such as GEDmatch or FTDNA) or even a forensic database (such as CODIS). In 2018, the new science of Investigative Genetic Genealogy was created and since then, the GEDmatch and FTDNA databases have been widely used to solve "cold cases" involving violent crime as well as identify unknown human remains. To date, over 100 cases have been solved.

            Y-STR data could be compared directly against the STR data on the public Results Page of specific Surname Projects. For example, the O'Donnell project's STR data could be used to help identify the remains of Red Hugh O'Donnell. 

            Y-SNP data could be compared against the available public Y-haplotrees such that the individual could be placed on a specific branch of the Tree of Mankind. This could help identify a likely surname for the individual (if the remains were <1000 years old) as well as an association to a specific Irish Clan. Of the available Y-Haplotrees, the most comprehensive is FTDNA's Big Y Block Tree which is available only to FamilyTreeDNA customers. However, they also maintain a public Y-Haplotree which can be used in conjunction with the Big Tree (for associated surnames and country origins) and YFULL's Y-Haplotree (for crude dating of branches).

            Using the example of Red Hugh O'Donnell again, based on SNP testing undertaken by the members of the O'Donnell DNA Project, it is anticipated that Red Hugh will carry the SNP marker BY21154. Below is the SNP Sequence for BY21154. A SNP Sequence is simply the sequence of SNP markers that characterises each branching point on the Tree of Mankind starting "upstream" at the level of the Haplogroup (R in this case) and progressing all the way "downstream" (i.e. towards the present day) to the Terminal SNP. Think of this string of SNPs as a line of ancestors coming forward in time towards the present day. Comparing the SNP Sequences of two branches helps us see exactly where each branch sits on the Tree of Mankind relative to each other and this tells us how closely or how distantly related are people sitting on these respective branches. The SNP Sequence for BY21154 is:
            • R-L21 >>> M222 > S658 > DF104 > DF105 > DF85 > S673 > S668 > DF97 > ZZ36 > FGC19851 > Z29319 > BY35773 > BY21154
            Age estimates for this SNP marker are available on the YFULL website here and surnames on adjacent branches of the Tree of Mankind can be viewed on the Big Tree here

            If Red Hugh does test positive for this SNP marker, then anyone else with this marker is in some way related to Red Hugh O’Donnell (via one of his direct male line ancestors - he had no descendants himself). This allows living people to connect directly with their O’Donnell ancestry and the history of the O’Donnell Clan in a very tangible way.

            Furthermore, DNA testing of ancient remains of known historical figures will help to confirm or refute the veracity of the ancient Irish annals and genealogies. We are seeing a lot of data from group projects at FamilyTreeDNA that supports the veracity of some genealogies and other data that suggests the opposite. Adding the DNA of known historical figures to the mix will help further this research. 

            This is a very exciting time for genetic genealogy as we explore the synergies between Ancient DNA analysis and Citizen Science. The rapid advances in these modern techniques are helping to enhance our understanding and appreciation of our ancient heritage.

            I can’t help feeling that Red Hugh would approve.

            Maurice Gleeson
            Aug 2020

            Resources and Links

            1) Neolithic and Bronze Age migration to Ireland and establishment of the insular Atlantic genome. Cassidy et al. PNAS 2016, 113 (2) 368-373. Available at https://doi.org/10.1073/pnas.1518445113
            2) Man’s discovery of bones under his pub could forever change what we know about the Irish. Peter Whoriskey, The Independent, 17 March 2016. Available at https://tinyurl.com/RathlinDNA
            3) A Genomic Compendium of an Island (2017) Lara M. Cassidy, PhD thesis, Smurfit Institute of Genetics, Trinity College Dublin.
            4) A dynastic elite in monumental Neolithic society. Cassidy, L.M., Maoldúin, R.Ó., Kador, T. et al. Nature 582, 384–388 (2020). https://doi.org/10.1038/s41586-020-2378-6



             


            Friday, 17 April 2020

            Statistical Analysis of Irish Type III signature

            Chi-squared test on Irish Type III Analysis

            Dennis Wright’s paper from 2009 describes a four-fold greater frequency of the Irish Type III (IT3) signature among Dalcassian surnames compared to non-Dalcassian surnames … http://www.jogg.info/pages/51/files/Wright.pdf

            This data is summarised in Tables 7 and 8 of the paper.


            Thus, among men with Dalcassian surnames, 57 had the IT3 signature and 214 did not. Similarly, among men with non-Dalcassian surnames, 37 had the IT3 signature and 334 did not.


            Chi-square test

            I put these values into the 2x2 contingency table that forms part of the chi-square calculator at https://www.socscistatistics.com/tests/chisquare/default2.aspx.

            The contingency table below provides the following information: the observed cell totals, (the expected cell totals) and [the chi-square statistic for each cell].

            The chi-square statistic, p-value and statement of significance appear beneath the table. Blue means you're dealing with dependent variables; red, independent.



            IT3+
            IT3-
            Marginal Row Totals
            Dalcasian
            57   (39.68)   [7.56]
            214   (231.32)   [1.3]
            271
            Non-Dalcassian
            37   (54.32)   [5.52]
            334   (316.68)   [0.95]
            371
            Marginal Column Totals
            94
            548
            642    (Grand Total)

            The chi-square statistic is 15.3283. The p-value is .00009. This result is significant at p < .01.

            The chi-square statistic with Yates correction is 14.4561. The p-value is .000143. Significant at p < .01. 
            (There's probably a consensus now that the correction is over-cautious in its desire to avoid a type 1 error, but the statistic is there if you want to use it).

            If we analyse only those Dalcassian surnames in bold in Table 7, we get the following results:



            IT3+
            IT3-
            Marginal Row Totals
            Dalcasian
            51   (22.04)   [38.03]
            73   (101.96)   [8.22]
            124
            Non-Dalcassian
            37   (65.96)   [12.71]
            334   (305.04)   [2.75]
            371
            Marginal Column Totals
            88
            407
            495    (Grand Total)

            The chi-square statistic is 61.7173. The p-value is . This result is significant at p < .01.

            The chi-square statistic with Yates correction is 59.6042. The p-value is . Significant at p < .01.


            Fisher Exact Test

            I also used another calculator on the website to do a Fisher Exact Test … https://www.socscistatistics.com/tests/fisher/default2.aspx

            The Fisher exact test statistic and statement of significance appear beneath the table. Blue means you're dealing with dependent variables; red, independent.

            Results


            IT3+
            IT3-
            Marginal Row Totals
            Dalcassian
            57
            214
            271
            non-Dalcassian
            37
            334
            371
            Marginal Column Totals
            94
            548
            642  (Grand Total)

            The Fisher exact test statistic value is 0.0001. The result is significant at p < .01.

            If we analyse only those Dalcassian surnames in bold in Table 7, we get the following results:

            Results


            IT3+
            IT3-
            Marginal Row Totals
            Dalcassian
            51
            73
            124
            non-Dalcassian
            37
            334
            371
            Marginal Column Totals
            88
            407
            495  (Grand Total)

            The Fisher exact test statistic value is < 0.00001. The result is significant at p < .01.


            Conclusions

            Both the chi-square test and Fisher exact test confirm that all comparisons are statistically significant, with p < 0.01 for all comparisons.

            Maurice Gleeson
            April 2020