Thursday, 25 May 2017

Convergence - what is it?

There are several phenomena encountered in the the analysis of Y-DNA STR data that can throw a genetic spanner in the works, and Convergence is one of them!

In genetic genealogy, Convergence occurs when two men have DNA signatures that are exactly or nearly identical, but have evolved that way purely by chance. As a result, the two men will show up in each others' list of matches and will give the false impression that they may be closely related (e.g. within the last several hundred years) when in fact they are much more distantly related (e.g. within the last several thousand years). The problem is we cannot tell that Convergence has occurred simply by looking at the two men's STR results. It is hidden from our view. We cannot see it just by looking at the present-day STR data. And the danger is that if the two men think they are closely related, they may start chasing their common connection, thinking that they will find the answer via further documentary research, when in fact there is little hope of that at all. Their "close match" is a red herring. And their pursuit of the Common Ancestor is a wild goose chase.

So what can we do about it? How can we recognise it? How can we avoid it wasting our precious research time?


The concept is occasionally discussed in Facebook groups or on various blogs, but there tends to be quite a lot of confusion around what it actually means. And there are a variety of quite understandable reasons for this. 

Firstly, there isn't a standard definition for Convergence, so how it is used varies from person to person. Some people apply it only to exact matches, others apply it to exact and close matches. Moreover, the concept of Convergence is closely tied up with the concept of lack of Divergence. Both are different phenomena, but their effects and consequences are very similar. Another contributing factor is the fact that it is difficult to see it or detect it in practice. We know that it exists, but we have no way of identifying it just by comparing two sets of STR results. In other words, it's largely a hidden phenomenon (like Black Holes). It is only when we do SNP testing that the extent of Convergence becomes apparent. And the problem is that not enough people have done SNP testing. 

The good news is that more and more people are doing SNP testing and as they do, the extent of Convergence becomes more apparent. The Lineage II members in the Gleason DNA Project are trailblazers in this regard and we will explore the results of the recent Z255 SNP Pack testing in subsequent blog posts.

But in this post, we will look at an example of Convergence from the Gleason DNA Project in order to illustrate some of the key characteristics and consequences of Convergence. In later posts, we will look at clues that may indicate that Convergence is present, attempt to quantify the number of Back Mutations & Parallel Mutations that occur over time (using the Mutation History Tree that we have previously constructed for Lineage II - the North Tipperary Gleeson's), and finally we will attempt to quantify Convergence itself.

But first of all, let's look at some of the aspects of the definition of the term.


A general definition for the term convergence from the Conicse Oxford English Dictionary illustrates some general characteristics of convergence that are worth exploring because they are of relevance to how the term is applied in genetic genealogy and to the analysis of Y-DNA STR data in particular:
converge 1. come together from different directions so as eventually to meet

convergent 2. Biology (of unrelated animals and plants) showing a tendency to evolve superficially similar characteristics ...
There are several important aspects to these definitions that we can apply to the analysis of STR data (e.g. your 37 marker data). First of all, the sense that things were initially apart, but then they come together. Secondly, the idea that two things can look the same or similar on the surface, but in fact they have come from very different directions. And thirdly, the idea that two things can evolve from something different into something the same.

Let's look at how this more general concept can be applied to the analysis of Y-STR data.

And a good starting point is the description of Convergence on the ISOGG Wiki:
Convergence (also known as evolutionary convergence) is a term used in genetic genealogy to describe the process whereby two different genetic signatures (usually Y-STR-based haplotypes) have mutated over time to become identical or near identical resulting in an accidental or coincidental match.
One can think of convergence as producing misleading matches – two men appear to be more closely related than they actually are. The same situation may result (very occasionally) if there is an exceptional lack of divergence. In other words, so few mutations occurred in the descendants of a common ancestor over the course of time that the common ancestor may appear to have lived only a few hundred years ago when in fact he lived much further back than that, perhaps several thousand years ago.
So let's pick apart some of the key elements of this definition. You might like to refamiliarise yourself with some basic concepts, such as the different types of DNA markers (STRs and SNPs), and what you are actually seeing when you look at the DNA Results page.

Basic Concepts

Firstly, the above description of Convergence refers to the genetic signature - the Y-STR haplotype. This is the string of numbers you see associated with your results on the DNA Results page of the project. I like to think of it as if all the Y-chromosomes of the men in the group were all stacked up on top of each other, in such a way that each of the individual markers along the chromosome were all aligned with one column for each marker. Thus in the diagram below, each of the men have a value of 13 for the first marker. The values for the second marker are a mixture of 23 and 24. And so on.

The Y-STR results for the men of Lineage II
(click to enlarge)

Another key point in the above description is the concept that some markers mutate over time e.g. the number changes from 14 to 15. These mutations are identified by comparing the value in each square to the modal value for the entire group (i.e. the most frequent value among the men in that group). The most frequent values for each of the markers are used to generate the "modal haplotype" which is a virtual signature constructed from these most frequent values (and is represented by the row marked "MODE", the 3rd row from the top in the diagram above).

Mutations are indicated by coloured squares. If the value for any marker is the same as the modal value for that marker (i.e. the most common value among the men in that group), then the square that the value is in will not have a colour. If however, the value is higher than the norm, it will be coloured pink; if it is lower than the norm, it will be coloured purple.

If you and someone else have exactly the same string of numbers, you will have the same coloured squares and the same "no-colour" squares. If you are not exactly identical, you will have some coloured squares that the other person does not have ... and vice versa. In other words, the sequence of numbers, and hence colours, will be different. Each coloured square represents a mutation - a small minor increase or decrease in the number (compared to the norm) for that particular marker, in that particular individual.

Convergence in theory

Let's imagine that some distant ancestor living 10,000 years ago gave rise to four distinct lines of descent surviving today (represented by the men A, B, C, and D in the diagram below). Let's look at what happened to their first 37 STR markers over time, and let's assume that mutations only occurred in 5 of these STR markers, as shown in the diagram below. How did the values change over the passage of time, from 10,000 years ago to the present day? And how many of the descendants of this ancestor "match" each other today?

In descendant A, only one of these 5 STR markers mutated. It underwent a single mutation (from 13 to 14) about 6000 years ago, and that was the only mutation over the span of 10,000 years. This is an rather extreme example of "lack of Divergence".

Descendant B had several mutations in his line of descent, but only affecting the first and the fifth markers. These show progressive "forward mutations" away from their original values. With the first marker, the mutations go forward in an upward direction (14,15,16,17) whilst with the fifth marker they go forward in a downward direction (15,14,13,12). This latter may seem counterintuitive but it serves to emphasise that "forward" means "away from" the original value, no matter if it is up numerically or down numerically.

Descendant C also has experienced mutations in only the first and fifth marker. But here we see two examples of a Back Mutation. The first marker shows a forward mutation 6000 years ago (13 becomes 12) but this has gone back to 13 by 4000 years ago. It then undergoes another forward mutation by the time of the present day (13 to 14). Similarly, the fifth marker undergoes a forward mutation (16 to 17) by 4000 years ago but a Back Mutation by 2000 years ago.

Descendant D undergoes mutations on all 5 of his STR markers. A Back Mutation occurs with the second marker between 2000 years ago and the present day (15 to 14); and likewise with the third marker (12 to 13); and likewise with the fifth marker (17 to 16). Two Back Mutations occur with the fourth marker (29 to 30 by 4000 years ago; and 31 to 30 by the present day).

Mutations over time in 4 distinct lines of descendants

Remember, these are four distinct lines of descent, with the MRCA (Most Recent Common Ancestor) represented by the first row of 5 STR markers in the diagram above. So now let's look to see if any of the mutations that occurred in these four individual lines of descent occurred in parallel i.e. the same mutational change occurred in two completely separate lines of descent.

Have a look at the first marker in A, B and C. All three men developed the same mutation on this marker - a change from a value of 13 to 14. In Lines A and B this change occurred in parallel around 6000 years ago. In Line C, the change occurred in parallel around about the present day.

There is a similar parallel mutation between Line C and D. Look at the fifth marker - it increases in value from 16 to 17 around about 6000 years ago in Line D and 4000 years ago in Line C.

And there is a parallel back mutation present in Lines C and D also - the fifth marker switches from 17 to 16 about 2000 years ago in Line C and around about the present day in Line D.

With Back Mutations you are only looking at a single line of descent. With Parallel Mutations we are comparing two or more lines of descent. And we will see that in practice Parallel Mutations are much more common than Back Mutations and have a much greater role to play in the development of Convergence.

The STR results of living people today tells us nothing about their evolutionary history - it is hidden from view

Which brings us to Convergence itself. Let's look at the Genetic Distance between each of these lines of descent. This helps to make the point that the DNA results from living people are only a snapshot in time. They do not tell us anything about how those STR values have evolved over the past 10,000 years:
  • A and B have a Genetic Distance (GD) of 7. This is made up of a 3-step difference on the first marker (14 vs 17) and a 4-step difference on the fifth marker (16 vs 12). And as these were the only changes on their first 37 markers, the GD would be written as 7/37. This exceeds FTDNA's threshold for declaring a match (i.e. 4 steps or less over the first 37 markers; written as 0-4/37) and so A and B would not appear in each other's list of matches.
  • A and C have a GD of zero. They are an exact match. Their GD for the first 37 markers is thus 0/37. They appear in each other's match list and the match looks really close. They think they have a common ancestor in the last few hundred years. They start comparing family trees, looking for the elusive ancestor. They will never find him. This is a wild goose chase. This is the consequence of Convergence.
  • A and D have a GD of 2 (or 2/37). This GD falls within the threshold for declaring a match. They both appear in the other's match list. They email each other, looking for the common ancestor - another wild goose chase. Another example of Convergence and its consequences.
  • B and C have a GD of 7/37. No match.
  • B and D have a GD of 9/37. No match.
  • C and D have a GD of 2/37. It's a match. It's Convergence. They don't know that. They spend months researching their connection. It's a wild goose chase.

The STR results of people living today tell us nothing about how those STR marker values have evolved over time. They may have come from a relatively recent common source, or they may have come from widely differing directions.

Below is another way of conceptualising how the numerical value of a single STR marker might evolve over time. This marker started out with a value of 8 for the common ancestor of 4 distinct lines of descent. But by the time of the present day, two lines had a value of 9, one had a value of 13 and one had a value of 5. But the evolutionary history of these 4 lines of descent is peppered with Back Mutations and Parallel Mutations:
  • Back Mutations
    • Line 2 (red) - 14 becomes 13 some time between 1000 years ago and the present day (0)
    • Line 4 (purple) - 4 to 5 between 1000 and 0 years ago
    • Line 3 (green) - 5 to 6, 6 to 7, and 7 to 8 between 7000 (7K) and 4000 (4K0 years ago
  • Parallel Mutations
    • 8 to 9 in Line 2 (10K to 9K), Line 1 (7K to 6K), and Line 3 (2K to 1K)
    • 8 to 7 in Line 3 (10K to 9K) and Line 4 (9K to 8K)
    • 7 to 6 in Line 3 (9K to 8K) and Line 4 (7K to 6K)
    • 6 to 5 in Line 3 (8K to 7K) and Line 4 (4K to 3K)

The evolution of values in a single STR marker over time in 4 descendant lines
of a common ancestor who lived some 10,000 years ago

The consequence of all these Parallel & Back Mutations is that the present day descendants of two of the lines (green Line 3 & blue Line 1) have exactly the same numerical value for this STR marker despite the fact that their evolutionary histories are so different.

This is an example of the evolutionary history for a single STR marker. And if this is representative of all STR markers, then the chances that the values for a particular marker will converge over time is really quite high. But our DNA results usually consist of 37 markers (the standard test most people start with) so what are the chances of the first 37 markers evolving in such a way as to result in convergence of a sufficient number of STR values to cause a coincidental match? ... well, the probability of that happening would be a lot lower. And the probability would be lower still with 67 markers, and lower still with 111 markers. But because so many people have tested (over 600,000 currently), we do see the phenomenon occurring even at higher marker levels (67 and 111).

And in a subsequent post we will look at clues to the presence of Convergence, so that you can look at your own or anyone's list of matches and adjust your suspicion level accordingly.

Convergence in practice

And to illustrate these points, I have temporarily moved one of the ungrouped project members into Lineage II, namely member Jim Treacy (B38804)*. He is third from the end in the diagram below. Don't worry about not being able to read the text (you can click to enlarge the diagram if you like) - just focus on the coloured squares. 

The Y-STR results for the men of Lineage II (with a Treacy third from the end)
(click to enlarge)

And Jim has no coloured squares for the first half of the markers. It is only when we reach the 19th marker in the row that he has a pink square with the value 16 inside it - everyone else in that column has a value of 15 for that marker, except for one person who has a value of 14. And as we continue along Jim's row, there are 4 other coloured squares, bringing the total to 5. This can be expressed as a Genetic Distance of 5/37 from the modal haplotype (i.e. the 3rd row from the top, which - to remind you - is a virtual signature constructed from the most frequent values for each of the markers).

Now a GD of 5/37 between two men would mean that they do not appear in each others' list of matches (because FTDNA have set the threshold for "declaring" a match to be 4/37 or less). But among Jim's list of matches at the 37 marker level, there are two members of Lineage II (with a GD of 4/37). And at the 67 marker level, Jim has 6 members of Lineage II among his matches (with a GD of 6 to 7/67). So this looks (on the surface) that Jim is relatively closely related to our Lineage II group. And this suggests (on the surface) that there may be a common ancestor some time in the past several hundred years, maybe somewhere between 1700-1850 (on the basis of TMRCA calculations based on the TiP Report). 

So what do we do next? Do we start looking for documentary evidence? Do we go back to the church records and land records and old newspapers to see if there is mention of a Gleeson-Treacy connection? 

We could do. But it would be a wild goose chase. Because the Treacy-Gleeson connection is a red herring. And we know this because we have done SNP testing.

Jim has done the Big Y test, as have 10 of the members of Lineage II. Both Jim and Lineage II members belong to Haplogroup R, and both share some SNP markers in common. Each marker characterises a branching point in the Tree of Mankind and a SNP Progression is a list of these SNP markers down to the finer "more downstream" branches of the Tree. Here are the SNP Progressions for Jim and for the Lineage II Gleeson's:
  • R-P312> Z290 > L21> DF13 > ZZ10 > Z255 > Z16437 > A557 > Z29008 > A10891
  • R-P312> Z290 > L21> DF13 > ZZ10 > Z255 > Z16437 > Z16438 > BY2852 > A5631

You can see that the branching points are exactly the same ... until marker Z16437. Thereafter, Jim goes down one branch and the Gleeson's go down another one. Now, let's be clear: the Gleason's and Jim do share a common ancestor. And if he was around today he would test positive for the SNP marker Z16437. But his children would have evolved along different paths - one path taking us down to our present-day Jim Treacy, the other taking us down to our present-day Gleeson's. You can see where Jim and the Gleeson's are placed on the Tree of Mankind in the diagram below.

And when did this common ancestor live? YFULL date the formation of Z16437 as 1650 years ago. The two markers downstream of this, A557 (Jim Treacy) and A5631 (Gleeson), both have formation dates of 1400 years ago. So from this we can say that the common ancestor of Treacy & the Gleeson's is somewhere between 1400 to 1650 years ago. Or to give it an actual date (by subtracting from 1950, the approximate birth year for members of Lineage II), sometime between 300 and 450 AD.

This is clearly a lot further back in time than the 1700-1850 AD estimate suggested by the STR data.

So this is a great example of Convergence. By chance, Jim's STR signature has evolved over time to approximate that of the Gleeson's of Lineage II and as a result, he looks a lot more closely related to the group than he actually is.

Maurice Gleeson
May 2017

* a big thank you to Jim for allowing me to use his name and his results in this example

Gleeson's to the left, Treacy's to the right, & about 1500 years in between

Friday, 19 May 2017

23andMe Transition arrives in UK & Ireland

Some time ago, 23andMe transitioned their US customers to a new website format, whilst those of us in Europe remained with the old format. That was quite some time ago! But just this week, I have received an email informing me that I will be transitioned to the new format in June 2017. 

Below is the email I received. Of note, all Health Reports will be archived as pdf documents. I received mine before the FDA (Food & Drug Administration) put the extended hold on 23andMe's Health Reports, so I have 63 reports on physical traits, 53 on carrier status for inherited conditions, 25 on drug response, and 122 on health risks for a variety of medical conditions including Alzheimer's Disease and Parkinson's. 

The first bullet point talks about "Ethnicity" but on my screen it is described as "Ancestry" - click on your name (top right), then Edit Profile, & you will see it directly under the Ancestry Information heading. Click on Update.

You can also enter or update your ethnicity by clicking on the green button above (in the email you receive). Of particular note, if you manage several kits, after filling out the survey for your first kit, be sure to switch profiles and complete the survey for each one of your kits.

The new 23andMe experience is discussed on their international webpages here, and additional information for European customers is available on this link here and is abstracted below.

Some of the key features that stand out for me include:
  • some Health Reports may be available (depending on which chip was used - you can find this information on your Download Raw Data page in the Profile box toward the end of the page)
  • the maximum number of matches has increased to 2000
  • linking to online trees is allowed, even if they are with other companies (saves you the hassle of having to upload a gedcom ... which anyway is no longer available with the new experience)
  • when defining haplogroup subclades, they have switched from the old terminology (e.g. R1b1a) to the new one (e.g. I-M253)
  • any connections you currently share with your matches will be maintained in the new experience

One of the best additional features of the new experience will be the Relatives in Common feature. This is similar to the Shared Matches feature on Ancestry and the ICW (In Common With) Matches feature on FamilyTreeDNA.

Maurice Gleeson
May 2017

Thursday, 20 April 2017

7-Day Sale till 27 April 2017

National DNA Day is April 25.
Celebrate with us! 

FamilyTreeDNA are having a 7-Day Sale to celebrate National DNA Day (which marks the discovery of the structure of DNA in 1953). It runs from April 20th until April 27th 2017. The promotion ends at 11:59 pm Central Time on Thursday, April 27th (which is 5:59 am on April 28th in Britain & Ireland). Please note that all Items must be paid for by that time, including items 
ordered though the invoice system (Bill Me Later)

Below are the items that are on Sale. Of particular note are the Family Finder test for a mere $59 (55 euro, £46), the Y-DNA-37 test for $129 (121 euro, £101), and the SNP Packs for an unbelievably low price of $89 (83 euro, £70).
If you want to dip your toe in the genepool, now is the time - do the Family Finder test. It will give you your ethnic makeup estimates and connect you with cousins on all of your ancestral lines.

If you have thought about researching your surname, or buying a Y-DNA test for a relative, now is the time!

And if you have been advised to upgrade to a SNP Pack, they have never been cheaper and you should take advantage of this limited 7-day offer.

Happy DNA Day!
Maurice Gleeson
April 2017

Tuesday, 31 January 2017

Getting the Most out of your Y-DNA test (from FamilyTreeDNA)

The advice below pertains mainly to people who have tested their Y-DNA at FamilyTreeDNA, but some of the general principles apply to everybody, no matter which test you have done or which company you have tested with.

There are a few essential actions you should take to get the most out of your DNA test. You may not be able to do all of them all at once, so come back to this page often and check it out again to see if there is anything else you could be doing to maximise the value you get from your DNA test.

You may wish to share the link to this page with anyone else who might be interested in doing a DNA test so that they can see what they will get if they do.

Make yourself visible to your cousins

If no one can see you, you won't be able to connect with your cousins. So try to make yourself as visible as possible (or as visible as you feel comfortable with).

1) Prepare your surname's Ancestral Line (from you up to your surname's MDKA). This is the single most important piece of information that you can share. You will need this in your collaborations with other project members. In addition, many projects have a facility for posting this information somewhere on the project-related webpages. For example, in our Gleason/Gleeson DNA Project, these will go up on our Patriarchs & Matriarchs Page on the blog or the Patriarchs Page on the WFN website. This will potentially help other people to connect with you. It would help if you could provide it in the following format:
1) James GLEESON b c1835 Shallee, Co. Tipperary, d 12 Nov 1879 Longstone, Co. Tipperary, m 13 Apr 1860 Maria COYLE, Silvermines, Co. Tipperary
2) Morty GLEESON ...
3) John GLEESON ...
4) Abigail GLEESON … but not including dates for a) births <100 years ago, b) marriages <75 years ago, or c) deaths <50 years ago
Researcher: (insert your initials here)
Your email address
DNA Kits: (insert your DNA kit numbers)
Link to online tree:

2) Use your kit number and password to Log in to your personal webpage and explore it. There are a lot of bits & pieces of information you can include on your personal webpage that will optimise your chances of successful collaboration with your DNA matches. And knowing what your DNA results can tell you will help you get the most out of them.

3) You should add your MDKA information (Most Distant Known Ancestor) including dates & locations for both birth and death. The format we recommend is the same as the one above, but you may have to abbreviate it as only a certain number of letters are allowed in this field. Location of birth is the most important piece of information. Here is an example:
James GLEESON b1835 Shallee, Tipp, d1879 Longstone, Tipp
To add this information, simply click on your name in the top right of your homepage - Account Settings - Genealogy - Most Distant Ancestors ... I have posted instructions on how to do this on the following link ...

4) Fill out your MDKA Profile. In essence, this is your Brick Wall. And the more information you can give about it, the better the chance of breaking through it. There are lots of clues and circumstantial evidence from documentary data that may help you identify a possible connection with other members of the group.  This applies to all project members but is most relevant to members with Irish ancestry given that the records tend to peter out about 1800. Check out the MDKA Profile page of the Gleeson DNA Project for instructions on how to complete the profile for your own MDKA. You can also view an example of it here.

5) Add your Ancestral Surnames (click on your name in the top right - Account Settings - Genealogy - Surnames). I suggest to put SURNAMES in capital letters and Locations in normal text, as this makes the surnames "jump out" and easier for the reader to scan through.

6) Upload your Family Tree as a GEDCOM file so that you have a version of your family tree on your FTDNA webpages.   This is particularly important if you have done a Family Finder test (autosomal DNA). You can also add your Family Tree manually if it is easier for you. And if you have a Family Tree online, leave a link to it in the About Me section of your Personal Profile. Click here for specific instructions on uploading a Gedcom file -

7) Optimise your Privacy settings so that your potential cousins can see your results:
  • Hover over your Name in the top right
  • Click on Account Settings, then the Privacy & Sharing tab at the end of the menu bar above
  • Then simply change the settings under My DNA Results by clicking on the words "Project Members" at the end, and on the next screen checking the box beside "Make my mtDNA & Y-DNA data public". Then press Save.

Before the change
After the change

Check out Project-related Resources

There are a lot of resources that are particularly relevant for Surname DNA Projects and you should check out and use these as you feel appropriate.

8) Join the relevant Surname Project. There are over 9200 of them at FamilyTreeDNA (FTDNA). You can either search for it via Google (simply type in: FTDNA & your surname) or you can search for it via the FTDNA Search page. Once you have joined, the Project Administrator should look at your results (within a week or so) and assign you to a particular group within the project. You can also email the Admin if you have any questions. Their email address is usually on the Home Page of the project.

9) If you join a surname project, check out the various pages of the project website - they usually have a lot of useful information that will help you understand your results. See the Gleason / Gleeson DNA Project blog as an example.

10) Join the relevant Haplogroup projects
Your results will reveal your haplogroup (your branch of the human Y-DNA tree and/or human mtDNA tree). Once your results arrive, make sure you join all the relevant projects as these will assist in the further analysis of your data and in particular your deep ancestry (where in the world your particular ancestors originated several thousand years ago). The projects are run by volunteer project administrators and they are a rich source for advice, guidance, and support. Frequently there is an associated mailing list or Facebook group you can join to keep abreast of up-to-date developments (this is a fast-moving field).

As an example, relevant Y-DNA haplogroup projects for each of the Gleason Lineages identified thus far include the following:

If your haplogroup project is not listed here, you can see if there is a specific project for your haplogroup on this list:

11) Join the relevant Geographical Projects
As an example, relevant Y-DNA geographical projects for each of the Gleason Lineages identified thus far include the following:
There may be other geographical projects that are relevant to your ancestral line and you can find them on this list:

Check out General Resources

There is a lot of information out there about genetic genealogy in general and it can be a bit confusing knowing where to find it. Below is a selection of our "best bits".

12) FTDNA have a lot of useful information in their Learning Centre. Be sure to check out the FAQs (Frequently Asked Questions).

13) The ISOGG wiki is a great place to start looking for general information about any topic related to genetic genealogy, including your particular type of test.

14) Read Kelly Wheaton's beginners’ guide to genetic genealogy:

15) Download and read the e-book from the resources tab on your myFTDNA homepage.

16) There are a variety of different YouTube videos on genetic genealogy which have been prepared by ISOGG members and Project Administrators.
17) Sign up to the relevant genetic genealogy mailing lists, forums and Facebook groups. These can be great sources of help if you have a specific question. See the list here:
I particularly recommend:

18) Read blogs written by experienced genetic genealogists. See this list of genetic genealogy blogs:

19) Read the relevant articles about your specific DNA-test ...

Y-DNA - traces your father's father's father's line
Y-DNA basics:

Mitochondrial DNA (mtDNA) - traces your mother's mother's mother's line
mtDNA testing for advanced users:

These two pages are relevant if you have taken the full mitochondrial sequence (FMS) test:
mtDNA scientific collaboration:

Autosomal DNA (atDNA) - traces all your ancestral lines
Understanding Family Finder results:
Understanding Population Finder results:

Please let me know if any of these links are broken or cease working.

Maurice Gleeson
Jan 2017

Tuesday, 8 November 2016

Instructions for Doing a DNA Test

If you are not sure which DNA test is best for you, you should read this first.

When ordering the standard Y-DNA-37 test from FTDNA, always join a project first in order to avail of the $20 discount (the $169 cost gets reduced to $149). This could either be a surname project or a geographic project. For example, go to the Silvermines DNA Project and click on JOIN in the photo and follow the instructions to buy the test. If you have an additional Coupon Code, enter this in the "Enter Coupon Code" box (& hit Apply) before you click on the green Proceed to Checkout button.

When you get your FTDNA test kit, make sure you fill out all the forms correctly. The form is fairly straightforward but it is easy to leave something out. Be sure to check you have done the following:
  1. Put in your full name (maiden name is preferable for women, for genealogical reasons)
  2. Write your email twice - in upper case and lower case letters. This helps the lab staff read it correctly and helps minimise the risk of their emails to you going to the wrong address or bouncing.
  3. Be sure that you have put in the 5 pieces of information about your credit card, namely: the long number (16 digits), the short number on the back (3 digits), the expiry date, your name, your address. People often forget one or more of these! Alternatively you can pay by PayPal.
  4. Take a photo of the form or make a note to your kit number and the test you ordered and the date you ordered it.
  5. Sign the green consent form - you won't get the full benefit of your results otherwise.

Below is Brad Larkin's excellent video on how to swab (this does not apply to the 23andMe test or the AncestryDNA test as these require saliva samples rather than cheek swabs). The most important considerations are to fast for 1-2 hours before doing the swabbing (so we don't get any food in the sample) and to swab the inside of the cheek for 45-60 seconds (to ensure we get enough cheek cells on the cotton head of the swab). Repeat this procedure for the second sample (for example, using one cheek first for the first sample, and the other one for the second).

Dislodging the head of the swab into the test tube can be a bit tricky so take your time, do it slowly, and apply constant but gentle pressure. Take your time with this.

You will need to buy a padded envelope to post the kit back to the lab and you can easily pick one up at the post office. Get one with bubble wrap on the inside to ensure the plastic test tubes are safe.

The address for posting back to the FTDNA Lab in Houston, Texas is at the bottom of the white form. It is as follows:
FamilyTreeDNA, 1445 North Loop West, Suite 820 Houston, Texas 77008, USA

Postage is a few pounds/euro/dollars. There is no need for a customs form, but if the Post Office insists, be sure to put down "Genealogy kit" and NOT "DNA kit" (these are not considered biological samples) as this will delay it unnecessarily through Customs in the US.

Here is an alternative video about swabbing ... with music! This video is about the Geno 2.0 test kit from National Genographic (but the same principles apply to the test kits from FamilyTreeDNA and LivingDNA).

Maurice Gleeson
Nov 2016

Tuesday, 21 June 2016

Should I upgrade my Y-DNA test to 67 or 111 markers?

If you have done a Y-DNA-37 test with FamilyTreeDNA, you may be wondering if there is any point in testing a higher number of markers (67 or 111) and what would be the benefit of such testing. Is it worth doing it? And if so, why?

Well the answer is yes, but only under certain circumstances. Outside of these circumstances you might be better spending your hard-earned cash on a different DNA test ... or on your favourite ice-cream.

Here are the main reasons for upgrading to Y-DNA-67 or Y-DNA-111:
  1. No or few matches at 37 markers
  2. Lots of matches at 37 markers
  3. To assist the Project Administrator with difficulties in placing you in a group
  4. To more precisely estimate how closely two specific people are related
  5. To help the PA identify the branching pattern within a genetic group 
We will look at each of them in turn, but before we do let's mention a few key considerations about Y-DNA testing in general, how matches are identified, and some of the pitfalls involved in the process.

Some general considerations

I choose 37 markers as the starting point because most people interested in surname research will have tested to this level. Not everyone will however, and some people (especially transfers from the National Genographic Project) will only have tested to 12 markers. Neither of these are particularly useful for surname research (with rare exceptions) so I will only be addressing upgrading from 37 markers to 67 or 111.

Secondly, it is important to be aware of FTDNA's threshold criteria for declaring a match and listing them in your Matches List. These thresholds are based on Genetic Distance (GD) and are illustrated in the table below (see FTDNA's FAQ page and Privacy Policy page).  Having a GD of 4/37 means that the two individuals being compared are 4 steps away from an exact match (which would usually be expressed as 0/37, or sometimes 37/37).

The thresholds for declaring a match can be summarised as: having a GD at or below 1/12, 2/25, 4/37, 7/67, and 10/111. Each threshold value roughly equates to about 10% of the total number of markers. 

It is important to be aware that some people who fall within these thresholds will not be related to you within "a genealogical timeframe" (which we will take to be about the last 1000 years or so). Similarly, some people who fall outside these thresholds WILL be related to you "within a genealogical timeframe".

Also, it is important to appreciate that these thresholds are arbitrary. They are designed to maximise the number of true positives (high sensitivity) and minimise the number of false positives (high specificity). However, some true positives will escape being caught and some false positives will sneak through. And one or the other scenario may affect some people more than others. The question is: how do you recognise this? How do you separate the wheat from the chaff? Your chances of being able to do this are substantially increased by joining the appropriate surname and haplogroup projects and liaising with the Project Administrators because they have better oversight of the totality of the data within a genetic group and also have additional tools that they can use to better define how closely you are related to other people.

Interpreting Genetic Distance is just as arbitrary as defining a threshold for "declaring a match" and our thinking on this subject is likely to change over time. The table below is derived from FTDNA FAQ pages relating to Genetic Distance at 12, 25, 37, 67, and 111 markers respectively. Match Thresholds are highlighted in yellow.

There is some apparent inconsistency at the 111 marker level when comparing the Match Threshold (>/=10) to the interpretation of Genetic Distance (Not Related). If two people with a GD of 10/111 are Not Related, why declare them as a match?

Furthermore, with the advent of SNP testing and our increasing experience from surname and haplogroup projects, there is now strong evidence that these interpretations can be wildly wrong. Even two same-surname individuals with a GD of >10/37 could be related within a genealogical timeframe (Farrell DNA Project, group R1b-GF2). The interpretations above should therefore be used only as a guide.

So now let's look at the specific scenarios where it might be worthwhile upgrading. What follows expands on the advice already given by FTDNA in its FAQ pages.

Scenario 1:  No or few matches at 37 markers

If you have (say) no matches at the 37 marker level, it could be because someone has a Genetic Distance to you of (say) 5/37 ... in other words, there are 5 differences between you both in the first 37 markers. However the threshold for "declaring a match" is 4/37, and so neither of you will appear in the other's Matches List.

But if you both upgrade to 67 markers, and there are no further differences between you on markers 38 thru 67, then the number of differences remains at 5 and the Genetic Distance is written as 5/67, which is above the threshold for declaring a match and thus you will each appear in the other's Matches List.

In short, upgrading to 67 markers has revealed an additional match that was "hidden" at the 37 marker level.

The same scenario may also apply at the 111 marker level. But the big caveat is you can only compare yourself with other people who have upgraded to at least the same marker level. You cannot detect more matches by upgrading to 67 markers if everyone else is still at 37 markers. Of the 238,000 people with Y-DNA-37 data in the FTDNA database, only 33% of them have Y-DNA-111 data.

There are several reasons for why you may have no or few matches:
  • you may be the first person with your Y-DNA signature to do the test
  • your DNA signature may be very rare because you are the last of your line, or few people with that particular signature are left in the world
  • you may have unusual mutations which have moved you away from the rest of your group

Scenario 2:  Lots of matches at 37 markers 

If you have lots of matches at 37 markers, either your Y-DNA signature is very common in the population or you are a victim of Convergence. This is where, just by chance, people have a similar genetic profile to you that makes them fall within the matching threshold, but the common ancestor is 1000's of years ago rather than 100's of years ago. 

Upgrading to higher marker levels will help weed out many of these Convergent matches but may not eliminate them completely. Convergence has been observed with a GD of 3/111 in the Stewart DNA Project (see this YouTube video from 28:50 onwards).

Scenario 3:  To assist the Project Administrator with difficulties in placing you in a group

Sometimes it can be difficult to allocate project members to a specific genetic group within a surname project, for example if the GD is borderline (e.g. 5/37) and/or the member has a surname variant that may or may not be related (e.g Farrell and Harrell).

In these circumstances upgrading to a higher level of markers may provide additional supportive evidence for grouping you in a specific group (e.g. if the GD remained the same at 67 markers, namely 5/67, then this would be stronger evidence for including you in a specific group).

This scenario may be particularly relevant to you if you are in the Ungrouped category in a surname project. If so, ask your Project Administrator if upgrading to 67 or 111 markers would help him or her with the grouping process. 

Scenario 4:  To more precisely estimate how closely two specific people are related

Upgrading to 67 or 111 markers can help provide supportive data of a very close relationship on the direct male line. However, this should probably be done in conjunction with autosomal DNA testing (and even mtDNA testing) as the Y-DNA-111 test on its own is not conclusive.

FTDNA says that over 50% of exact matches at 111 markers (GD = 0/111) are first cousins. Similarly, over half of matches with a GD of 1/111 are 2nd cousins or closer, 2/111 are 4th cousins or closer, 3/111 are 5th cousins or closer, and so on (see full Table here). 

In short, upgrading to 111 markers will give you a better estimate of how close you are related to someone else but will not define it precisely. There will still be quite a broad range around the "best guess". In order to get a more precise estimate of which ancestor on a direct male line is the common ancestor between two people, it may be necessary to do autosomal DNA testing to estimate the degree of kinship, or to additionally test specific selected cousins of one or both matches, in order to triangulate with atDNA testing, or even mtDNA testing (the latter technique was used to identify WWI soldiers found in Fromelles).

Scenario 5:  To help the PA identify the branching pattern within a genetic group

As surname projects mature, some Project Administrators may take on the task of better defining the branching pattern within certain genetic groups within the project. I am attempting this in the Gleason/Gleeson DNA Project (you can see more about it in this YouTube video).

This process of building a Mutation History Tree (or cladogram or phylogram) is not easy and requires a lot of work. It is best done with 111 STR marker data combined with SNP data (e.g. via the Big Y test). In the future, the number of STRs available to test may increase to 500 or more (e.g. via YFULL) and testing out to 500 markers may become the preferred option. Furthermore, this process requires that many people within a genetic group have this data available.  It is thus quite a costly undertaking for group members.

However, defining the branching pattern within a genetic group brings several specific potential benefits. It can more accurately define how long ago different branches of the family broke away from each other, and how closely specific individuals within a family are related. This can be very useful for both historical studies of the family and the personal genealogical research of individual members. It can also indicate where Back Mutations and Parallel Mutations occurred within a particular genetic group, and this furthers our understanding of the nature of these mutations which usually remain hidden.


So if you think you fall into one of the above categories, consider upgrading your Y-DNA-37 results to the 67 or 111 marker level. You can do it in a step-wise fashion as there is (usually) no extra cost in doing it this way rather than upgrading to the highest level all at once.  And this potentially saves you money because all your questions may be answered by simply upgrading to 67 markers only.

If you do not fall into one of the above categories, you may benefit more from some other test, such as Y-SNP testing or autosomal DNA testing. It all depends on the questions you want answered.

Defining the genealogical questions clearly in your own head will enable you to better arrive at the optimal testing strategy to answer your questions.

Maurice Gleeson
June 2016

Monday, 16 May 2016

Y-DNA matches with Different Surnames

Why do I have Y-DNA matches that don't have the same surname as me? 

This is a common question that is asked when people first get their Y-DNA results. And there are several explanations for it. The Y-DNA test only compares Y chromosome DNA to other Y chromosome DNA. A “match” between two men usually means one of three Scenarios (bear in mind there are exceptions to every general rule):

Scenario 1. 
The two men are related via a common ancestor who lived some time since the appearance of surnames (e.g. within the last 1000 years or so in Britain & Ireland). And there are several sub-scenarios in this situation:
a) the two men have the same surname - in which case, they are probably related via a common ancestor (who bore that same surname) some time within the last c.1000 years. This is the scenario we are most interested in and forms the basis of surname studies.
b) the two men have different surnames - in which case an NPE may be present i.e. Non-Paternity Event (or Not the Parent Expected). In other words, both men have a common ancestor within the last c.1000 years, but the surname on ONE of their lines (we don’t know which one) has changed over the years because of a secret adoption, or infidelity, or illegitimacy, etc. Postscript: as mentioned in the Comments below, there are many other possible causes for "surname discontinuity". For example, some families adopted new surnames after emigrating to the US, changing the name to perhaps sound more English. And of course some societies adopted inherited surnames quite late (e.g. Turkey in 1934) or not at all (e.g. Iceland, Tibet).
Scenario 2. 
The two men are related before the appearance of surnames (e.g. pre-1000 AD) - in this scenario, the two men will have different surnames (with rare exceptions). This scenario can arise where there has been very little mutation in the DNA over the course of the last c.1000 years or so. Or where there has been a degree of Convergence (see below).

Scenario 3. 
The two men are related but much further back then they look. This is because of Convergence, where the two genetic profiles were identical 10,000 years ago (for example), but then mutate away from each other gradually over the millennia, and then (by chance) start mutating back towards each other so that it looks like the common ancestor is closer than he is (say 500 years ago rather than 10,000 years ago). Convergence is still being studied and not a huge amount is known about how commonly it is encountered. It is likely that it is more common in some haplogroup subclades than in others.

So in the situation where a man matches a man with a different surname, these are either cases of Scenario 1b (NPE) or Scenario 2 (pre-surname match) or Scenario 3 (Convergence). How can you distinguish between these three scenarios? Not easily, but there are certain clues that can help.

If one of the men matches other people with his surname, then it is less likely that his particular surname is the result of an NPE. And if the other man matches nobody with his surname (and there are people with his surname in the FTDNA database that he could potentially match), then the likelihood that an NPE has occurred somewhere along that man's direct male line is higher. On the other hand, if both men match others with their surname, then perhaps this is a case of Convergence.

If the two men have tested to 37 markers (or higher) and are exact matches, then this makes Scenario 1b more likely (i.e. an NPE has occurred somewhere in the past). The likelihood increases if there is an exact match at 67 markers or 111 markers. And on the contrary, the less close the match is (say 4/37 or 3/37), then the more likely this is a case of Scenario 2 (pre-surname match) or Scenario 3 (Convergence).

Looking at the terminal SNP results of a man's matches may give a clue as to which of the three scenarios is most likely to be present. You can examine the terminal SNPs of a man's matches (at the 111 marker level down to the 25 marker level) and see which SNPs are most common among his matches. Then by plotting these SNPs on the haplotree* you can get some indication whether or not there is evidence of Convergence (i.e. the SNPs fall onto different branches of the haplotree) or no evidence of Convergence (all of the SNPs fall onto the same branch of the haplotree). If there is no evidence of Convergence, then this makes Scenarios 1b or Scenario 2 more likely.  In the example below, the terminal SNPs of a man's matches all fall below SNP L226, suggesting that he and his matches all sit on the L226 branch of the haplotree. However, there may be some Convergence further downstream, as two of his matches sit on different branches below SNP FGC5628.

Performing additional downstream SNP testing (e.g. a SNP Pack or Big Y test) will help differentiate between the three scenarios. Here is what you might expect:
Scenario 1b (NPE) - the two men sit on the same downstream branch that is associated with the surname of one of them. The age of the common SNP might be somewhere in the last 1000-2000 years.
Scenario 2 (pre-surname match) - the two men sit on the same branch upstream (i.e. representative of a major subclade of the haplogroup, say L226). The age of the common SNP might be somewhere in the last 2000-8000 years.
Scenario 3 (Convergence) - the two men sit on completely different (i.e. very distantly related) branches of the haplotree and the common SNP is (say) >8000 years old.
If a recent NPE is suspected, autosomal DNA testing can help establish if the two men are closely related (i.e. within the past 5 generations or so).

Using these techniques will help distinguish between the three possible scenarios but in many cases there is unlikely to be a single definitive test that will give you the answer. The best you might be able to hope for is that taking all the evidence together, the balance of probabilities points toward a particular scenario as being the most likely.

Example: Plotting terminal SNP results of a man's matches shows that they all fall below SNP L226
(i.e. no evidence of Convergence before SNP FGC5628)
(click to enlarge)

*I use FTDNA's but you can use others too - ISOGG, the Big Tree, or YFULL's tree