Pages

Tuesday, 5 April 2016

Chromosomes, Markers & Evolutionary Trees

This blog post (from the Gleason/Gleeson DNA Project blog) discusses some of the basic principles behind Y-DNA testing, the different types of DNA, the different types of marker, and their practical applications, including the construction of evolutionary trees.


Let's recap on some of the basic science behind Y-DNA as this will help you understand what you are seeing when you look at your results, and how your results can be applied in practice.

Chromosomes - a closer look

We have 46 chromosomes, arranged in 23 pairs. Each pair has 2 copies, one of which you got from your mother, the other from your father. So for example, you have one paternal chromosome 14 and one maternal chromosome 14. Before you were conceived, your father made a copy of each of his 46 chromosomes but only passed on one copy from each pair to you. Similarly your mother made copies of all her 46 chromosomes but only passed on to you one copy from each pair. In this way the 23 chromosomes you got from your father combined with the 23 from your mother to bring your chromosome quotient back up to the usual 46.

click to enlarge

The 23rd pair is also known as the sex chromosomes. There are two types of sex chromosome - an X and a Y. At conception, if two X chromosomes combine, a female child is produced (XX). If an X and a Y chromosome combine, a male child is produced (XY). Women (XX) only have an X chromosome to pass on to their offspring, whereas men (XY) can pass on either an X or a Y to their offspring. Therefore the man's contribution decides the gender of the child. Women do not have a Y chromosome and so cannot do this particular DNA test.

Thus the Y chromosome is only passed on from Father to Son.  This is why it is perfect for tracing the father's father's father's line and is the main type of DNA used for surname studies. Be aware though that it only assesses this single ancestral line, and if you go back 10 generations, this represents only 1 of your 1024 ancestors (which is equivalent to about 0.1% of your ancestors at that particular level).

Each of our 46 chromosomes consists of a long double-stranded helix of DNA. If we unwrapped it, it would look like a long ladder extending into infinity, or a railway track running from New York to Los Angeles. It's huge. If you untwisted all 46 chromosomes from a single cell, it would stretch for 2-3 metres (6-10 feet). All the untwisted DNA from the human body would stretch to the moon and back several times.

All along the "ladder" are the nucleotide bases, like rungs in the ladder, binding each strand of the helix to the other strand of the helix. The bases are called A, T, C, and G, after the first letters in their respective names - Adenine, Thymine, Cytosine, & Guanine. A only ever binds with T, C only ever binds with G. You can remember this by thinking the straight-sided letters only bind to each other, and the curved letters bind only to each other. Each base pair effectively forms a rung in the ladder.

click to enlarge

Because A only ever binds with T, and C only ever binds with G, if we know the sequence of bases on one strand of the helix, we automatically can tell what bases are on the other strand. Therefore, the sequence of bases along the DNA is only ever written as a single line of letters (e.g. ATCCGAATTGG). The sequence is read from what is called the 5' (5 prime) end of the DNA molecule (and is read toward the 3' end, like reading from left to right).

In each pair of chromosomes, the two copies (maternal and paternal) are virtually identical to each other in terms of size, length, morphology, etc. The exception is the sex chromosome pair, X and Y ... the X chromosome is 3 times bigger than the Y chromosome.

Although each chromosome in a pair is virtually identical, there are subtle differences between the nucleotide bases that run along the entire length. These variations in the bases are called mutations and can be identified because they occur at specific locations along the chromosome. These locations where mutations occur are referred to as DNA "markers". Each marker can be identified because it occurs at a specific position along the chromosome and thus can be given a particular name (e.g. DYS390 or Z255). People who share the same mutation may have inherited it from a shared Common Ancestor, and this is why DNA can be so helpful for genealogy.

A note on terminology: Y-DNA refers to the Y chromosome. Autosomal DNA refers to all the chromosomes EXCEPT the last pair (Pair 23, the sex chromosomes, X and Y - all the other chromosomes are called autosomes, hence autosomal DNA). Mitochondrial DNA refers to the DNA found in mitochondria (the "batteries" that power each cell). For a more detailed introduction to the three types of DNA test and how they are applied in genealogy, watch this YouTube video here.

The different types of DNA marker

There are two types of DNA marker - STR markers and SNP markers.

STR stands for Short Tandem Repeat and the key word here is "repeat". An STR marker is a sequence of bases repeated many times (e.g. CATCATCATCAT). In this example, the sequence is CAT and the repeat value of the sequence is 4. When the DNA is being copied before being passed on to any offspring, there are occasional mistakes made in the copying process. So for example, a copying mistake in the CAT sequence above might result in 3 repeats instead of 4, and so the value of that marker may shift from 4 in the parent to 3 in the offspring. This may be the first mistake to be made in this particular marker for many generations, and so not only will the male child differ from his father, grandfather, and great grandfather, but also from all his male siblings and cousins, who will all have a value of 4 for this particular marker.

The second type of DNA marker is the SNP marker, which stands for Single Nucleotide Polymorphism. The key word here is "substitution" - a single base at a specific location changes from what it normally is to a different base (e.g. an A changes to a C or a T or a G). Whereas the STR markers involve several bases in a row, the SNP marker only involves the substitution of a single base.

click to enlarge

Kelly Wheaton has written some excellent blog posts about DNA markers on the Y chromosome. You can read them by clicking here - STR markers & SNP markers.

There are some very important characteristics of STR and SNP markers which are key to understanding how they are applied in surname studies:
  • Mutations in STR markers are written as the value of the marker (e.g. 12) whereas mutations in SNP markers are given names (e.g. Z255) or are written as the location on the chromosome followed by the change that occurred in the bases there. For example, 17349992 (G>A) indicates that a G has been replaced by an A at position 17349992.
  • The mutation rate of STR markers varies from marker to marker. Some mutate relatively quickly (e.g. 1 mutation every 5 generations) whilst others mutate very slowly (e.g. 1 mutation every 500 generations). Mutations in slow-mutating markers are very good for studying human migration, whereas mutations in fast-mutating markers can be very useful for genealogy research (in the last 500 years or so).
  • A big problem with STR markers is that they can mutate back as well as forward. So for example an STR marker may have a value of 4 which changes to a 3 and then back to a 4. The first mutation (4 to 3) may have occurred 1000 years ago, and the second one (3 back to 4) may have occurred 300 years ago. The trouble is that the Back Mutation masks the fact that there was a significant mutation 1000 years ago and this may result in people with the 4 value being assigned to the wrong branch of the human evolutionary tree and hence the wrong family tree!
  • Another problem with STR markers is the Parallel Mutation. This happens when two very separate branches of the same family experience the same mutation "in parallel", giving the impression that the two branches are more closely related than they actually are in reality.
  • A further problem with STR markers is that it is very difficult to identify a Back Mutation, or a Parallel Mutation. And as a result we don't know how often they occur. We suspect that it happens fairly frequently, perhaps as often as a marker value mutates forward it also mutates back. We really don't know. But such "hidden" back mutations may seriously confound our interpretation of the data and may result in people being placed on the wrong branches of the human evolutionary tree.
  • Convergence is the name given to the situation when Back Mutations and Parallel Mutations on STR markers result in people appearing to be more closely related to each other than they actually are. This is a big problem when comparing people at 12 markers, but less of a problem when comparing at higher numbers of markers (e.g. 37, 67, or 111). However, even at 67 markers significant Convergence has been detected.
  • On the other hand, SNP markers mutate much more slowly. And because there are so many of them, Back Mutations and Parallel Mutations are extremely rare (and easily spotted). For this reason, when using DNA markers to place people on the human evolutionary tree, SNP markers trump STR markers i.e. more reliance is given to SNP markers than to STR markers.


Y-DNA, Population Migration, & the Human Evolutionary Tree

Because the Y chromosome is passed on virtually unchanged from father to son, and because mutations in the DNA markers along the Y chromosome happen relatively infrequently, it is also an extremely useful tool for studying the last great human migration out of the African Motherland (about 50,000 years ago) that ultimately led to the populating of the entire planet. There is an excellent interactive animation of human migration here, including the various ice ages and the catastrophic eruption of the Mount Toba volcano that almost destroyed Mankind.

Population geneticists have been studying the evolution of mutations on the human Y chromosome (and on mitochondrial DNA) for many years and have developed an evolutionary tree based on these mutations (called the Haplotree).  They refer to each of the major branches of the tree as Haplogroups and have named them after the letters of the alphabet (e.g. Haplogroup R, or its subgroup Haplogroup R1b). You can think of a Haplogroup as a group of people with a broadly similar genetic signature.

click to enlarge

As modern humans moved around Africa and then moved out of Africa and spread to different places around the world, the humans who moved to Europe developed a totally different set of mutations to those humans who moved to India or Australia (for example). Thus certain haplogroups are found more commonly in Europe (e.g. R1b, I2b) than in India (e.g. H, L) or Australia (e.g. C, T).

Furthermore, genetic genealogy is a very young science, and more markers are being discovered all the time (thanks to novel tests like the Big Y test from FTDNA). As a result, scientists are still discovering finer and finer sub-branches of the human evolutionary tree, and we are approaching the point where we will discover the finer branching patterns associated with individual surnames (such as those in the Gleason/Gleeson DNA Project).

The old nomenclature for the various branches of the tree used a long string of letters (e.g. R1b1a2a1a2c1e) but this has been superseded by a system that simply puts the main Haplogroup letter followed by the "terminal SNP" (e.g. R-Z255). You can still see both terminologies in use on the ISOGG tree.

The terminal SNP refers to the SNP marker that currently occurs at the end of a branch. The word "currently" is important because as new SNP markers are discovered the current terminal SNP marker is likely to be replaced with a new one, and we will continue to move further and further down the finer branches of the tree until we identify SNP markers that are specific for your own family branch and even single individuals.

This will eventually allow us to reconstruct family trees based on DNA marker mutations. These are sometimes called phylogenetic trees, sometimes cladograms or phylograms, but my favourite is Mutation History Trees because it sounds similar to Family History Trees. The difference between the two is that Family History Trees are constructed using named individuals, whereas Mutation History Trees use DNA markers. It should be possible to superimpose one upon the other and in this way we can look 'beyond the Brick Wall" of individual pedigrees and see where different family branches are likely to connect. This in turn will help focus further documentary research.

There are various groups working on the human evolutionary tree and they have produced their own version of the haplotree:
  • The YCC Haplotree is produced by the Y-Chromosome Consortium. This is an academic effort and it is frequently out of date, being surpassed by the ISOGG tree which is updated much more frequently and harnesses the continuous output of genetic genealogists working on Haplogroup Projects (such as the R-Z255 & Subclades Project to which all members of Lineage II in the Gleason/Gleeson DNA Project belong). The most recent update of the YCC tree is from March 2015 but the tree itself is not user-friendly.
  • The ISOGG tree is the result of the efforts of ISOGG (the International Society of Genetic Genealogy) who co-ordinates the analysis and interpretation of the findings from various Haplogroup Projects and as a result has developed a much larger tree than the YCC Tree. It too is quickly out-dated as the pace of new SNP marker discovery advances and further sub-branches are discovered. Lineage II members can click here and search (Cmd+F or Ctrl+F) for Z255 to see where this particular sub-branch sits on the main Haplogroup R branch.
  • Several of the commercial companies have developed their own haplotrees which at times may be more advanced than the ISOGG tree, and at times less advanced:
    • FTDNA tree - this can be accessed from the Haplotree & SNPs page of your personal FTDNA webpage
    • YFULL Experimental Tree - YFULL is a company that offers SNP testing and will interpret the results of SNP testing carried out by other companies. This tree is relatively easy to navigate but again requires use of the Find function (Cmd+F or Ctrl+F).
    • FGC tree - like YFULL, FGC (Full Genomes Corporation) also offer SNP testing and interpretation. The visual presentation of the tree is not easy to navigate.
  • Haplogroup Project Administrators work at the coal face of scientific discovery in relation to the finer branches of their own particular haplogroup project. The R-Z255 & Subclades Haplogroup Project updates its draft tree periodically as new member results come in to the project. You have to sign up to the project to access these updates but here is the most recent update as of July 15th (for members only). It is important to appreciate the pivotal role that Haplogroup Project Administrators are playing in the ongoing discovery of the finer branches of the tree. Surname Project Admins will work closely with Haplogroup Project Admins to advise their project members regarding which tests to take next and why.
  • Alex Williamson's "Big Tree" is a tree that specifically focuses on the Haplogroup R-P312 branch of the human evolutionary tree (of which Z255 is a subgroup). Alex has done incredible work placing newly discovered SNP markers in their best estimated position on the tree, and most importantly for us, creating a visual representation that is easy to navigate and makes the current state of the tree so much more understandable. The members of Lineage II feature here too, in the Z255 subsection. There are two interesting features to Alex's tree:
    • if you click on the name of any individual, an analysis of their unique genetic signature comes up. Here is the analysis for member N74958 showing his position on the tree, his unique mutations, and his putative haplotype progression (i.e. the estimated progression of his mutations from previous ancestors).
    • the Overlay STR Feature allows you to compare the results for all STR markers (one by one) across the whole group. Here it is for DYS439.
  • Nigel McCarthy runs the McCarthy DNA Project and has pioneered the development of phylogenetic trees based on a combination of SNP and STR markers. Luckily for us in Lineage II, one particular area of his research is also focussed on the Z255 subclade to which we belong (Group E in his project). We'll be talking a lot about Nigel's work in due course as it is particularly relevant to the next steps in the DNA Project for Lineage II members.

The portion of Alex Williamson's "Big Tree" that deals specifically with members of Lineage II

You may have to read this several times before a lot of the information sinks in but stick with it - it's worth it! Knowing the basics behind the science of Y-DNA and how it can be applied will help you understand a lot of the discussion about SNP testing and Big Y results that will follow in subsequent posts.

Maurice Gleeson
30 July 2015




No comments:

Post a Comment