How do you explain difference between MyHeritage and Ancestry shared matches for Triangulation Group?

+8 votes
336 views
I’m working on a Triangulation Group for Benjamin Sherman and Rebecca Phippen, Sherman-1506. But there are some issues.

All looks good at MyHeritage, MatchA and MatchB triangulate with over 30 other matches for a 21.5cM group and 3 consistent trees (including mine).  MatchA’s MRCA with me, Job Sherman, is one generation closer than matchB’s, Benjamin Sherman.

I have not found GEDmatch numbers for them, but both MatchA and MatchB are at Ancestry where I find a problem.  MatchA is consistant, matching many of what I consider Bristol/Sherman matches, including my maternal half-sister, but MatchB HAS NO MATCHES with me or matchA at Ancestry.  

How is that explained?  

Can it be related to another issue….that I descend twice from Rebecca Phippens father David Phippen, through both my maternal and paternal lines?

Or yet another issue…that I find that both matchA and MatchB have two Sherman paths to Job or Benjamin Sherman in their trees?

Ty, Ann
WikiTree profile: Benjamin Sherman
in Genealogy Help by Ann Weiner G2G4 (4.5k points)

3 Answers

+13 votes
 
Best answer

Hi, Ann. The good news: your DNA didn't change. The bad news: the testing and reporting companies analyze and evaluate matching comparisons differently. To compound that problem, centiMorgans are a calculated estimation (not a physical measurement) and the actual start and end points of matching segments can't be known: because we test only about 0.026% of our available genome, the companies either use the last tested marker (or SNP) or impute one based on how your DNA compares to something called reference genotypes. Oh, and if that wasn't enough, of the tests commonly in use there may be only 17% of the same SNPs tested between any two test versions.

All super clear now, huh? laugh

But it's a fact of life in genetic genealogy. Most people expect, as is reasonable, that, "Hey! It's science, right? The results should be accurate and repeatable."

They really aren't. The smaller the segment in question, the more likely  that the reported information will differ among companies, and the more likely that the reported segment might not be valid at all.

At AncestryDNA, for example, they changed their minimum segment size criterion a few years ago to 8cM, and on top of that they employ a proprietary algorithm called Timber that considers large sets of the reference genotypes I mentioned before, this with the goal of identifying tiny segments of DNA that are much more likely to be present due to many generations of a population-level inheritance, of families living in the same region and inevitably interbreeding long ago. Without that--a result of what's called linkage disequilibrium--the companies wouldn't have even a hope of trying to estimate "ethnicities" (more correctly, admixture).

What AncestryDNA does, then, is to down-weight those deep ancestral segments in the matching algorithm in an effort to provide more accurate estimates within a relatively recent genealogical window of time.

MyHeritage takes an approach that is almost the opposite. They use an algorithm they refer to as "stitching." They use genotype imputation to make an educated guess about segments that are a little too far apart to call as being one continuous chunk of DNA. If the genotype imputation shows that you and your match might have some of those deep ancestral tiny chunks of DNA, even if they were never actually tested in your DNA, MyHeritage will "stitch" the two small segments together and call it one...and often that means two small segments that otherwise were off the radar. Say one that's 5.2cM and one that's 4.8cM now end up evaluating as a single segment that's over 10cM. The result is that many people feel the MyHeritage has the most lax matching standards, and that renders their take on smaller segments as probably inaccurate.

In my opinion, GEDmatch has a similar laxity problem but for an entirely different reason. They use no genotyping or imputation of any kind. They "slim" the kits we upload in something of a brute-force attempt to reduce some of the SNPs that may not be very meaningful for matching (consider that up 19% of the SNPs examined in our tests look at markers that are targeted specifically for clinical, not genealogical purposes) and then the resultant comparisons are pure arithmetic. If we leave the default settings as is, there seems a fairly high incidence of false matches for segments below centiMorgan values in the high teens.

In 2020 I put together a table that used two 2nd cousins who had both tested using the same iteration of the AncestryDNA v1 test, then uploaded the same raw data to Family Tree DNA, GEDmatch, and MyHeritage. In addition to data supplied by those companies about the matching segments, I also ran centiMorgan calculations at Rutgers University and the Williams Lab at Cornell. Of the 11 segments compared, some are very similar (though none of the three testing companies reported any of the same start and end positions), and some are proportionately pretty different. The biggest gap there was one on Chr 9 where FTDNA reported a 7.58cM segment, and MyHeritage a 24.9cM segment. Big difference. The table is available here as a PDF file.

There's about 8% of our genome that isn't available to our common microarray tests. We need to take any results that fall into those chromosomal areas with varying grains of salt. You mentioned Chr 5 and, while not exactly in the middle of it, the region from 46.1 million base pairs (Mbp) to 50.7 Mbp is one of those areas. If, for example, a matching segment extends across that section but only by a fairly small amount (say, something like from 35 Mbp to 55 Mbp) yet the testing company calls it a continuous matching segment of 11.5cM, that would be one to discount and write off your matching list.

This is a more technical PDF file, but it delineates many of those areas.

Not a simple answer to your question but, yes, the testing companies and GEDmatch are likely to all report slightly different information. When in doubt, I'll tend to consider only the most conservative data presented. To me, there's a much smaller downside in erring on the side of accuracy than to make some assumptive leaps and end up with misinterpreted DNA information.

Have fun!

by Edison Williams G2G6 Pilot (450k points)
selected by Lucas Van de Berg
Ty Edison,  that was very helpful.  I didn’t know about the effects of the centromere regions.  And yes, the CHR5 centromere goes right through the middle of matchA but misses MatchB.  MatchA extends either side of the centromere.

I know about pileups and unused areas from DNAPainter but those don’t affect these CHR5 matches, I don’t think.

It seems matchA is fairly well-behaved at Ancestry despite the centromere because he matches correctly with my sister who encompasses the whole region.  MatchB which avoids the centromere region does not match my sister at Ancestry, but triangulates with matchAs group at MyHeritage implying that MatchB should match her at Ancestry.

So, if I understand you, MatchB’s shared matches, including my sister, were  filtered by Ancestry. Ancestry’s algorithm determined that matchB’s 21.5cM segment likely consisted of even tinier ancient segments that needed to be down-weighted according to matchB’s calculated genotype to prevent the illusion of a recent MRCA when there wasn’t one. This broke matchB’s raw segment into pieces that were too small (< 8cM) to match anyone else, including my sister, except me (he and I are Ancestry matches).  Did I understand this correctly?

I think I see this at work when analyzing groups at MyHeritage that include matches whose trees don’t indicate they are anywhere nearly related on any branch.

In this case there is a MRCA.  Makes you go Hmmmmm. :)

Thanks for the discussion…I’m learning anyway.

Ann

Part 1 (Yeah; 12,000 character limit; it's a real thing.)

Well, shoot, Ann. MyHeritage just put the kibosh on your triangulations there, at least temporarily, by removing their chromosome browser and matching segment detail. See this G2G post from a few hours ago.

I admit I dashed out yesterday's answer without examining in detail what genealogical relationships your question concerned. My bad.

It isn't surprising at all that you and your sister display very different match groupings, especially the more generations distant the MRCA. You and she will share only about 50% of your DNA. Mercedes Brons notes that full siblings generally share 33% to 50% of their DNA, while the way 23andMe reports the calculation shows that the sibling sharing range is 38% to 61%. The higher side can come about if there was fairly recent pedigree collapse in the tree. So there's a lot of space differentiating the two of you.

In fact, germane to some of the rest of yet another lengthy post (gotta maintain my record of having the highest G2G word count of all time smiley ), is that it's rare we ever pass along our entire genome. Even with a cluster of children, some of a parent's DNA likely never makes it to the next generation. As U.C. Davis geneticist Graham Coop describes it, by 10 generations back the likelihood is that, for any given ancestor, there's less than a 50% chance that any of their autosomal DNA came down to you at all. By 12 generations, that's reduced to more like a 30% chance. By 12 generations you're approaching single digits.

Stands to reason. Each creation of a gamete, and egg or sperm cell, is an independent event that results in a cell that contains, after recombination, one of each of the parent's chromosomes. The homologous chromosomes are approximately the same size, so each one represents 50% of the DNA the parent starts with. To clarify, by "independent event" I mean in probability terms. It's like rolling a die: the odds aren't linked, they start fresh each time. Because you rolled a 6 the first time, it doesn't mean the odds are lesser or greater that you'll roll a 6 the next time.

We can do a little math to estimate the average amount DNA that will be shared, in aggregate, with any number of full siblings by using a simple equation: 1-(0.5n) where n is the total number of siblings. Note that this assumes the average sibling sharing is 50% overall...which should be in the ballpark, or perhaps even slightly on the high side, most of the time. It takes 7 children for one parent's DNA to hit a statistical 99%. 'Course, those independent events means the real world will differ...could have hit 100%, could have been down under 90%.

Getting a little off track here, but the bottom line is that things get really complex, biologically speaking, when you try to go back multiple generations and link distant cousins to an identified MRCA. Ultimately, the problem is that it becomes all but impossible to rule out all the possible pathways by which DNA could have arrived at the compared cousins. Not only what seem direct-line pedigrees have to be considered, but you need to go back at least a couple of generations earlier than the MRCA and the tree branches have to be filled out laterally, as well, to attempt to find any possible cross-pollination.

I, of course, can't explain precisely what MyHeritage or AncestryDNA did with the data, but there's definitely something odd afoot. You mentioned shared segments of 36.5cM and 21.5cM, which are much larger than would be expected for 8th cousins or 7C1R.

Before anyone mentions it, I would caution against any interpretations against the Shared cM Project for relationships this distant. In the last report from the project, version 4.0 in March 2020, you'll notice that Blaine Bettinger provides a meiosis grouping chart only up to his classification of "Grouping #10," which has as the most distant relationship that of 4C1R. He does provide individual relationship histograms for more distant cousinships, but you can see the distribution curves are going south--meaning the technical term "getting wonky"--by 5th cousins. In the end, these are user-reported data that Blaine has no way to vet and verify. For example, how often have we seen someone label a relationship as 4th cousins when it was, in fact, 3C1R. Happens all the time. Combine that with the fact that validating a specific ancestral relationship out as far as shared 4g-grandparents is exceedingly difficult--and likely not probable at 5g-grandparents and beyond--and you end up with what is undoubtedly a large number of errors in the Shared cM Project data, errors that grow more numerous as the relationship distances increase.

But...more math! Whoopee!

Give or take (I default to using the averages from David Reich at Harvard) your father's DNA went through crossover during meiosis (recombination) about 26 times; your mother's more, about 45 times. Your father's X chromosome doesn't undergo crossover, so only the 22 autosomes. That equals about 48 discrete segments (26+22 because each crossover point would then represent two segments, so we add the number of chromosomes involved to the number of crossovers to arrive at a working value). For your mother, it does include xDNA recombination, so that's 45+23=68 segments. You get one entire chromosome from each, naturally, but the total number of parental segments that you (and your sister) received would be about 116.

That will hold true for each meiosis event as we step into the past. We can express that as 116*(2k), where k equals the number of generations, to roughly figure out how many segments could have been involved in your own genetic inheritance. At 2 generations, your grandparents would represent up to 464 possible segments that you might have received. You didn't get all of them, because biology! But that's how many were involved in the, er, production of your parents.

Part 2

If we go back 10 generations, 116*(210), we see that as many as 118,784 segments are in play. If we use an approximate average of 7,200 calculated centiMorgans per sex-averaged genome and say that a value even as low as 5cM can accurately be detected (I would raise that, but most-optimistic scenario here), then our genomes would house about 85,556 detectable segments. You see the conundrum. Segments don't neatly divide during meiosis like, say, sheets on a roll of toilet paper (World's Worst AnalogiesTM), but just the raw numbers demonstrate how ancestral genetic material drops out over generations. Some small chunks live on due to linkage disequilibrium, but much of the DNA sifts out over time so that our distant cousins are unlikely to have much in common with us, genetically. We end up patchwork quilts made from quite different bits of cloth.

Trying to validate at the 8th cousin level means, assuming no recent pedigree collapse, you have 512 paper-trail ancestors and your cousin also has 512 paper-trail ancestors...but you supposedly share only two among those 1,024 potential genetic donors.

I keep saying "recent pedigree collapse" because those effects--barring actual endogamy and oft-repeated incidents of it--filter out very quickly. For example, if you and I were both 3rd cousins and 4th cousins, the difference in the amount of expected DNA (theoretically) would be only about 0.196% (call it about 14cM) than it would be if we were 3C only. That drops in half with the next generation, our kids, assuming no further pedigree collapse. By our grandkids, the amount is too low for our standard tests to reliably measure, around 3cM.

We have some pretty good, peer-reviewed analyses of the likelihood that two cousins will share any detectable DNA (Henn at al., 2012; Caballero, et al., 2019). For 7th cousins, that's about 1% of the time; 7C1R, about 0.57%; for 8th cousins, about 0.29%. By those numbers, we'd need to test around 345 8th cousins to find two who had matching, measurable, DNA.

But wait. There's more math! How great is that?!

Those percentages denote any matching DNA. Not the same bit of DNA that's identifiable as having come from the same, identified ancestor. Let's round up the 8th cousin matching and state it as a probability: 0.003. Because any two children of the assumed MRCA would share only about 50% of the same DNA, we can basically divide that probability in half to arrive at an approximation of the probability that any two 8th cousins will share some of the same DNA that originated from the same identified ancestor: 0.0015.

This next bit will not be accurate...because, biology! I've tried off and on for about three years to interest any of several well-known bioinformaticians to take up the challenge, but no luck so far. It's complicated. There's more in play than just straight math, but that's all we have to work with for now. Remember our "independent event" where the outcome of one event doesn't have any effect on the outcome of another? To arrive at a probability involving something like a throw of the dice, we'd simply multiply the separate probabilities.

That would mean the probability of finding three 8th cousins who all shared the same measurable segment of DNA that came from the same ancestor would be somewhere around 0.0015x0.0015. That's a sort of staggering 2.25x10-6...or a 0.000225% chance. The reality could easily be a factor of magnitude higher, but that would still mean we'd be testing over 44,000 8th cousins to locate three who meet the criteria. Not exactly lotto-winning odds, but not great. 

It's such a humongous outlier that, if we think we've found it, Occam's razor would tell us that the proposed solution to the DNA sharing that's being displayed is far, far too complicated to be the best explanation. Part of the scientific method for any good lab experiment is to actively, diligently work to disprove the hypothesis. That's the only way to try to eliminate confirmation bias and to gain confidence that the hypothesis will stand up to rigorous scrutiny.

Unfortunately, popular genetic genealogy has led us to believe that it works the other way around: establish the hypothesis (A and B both descend genetically from C) and then set out to "prove" it. And that's also an unfortunate disconnect: genealogy uses the word "proof" liberally, as in the Genealogical Proof Standard, and that was an established practice long before traditional genealogy ran headlong into the biological sciences and bioinformatics a couple of decades ago.

On the science side, there is really no such thing as "proof" (as I've used recently, that's why Einstein's theory of general relativity is still labeled a theory even though it's been tested time and time again since 1911). In the physical and life sciences you can't "prove" a hypothesis, you can only disprove it.

For genetic genealogy we should be working more like a scientist and less like a genealogist. Gather all the data possible that might be germane to the hypothesis; then try to gather more data; then work to disprove the hypothesis, i.e., find other explanations for the data; and be sure to attempt to reconcile all data that's still left hanging, at least where it's possible to do so.

What I see far more often with autosomal triangulations is that the process starts at the wrong end of the proposition and stops when any data at all seems to present itself as a possible correlation to the hypothesis. E.g, I think A and B both descend genetically from C. I go run a triangulation at GEDmatch. Lo and behold, even though GEDmatch has a database only about 6% the size of AncestryDNA's, I find a bunch of reported possible triangulations that include that segment and, oh look!, there's B right there among the 28 other people listed for A as mutually sharing that segment. Done! I've "proven" that A and B descend C with a triangulated segment of DNA.

But, nope. Nothing "proven." In fact, nothing even analyzed yet. Apparent correlation does not equal causation. Takes a lot of detailed work.

I'm here through the weekend with two shows Friday and Saturday night. Please be sure to tip your waitstaff. They work hard and your gratuity is a big part of their income. Thank you all! You've been great audience!

<cough cough> laugh

Ha, thanks for the prolific response! I give.

As to my my sister…she is half, sharing 29% and 59 segments according to Ancestry. And I use her 54cM shared segment on CHR5 to phase my matches for this group.

As to crossovers, I’ve only tried to understand/locate those that delineate my grandparent contributions….a good check on my MRCA attributions.

So I agree with your Wonky assessment after reading your post.  Just makes me want to know what happened even more! Sometimes it’s the unique artifact that brings discovery.        And sometimes not….it may be as simple as the 34 cM length attribution of MatchA  was an artifact of the centromere location.??  I have another location where all the segments seem ambiguously lengthened to extend through a pileup area.  I have a crossover in there somewhere too.  One of what I thought was one of my best matches for a time ended up being a CHR9 pileup match..no wonder I couldn’t understand how we related.

And yes, losing our chromosome browsers will make me feel blind.  But we need people to be safe.

Anyways, I really appreciate your time on my problem and the education I received.

Oh…and your profile is a hoot. Enjoyed it….despite all those words.

Ann
Just an UPDATE:

I obtained a GEDmatch number for MatchB.  At default settings, GEDmatch broke the segment into two segments, 12cM and 6cM.  Significantly, they triangulate with my sister as I would expect.  This is consistant with what you all explained about Ancestry’s shared match filtering, and a possible explanation for why my sister does not triangulate with MatchB at Ancestry like we would expect.

Secondly, 5/8 great grandparents of MatchB are Danish lines with no American ancestors in which I would expect any near cousin commonality.  The Sherman line in question is one of the remaining 3.

Thirdly, I went to examine a few other matches in my MyHeritage TG and found two that were close cousins of MatchA, a 1C2R and a 1C1R that were 2C1R to each other which share his Sherman tree.

Unless someone gives me a reason not to, I will mark and document this very distant 7C  Sherman TG as Confident.

Thank you all for your excellent help!
+9 votes
I'm not a DNA expert but I have learned that:

Not every segment of our DNA is considered, only a small percentage that is deemed relevant to the goals of the testing.

Different DNA testing organizations consider different segments than other companies, with some overlap and some different segments.

So, the match results may differ some between different testing companies.

Further, if you only test with one company (e.g. Ancestry), but then port your DNA over to another company site (e.g. MyHeritage), you will get fewer matches on MyHeritage than if you had originally tested with MyHeritage. Your ported Ancestry DNA data will be missing some segments used by MyHeritage, and the others who tested with MyHeritage will be missing some segments that you have in your Ancestry DNA port.
by Joe Murray G2G6 Mach 8 (84.9k points)
Thank-you.  Yes, I tested at Ancestry and uploaded to MyHeritage, then sent all my MyHeritage matches to DNAPainter.  I then cross-reference between Ancestry and GEDmatch where I can to get trees and segment locations.

So, does it make sense that Ancestry would not use the middle of Chromosome 5 for Shared matches?  MatchA has a closer MRCA and a longer match with me than MatchB.  And there is a discrepancy in cM for MatchA - 36.5cM segment at MyHeritage and 22cM shared at Ancestry.  Does that support your answer?

Finally, can I still use the TG based on the MyHeritage results?

Wow, Edison Williams has given a great insight below into DNA analysis that seems to explain the discrepancy “in the middle.” Possibly not that Ancestry didn’t use it but that MyHeritage filled it in with an assumption (if I understand Edison correctly)

One more simple thing I can add is that while Ancestry uses 8cm threshold for direct matches, their threshold for shared matches is higher (I think it’s 16cm) so your matching A and B separately may not result in A and B being shared matches with you. I see that a lot.

Ahhhh, thank you for that Joe.  A 16cM threshold for shared matches vs. an 8cM match threshold would explain why MatchB matches me and not my sister as I had expected he should.
+8 votes
The main difference is that MyHeritage and Ancestry each have their own software/programs to analyse the raw DNA test data, and their criteria for a match is different.
 The number of matches is not affected by which company you test with, it is a function of the size of the test database combined with the geographical spread of testers and their ancestors, as a New Zealander of UK origins for at least 10 generations, but with deeper Scandanavian/Western Europe ancestry, my match numbers were only a few hundred different between the two sites, both in the mid 15,000s.
 MyHeritage has been described as having looser matching criteria  and therefor more false matches than Ancestry, however I also find major match discrepancies with Ancestry, particularly with shared matches. If I match person B with shared match C, and I then check person C for shared matches it is nothing unusual to find person B is not listed.
 My most absurd result was a match with person B and C on Ancestry with B and C not a shared match, on MyHeritage they were a shared triangulated match which was not surprising as they were mother and daughter.

 We simply have to make the best of the data we have being aware that the reporting and analysis systems of the various companies differ and the accuracies of the results are overhyped.

 It pays to keep in mind that the smaller the cM matche(s), their is a much higher rate of false matches, with figures given of 50-60% false matches at a 7cM level.
 With an ancestor from the 1600's shared cM values should be low and approached with caution, for my part I'm pleased to find them as it's nice to know that the DNA results are consistent with the researched tree, but it is important to keep in mind the limitations in the current tests and reported results.

 There are also sections of DNA on the tested Chromosomes that are widely shared across regions or populations and are not useful for genealogy as they as passed on  for a large number of generations.
by Gary Burgess G2G6 Mach 8 (88.5k points)
Thanks for the examples and explanation Gary,  I see. Your post prompted me to read about Ancestry’s algorithm TIMBER.  MatchB’s shared matches may have been filtered by TIMBER.  Question is….erroneously?   

TIMBER is used to filter identical segments that may not have a recent common ancestor due to commonality within a community.  The fact that MatchAs raw segment is 34.5cM and filtered segment is 22cM indicates that TIMBER may have played a roll here.

Certainly, this is an endogamic puritan/colonial population.  Possibly, TIMBER excessively filtered because it detected multiple paths to several of the ancestors on this line which is common in my maternal and paternal colonial lines.

Furthermore, what does TIMBER consider “recent” ?  Each match in this TG is a 7C1R to me, pretty far back for aDNA.

Because both matches have strong trees on this line to an identifiable MRCA, triangulate well at MyHeritage, and what I read about TIMBER occasionally over-filtering in cases of endogamy,  I think I can argue that this is a case of excessive filtering.  ???

Related questions

+12 votes
3 answers
+13 votes
0 answers
+8 votes
1 answer
+7 votes
1 answer
149 views asked Jan 13, 2014 in Policy and Style by Living Sherman G2G Crew (590 points)
+2 votes
0 answers
+5 votes
2 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...