Questions about DNA and Clarks in NC

+2 votes
234 views

Long question with background.  I am looking for the father of Isaac Fleming Leger (Leger-5095).  His unwed mother listed Ike's father as John Roberson, Robertson or Robinson.  We have never known with any certainty that a Robertson was Ike's father.

A recent Y-DNA (Big Y) result (haplogroup I-FT133405) of a direct Leger male matched exactly with a Clark.   So I am trying to find who Ike's father might have been.  I searched the 1880 census and it shows that two Clarks lived very close to Ike's mother.  They were Abraham Lincoln Clark (Clark-2215) (he would have been 13-14 yrs old when Ike was born; and Nathaniel McKenzie Clark (1849-1910), would have been about 30 years old.  Nathaniel does not have a WikiTree Profile.  

QUESTION

DNA: how different would the Y-DNA haplogroup of (ex.) Nathaniel be from his cousins, grandfather, gr-grandfather, etc.  Would they possibly be in a different subclade (but related to our Y-DNA)?

  • In tracing Nathaniel McKenzie Clark as the possible father, he was the brother of my Y-DNA match's 2nd gr-grandfather.  
WikiTree profile: Ike Leger
in Genealogy Help by Living Leger G2G1 (1.3k points)

1 Answer

+3 votes
 
Best answer

This will be a difficult question to answer with solid specifics, so I'll begin by asking one of my own, and then try to provide some background.

You noted that, "A recent Y-DNA (Big Y) result (haplogroup I-FT133405) of a direct Leger male matched exactly with a Clark." Does this mean that only the (currently) deepest classified haplogroup--and you're correct in referring to it as a subclade--is identical? Or that the Big Y data is showing no non-matching SNP variants between the two test takers?

Since Ike's mother was born in 1858, it's reasonable to assume that his father was born circa 1855, give or take. There's a decent likelihood that--after roughly 3.5 generations, or call it 112 years--to the MRCA, that there may be at least one private variant in the mix for one of them, and maybe even an additional named but non-matching SNP.

But I'll back up a bit. Closing in on the end of last year, FTDNA was nearing 90,000 total Big Y tests processed. Compared to the recent past, that's a huge number. Compared to humanity's global yDNA haplotree...not so much. Meaning that a seeming correlation with only two data points may not be indicating what it might seem to be. Matching results to surnames can be at best a bit iffy especially in certain circumstances. Patronymic naming conventions is certainly one, but another was in 18th and 19th century America where unrecorded adoptions and name changes weren't uncommon.

Before spending much time researching the Clark surname, I'd personally want to see the holistic picture of the Leger descendant's yDNA matches. If a surname is in question, it's often better to look at the proportions of reported matches than just the closest match...assuming those don't align. In other words, if there are multiple matches--not just in the Big Y results but also in the 111-marker STR panel (with a GD of 4 or closer)--evaluate the testers' surnames among the top 25% or 33%. If Clark dominates, then Clark is a good bet to chase. If it isn't Clark, or if Clark is a one-off, then that has a higher probability of being an NPE or undocumented adoption somewhere along the line,

Currently at FTDNA, only two men have tested as being I-FT133405...gotta be your two test-takers. And the estimated years-before-present mean value for the appearance of that SNP is 190 YBP, so call it 1832, with a 68% confidence interval of 1761-1893. We're in the ballpark.

Its parent SNP, I-FT133587, is much older. The 68% CI for it is reported as being 398 BCE to 418 CE. That, combined with the fact that FTDNA is showing 23 additional SNPs cataloged under FT133405 would indicate to me that there are at least a few more branches waiting to be determined, waiting on additional Big Y test-takers in order to allocate the demarcations. Which would also lead me to bet that your two Big Y test takers do have non-matching SNPs.

If you want to take a deep dive into TMRCA calculations with yDNA, FTDNA is basing at least some of its results with the fairly new FamilyTreeDNA Discover tool on this research paper: McDonald, Iain. "Improved Models of Coalescence Ages of Y-DNA Haplogroups." Genes 12, no. 6 (June 2021): 862. DOI: https://doi.org/10.3390/genes12060862.

In that paper, Iain provides an example calculation which shows an average mutation rate of 83 years per identified Y-SNP. Unfortunately, a lot of folks have read that as being the average, not just one value as derived from the variables defined for that particular example calculation. That is not a number that can be applied generally across the haplotree. This can be seen empirically simply in the number of clades currently cataloged under the various basal clades. The tree at FTDNA has (today; it increases monthly; in June 2022 it was 55,989) 67,153 defined branches. Of those, 31,790 are in the R clade; that's 47.3% of all the branches. In comparison, haplogroup I has 10,133 branches, haplogroup E 5,106, and G 1,946. So clearly the same mutations-to-TMRCA can't hold across the entire haplotree.

As Iain summarizes: "For small groups, perhaps up to ~10 individuals, the dominant uncertainties are Poisson uncertainties from the small number of mutations, and the method by which mutations are counted; for larger, older groups, the accuracy of the mutation rates dominates; while for larger but recent groups, a number of second-order statistical and genetic effects become important.... The mutation rate, μSNP∼8×10−10 SNPs per base pair per year, must be definable over all b loci, so further restriction of b may be needed to account for this."

Iain's example calculation also used as a foundation that the average commercial test was obtaining about 15 million callable loci (the value of b), but the Big Y-700 is achieving a higher count than that, up to around 22 million.

The 83-year number can't be taken literally, but given a very broad view of the haplotree it, too, is in the ballpark. If we were simply to apply the 68-95-99.7 "rule" pretending that 83 years is a known median and that we have a constant, even Poisson distribution, a 68% CI would place us at 42 years to 124 years; a 95% CI would push that out to 143 years. If I'm doing a SWAG with no other information to go on, that's what I'll use: the 68% CI of 42 to 124 years.

When dealing with yDNA, the average generational interval--the average age of the parent taking into account his/her age at the time of birth of each child--is somewhat longer than the average. Generally used by Group Project administrators at FTNDA is 32 to 36 years. I usually stick with 32. So still on our SWAG, that would translate--per mutation (i.e., non-matching variant)--to 1.3 to 3.9 generations.

That's super-rough, back-of-a-napkin stuff...done at 15 minutes before last call in a noisy bar. wink But at least it's a place to start.

Circling back from yet another long and rambling post, the FT133587-positive SNP is rock solid evidence that the two men share a common patrilineal ancestor probably in line with that projected ~1850 birth. But in isolation I wouldn't lean too heavily on the assumption that Clark is the correct surname.

I'd want to make certain I included all the data I could from existing yDNA tests. In particular, I'd want to see the entire match list (it can be downloaded as a CSV file if your cousin is willing to share it with you), and I'd want to see what non-matching variants were reported between the two men. Counting those non-matches may give you a better guess for the time to most recent common ancestor.

And if there are few results to compare, I'd want to identify and recruit test-takers from both your Leger side and what you know of the Clark side. Even 37-marker STR tests could help because, if there is an NPE or undocumented adoption, that could tell you.

While there seems to be know Leger or Ledger project active at FTDNA, there is a large Clark/Clarke project: https://www.familytreedna.com/groups/clark. I would encourage your tested Leger cousin to join that project so that you could at least see if/where he is grouped and compare the STR results.

Best of luck with the search! laugh

by Edison Williams G2G6 Pilot (446k points)
selected by Lucas Van de Berg

Dear Edison, thank you for your response, your knowledge of DNA genealogy is impressive and far greater than mine.

My (brother’s) Big Y Test is identical to a Clark, we are both listed in the FTDNA group for Clark.  The other tester does not answer my queries.  There is only one other test at 111 markers, also a Clark.  There are several Clarks amongst the 67 markers, too.   My brother and the identical YDna match are both members of the FTDNA Clark surname group.  They are the two you noted.

Interestingly the Block tree shows only one other surname, with 19 variants, named York ( I-FT115469).  The tester is family finder at 5 steps.   There is a York married to a Clark daughter. Her father is who I believe to be the MCRA to me and the Big Y Clark tester.  (I hope this makes sense.)

Apologies for the late reply. And my knowledge genetics qualifies me only enough to be controversial. Nothing I say constitutes professional advice. wink

I checked the Clark/Clarke project and located the Leger STR test results in "HAPLOGROUP I 12 - BANYON I-FT133587." Interesting way the admins chose to label the groups: all trees...though that one is spelled "banyan." Did I mention I can also be an annoying pedant? And while I'm at that particular pastime, it's a shame that none of the groups were named for the Norway Spruce. That particular species has one of the largest plant genomes ever sequenced: approximately 19.6 billion base pairs, over six times larger than the human genome. If I'm gonna be named for a tree in a DNA project, I'd want to be a Norway Spruce. laugh

Ahem. Looking at the STR results for the Clark and Leger match, I'm seeing a genetic distance of 1 at 67 markers, and a GD of 6 at 111 markers.

That said, STRs can be highly variable, and while the infinite allele method FTDNA uses is well-accepted, it really doesn't take into account the probable mutation rate for any given marker. The generational distance is a rough estimate of the minimum number of generations likely to have resulted in the differences seen.

I mentioned Iain McDonald and his recent paper. He has also been maintaining a record of STR mutation rates specifically within the R-U106 haplogroup. To his data I added, and sometimes modified values, information from Heinila (2012), Burgarella, et al. (2011) and Willems, et al. (2016), to arrive at a guesstimate chart I've used for several years to get a better picture for the projects I manage of 104 of the 111 STRs that FTDNA tests. Here's a quick table comparing the STR results from that Clark/Clarke project group; the "Rank" is, from 1 to 104, where the STR falls from fastest-moving to slowest (the palindromic CDY gets my nod for the fastest and most volatile; DYS632 is the slowest and most stable):

SNP Clark Leger Mut. Rate μ Rank Location
DYS446 13 14 0.002877 #24 38-67 panel
DYS714 24 22 0.007726 #6 68-111 panel
DYS712 24 22 0.016378 #3 68-111 panel
DYS504 18 17 0.006949 #10 68-111 panel


I'd be inclined to think that DYS712 and DYS714, showing a GD of 2 each, might not correctly represent distinct generational events. My guess is that the GD count is more likely to be 5 rather than 6.

However, that's definitely not an exact match, and it should mean that there are also non-matching SNPs reported in the Big Y results. To get a better idea of the TMRCA, those non-matching polymorphisms need to be taken into account.

The recently revised TiP data for predicting TMRCA with STRs indicates, at 111 markers, a 1650 CE median for a GD of 5 (a range of 1450-1800), and 1600 for a GD of 6 (range of 1350-1800). Given the huge gap between the SNP dating currently shown for FT133405 (68% CI of 1761-1893) and its parent clade, FT133587 (same CI at 398 BCE to 418 CE), I think there's a reasonable chance that FT133405 may end up being older than estimated.

Discounting the Big Y data for the moment (it's a shame that there are only three test-takers in your block tree), are there other surnames showing in the STR results, men who haven't taken a Big Y? I'd be checking GD of 3 or closer at 67 markers and 6 or closer at 111. In there, is there any clustering of a specific surname that jumps out at you?

There's no denying that the Clark match absolutely shares a patrilineal ancestor with you. But the uphill battle right now is that you don't have a well-vetted family tree from that test-taker to examine if he's not communicating with you, and with only the two data points there's no other empirical information to go on. If that Clark surname was also an NPE or undocumented adoption, it would derail the search and point it in an incorrect direction.

You noted earlier that, "In tracing Nathaniel McKenzie Clark as the possible father, he was the brother of my Y-DNA match's 2nd gr-grandfather." At the Clark/Clarke project, the EKA for the Clarke match is listed as a William Clark, b. 1740 d. 1818. I did a cursory look to see if we had that William Clark in WikiTree, and the only possible match looked to be this person: https://www.wikitree.com/wiki/Clark-2240. Could that be him? The Ancestry tree cited in the GEDCOM upload is--of course!--private and can't be viewed. Why someone has a tree with almost 29,000 individuals in it and opts to keep it private escapes me.

The father of Nathaniel Clark--if that's the MRCA--would be your match's 3g-grandfather, making the match no more than a 4C1R or 4C2R to you. An STR GD of 5 or 6 at 111 markers  would be rare for a relationship that close.

I suppose all I'm offering is a recommendation to continue pursuing all avenues of data. In experimental physical and life sciences, the constant battle is to avoid confirmation bias. We can't make any hypothesis a favorite to the exclusion of others. In fact, we have to actively work to disprove any hypothesis we're testing: actively seek out anything and everything that might disprove the hypothesis until, in the end, we have a solid preponderance of evidence with no data that negates our hypothesis. The "genealogical proof standard" describes it in a different way, but the result is roughly the same.

All the best!

Dear Edison, I’ll take some time to review your thoughts, they are really outstanding.  And Banyan vs Norway Spruce, how funny.

Thank you for the STR analysis.  It also shows, I think, that testing at fewer markers really is not all that helpful.

FWIW, FTDNA’s Tip Report assigned my brothers with a genetic distance of 1 step with TMCRA born around 1900.  My mother would be shocked to learn she was 30 years older.  (One brother did the 67 marker test while the other did the Big Y.). So I take the Tip reports with a huge grain of salt.

Related questions

+4 votes
4 answers
657 views asked Dec 31, 2018 in Genealogy Help by Jeff Andle G2G6 Mach 1 (12.3k points)
+4 votes
1 answer
535 views asked Dec 8, 2021 in Genealogy Help by Susanne Boudreau G2G Crew (550 points)
+3 votes
3 answers
+2 votes
0 answers
146 views asked Dec 26, 2018 in Genealogy Help by Jeff Andle G2G6 Mach 1 (12.3k points)
+2 votes
1 answer
+4 votes
3 answers
196 views asked Dec 30, 2017 in Genealogy Help by Jessica McCarty G2G Crew (790 points)
+3 votes
1 answer
235 views asked Nov 28, 2017 in Genealogy Help by Laurie Ariemma G2G Rookie (250 points)

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...