How many fourth cousins do I have?

+15 votes
936 views

Well, obviously, if I don't know, neither will anyone else.  This morning, based on another thread, I tried to estimate how many I have.  I developed a table which I hope I can paste in here, but at least I will provide a link.

Mostly, I had to estimate, but for much of the first generation--my third great grandparents--I knew how many children they'd had.  For some of the following generation, I also had an accurate listing.  For the others, I used an estimate of average family size in the U.S. over time that I found on-line, but it was not altogether helpful and I'd like a better source.

First discovery:  I may have around 14,000+ fourth cousins.  No wonder I haven't found nearly all of them, even though I have an Ancestry tree of over 10,000!

Second discovery:  It makes a huge difference how many children there were in the first couple of generations (pretty self-evident once I thought about it).  My calculated number of fourth cousins ranges from over 2,000 in my Cecil line to only 144 in my Glenn line.  No wonder I have so many Cecil matches!  No wonder I've had such a hard time with the Glenns!

Observation:  Doing this analysis can be a helpful tool for focusing DNA research.

Questions:  Can anyone point me to a better source for estimating family size?   Can anyone help me refine my table? I'm not at all sure all my assumptions are the best way to analyze the data (being only an accountant, not a geneticist or mathematician).

Table is at: https://www.wikitree.com/photo.php/c/c2/Chromosome_Mapping_Examples-4.pdf

in Genealogy Help by Living Kelts G2G6 Pilot (555k points)

Check the ISOGG table — the 4th table on this page: https://isogg.org/wiki/Cousin_statistics

Barry, why do you think the numbers vary so greatly from mine?  Where is the flaw in my logic?
Love, love this question! It's because of 4th cousins in DNA tests that I can trace back to the parents of my great-great-grandparents. It's highly fraught with challenges;  it's teeming with possibilities. I am always reworking my system to correct errors but very exciting. My latest "4th cousins" project is utilizing Genetic Affairs for clusters.
Just played with that new tool yesterday Maggie for both me and my husband.  Some really good new leads for breaking brick walls !  I love it ! For like .25 cents I can do more than could do with 20 hours of speadsheet matching cm's and tree investigation.
There is no way of knowing that. But to give you a scenario...you have 36 Grandparents at 4th cousin level, if I am correct.
Well, first of all, I disagree that there is no way of knowing that.  Should I have the time, descendant tracing can at least give me some idea.  

We have four grandparents, 8 great grandparents, 16 second great grandparents, and 32 third great grandparents (whose descendants, other than the descendants of the second great grandparents in that same line, are our fourth cousins, if they are in the same generation as we are).  Just multiply by 2 for each generation.
Yes.  Thanks.

4 Answers

+8 votes
 
Best answer

The table I mentioned in the comment comes from page 7 of this paper:

https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0034267&type=printable

You can check that the value in that table for the number of nth cousins of an individual is 

2n (2.5n (2.5-1))

(and then rounded). This formula is Equation 1 on p.11 of their paper. The 2.5 in this formula is the author's estimate for the number of offspring in each generation.

Your version of this computation for fourth cousins is approximately the formula

24 (4 x 4 x 3 x 3 x 6)

 So the difference comes from comparing the stuff in parentheses in the two formulas:

2.5 x 2.5 x 2.5 x 2.5 x 1.5     vs    4 x 4 x 3 x 3 x 6

The 6 in this latter formula is the average of the numbers in your column 5, with one subtracted. This is the average number of children per 3-great-granparent pair, excluding the child who is your ancestor. You perceptively noticed that this child would give you third cousins rather than fourth and eliminated them at the end. But you could just have well reduced the numbers in column 5 by 1, and that is what the authors do in their formula.

So the entire difference comes from the authors' estimate of 2.5 children per generation versus your much higher numbers.

Why did they use such low numbers? They are trying to estimate population averages and cite two papers that suggest an average value that should about 2.1 children per household for Europeans and East Asians. Their paper uses data from Ashkenazi Jews, who tend to have an average more like 3. So they used 2.5 to split the difference.Your initial column with an average of 7 children is much higher than that. You are not a population average, but an individual person, and it appears that your third-great-grandparents were exceptionally good at having lots of children who made it to adulthood, compared to the average. And as you noted, the more people you have in the earliest generation, the bigger an increase you will see. I wonder, do you have more matches than most people on the DNA sites?


You are using 4 and 3 as averages for lower generations, which still are much higher than the value 2.5 that the authors used. One thing that you need to incoporate: it is not the number of people in a generation that matters, but the number of people with children who have survived to the present. Even if a sibling of one of your great-great-grandparents had grandchildren, if none of those grandchildren had offspring, then you get zero cousins from that sibling. So you need an estimate of the number of people in each generation who don't have living offspring, and you need to remove them. That's very hard to do. I have found several times tracing down a collateral line from my own ancestor that it seems to leave no further descendants after 3 or even 4 generations. You can't count any of these people in your figures.

It would be very hard to estimate how many people have no living descendants directly. The estimates from the paper do something else. You can imagine it this way: eliminate from the human family tree any person with no living descendants -- call this the "tree of success". Now count people in the tree that remains. If we assumed that every couple had two descendants in the tree of success, then they only manage to replace themselves, those children only manage to replace themselves, etc., and you'd see constant population size. If every couple had three descendants in the tree of success, you'd see a population growth rate of almost 50% per generation. The statistical models cited show a world population growth rate of anywhere between 0.7% to 7% per generation (assuming 30 year generations), which leads to the value of around 2.1 for the estimated number offspring each couple has in the tree of success.

The 0.7% and 7% figures seem low to me, so I'll have to investigate them more. And I don't know why they need those complicated statistical models rather than just looking at estimates of world population itself. But the 50% is definitely too high, so you shouldn't be using factors larger than 3 in your computation if you are trying to figure out what happens on average, and not just in your particular case. 

 

by Barry Smith G2G6 Pilot (302k points)
edited by Barry Smith
Yes I am a member.  Not finding my Hannah in there though .  Lots of other Hannah's with the wrong spouse and wrong kids and wrong dates :)  But thank you for asking !
So just switched to my husband's Ancestry matches to look he only has 89,501 and only 3347 are 4th cousins or closer,  almost half mine.  Some of his lines had small families.
Loretta, my guess is that both of you are way above average.  As I think I said somewhere earlier, a lot of it depends on how recently your ancestors immigrated to the U.S.  

Other factors include, of course, family size, and in my case, the number of family genealogists among my relatives who have tested multiple family members.

Also, endogamy in your various family lines.

For identified matches, the number is greatly affected by published genealogies.  For example, the Austin Families website.  As another example, I am descended from Moses Cleveland, and [what I call] The Cleveland Genealogy traces generations of ancestors.  That allowed me and probably countless others to trace our ancestry all the way back to the 17th century.  I figure I have nearly four million distant Cleveland ancestors, and it's possible that over 8,000 have tested.  (These are old calculations that I haven't reviewed in a while; many of those identified matches came from what I used to call "tree matches" on Ancestry...).
Sounds like you're probably closer to reality on your overall number, but of course even a good estimate of this nature - even with excellent underlying assumptions - can be wildly inaccurate. I have taken a crack at this myself, for my Standley 1/16 - I have an Excel workbook that I use to track it.

More recently, I got to wondering why the heck I don't have more matches from my Baxter 1/16, and the answer seemed to lie in the whole proliferation pattern being completely irregular. Being on WikiTree now, and noticing that our descendant list feature is a really great way to visualize what's going on, I have taken to adding everybody out to my dad's 3rd cousins. So far I have all the grandchildren on there (51 of them), have carried the descendants of half of his children forward to my father's generation, and gotten mostly or partly through most of the rest.

The big picture is that of the 51 grandchildren, 20 are known to have no biologically-related descendants (one adopted, one was adopted). Another 10 I have carried forward to my father's generation, with a total of 8+1*+7**+11*+3*+18*+1+10 +6+5 = 70 gt-gt grandchildren (that's in my father's generation) who may themselves have living biological descendants. (A * indicates my father's 2nd cousins, and ** is his own branch). That leaves 21 unknowns (at least two of whom I don't think had any kids, but I'm not sure).

Nonetheless, I can use that average (those 10 grandchilden with known descendants averaged 7 grandchildren of their own) and estimate that there were perhaps (10+21)*7 = 217 gt-gt grandchildren (But I'm applying the 7 to all the unknowns, some of whom may not have had children, so this is probably an overestimate). 40 of those 217 are my father's 2Cs and closer, leaving 177. My 4Cs are the next generation, so maybe about 400 of them, on my Baxter 1/16.

On the aforementioned Standley 1/16 I have 100% of my grandfather's generation researched, and about 85% of my father's generation. So I know there are about 600 in my father's generation, 170 of whom are 2C or closer to him, leaving 430. So maybe about 1000 4Cs on the Standley 1/16. This seems to be an especially prolific family (maybe only my Johnson 1/16 is as big).

As far as matches (on AncestryDNA) go:

* On that Standley 1/16 (~1000 4Cs) my brother has 17 matches (I don't match 10 of those). About 1 in 60.

* On that Baxter 1/16 (~400 4Cs?) I have found exactly 2 matches. 1 in 60 would give me about 7, so maybe that 400 is way too high.

* On my Cronin 1/16, I've only ever been able to find about 7 3Cs for my mother, so maybe 20 4Cs for me there, tops. No matches have turned up that I've been able to identify.

* On my Brohan 1/16, I've been able to find a few traces of gt-gt grandma's siblings in the records, but as far as i can tell so far, the number of 4Cs on that 1/16 is a big fat ZERO. I also have ZERO 3C matches on my Cronin-Brohan 1/8, so this really stinks, BTW.

So with some as high as 1000 4Cs, and others practically ZERO, I'd guess I have maybe 500*16 = 8000 4Cs total. Really, most sides of the family seem to be a LOT smaller that the Standleys or Johnsons, so it might really be more like 4000 or 5000.

Frank, I don't have a lot of time to think through your post at the moment, so I'm sorry if I'm asking something obvious here, but when you say you have 100% of your grandfather's generation researched, do you mean that you have 100% of your grandfather's, and his ancestors' in the two preceding generations researched?  Are you certain, or did you find any dead ends in your research?

Julie, I mean that I have found out, to the extent humanly possible, all the gt-grandchildren of my Standley gt-gt-gt grandparents. That's my grandfather's generation (I have most of them on WikiTree).

I suspect I'm missing maybe two or three who died as infants (maybe they'll turn up eventually), but other than that the only real possible "hole" is from the one grandson who the newspapers tell me had some brushes with the law (bank robbery!), and took an assumed name. He disappears from the records - I don't know what happened to him. His step father disappears also - I suspect he got married under a fake name, because he's not in the records BEFORE they got married either.

There was a minor "hole" in my father's generation, but DNA filled it in! A wife and young son in one census disappeared before the next. Unknowingly, I wrote to a DNA match and he said something like, "I can't really help you. My Dad's surname was McCaslin, but was changed when he was a baby, and we don;t know anything about his biological family. His mother left his father, and they changed his surname to his step-dad's surname." He thought he couldn't help but he filled that hole that I may never have figured out!

There may ultimately be a FEW issues with my dad's generation, but I think it'll ultimately be something like 98% complete.

So what I was getting at is that I have a REALLY good idea about how many relatives I have on that side - a virtually exact number at my grandfather's generation, and a really good estimate in my father's generation (with 85% being a solid count). It's a pretty solid data point.

Most of my rambling, I think, was about my Baxters, which is kind of "partly solid, partly something of a guesstimate" data point. I may develop that further in March, but that's what I've got right now, as far as real numbers being handy.
Here are some match stats, BTW, all AncestryDNA:

Me: 33,912 (692 of them over 20cM - so-called "close")

Mrs Me: 34,179 (546 are 20cM+)

My brother: 41,879 (1121 are 20cM+)

His girlfriend: 34,699 (408 are 20cM+)
Thanks, Frank.

Just so that Julie has new bullet-points for her AncestryDNA data:

  • All matches: 151,124
  • 4C or closer (20cM and higher): 5,974
  • "Distant" (6-20cM): 145,150
  • 100cM and higher: 25
And I've seen higher total matches. What may be a minor factor is that mine was an early Ancestry v1 test. Ancestry has only officially announced one official version change, that in April 2016, but there have been several iterations of microarray chips in use along the way. The disparity in same-SNPs tested isn't as great as was the market move from Illumina's OmniExpress to Global Screening Array chips, but the least overlap among AncestryDNA's tests amounted to 64%, 427,858 of 668,942 markers (early v1 to first iteration of v2). All Ancestry matching is imputed against current genotype datasets, so the differences really wouldn't be significant, though.
Interesting, Edison!  If the chip is only a minor factor, what do you think accounts for your large number of matches?  I see from your tree that you don't appear to have any recent immigrants  among your ancestors.  Anything else?

Of that total, how many have you identified?
+10 votes
Julie,

This question is too interesting.   I shouldn't have read it.   I'm trying to finish taxes before we leave on a trip next  week.   Now I'll just be thinking about your assumptions for this estimate.   Hmmmmmm......
by Peggy McReynolds G2G6 Pilot (474k points)
+11 votes
In what ways do you find it helpful?
by Living Ford G2G6 Pilot (162k points)
It shows me why I have so many more matches in parts of my tree than other parts.  I had always had some vague ideas, but for me, this analysis really quantified it.

It can tell me the likelihood of results when I try to research various branches of my tree through DNA, and what I need to focus on.

It shows me how much more work I have to do just to identify my fourth cousins, and identification is a route to further discovery.
I have at least five (5) 4th cousins on WikiTree, as contributing members, or Family/Guest members (although it is the husband of one of them who is actually the contributing member).
+12 votes
Genetic Affairs is a new DNA cluster grouping tool that takes DNA from multiple services and groups them by Matching segments, matching trees and even shows where cousins match more than one way.  I was playing around with it this afternoon it's very cool.  Does what many hours of spread sheets used to do and many more hours of tree research.  If you test at FTDNA you can download all your match cousins.  I have over 13,000 last I looked. My most prolific lines are by far my Caudill and Maggard.  Almost everyone I look at has one in the tree.  This tool brought up some new matches in the lines I am working on that are less documented
by Loretta Morrison G2G6 Pilot (180k points)
The clustering tools identify groups of shared matches, but do not identify any particular match specifically, nor identify the degree of relationship.

Well that is the cool thing this one does all that and more and even lists above all the clusters which ancestor you go back to.  And maps the whole thing on a tree if you like.  With Cm's  per match and everything.  Explanations here by Roberta Estes of DNA Explained Genetic Affairs new cluster tool

Loretta, it took me a while to refresh my memory about Genetic Affairs.  It was discussed in this thread about six weeks ago:

https://www.wikitree.com/g2g/968686/how-i-used-dna-to-find-my-cousins-great-grandfather?show=969237#c969237

I signed up for Genetic Affairs.  Here is what I said then about the autotrees:

It took me a while to figure out how to get the autotrees.  When I finally did, they were a disappointment.  For my big mystery group, they were just a bunch of little fragments.  I had already done much more extensive connection of the various people the old-fashioned manual way (although relying on other Ancestry trees).

For people who have not spent months of their lives analyzing their DNA matches, I think this website offers a good service for a cheap price.  But still, it takes a great deal of study to really understand one's ancestry and one's matches.  While I don't doubt that some day, someone will manage to create a true one-world tree based on genetics, the current products are a long, long way from that.

Well maybe in a few months when you get more matches you could run it again.  What is it about great things in small packages :)  Thing is I have been doing the same work with FTDNA and Gedmatch and then 23&Me by making spreadsheets and researching trees took at least 20 hours for each ancestor or more and see less.  I found the tree by cluster to be helpful and the whole tree.  The clusters by themselves are just names.  Yes most of my work is done with about 16K people in my tree.   But still a few very important brick walls.     Ancestry does not have a chromosome browser so you don't know why you match truly.  Especially those cousins you match more than one way.  And when they give you a clue, it may not be right and is often off of just one persons new theory.   I found a match I had overlooked that is fair amount of cm's in my Morrison line which could help with my brick wall.  Well have a great weekend !
Yes, of course I know that Ancestry does not have a chromosome browser.  When I want to know more about a match, I contact the person and ask him or her to upload to GEDmatch (and offer to help).

Related questions

+8 votes
2 answers
281 views asked Mar 8, 2018 in The Tree House by Taylor Worthington Gilchrist G2G6 Mach 9 (90.7k points)
+4 votes
1 answer
542 views asked Jul 6, 2018 in Genealogy Help by William Arbuthnot of Kittybrewster G2G6 Pilot (185k points)
+9 votes
2 answers
405 views asked Feb 2, 2019 in The Tree House by Jerry Dolman G2G6 Pilot (183k points)
+10 votes
1 answer
218 views asked Sep 14, 2017 in The Tree House by Living Dardinger G2G6 Pilot (445k points)
+6 votes
0 answers
106 views asked Aug 10, 2017 in The Tree House by William Arbuthnot of Kittybrewster G2G6 Pilot (185k points)
+3 votes
2 answers
113 views asked Jun 24, 2023 in WikiTree Help by Liz Kraft G2G Rookie (260 points)
+3 votes
1 answer
117 views asked Jul 12, 2021 in WikiTree Tech by Betty Tindle G2G6 Mach 8 (88.0k points)

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...