Wikitree Statistics - November 2022

+75 votes
1.3k views

I have been tracking several statistics that approximately represent the quality of the Wikitree database.  Following is a summary as of November 2022:

Overall status:  32.3 M total profiles; 85% are connected; 33% have DNA links.   

Profiles with known consistency issues:  96,500 (down 200 since June).

Sourcing:  about 18% with 3 or more sources, 37% with 1-2 sources, 10% poorly sourced, 24% unsourced, and 11% unavailable

Undated profiles:   444,000 (down 12,000 since June)

Compared with June 2022, there are 1.6 M more profiles.  The estimated fraction of profiles with 1 or more sources is about 55%, down slightly from the June estimate but the same within the estimate uncertainty.   

A Free Space page is available with graphs, historical data and technical details.

WikiTree profile: Space:Wikitree_Statistics
in The Tree House by Paul Gierszewski G2G6 Mach 9 (90.1k points)

11 Answers

+37 votes

Here is a graph summarizing the sourcing history.

by Paul Gierszewski G2G6 Mach 9 (90.1k points)
+22 votes
Fantastic, thank you for doing this, Paul!
by Elaine Martzen G2G6 Pilot (175k points)
+29 votes
It appears that the scores of Wikitreers that participate in the various sourcing events (Monthly Sourcerer's Challenge, Saturday Sourcing Sprint, USBH Sourcing Challenge) are making a big difference from the looks of Paul's graph. Kudos!
by Nancy Thomas G2G6 Pilot (210k points)
+20 votes
Thank you for gathering this info for us.  I do find the unsourced numbers depressing.
by Nan Starjak G2G6 Pilot (385k points)
I don't see any reason to be depressed. The total number of profiles is increasing, but the number of unsourced profiles is not.

Also, there is an optimistic way to look at unsourced profiles. In the future, our descendants will still have access to census records, etc. They won't have access to the memories of our loved ones. If someone passes on without sharing their "unsourced" recollections, they may be lost forever.

Paul is determining 'unsourced' profiles using BioCheck random profiles, I think.  Using the Unsourced count of profiles from any Wikitree + report is not accurate because the 'majority' of profiles that we come across do NOT have the unsourced template or unsourced category on them, so they are not included in any unsourced report that is generated by Wikitree +.

Unsourced count isn't decreasing because more sourced profiles are being created. It is partially staying the same because there are so many people that are 'sourcing' the unsourced profiles that are not identified. BioCheck identifies some of them and random discovery while doing other work on Wikitree is finding the profiles with no sources.  Pip even mentioned this in his Weekend Chat.

I come across lots of those when I'm wandering around WIkiTree. No sources, no headings, no formal indications that they were unsourced. I think that the official number of unsourced profiles on WikiTree is way below the actual number.

Many profiles have been loaded over the years with no sources and they did not all come from gedcoms. We see new profiles every day that have no sources on them.  

Yup, and tons of recently added profiles from the 1700s on where the only "source" is an Unsourced Family Tree. Yeah, I know that meets WikiTree's requirement to be counted as a sourced profile, but no even semi-serious genealogist would find it palatable.
I agree Stu. We are not seeing the "Real Numbers"!
To be clear, the numbers are NOT based on BioCheck, and are NOT based on profiles with the Unsourced category.  There's a more complete description in the Free Space page, but basically I manually look at each of number of randomly selected profiles and evaluate the types of sources listed. I do not count a "family tree" as a source.  So it's certainly only an estimate, but it is an estimate of the "real numbers" across the whole tree.

"Unsourced" recollections are not worrying. In fact it's doubtful whether they really count as unsourced, if they are within the time frame of recollection of the contributor.

And if an "unsourced family tree handed down to me" is really that - something stored in a family bible or something like a tree collated by great-aunt Jemima from her memory and consultations with uncle Homer, there is also some source value in it.

The problem is that in most cases the "unsourced family tree handed down me" is a cover phrase for "I copied this off a tree at Ancestry / MyHeritage / FamilySearch / the Internet". Could be based on something that was solid genealogy in its first incarnation, could be based on a heap of same-name misunderstandings, could be based on pure myth. The contributor probably doesn't know.

Eva, I agree that if the "unsourced family tree" is for events within the lifetime of the contributor, or perhaps the lifetime of someone the contributor could have known, then it's a valid source; except that when it's simply documented as "unsourced family tree," how are you or I to know whether the creator of that tree had personal knowledge of the events, or had heard about them from the lips of someone who did?

In my own case, there are a very few places where I have relied on otherwise unsourced materials from family members. But in every case, I've documented this as "family history notes created by," followed by the name of the person who created the document. In a few other cases, I've used something like "event related by," followed by the name of the person who gave me the oral information. But in both situations, most of the things I've documented in these ways have been supplemental to the basic BMD information, for which I have always at least tried to find documentary sources.

In so many cases, "unsourced family tree" profiles could have been tremendously improved with only a very little effort on the part of the profile creator. I'm currently working on a profile for a man with the dates 1827-1899. The profile was created in 2020 with "unsourced family tree" as its lone source. But there exists for this man an Ancestry.com profile with no fewer than nine citations—six vital records, a city directory, a census, and a published family history. How much better the WikiTree profile would have been if the creator had investigated those sources and cited them here—or at the very least, included a reference to this well-sourced Ancestry profile (which I have now added, along with other sources)? I'm assuming, of course, that the Ancestry profile pre-dates the WikiTree one, but even if it does not, a little time searching with FamilySearch or Google would have unearthed the same sources that the Ancestry profile creator used.

Exactly:  My genealogy adventure (4-5 years so yes a novice) started from a simple query about a family member that led to an Ancestry source of a whole heap of related people - done by a respected genealogist so a good start - lots of census references, etc. Next I took a paid subscription to a different company and DNA tests from another; Now I have Gedmatch, LDS, Wikitree and a second subscription - to get their data sources and an off-line package that enables research into new trees without cluttering all the others.  Great except: I now have 11,000+ "matches" just to my DNA from one, 3000 from another, and about 50 from the original DNA test company so quite overwhelmed with inputs but nothing that has solved the original challenges , also similar numbers for two other relatives DNA tests.  Now here's the problem - each of the commercial companies and LDS send me messages that say "here is a match" that on inspection is revealed to be extracted from one of the other two by someone copying my original input/photos.  Where I spot the duplication I can ignore that match but sometimes the match link adds the odd cousin/spouse/dates, etc but how can I trust them as sources when clear discrepancies come to light (particularly BMDs) - I don't have the time or resources to validate so being a good Wikitree'r  cannot provide an assured sourcing, yet that (potential) profile might just be the one that solves mysteries for many and prompts someone who the time to document their more erudite research for the benefit of us all.  There is a period (100-150 years) where, allowing for mis-recording and subsequent transcription errors, there is usually a surfeit of records that would assure a particular fact.  Further back in time (with certain notable exceptions) we are bordering on mythology - as is well exploited by certain companies!  Yes we are all descended from a long list of historical figures, recorded sailings, perhaps even Adam and Eve by some accounts.

Conclusion:  Unsourced material is a rich diet of ideas that may prove indigestible if one over-indulges or ignores the health warnings.    Ah well! Seasons greeting to you all...
+23 votes
These are just fabulous statistics.  I especially like that there are 6 years of data on a number of the statistics on the space page.  Indeed, WikiTree is growing and maintaining accuracy. It's so rewarding to see that the efforts to get dates and sources on profiles is noticeably working.

Thank you for all this great information and excellent work!
by Kathy Zipperer G2G6 Pilot (477k points)
Kathy, keep watching those no date profiles. I am adding more to that list everyday. :) While working old
GedComs.
Loretta -- I know.  It's just great progress! I haven't worked too many GEDI profiles recently, but I'll get back to it.  Trying to get my watchlist under control from the last connect-a-thons.  It just ballooned.
I have a long way to go on the GedCom file I chose, I figure it will take another few years.

GedComs can be very rewarding; but they can be very stressful!  lol
+15 votes

Thanks, Paul! 

Our Friday Date Night participants will be excited to see the progress you've posted here. :-) I have some more stats on that here. (They're due for an update, which I hope to get to today.

by Julie Ricketts G2G6 Pilot (489k points)
+14 votes
I appreciate you doing this so thank you Paul.
by Kathy Nava G2G6 Pilot (311k points)
+13 votes
If the number of undated profiles is done strictly by using the suggestion numbers, that will also be a lower number than the actual number because Suggestions are generated on mostly 'active' profiles and some 'inactive' profiles.  The majority of the profiles on wikitree are never reviewed for Suggestions or the Suggestion process would take forever to run.

Using wikitree + with a b0 d0 query for a location will show more profiles than simply the specific suggestion numbers for the same location.  Many of us are using that process, also.
by Linda Peterson G2G6 Pilot (787k points)
Thanks for sharing that tip, Linda!
Thank you, Linda, I didn't know that!

The statistic tracked here is the sum of Suggestions 131-134.  As the Wikitree+ Help page states, these are based on all Open, Not Living profiles (not "active" vs "inactive").  If you use Wikitree+ to look at all the profiles, open or locked, the number without birth or death dates are as follows as of today:
Open (white) - 449,555 [as per Suggestions report]
Green - 226,950
Yellow/dark Yellow - 627,026
Orange - 69,013
Red - 148,337
Black (unlisted) - unknown

Paul, I am only stating that Suggestions Report does not include all profiles every week. Active profiles and some inactive profiles will generate suggestions, but many inactive profiles do not generate Suggestions each week. Ales has stated It would take too long if they were generated for all profiles every week. There are many profiles seen with a b0 d0 query that do not have Suggestions.
Can you explain in more detail how to do this?
Run this query will start wikitree +, hit the Blue Get Profiles button, it will return all profiles with a Utah location that has no Birth or Death dates. You can replace Utah with any US state or Country.

 https://wikitree.sdms.si/default.htm?report=srch1&Query=B0+D0+utah&MaxProfiles=500&PageSize=-1

Edited - this returned 136 Suggestions 131-134 for 214 profiles.
+10 votes

85% are connected; 33% have DNA links.  

This is outstanding, Paul! I know you said 24% are unsourced, but that just challenges us to find ways to source them. laugh

by Mindy Silva G2G Astronaut (1.1m points)
+8 votes
Thanks for this data, Paul! I'd love for WikiTree to find some new and innovative ways to improve overall quality of profiles now that we're getting closer and closer to finishing cleaning those original gedcom uploads. I know they were a big part of the problem.
by Emma MacBeath G2G Astronaut (1.3m points)
+9 votes
I was just discussing the quality of wikitree yesterday. I think creating a profile with one basic fact is worthwhile. it gives others a place to start. I do a lot of sourcing and try to make each profile useful to the next person who looks at it.
by Nancy Wilson G2G6 Pilot (148k points)

Related questions

+38 votes
5 answers
486 views asked Nov 19, 2023 in The Tree House by Paul Gierszewski G2G6 Mach 9 (90.1k points)
+50 votes
5 answers
544 views asked Nov 12, 2021 in The Tree House by Paul Gierszewski G2G6 Mach 9 (90.1k points)
+93 votes
9 answers
1.6k views asked Jun 10, 2022 in The Tree House by Paul Gierszewski G2G6 Mach 9 (90.1k points)
+36 votes
4 answers
347 views asked Jun 9, 2021 in The Tree House by Paul Gierszewski G2G6 Mach 9 (90.1k points)
+32 votes
2 answers
412 views asked Nov 4, 2020 in The Tree House by Paul Gierszewski G2G6 Mach 9 (90.1k points)
+63 votes
10 answers
698 views asked Jun 11, 2020 in The Tree House by Paul Gierszewski G2G6 Mach 9 (90.1k points)
+81 votes
7 answers
626 views asked Nov 8, 2019 in WikiTree Tech by Paul Gierszewski G2G6 Mach 9 (90.1k points)
+29 votes
2 answers
285 views asked Jun 21, 2019 in WikiTree Tech by Paul Gierszewski G2G6 Mach 9 (90.1k points)
+12 votes
2 answers
+54 votes
6 answers
460 views asked Nov 9, 2018 in WikiTree Tech by Paul Gierszewski G2G6 Mach 9 (90.1k points)

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...