Wikitree Statistics - November 2023

+38 votes
484 views
I have been tracking several statistics that approximately represent the quality of the Wikitree database.  I was initially conducting an assessment every 6 months, but find that Wikitree is large enough that the statistics change slowly and am now providing an annual update.  Following is a summary as of November 2023:

Overall status:  36.2 M total profiles; 86% are connected; 34% have DNA test connections   

Sourcing:  about 22% with 3 or more sources, 36% with 1-2 sources, 12% poorly sourced, 22% unsourced, and 8% unavailable

Profiles with known consistency issues:  101,100 (up 4,600 since Nov 2022)

Undated profiles:  418,000 (down 26,000 since Nov 2022)

Duplicate profiles:  3-13% (Dec 2021 estimate)

Compared with Nov 2022, there are 3.9 M more profiles.  The estimated fraction of profiles with 1 or more sources is about 58%, up from the 2022 estimate.   

A Free Space page is available with graphs, historical data and technical details.  But essentially the sourcing review is done by manually checking a random set of profiles and looking at the listed sources.
WikiTree profile: Space:Wikitree_Statistics
in The Tree House by Paul Gierszewski G2G6 Mach 8 (89.9k points)
Thanks so much Paul. Always very interesting!
May I ask from where you are drawing the total number of unsourced profiles? If I recall correctly, to get a percentage you must have had a total number to start.

The reason I asked, is depending on the source of the number, it might be off considerably. For example, if one looks at the profiles on the unsourced list for Washington state (https://www.wikitree.com/wiki/Category:Washington%2C_Unsourced_Profiles), it contains more that then unsourced just for that state. It has profiles where the person was born, married, or died in another state that has a Washington county in a location field. For example, https://www.wikitree.com/wiki/Beaty-2235 appears on the Washington State unsourced list and the unsourced list for the State of Virginia.

Simply trying to understand this all better...
It's explained in a bit more detail in the linked free space page.  But basically I randomly sample profiles from across all of Wikitree, and then look at their sourcing.

5 Answers

+24 votes

This is a historical view of sourcing in Wikitree.  Kinks are due to the statistical nature of the estimate.

by Paul Gierszewski G2G6 Mach 8 (89.9k points)
That is a fantastic statistic, from 5M to 20M that are sourced in the last 6 years! Great job everyone!
+19 votes

This figure shows the growth in Wikitree over time.

by Paul Gierszewski G2G6 Mach 8 (89.9k points)
edited by Paul Gierszewski
If onward and upward were a graph.
Is there a way to make the images larger? I cannot read the text.

You can click on this link to the original figure and should be able to expand it as needed.

https://www.wikitree.com/photo/png/Wikitree_Statistics-5

+22 votes
Has anyone collected the data for the monthly total of 100 contribution badge winners and1000 contribution badge winners over the years? I think the sum of the two is a good measure of "active" contributors. It would be interesting to see how (if?) that has grown over time.
by Chase Ashley G2G6 Pilot (314k points)

I have not tracked this so will defer to others.  I know you had provided info on this up to 2019, but don't see the info posted for later years. https://www.wikitree.com/g2g/1000983/number-of-100-contribution-badge-winners-over-the-years?show=1000983#q1000983

Haha. I forgot that I had previously checked that. I just did the calculations and added them as a separate post: Number of Active WT Contributors by Year

+7 votes
It would be interesting to start tracking also Missing locations. If you are interested, I can provide you with a lot of historical data. Some is public in WT+ and I have some tables to get more of it.

https://plus.wikitree.com/default.htm?report=stat1&dataID=25&Year=0

https://docs.google.com/spreadsheets/d/1LwVViBHAseZGUUr16M3hN3N8nItFMoTdZSQ2IBhPxik/edit?usp=sharing Rows 6-8
by Aleš Trtnik G2G6 Pilot (811k points)
I agree that having locations is important for quality, and so worth tracking.  Your weekly report tracks multiple measures related to poorly defined locations, but I think in the interest of conciseness it would be best to focus on those with no location.  

Your Google table lists counts of profiles with birth locations, so would the number without a birth location just be the active profile count minus this value?  E.g. 32,826,375-24,806,818 = 8,019,557?  Similarly for no death locations?

Question then is if I try to track profiles with neither birth nor death location, as I don't think that is reported.  I could use "No birth location" as a general proxy for location quality.

Here's some quick graphs of Profiles with Missing Birth Locations and Profiles with Missing Death Locations, over time, from Suggestions report, as absolute count and as %.

+1 vote
Thank you for these statistics. We have made great progress in some areas.

However, I am a bit dismayed that 22% of the profiles on WT are unsourced. Can that be correct?

Thank you.
by Laura Ward G2G6 Mach 4 (46.4k points)
It's a statistical estimate, not an actual count, so might not be correct.  But I think it's accurate within about plus or minus several %. I think we need to be realistic about where we are, so that we know what needs to improve.  I would also note that an unsourced profile is not necessarily incorrect; these are at least a framework for adding in sources.
Additionally, it is very important to remember that there are an amazing number of profiles created by gedcom years ago, that once abandoned, hit the data doctors suggestion lists with location issues. I primarily work two states and can say that when I work on the those abandoned profiles to fix locations,  I add the unsourced tag to about 80%of them. These are generally  profiles from 2010-2014. And we're making headway on those like in the Appalachian group challenge this month. I know there are other challenges that also see the unsourced tag with a state associated and work on those.  But it's like any housekeeping, everything might be good one week, but by next week and the dust is back there are hundreds more.

Related questions

+75 votes
11 answers
1.3k views asked Nov 11, 2022 in The Tree House by Paul Gierszewski G2G6 Mach 8 (89.9k points)
+50 votes
5 answers
544 views asked Nov 12, 2021 in The Tree House by Paul Gierszewski G2G6 Mach 8 (89.9k points)
+93 votes
9 answers
1.6k views asked Jun 10, 2022 in The Tree House by Paul Gierszewski G2G6 Mach 8 (89.9k points)
+36 votes
4 answers
347 views asked Jun 9, 2021 in The Tree House by Paul Gierszewski G2G6 Mach 8 (89.9k points)
+32 votes
2 answers
412 views asked Nov 4, 2020 in The Tree House by Paul Gierszewski G2G6 Mach 8 (89.9k points)
+63 votes
10 answers
694 views asked Jun 11, 2020 in The Tree House by Paul Gierszewski G2G6 Mach 8 (89.9k points)
+81 votes
7 answers
626 views asked Nov 8, 2019 in WikiTree Tech by Paul Gierszewski G2G6 Mach 8 (89.9k points)
+29 votes
2 answers
285 views asked Jun 21, 2019 in WikiTree Tech by Paul Gierszewski G2G6 Mach 8 (89.9k points)
+12 votes
2 answers
+54 votes
6 answers
459 views asked Nov 9, 2018 in WikiTree Tech by Paul Gierszewski G2G6 Mach 8 (89.9k points)

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...