Comparison of data between Wikidata and WikiTree

+24 votes
941 views

There are already 5600 profiles connected to Wikidata. It is time to start using it.

I prepared comparison of data for these profiles. I don't think differences should be put in error system, since often data is wrong on Wikidata part. Flagging it as false error in such case is also a problem, since data on wikidata side can change. But I can change my mind.

I prepared lists by first letter of wikidata name except for royals, that are in separated group. I can add more reports based on categories, just let me know.

Edit: I removed the links, since they are no longer valid. See this post http://www.wikitree.com/g2g/295172/wikidata-wikitree-update for new info.


Data from Wikidata can be wrong the same way as data on wikitree can be wrong. I think in case of differences, sources should be find to determine which data is correct. In case of missing data in wikitree, data from Wikidata should be verified and used if sources found. There is also a possibility of wrong connection to Wikidata.

I will also add comparison of relations to the list.

WikiTree profile: Space:Wikidata
in The Tree House by Aleš Trtnik G2G6 Pilot (808k points)
retagged by Maggie N.
Outstanding, Aleš!

I know Robin is concerned about people misusing what you're providing. I'm sure we'll see some members citing Wikidata as the source for an addition or a correction, rather than digging in to find where Wikidata is getting it and independently judging that source. But I think we just need to be clear in the explanations, as you are above, and help make sure that people understand the limitations.

Interesting approach of comparing two wikis. Maybe WikiTree could integrate Aleš work and add a red mark when different...

See Extension:LinkedWiki/lua

 

6 Answers

+11 votes

Magic Aleš

I am still out travelling in Greece and surfing on a Mobile phone...

Next step is that I will batch connect about 16000 names from the list you created with Wikitree profiles that has links to Wikipedia. As all those profiles dont use a template telling this person on Wikitree is same as this person in Wikipedia all items need to be Checked manual.

Then I use a tool for uploading

https://tools.wmflabs.org/mix-n-match/

The basic Idea is if you have a list of people then this tool will help and match... 

Plan is ready end next week...

Interesting to see is https://tools.wmflabs.org/mix-n-match/
and Genealogics another Genealogy site they have uploaded +458000 entries and let the system automatch and/or match manually 

Genealogics

Genealogics person ID

Show: Manual | Auto | Unmatched | No Wikidata | N/A | 458,240 (95.1%) entries to do

 

Maybe an approach we should do with people who has Template Famour or Category House of...

 

by Living Sälgö G2G6 Pilot (298k points)
edited by Living Sälgö
I think i should prepare the list for.mixnmatch.

As i was reviewing wikipedia links, often all relatives have the link to same page. So human have to find correct page if it even exists. Also there are 50000 of pages. I corrected search algorithm of bio.

I would take all profiles connected to wikidata and get 1 or 2 generations from them. That would be cca 50k profiles, that likely have wikidata records.

Also some categories or templates could be added to this list.

At the moment there are 3.2 milions persons on wikidata, so my previous estimation was wrong. I think close to half a million profiles will be connected.
Hm maybe you are right again Aleš ;-)

My experience is that many Wikipedia links are not to people and could be to a relative.... But checking that names are correct and that the Wikidata object is human Q5 would I guess find most of them....

I have seen the some odd Wikipedia articles mixing describing a person and a house which I guess is wrong approach...

The tool mix-and-match has a manual and automatic approach and when I communicated with the author of the tool Manke  the idea was to have one register file per source not like I did with a small test file....
So we can give data only once? Or can it be updated. Our data is constantly changing and i think it should be updated weekly or monthly. I can exclude data, that is already connected if that is the problem.

Can you forward me his email and cc me in future emails about mixnmatch
@Aleš see talk page 11:52, 18 August 2016 (UTC)

https://meta.wikimedia.org/wiki/Talk:Mix%27n%27match#Upload_more_to_a_catalogue

I am still surfing on a Mobile so I have problem finding information but I think one sultion to upload is having a bot doing the work with adding new...
+11 votes
  1. Locations

I guess Wikitree has not set a standard yet but we need to have a naming with country at the end. 

If we compare with Wikidata then you assign as birth location an object e. g. https://www.wikidata.org/wiki/Q437

This object has 

1) labels in more languages 

2) more properties like country P17, located in the administrative territorial entity P131

I guess a good first approach is that we use

Label english , P17

Maybe we should have some administrative unit also like located in the administrative territorial entity (P131)

by Living Sälgö G2G6 Pilot (298k points)

I saw now that we have a special row for country...

As the Wikidata concept is to update from more sources it would be cool to update Wikidata from Wikitree when data is missing...

 

Concept location, wikidata sounds interesting. But wikidata location should be automatically extended by wikitree.

So address would be entered like Some street, Q12345

But with this aproach we dont have country at that time, until wikidata will have that possibility, if they even will.

Update of wikidata is up to them. We could provide them with wikitree data and they should use it. But i would do that when profiles will be corrected of errors.

I believe in linked data and I think one nice approach is to have a location model like Wikidata that as you say we extend. 

Then we could say in the same way we do with profiles that Wikitree Rubén-1 is Q5599

We should be able that location object in Wikitree is the same as Q437. Hopefully we set up a Sparql endpoint for Wikitree and then we could also use the location info in Wikidata to queries that is useful for genealogy....

https://www.wikidata.org/wiki/Q437

Dear Ales, Magnus and others,

I think you are doing fantastic work with D/B errors as=nod Wikidata. I never thought I would see mention of Linked Data and SPARQL on a Gmealogy site!

If you think that there are serious shortcomings with those technologies and their implementations, let me know, as I am plugged into W3C (LD) and OGC (geo-SPARQL) and can help eventually to address strategic issues (so do not hold your breathe!)

Best wishes, Chris
Seems unlikely that WikiTree has any genuine data on notables and aristos that WikiData hasn't got.

We do have lots of unknowns filled in with unsourced guesses and junk.  Fantasy parents a speciality.

@Chris 

Aleš is doing all the magic. I feel that WikiTree is not a technology driven site and all help to explain for the WikiTree community and also for Chris Whitten why linked data is the future and has great benefits....

Status right now

We have started to explain Wikidata see Space:Wikidata please update if you have good links examples....

Open Issues

  1. Tighter integration
    Should WikiTree better integrate with WIkidata and on the profile mark if we have a mismatch with Wikidata on a field
     
  2. Data model mismatch
    How do we best compare the data models of WikiTree and Wikidata and say that we have the same information?

    How do we extend WIkiTree so it better can be a good citizen in the linked genealogy landscape 
    1. Location model
      Today WikiTree just have a text field for location with no validation at all. Aleš has done some work for cleaning this field but he is waiting for an action by Chris in what direction WikiTree should move....

      As mentioned above by Aleš and I think historical places are good candidates for a location model as Wikidata has. That we on some level says that church xxx is the same as Wikidata Q1234 or city zzzz is the same as Wikidata Q5678 

      Chris it would be interested to hear what your point of view is..... my understanding is that you have been working with more advanced linked Geo data... and meteorology....
       
Hi Magnus,

I agree with your assessment - we probably need a location model.

I think our time handling is OK, as it is event driven, and underlying math structures (Allen Algebra) rigorous. Policy of recording date as in Source is a solid one, otherwise into problematic area of calendar conversions. By leaving as is, and using +/- time tolerance, events maintain their integrity. W3C has an ontology for time, from 2006, but about to be updated to allow non-Gregorian calendars. There is also work just started in OGC and ISO to allow ISO8601 strings to be enhanced for non-Gregorian, so software-wise, life should get better.

Location implies at least three models:

1. Geo-Location, needed for people to relate to real world: Google maps, Open Street Map, etc. GPS ad WGS 84 probably good enough - to nearest 10 m or perhaps even 100metres. Geo-loc people tend to ignore time aspects ( e.g. Australla moving north at several cm/year -> several metres per century).

2. Gazeteer/named places. Generally OK but:

2.1 Names in sources may no longer be current, or no longer exist.

2.2 Any gazeteer or registry should recognise variants/synonyms/exonyms/etc (e.g. Londres for London) . I use Cumbria Family History publication on "Place Names of Cumberland" for standardised names and UK National Grid reference (which is not WGS84).

2.3 Again, time aspects tend to be ignored. E.g. Cumberland UK became part of Cumbria in 1974. Again, policy of using names in sources good. However, some sources, (e.g. Censuses) use admin districts not useful for anything else, either now or at the time of source.

2.4 Also flawed for regions, rather than 'points'. E.g. USA is not some isolated farmstead in the middle of Kansas, but whole country - a polygon or bounding box.

3. Addressing. Great variety across countries. Some international standards, not used in USA or UK or Venice/Venezia.   Postal   Codes really for delivering, but probably do not need to distinguish front door from goods entrance.

So, does that help in scoping the 'location model'?

Chris

Magnus, 

Tighter integration 
Should WikiTree better integrate with WIkidata and on the profile mark if we have a mismatch with Wikidata on a field 

I don't think this would be good. I also no longer intend to make errors based on Wikidata's data. As I was connecting it, There is probably same amount of errors on wikidata as it is on our side. And as new data will be added to Wikidata from various sites (Maybe also Wikitree at some point), all those errors will be shown here. I think I will just create compare page for a single profile, so anyone can check and look into differences.

But Chris Whitten should add automatically Wikidata template on any connected page, so you could easily look into their data and related sites. And we will put usefull links into that template (at the moment  Wikidata, SQID, Resonator and My compare page(TBD))

On Location Chris Little has a good point. Nothing is ideal for us, since prime guideline is to use names at the time of the event. I still think My solution http://www.wikitree.com/wiki/Space:Database_Errors_Definition#Country_timeframe_.28Not_Active.29 

is good and depending only on us. It should be extended to regions and cities, and can be initially populated from existing locations in wikitre. It is 11 millions timed locations here. I think initially it would be a lot of work to finetune all locations but end result would be great. you would put in Some street 9, Ljubljana and from from date, you would get country in english and local language. Or with London, you would get autocomplete to UK, and all other variations of https://en.wikipedia.org/wiki/London_(disambiguation) filtered by date.

Wikidata has object for a place, https://www.wikidata.org/wiki/Q437 for Ljubljana, But when I was born it was in Yugoslavia and you cannot deduct that from Wikidata. They might have that info in the future.

@Chris
Thanks for your input...

I got inspired and will join a session in Sweden next month about how to geocode culture things so maybe then I get an understanding what direction other people are moving

WikiTree: I have suggested that we create a template for geocoding see Template COORD at familysearch.org but get no reaction....

 

My feeling is that not too many people inside WikiTree see the potential using maps and geocoding locations when doing genealogy. Maybe it's a we need to wait on a new generation that always use smartphones most people inside Wikitree I have a feeling is not used to the concept....

 

For me location based genealogy is one of the most interesting things to do. Visiting graves and houses

Magnus,

Enjoy your cultural geo-coding.

Two (philosophical? Theological?) ideas to consider:

1. There are technical arguments for Lon/Lat rather than Lat/Lon (x,y; Eastings,Northings) and practice is different in different languages and countries.

2. There is no such thing as a point, only smaller regions. When coordinates of points are described less precisely, they move location. When regions described less precisly, they should get bigger, but not move.

@Chris Thanks

I did a University training in GIS some years ago and then you start to get a feeling how difficult it is with coordinates but also rewarding...... I guess starting describe things then you will meet more challenges.... and then we have RDF and linking.....

I started a crazy WikiTree project to add links useful for Swedish genealogy. The unit in Sweden that is best for genealogy is Parish (swe. socken)... 

My problem is to understand how you describe relations in RDF and what is best practise plus get other organizations delivering information/tools to understand linked data

  1. In Sweden we have 2500 parishes (swe. socknar)
  2. All those parishes are described with links inside Wikipedia/WikiData
    1. In Wikipedia you use an Infobox that is implemented on all articles but the automatc transfer to Wikidata is not done
  3. I think all parishes that existed in 1880 has a unique number (ATA:s sockenkod)
  4. Parishes I think started in 1600 and they are rather stable but not totally.... and here we need a good way of describing a "fuzzy" relationship....in RDF
    1. One parish change county
    2. One parish belongs to more counties
    3. A Parish is not always the same as the division churches use "Församling" but nearly
  5. My understanding is that Wikimedia is looking into also having some description of the borders/area of a parish (?!?! kml?!?)

If you look at the SPC project I have created User stories and created templates for linking external sources useful from a genealogy point of view. 

My hope is that we start getting Swedish genealogy sites to understand linked data and start to link between each other and don't invent new structures non related to what other sites have....

Lesson learned is that genealogy in 2016 with internet and scanning etc. is magic but not all people with a family tree are rocket scientist... so it's a challenge ;-) 

@Magnus, the W3C has a group trying to publish Best Practices for Spatial Data on the Web by the end of this year. (Generic) Data on the Web Best Practice is already published, if not yet a final recommendation.

In the SDW BP, use of vague and imprecise spatial relations is a strong requirement. Have a look at w3c.org Spatial Data on the Web Workng Group.

Swedish socken sound exactly like British Parishes! No identifiers other than their name or a partly standardised three letter acronym assigned by genealogists!

Best wishes, Chris
+11 votes
http://www.wikitree.com/wiki/Template:Wikidata

http://www.wikitree.com/index.php?title=Special:Whatlinkshere/Template:Wikidata&limit=250&from=0

Feels Template Wikidata should have a link to the above pages....

Aleš any suggested page to link?
by Living Sälgö G2G6 Pilot (298k points)
I will prepare a page, that lists the person and first generation.

I think wikidata template should be handled automatically by wikitree. Connections should be retrieved from wikidata and template with links automatically shown on page if connection exists. We will need Chris to do that.
+12 votes
So just out of curiosity, what is our end goal here? I think having access to additional potential source data is a interesting idea and has merit, but are we considering wholesale replacement of the existing data in WikiTree for these profiles to conform to the WikiData that is out there? Or are we just using this for another comparative report so that we can generate additional errors for people to investigate? From what I'm hearing so far, it sounds like the latter, but I would be concerned if the work that many have done to improve profiles might be arbitrarily overwritten if we determine that WikiData on the whole is more accurate that WikiTree. Just doing a quick check to see where we believe this might lead. Thanks.
by Scott Fulkerson G2G Astronaut (1.5m points)

Status today is that the Database Error is "just" explaining that a WikiTree profile has logical errors or a difference with WikiData. And I guess it will be so forever.... 

My vote is that WikiTree should have a genealogy "level" that we don't find on Wikipedia/Wikidata today. Just doing copy/paste from WIkipedia I feel is a waste of time then it's better to link....

In WIkiTree a profile should have sources and if necessary we should have an analyze why we believe something...

I have connected about +3000 profiles from WikiTree and too many profiles just have a link to Wikipedia i.e. we don't add any value

For me personal I also update Wikipedia articles if I find something wrong

Video I did about how to use the Wikidata/Royal report Aleš created....
https://youtu.be/y6OcM4xyTHM

+15 votes
I just want to compliment Ales on all of the tools he is making for us to use.   I have gone through many of the profiles above where there are differences and have found some very interesting and subtle changes to be made to profiles...I appreciate all that you are doing!
by Robin Lee G2G6 Pilot (863k points)

@Robin

Should we have some kind of workflow?!?! I have looked on some profiles and some problems can be difficult to fix and need more research... plus we have privacy problem so then we need to add a comment that needs to follow up....

I feel we have a process like this

  1. We find a mismatch
    1. If we can fix it and understand then we fix it either in Wikidata and/or in WikiTree
    2. If not we flag the WikiTree profile with a template and a link to Aleš report or add a comment to the Talk page in Wikipedia 
 
A template in WikiTree could look like
 
 {{WikidataMismatch|looks like birth date Wikidata has a mismatch|Aleš report xxxx}}  plus add a timestamp ~~~~
+8 votes

Row Categories

Would be nice to get from Wikidata the values of below properties as those properties normally have a match in WikiTree categories 

  1. P53 - Noble family
  2. P39 - Position held  
  3. P166 - award received
by Living Sälgö G2G6 Pilot (298k points)
I intend to get all properties from wikidata, but first are the relatives.

Cool

You now also have 

Property:P535 = Template:FindAGrave

Related questions

+27 votes
14 answers
1.6k views asked Oct 14, 2016 in The Tree House by Aleš Trtnik G2G6 Pilot (808k points)
+17 votes
2 answers
555 views asked Sep 15, 2016 in The Tree House by Aleš Trtnik G2G6 Pilot (808k points)
+13 votes
6 answers
+9 votes
0 answers
396 views asked Mar 1, 2017 in Policy and Style by Living Sälgö G2G6 Pilot (298k points)
+19 votes
1 answer
+21 votes
5 answers
1.0k views asked Aug 5, 2016 in WikiTree Tech by Aleš Trtnik G2G6 Pilot (808k points)
+6 votes
1 answer
739 views asked Feb 7, 2017 in The Tree House by Living Sälgö G2G6 Pilot (298k points)
+8 votes
4 answers
+9 votes
0 answers
+7 votes
0 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...