I am preparing new errors for Prefix field. I analysed prefix field and it is usually one of 100 texts. Maybe we could set the standard of what exactly is in prefix field.
Here are all prefixes, that occurs over 5 times.
http://www.softdata.si/osebe_staro/ales/Wikitree/Prefix5.htm
And here are all prefixes.
http://www.softdata.si/osebe_staro/ales/Wikitree/Prefix.htm
I did some review and we could classify prefix in a few groups listed below and in reports with number of occurrences. It is not completely in line with help but, that is the state of prefix field. And all the rest should be an error. I could also add another error, if any of these prefixes appear in any other name, since they should be in prefix. I want your opinion of these error.
Let me describe a bit my idea, how this would work.
We have 230000 prefixes in database. We already whitelisted or blacklisted 220000. And excluding numbers there are cca 8000 questionable prefixes. Number still will reduce a bit, as we extend the list.
- We already have one error for number in prefix
- We also have one error for separators in prefix
- I will add one error for blacklisted prefixes. cca 800 errors
- I will probably add one error for prefixes, that should be in suffix. cca 800 errors
- I will add another error for all other prefixes, that are not whitelisted. few thousand errors. This error will have a possibility to mark it as false error. I will occasionally review false errors and add frequent words to white list.
WhiteList
Here is latest list of prefixes, that are ok.
http://www.softdata.si/osebe_staro/ales/Wikitree/PrefixWhiteList.htm
Definition of Whitelist and BlackList will be defined on wikitree page, so anyone will be able to whitelist another word. Although I would prefer some level of agreement in G2G, before adding new items to whitelist. This page is already being used, and will be extended as we add specific errors. http://www.wikitree.com/wiki/Space:Database_Errors_Definition I will add the prefix list on the page as we conclude this discusson.
I would also ask Chris to include or reference definition page in help for the prefix.
I also need some help from you.
- Are group names ok. Should some group be split or added a new group.
- Are items in correct groups.
- What variations are used in non english languages. See the links to report.
- Which form should be used where there are different spellings/abbr. For instance Sargeant 17, Sargent 21, Sergeant 162, Sgt 881 and a few more spellings of this rank that I didn't classify jet.
This is the description from help page:
Prefix
This is for a name prefix or title2 such as Mrs, Sir, Dr, Gov, Sgt, etc.
If a person has multiple prefixes or titles use the highest, last or preferred one, e.g. Capt over Lt..
The prefix is limited to 10 characters.
2. If a title cannot be properly paired with the Proper First Name at birth it should not be used in the Prefix field. Instead, it should be part of the Preferred First Name or Nicknames. For example, King is not an appropriate Prefix for George VI because his Proper First Name at birth was Albert and he cannot be called King Albert. See Name Fields for European Aristocrats.