- Profile
- Images
Location: [unknown]
NOTE: AGC is now part of the WikiTree Browser Extension. It is still also available as a standalone browser extension but that my change at some point.
WikiTree AGC (Automatic GEDCOM Cleanup) is a Browser Extension developed by Rob Pavey that allows the user to reformat a profile that was created from a GEDCOM (by GEDCOMpare or the earlier GEDCOM import process).
NOTE: There are user options to configure the reformatting which are not immediately obvious, see the User Options section below.
Contents |
How to install
As mentioned above this is now part of the WikiTree Browser extension so see that page for install instructions.
If for some reason you need to install the old standalone extension read on...
The extension is free to install and use (except on the Apple App Store). It works in many different browsers.
- For Chrome, Opera, Brave, Edge, Vivaldi and other Chromium based browsers , install it from the Chrome Web Store.
- For Firefox install it from the Firefox Add-ons page.
- For Safari on Mac or iOS go to the App Store and search for "WikiTree AGC"
How to use
In WikiTree go to a profile that was created from a GEDCOM. Click the edit tab and scroll down to the the Edit Text section.
If WikiTree AGC can recognize this as a GEDCOM created profile then there will be an AGC button to the left of the line of buttons above the big text box:
Press this button and the text in the big text box will be reformatted. Scroll down and press the "Preview" button so see what it will look like if saved.
There is a video demonstation on this Feature Friday Episode.
Undo feature
If you want to try again with different options or want to make some manual changes before reformatting you can undo the changes and try again. If you look at the AGC button again you will see it now looks like this (sorry still temporary art!):
Pressing it again will undo the changes (in both the text box and any other fields that the extension has changed). You can then edit the user options (see below) and press the button again to redo the reformatting with the new options.
What problem does this extension address?
The current GEDCOMpare tool creates profiles which many users are not happy with. The primary problem is that the biography section is so unlike what a typical WikiTree user would create that some users feel that it is faster to delete the whole biography and start over. While this may be an exaggeration, if we can automatically create a biography that provides a nice starting point that the users can then gradually improve, that would make the GEDCOM pathway more attractive.
So this extension adds a button that, when pressed, reads the GEDCOMpare created biography and replaces it with a new biography in a chronological narrative format. The user can then preview it and make edits before saving the changes.
Some of the most obvious issues with the latest GEDCOMpare created biographies are:
- There are no line breaks. So the entire biography section is one paragraph. This makes it hard to read and edit. Also known as "the blob".
- The facts are grouped by fact type. So all the Residence facts are in one section and all the Marriage facts are in another etc. Thus it is not in chronological order.
- Even within one fact section (such as Residence) the facts are not sorted chronologically. They are in an order that appears random (probably the order in the GEDCOM).
- For each fact, the biography contains first the description, then the date and then the location. The description is usually some extra info or notes added by the user or automatically by Ancestry etc. It would be more readable if the date and location came first.
- Nearly all of the citations (<ref>) are named and are referenced in multiple places. For example there is usually a "Name" fact that references all of the citations (since most citations include the person's name). This is mimicking how Ancestry works but is not how most WikiTree profiles work. (see advanced sourcing for more information).
- The citations and sources are separated. i.e. none of the <ref>s include the source details directly. Instead they reference a separate source like this: Source: #S-214974479. There can be some benefits to this in reducing the text size of the profile if the same source is used by multiple citations and some users may like this. But it does make the profile harder to read and edit in my opinion. Also it is officially not recommended.
The extension will also clean up profiles created by older WikiTree GEDCOM import tools. However, if they have been edited too much it will not be able to recognize the events.
What does the extension do?
There is a more detailed list of what it does below. But these screenshots give you a quick taste.
Here is a biography created by GEDCOMpare from an Ancestry GEDCOM:
A typical Ancestry GEDCOM created profile. |
If the user goes into edit mode they will see a new button on left hand end of the toolbar above the biography. This button only shows up if the biography looks like it was created by GEDCOMpare:
The AGC button is on the left of the toolbar. |
Pressing the button will modify the text box below by rewriting the text:
The biography that WikiTree AGC creates. |
More details of what the extension does
- It creates a biography in a narrative form. I.e. it says "Leonard was born on 12 September 1920 in Camden, London, England." rather than "Born: 12 Sep 1920, Camden, London, England.".
- All facts in the biography are sorted chronologically.
- It embeds the source text within the <ref>
- It optionally switches to have only one <ref> for each citation. This is typically with the latest fact using that citation. There are actually five options for when to use named references:
- Never: Refs are never named. There is one ref for each citation, typically on the latest fact that referenced it. All the sources will be in chronological order.
- Minimal: Only adds more than one ref for a citation if there would otherwise by no ref for a narrative event.
- Selective: Only adds more than one ref for a ciitation if adding a ref to a narrative event will likely be adding a ref that has a more accurate date or location for that narrative event.
- Multiple Use: Preserves all refs from the GEDCOMpare biography but only names refs that need to be named.
- All : Always names references and keeps all refs the GEDCOMpare created.
- If there is an exact baptism date in the same year as a year-only birth date it switches the "Birth date" field of the profile to be before the baptism date. Same for an exact burial date with a year-only death date in the same year. If this is done then a note of this is added in the research notes. (There is an option to turn this off).
- For female profiles, if the Current Last Name is the LNAB and there are marriage facts and the last husband's name is known, then the Current Last Name is changed to the last husband's last name (optional).
- If there is a File fact section, then a list of External Media Links is added at the end of the biography. (This is optional).
- If the description for a fact contains a link to FindMyPast then a new citation (ref) is created that contains this link and it is removed from the description.
- Arrival and Departure facts can get linked. For example, if there is an arrival fact and a departure fact that use the same citation it will combine them into a single narrative event.
- If the person died at an age of 12 or less the "Died Young" sticker is added.
- All dates in the narrative are transformed into a standard form: dd Month yyyy (e.g. 10 September 1842). This could be user configurable in the future.
- Duplicate marriage citations are merged.
- Any existing Biography or Research Notes sections that existed before the GEDCOM import are preserved.
- Meaningful titles are put on the source citations based on the fact section/type and source. (This is optional).
- If a source is an Ancestry or MyHeritage source (i.e. a subscription source) it will clean up the source information to remove extraneous data. Also, if it is a recognized source a link to available free sources for this data will be added (this is optional).
Things it does NOT do:
- If the GEDCOM was exported from Ancestry then the source citations still point to Ancestry records. However, if the source is a recognized source (currently limited to England sources) then it will add a link to where a free source may be found.
I am working on a new extension extension to assist with finding free sources for the subscription records.
User options
Not every WikiTree user will have the same preference for how the biography is formatted. The browser extension has user options to allow for this. If you have ideas for new options please let me know. Details of how to use the options follow...
How to edit the user options
NOTE: If you are using this in the WikiTree Browser Extension see the documentation here.
In the Chrome toolbar there is an icon that looks like a puzzle piece. Click that and then click the three dots to the right of WikiTree AGC. Then select options from the menu. See picture below:
In Firefox you click on the "burger menu" (top right corner of browser window) and select "Add ons and Themes". Then you click on the "..." next to WikiTree AGC and click Preferences.
The current options screen
That will bring up a new tab in the browser which looks like this (this image is a bit out of date as new options have been added):
Select the options that you want and press save.
Explanation of the options
These are the current options:
- General
- Spelling. UK English or US English. e.g. baptised vs. baptized. A future improvement could have an option to select spelling based on the fact location.
- Whether to add the person's age to narrative events.
- Whether to add an External Media section to biography if there are files referenced.
- References
- When to use named references.
- Whether to add a newline before the first reference on a narrative event. Doing so makes it slightly easier to edit but inserts a space before the [1] etc.
- Whether to add a newline between each reference on a narrative event. Doing so makes it slightly easier to edit but inserts a space between the [1] [2] etc.
- Whether to add newlines within each reference on a narrative event. The newlines are after the opening <ref> and before the closing </ref>. Doing so makes it easier to edit and has no effect on the public view.
- Whether to add meaningful names to references
- Research notes
- Whether to add an "Alternate names" section to the research notes if the GEDCOM has name variations.
- Other fields
- If there is an exact baptism date in the same year as a year-only birth date then change the "Birth date" field of the profile to be before the baptism date.
- If there is an exact burial date in the same year as a year-only death date then change the "Death date" field of the profile to be before the burial date.
- For female profiles, if the Current Last Name is the LNAB but there are marriages and the last husband's name is known then change the CLN to that.
Reporting problems
Nite that this code is not being actively worked on or maintained as Rob Pavey is now working on the WikiTree Sourcer extension full time. If a MAJOR issue is found then it can be fixed. See below. Also, there are other developers working on the WikiTree Browser Extension who may be able to help. See WikiTree Browser Extension Communication.
If you try this on a profile and there seems to be a bug please do not "Save Changes". Instead, leave the profile as it was created by GEDCOMpare and send a private message to Pavey-429. Include the profile name. I will then try it on that profile myself and debug the issue.
Please include what seems wrong e.g.:
- The AGC button doesn't show up at all for this profile
- The AGC button shows up but does nothing
- The resulting reformatting has an issue
- ...
Please include the version number of Ancestry AGC that you are using. This is visible on the chrome://extensions/ page or in the "More information" box towards the bottom of the Firefox add-on page.
Also, if you see any issues with these instructions please let me know.
Release Notes
See WikiTree AGC Release Notes page.
Wish list for future enhancements
Acknowledgements
Thanks to the many WikiTree users who have beta tested the extension or provided feedback on it, including: Loralee, Hilary, Christina, Steve, Michelle, Kathleen, Jo, Leandra, Jonathan, Raewyn, Geoff, Frances
- Please cleanup GEDCOM imported profiles first... Jul 30, 2022.
- WikiTree AGC (Auto GEDCOM cleanup) now available for Safari on Mac and iOS Jan 8, 2022.
- WikiTree AGC (Automatic GEDCOM Cleanup) now works on Firefox May 9, 2021.
- Automatic GEDCOM Cleanup Aug 25, 2020.
- Beta testers wanted! WikiTree AGC reformats GEDCOMpare created profiles into a nice chronological narrative. Aug 4, 2020.
- Login to request to the join the Trusted List so that you can edit and add images.
- Private Messages: Send a private message to the Profile Manager. (Best when privacy is an issue.)
- Public Comments: Login to post. (Best for messages specifically directed to those editing this profile. Limit 20 per day.)
- Public Q&A: These will appear above and in the Genealogist-to-Genealogist (G2G) Forum. (Best for anything directed to the wider genealogy community.)
Under "What does the extension do?" it states, "This button only shows up if the biography looks like it was created by GEDCOMpare." However, it's possible that the first part of a bio might appear the way it did after being imported, but the user might have made changes from what the extension expects. That's what happened in the case of the profile above (and numerous other profiles I imported). I left the name and christening as imported but changed the marriage, death, and burial. Consequently, the extension made a mess of them.
It needs to be made clear to extension users that if a bio doesn't look exactly the way it would after being imported by GEDCOMPare, then they shouldn't use the extension. Or don't save the changes if the result doesn't look right.
And just as bios need to be fixed after importing a GEDCOM, they also need to be reviewed and fixed after using this extension. Users shouldn't just click the button and leave the profile a mess for the profile manager or other users to clean up; I consider that profile vandalism. I find it very disconcerting that a profile I put a bit of effort into can be so easily messed up with the click of a button. I shudder to think how many other bios have been messed up with the extension.
edited by Matthew Riggle
It is being used by over a 1,000 users and mostly works well. There are bound to be some cases where a user uses it on an inappropriate profile or doesn't check the results and clean things up, but on balance it seems to work well. The GEDI (GEDCOM improvement project) is using it all the time to try to work through the old GEDCOM imports from way back.
Personally I think AGC strikes an OK balance between refusing to run on certain profiles and changing things without warnings.
Thanks for your work on this extension. I think with a few tweaks to cut down on unnecessary extra effort and a strong warning to use it only on GEDCOM-imported bios, it could really help.
Since you're not working on it, Rob, then who is? You might want to change the "Reporting Problems" section above and/or transfer ownership of this page to someone else.
Generated by WikiTree AGC. This section should be removed when all issues have been looked at.
https://www.wikitree.com/wiki/Cozijn-2
"For female profiles, if the Current Last Name is the LNAB but there are marriages and the last husband's name is known then change the CLN to that"
I am afraid someone will click on it. Don't know that would do.
E.g.:
<!-- Name: David Ervin Lawhon. Given Name: David Ervin.
If that commented out part of removed then the AGC button will probably not show up.
This is the first WikiTree "edit" app I've applied to a profile. Increadible.
THANK YOU!! THANK YOU!! THANK YOU!! THANK YOU!!
Thanks for pointing that out. That style is something that I'm inheriting from WikiTree - the standard page uses that box for reporting some errors. It should be white on orange but is not as clear as it could be. I will put it on my list to look at. I don't remember now what causes the standard WikiTree site to use that error box - perhaps they do not anymore.
Cheers, Rob
I've only had one issue so far which is it wrapping out on this profile: Catherine Tiedke. When I hit the button, it freezes the entire Chrome browser tab and sometimes the one that launched that page, like the gedmatch page. Could it be her mother's umlaut?
Thanks for your fine work! Jane
edited by Jane (Snell) Copes
My problem: I have richly sourced GEDCOM entries which match to sparse, already-created WikiTree profiles. If a profile doesn't already exist, I can add it, use AGC to spiff it up and voilia. But, if a profile already exists and I match it, I don't have a good way to automatically extract all of my GEDCOM goodness so that I can manually add it to the existing profile ...
What would be good would be either:
Any suggestions appreciate, thanks! Jeff
For example, by exploring the detail in English census records, you can find out who people lived with, where, and in some cases why. To me, a lot of this detail is missed if the process is automated. What would be really cool is a process that makes the tedious task of transcribing sources easier that also asks pertinent questions about what may have been missed in the process. Anyway, I really appreciate this contribution. Just thought I would share my experience.
I'm asking to please please reconsider an option to turn off the change that splits multiple forenames into different fields.
I plan to get back to working on AGC a bit soon. I will look into adding an option.
Cheers, Rob
I have released version 1.0.0 of WikiTree AGC. It now has an option for this. The text on the options screen is:
"For old GEDCOM imports move additional names from the Proper First Name field to the Middle Name field:"
I hope that works for you.
<Package is invalid. Details: 'Could not load options page 'options.html'.'.>
I have never seen that error before. Which browser are you using? Chrome or Firefox? Thanks, Rob
Thanks, Rob
My GEDCOM comes from Roots Magic. The way the sources/citations are handled there ends up with me having a LOT of duplicated information in the large text box. I manually cleaned that up for that first 45 person GEDCOM. Just about to try a few more people and see if AGC handles it all. If not, I'll get back to you with examples.
But really just wondering if any other Roots Magic users have contacted you to consider changes that would handle Roots Magic specifics.
I did hear just last week from another user using a RootsMagic GEDCOM who was seeing similar issues and was talking to RoosMagic support about it. That was Schmehl-58 if you are interested in chatting directly.
If RM were to change to give you more options in the generation of a GEDCOM that could help but they are putting finishing touches on a major upgrade to V8 (which they've been working on for several years) and they're bug fixing not taking enhancement requests. So if any GEDCOM option were to get added it is a few years down the road - though waiting to see what V8 looks like seems a prudent thing to do. I had been thinking about writing a GEDCOM post-processor to deal with this until I found out about AGC. You've got the great majority of the infrastructure all in place and working, so that's a better place to deal with it - as long as supporting other GEDCOM generators than Ancestry is in your plan.
Give me a day or two to generate a descent test case with my RM V7, see what AGC does with it, and get back to you.
This handling of the complex source representation in Roots Magic is the main thing I was looking at. But there is a further issue in date representation - things that Roots Magic supports but AGC does not recognize as dates. RM allows not just the familiar "Before (BEF) date", the "After (AFT) date" but also "FROM date1 TO date2" and "BETween date1 AND date2". I can get you more information on how those are handled in Roots Magic.
I haven't looked at this page for a long time and I realize now that I missed your replies. I may be able to at least improve some of the handling up RootsMagic date formats.
If you can send me a Private Message with an example WikiTree profile text as it is created by GEDCOMpare I can add it to my test suite and work on improving the handling of it.
I have been spending all my time on my other extension (WikiTree Sourcer) but hope to get back to some work on AGC soon.
Cheers, Rob
~Kathy
I'm not able to reproduce the problem myself. I guess it is possible for the WikiTree website to change in a way that could mess up the extension but I'm not seeing it. Could you give a bit more detail? - Was the "SAVE CHANGES" button completely missing? If so, is this true for the one above the edit box as well as the one at the bottom of the page? - Which browser are you using? I assume Chrome?
Thanks, Rob
Chrome is my main browser, but this morning I tried it on Opera, Vivaldi, and Edge. Right now, Vivaldi is working with all extensions removed. When I tested it, I removed extensions right to left, and after I removed each extension I refreshed each time. After AGC was deleted it resolved, but that may have been because it was the last extension, I can't remember.
Now, I just went back and did that again in Chrome, added the extensions, removed them, and then added them back. The first time I added the extensions, the Save was grayed out. After I removed them, I could save. Then I just added them back in, and I can still Save.
So, obviously, there is no problem with your App, and I am sorry for bothering you with this. I appreciate your quick response.
The AGC extension does have some code to enable the save changes button after it modifies the text (by sending a 'change' event). Otherwise you get the "No changes to save" message. So there could be some relation. There isn't any code that disables the save button though.
I had this case where someone used AGC to reformat a profile that had a Could not interpret date in Birth Date (20 MAY 19??)., reformatted to was born on 20 May 1900. Please see https://www.wikitree.com/index.php?title=DeROO-85&diff=117958614&oldid=100169665 Sent me on a fruitless hunt searching for someone born 1900-05-20. Only after inspecting the change log did I notice the 1900 had no basis. Dropping that search filter it was not hard to find the person, with birth date 1911-05-20...
I guess the problem was initially caused by the GEDCOM import software assigning birth date (basic data item) 1900-00-00 from input 20 MAY 19??. Maybe you can implement logic in your otherwise wonderful tool to process "Could not interpret date" cases more sophisticated? Could not find any reference to "interpret" in your documentation.
Thanks for reporting that problem. I'm thinking about the best way for the extension to handle that. Perhaps you have some suggestions? I can detect the "Could not interpret date" text (and also ? characters in dates). Options that I can think of would be:
Any thoughts? Rob
edited by Rob Pavey
Looking at the latest 851 suggestions (almost 16K) there are less than 10 occurrences of ?? So that would, imo, not justify a significant development effort.
Thank you!
I will check that it is just initials. It did it with one that had a name followed by an initial.
I have fixed the issue in version 0.1.21
I have been using AGC for a couple of months now and generally like the way it works. One thing that may be helpful is to look for the first and middle name in the first name and prefered name field. If found, correct it or put an "Issue" in the reference notes to prompt correction. I adopted dozens of profiles imported by gedcoms and they all have this issue. Here is an example. Shirley-1037
Record File Number
to something like
in the Sources section, as the FamilySearch ID is currently handled?
I've been copying them out by hand to not lose the information and it is rather tedious.
Example https://www.wikitree.com/index.php?title=Zaborskas-53&diff=113732769&oldid=5477361
edited by Aaron Gullison
Thanks for the suggestion, Rob
I am glad to hear that my extension is helping you. Rob
I tried the GedCom cleanup on Bancroft-665. One minor glitch but otherwise awesome! For some reason it placed the 2nd level "Death" heading and information right after <ref> in the last census citation. Maybe because it was a 2nd level heading instead of 3rd level like it should have been?
Yes it would be because the = Death = second was a second level heading rather than third. It looks like someone manually added that death section after the GEDCOM import. So it treats it like any other second level heading (e.g. Research notes). Rob