Thursday, January 4, 2018

Database Statistics Made Easy (Using Gigatrees and Other Programs)

I've been getting pretty lucky with school lately. Today (1/4/2018 that is) school was canceled due to "unsafe wind chills" and tomorrow it will be canceled for the same reasons. The benefits of the lack of school days were numerous, most notably my paper one on "German and Italian Expansion" was delayed (History and English are my weakest subjects believe it or not) and I was able to investigate the different ways I could take statistics in my database in Legacy Family Tree
A few of the statistics I am now tracking

This post was inspired by this one on the Michigan Family Trails blog, which was inspired by one on Geneamusings

The Michigan Family trails post finishes with the following question: How do you keep track of your progress? I decided to find the best way I could to do so, and the rest of the post will outline my methodology. If you follow my the guide, you will finish with a series of highly informative graphs of a variety of aspects of your database as it evolves over time. You will be able to target the specific areas where your database is lacking and remedy them. The guide will work for gedcoms made using any genealogy program.

1. Download Gigatrees

Gigatrees is a free program that you can use to make web pages out of gedcom files. The websites it makes are very nice, you may even consider making a gigatrees site and hosting it on github for a completely free family site. The reason we will use it, however, is the detailed statistics page it generates. To download just go to this site and click the download you would like to use. 
Download Page for Gigatrees
Note: If you don't see the blue buttons, look at your address bar for any icons with a red x on them. If you see one click it and enable whatever it is blocking.These icons mean that for whatever reason your browser doesn't trust certain parts of the website, even though they are perfectly safe to use.

2. Export a Gedcom From Your Database

How you complete this step varies based on what family tree program you use, so if you don't know how a simple google search should tell you. I recommend naming the file with the date that you saved it on for clarity's sake.
All of the gedcoms I have saved over the past year

3. Make a Website With Gigatrees

When you boot up gigatrees you should see a screen with 2 fields; Input file and Output path. Specify the gedcom file as the input path and choose a folder for the outputted website pages to go to. To start running the program press Run>Launch Application in the top bar. Alternatively, you can just press your f5 key.
The main screen on gigatrees

When you do this a command prompt will pop up listing what part of the process it is at. Just wait until this goes away before doing anything else.
The command prompt popping up. Just wait until it goes away before moving on.
Next, navigate to the folder you chose for the output path and click on index.html. You will be taken to a home screen with a lot of different views. Choose the 'statistics' view. 
The statistics button on the main home page

4. Saving the Statistics Page

One of the main parts of this tracking method is saving the statistics page every so often so that we can later input the data into excel. to do this right click the page and choose 'Save as'. When the file save dialog pops up choose 'save as web page, HTML only'. I recommend that you pick a simple name, such as the date that the statistics are from.
Example file save dialog

5. Setting Up Your Excel Documents

You will have 2 documents in total.

First Document

This Document will display the data in the "Total Records Processed" part of the gigatrees statistics page. A finished document (not including graphs) will look something like this:
Total Records Processed statistics in Excel
To get started setting up your document, highlight the following and use ctrl+c to copy it:

Individuals,Families,Repositories,Sources,Notes,Locations,Source Citations,Source References,Note References,Claims,Documented,Claims,Impossible Claims,Census,Photos,?,Immigrants,Nobility,Titles

Select cell b1 in your document and click home in the top bar of excel. Then click the arrow at the bottom of the paste button and select "Use Text Import Wizard".
Use Text Import Wizard In Excel
Tip: If you don't see the import wizard as an option, first paste the data into word, then copy it in word and try again. You will be taken to a screen that looks like this:

Import Wizard first screen
Just click next.
The next screen will look like this:
Import Wizard 2nd screen
Just check the 'comma' box and click next.
The next screen will look like this:
Final Import wizard screen
Click finish. 
And you have successfully pasted the headings of your document. Put the dates of any gedcoms you have saved in the lefthand column.
Headings pasted into the document

Second Document

This document will look something like this:
The second excel file
Now I could have actually formatted the previous one in a way that would have made it much easier to paste, but I wanted to show the text import tool because you will need it later on. This one though I will be much nicer. Just copy and paste the following as is into cell a1:
Parents Relationships Names Census Vital Events Other Events Attributes

Then copy paste the following into the row below it:
Undocumented Documented

Now in the undocumented, documented line, select both undocumented and documented and click, hold, and drag the little square that appears in the bottom right corner until the entire row is filled with alternating Undocumented and Documented. 
Fill in the whole row with documented and undocumented
Now this next part is more aesthetic than anything but you probably still want to do it. For each two cells above "documented" and "undocumented" select both and in the "home" menu click "Merge and Center" Like so:
Merge and Center option
You will see that the cells are now clearly underneath a specific option.
The merged result
Add your dates in the same location as the last document.

Download Converters

Now, you could just copy and paste every data point from the statistics page in gigatrees into your excel sheet. Luckily for you though I made some programs so that you don't have to. Go here and download "Gigatrees Statistics to Excel". Unzip the folder and you will find 2 .exe files. For our first document we will use "GigatreesProcessedRecords.exe" and for our second we will use "GigatreesClaimsDocumented.exe" Put both of these in the same folder as the statistics HTML files you saved earlier.

Use the Converters

So what exactly do these .exe files do? It's pretty simple. They print out all the numbers you need for your excel spreadsheet in a way that's easy to paste. Here is an example. I will run the first one (they both function the same way).

When I first run it I see a screen that looks like this:
GigatreesProcessedRecords.exe initial screen
It is asking the name of the statistics html file I intend to use. I will type in "1-4-18", the name of my most recent one. It prints out a comma-separated list of numbers.
Comma-separated list
These are the values for every statistic in the "Processed Records" section of the Gigatrees statistics page. Use the text import wizard from earlier (make sure you check commas) to paste the data values into your excel sheet. Do this every so often until you have a reasonable amount of data points. Then you can have fun making line graphs in excel like you normally would. Here are some of my own statistics I was able to make graphs for:

Individuals and Families I added to my family file over time
Information about the Number of Sources and how often I used them in my file over time
The number of claims I made and how many of said claims have citations
Each of the specific categories of claims I made and how many of each type have citations
Because of the graphs, I realized that really the only type of claim I wasn't slacking on citing were censuses and events (which fall under the "other" category. I need to make sure I am citing the records that list the names of people and the ones that prove interrelationships. My database is also less than a year old, and as time goes on I see I have gotten better about citing things in each category. Hopefully, you discover something about your database as well!

No comments:

Post a Comment