How Artificial Intelligence AI and Machine Learning Impact Genealogy

Artificial Intelligence and Genealogy
Elevenses with Lisa Episode 32

In this episode we tackle a few small geeky tech questions about artificial intelligence, better known as AI, that may have a pretty big impact on your genealogy life. Questions like:

  • Is artificial intelligence the same thing as machine learning?
    And if not how are they related?
  • And am I using AI, maybe without even being aware of it?
  • And what impact is AI really having on our lives? Is it all good, or are there some pitfalls we need to know about?

We’re going to approach these with a focus on family history, but pretty quickly I think we’ll discover it’s a much more far-reaching subject. And that means this episode is for everyone.

Free Webinar AI Machine learning and Genealogy

Watch the free video below.

While I’ve done my own homework on this subject and written about it in my book The Genealogist’s Google Toolbox, I’m smart enough to call in an expert in the field. So, my special guest is Benjamin Lee. He is the developer of the Newspaper Navigator, the new free tool that uses artificial intelligence to help you find and extract images from the free historical newspaper collection at The Library of Congress’ Chronicling America. I covered Newspaper Navigator extensively in Elevenses with Lisa episode 26.

Ben  is a 2020 Innovator-in-Residence at the Library of Congress, as well as a third year Ph.D. Student in the Paul G. Allen School for Computer Science & Engineering at the University of Washington, where he studies human-AI interaction with his advisor, Professor Daniel Weld.

He graduated from Harvard College in 2017 and has served as the inaugural Digital Humanities Associate Fellow at the United States Holocaust Memorial Museum,  as well as a Visiting Fellow in Harvard’s History Department. And currently he’s a National Science Foundation Graduate Research Fellow.

Thank you so much to Ben Lee for a really interesting discussion and for making Newspaper Navigator available to researchers. I am really looking forward to hearing from him about his future updates and improvements.

Artificial Intelligence and Genealogy

Covering technology and its application to genealogy is always a bit of a double-edged sword. It can be exciting and helpful, and also problematic in its invasiveness.

Tools like family tree hints, the Newspaper Navigator and Google Lens (learn more about that in Elevenses with Lisa episode 27) all have a lot to offer our genealogy research. But on a personal level, you may be concerned about the long reaching effects of artificial intelligence on the future, and most importantly your descendants. In today’s deeply concerning cancel culture and online censorship, AI can seriously impact our privacy, security and even our freedom.

As I did my research for this episode I discovered a few things. Artificial Intelligence and machine learning is having the same kind of massive and disrupting impact that DNA has had on genealogy, with almost none of the same publicity. (For background on DNA data usage, listen to Genealogy Gems Podcast episode 217. That episode covers the use of DNA in criminal cases and how our data potentially has wide-reaching appeal to many other entities and industries.)

A quick search of artificial intelligence ancestry.com in Google Patents reveals that work continues on ways to apply AI to DNA and genealogy. (See image below)

Patents for AI machine learning and DNA

Patent search result: a pending patent involving AI and DNA by Regeneron Pharmaceuticals, Inc.

AI now makes our genealogical research and family tree data just as valuable to others outside of genealogy.

This begs the question, who else might be interested in our family tree research and data?

Who Is Interested in Your Genealogy Data

One answer to this question is academic researchers. During my research on this subject The Record Linking Lab at Brigham Young University surfaced as just one example. It’s run by a BYU Economics Professor who published a research paper on their work called Combining Family History and Machine Learning to Link Historical Records. The paper was co-authored with a Notre Dame Economics and Women’s Studies professor.

In this example, their goals are driven by economic, social, and political issues rather than genealogy. Their published paper does offer an eye-opening look at the value that those outside the genealogy community place on all of the personal data we’re collecting and the genealogical records we are linking. Our work is about our ancestors, and therefore it is about ourselves. Even if living people are not named on our tree, they are named in the records we are linking to it. We are making it all publicly available.

In the past, historical records like birth and death, military and the census have been available to these researchers, but on an individual basis. This made them difficult to work with. Academic (and industry) researchers couldn’t easily follow these records for individual people, families, and generations of families through time in order to draw meaningful conclusions. But for the first-time machine learning is being applied to online genealogy research data making it possible to link these records to living and deceased individuals and their families.  

It’s a lot to think about, but it’s important because it is our family history data.  We need to understand how our data is being used inside and outside the genealogy sandbox.

Answers to Your Live Chat Questions About AI

One of the advantages of tuning into the live broadcast of each Elevenses with Lisa show is participating in the Live Chat and asking your questions.

Elevenses with Lisa Q&A on AI and Genealogy

www.GenealogyGems.com/Elevenses

From Linda J: ​What about all the “people search” sites (not genealogy) that have all, or a lot of, our personal date?
Lisa’s Answer: My understanding is that much of the information provided on many of the “people search” websites comes from public information. So while the information is much easier to access these days, it’s been publicly available for years. That information isn’t as accessible to projects like the one discussed in this episode because those websites don’t make their Application Programming Interface (known as API) publicly available like FamilySearch does.

From Doug H: Wouldn’t that potentially find errors in our trees?
Lisa’s Answer: Yes.

From Sheryl T: ​Do these academic researchers have access to the living people on the trees? Or are those protected from them as it is to the public?
Lisa’s Answer: They have access to all information attached to people marked as “Living Person.” Therefore, if the attached record names them, their identity would then be known. Click a hint on your tree at Ancestry for example, and the found records clearly spell out the name of the person they believe is your “Living” person.

From Nancy M: ​How long do the show notes stay available? am looking for Google Books two weeks ago and last week’s Allen Co Library.
Lisa’s Answer: The show notes remain available until the episode is archived in Premium Membership. You can find all of the currently available free Elevenses with Lisa episodes on our website in the menu under VIDEOS click Elevenses with Lisa.

Nannie A: I heard a rumor that Ancestry .com has been sold. Do you know if that’s true?
Lisa’s Answer: Yes, they were sold again this year. Read:
Private equity firm Blackstone Group Inc. buying Ancestry.com for $4.7 billion
Private equity wants to own your DNA by CBS News.

Resources

Get My Free Genealogy Gems Newsletter – click here.
Bonus Download exclusively for Premium Members: Download the show notes handout. 
Become a Genealogy Gems Premium Member today. 

 

Episode 208

Genealogy Gems Podcast Episode 208

with Lisa Louise Cooke

In this episode:

  • A free webinar!
  • Great comments from you: An inspiring Google Books success story, how one listener gets her shy husband talking about his life story, and a listener’s own version of the poem, “Where I’m From”
  • The Archive Lady talks to us about historical scrapbooks at archives that may be packed with genealogy gems for us
  • A genealogy hero who saved a life story
  • Your first look at RootsTech 2018

FREE GENEALOGY WEBINAR

“Reveal Your Unique Story through DNA & Family History”

Handouts:

Googling and Making Videos with Lisa Louise Cooke

Newspaper Research Worksheet from Lisa Louise Cooke

Genetic Genealogy: Here’s What You Need to Know from Your DNA Guide Diahan Southard

NEWS: FIRST LOOK AT ROOTSTECH 2018

Going to RootsTech for the first time? Read this RootsTech Q&A.

MAILBOX: PAT INTERVIEWS HER SHY HUSBAND

“Remembering Dad” video

Pat’s tip: When someone is shy about sharing life stories, interview them informally while traveling. Pat uses her iPad to transcribe his responses, then polishes it up when she gets home and transfers it to her own computer. “Eventually we will have enough to write the story of his life, with lots of pictures. And it’s completely painless.”

MAILBOX: GOOGLE BOOKS SUCCESS STORY FROM KIM

Click here for another inspiring genealogy discovery using Google Books?with how-to tips and a free video preview of Lisa Louise Cooke’s Premium video tutorial, “Google Books: The Tool You Need Every Day”

MAILBOX: “WHERE I’M FROM” POEM SUBMISSION

Genealogy Gems Podcast Episode 185: Learn more about the “Where I’m From” poetry project and hear a conversation with the original author, Kentucky poet laureate George Ella Lyon.

THE ARCHIVE LADY: HISTORICAL SCRAPBOOKS

Scrapbooks are one of my favorite record sources to do genealogy research in and to also process in the archives. There are all kinds of scrapbooks; each and every one is unique and one-of-a-kind. They were put together with love and the hope that what was saved and pasted onto those pages will be remembered.

The origins of scrapbooking is said to go back to the 15th century in England and it is still a hobby enjoyed by many today. Most archives, libraries, historical and genealogical societies have scrapbooks in their collections. They will most likely be found in the Manuscript Collection as part of a specifically named collection.

Scrapbooks contain all kinds of wonderful genealogical records, photographs and ephemera. There is even a scrapbook in the Houston County, Tennessee Archives that has candy bar wrappers pasted in it. This particular scrapbook is one of my absolute favorites. It was compiled and owned by Evelyn Ellis and dates to the 1930’s and 1940’s.

Among the normal newspaper clippings and event programs are interesting pieces such as a Baby Ruth candy bar wrapper with a handwritten note by Evelyn that reads “Always remember June 11, 1938 at Beach Grove at the Ice Cream Supper.” There is also an original ticket pasted into the scrapbook from the Grand Ole Opry in Nashville, Tennessee where Evelyn Ellis visited and recorded her comments on April 1, 1939.

There are scrapbooks for just about any subject. Aside from personal scrapbooks, you can find war scrapbooks, obituary clipping scrapbooks and scrapbooks that collected and recorded local or national events. The obituaries found in scrapbooks could be a real find because sometimes they are the only pieces of the newspaper that survive and can be a treasure trove for any genealogist. Many scrapbooks contain one-of-a-kind documents, photographs and ephemera.

To find scrapbooks in an archive, ask the archivist if they have any scrapbooks in their records collections. Many times scrapbooks are housed with a particular manuscript collection and will be listed in the finding aid. Some archives have a collection of just scrapbooks that have been donated to them and can be easily accessed. Most scrapbooks will not be on research shelves and will be stored in back rooms at the archives and will have to be requested. You should also check the archives online catalog for any listings of scrapbooks before you jump in the car and drive to the archives.

I encourage all genealogists to check with the archive in the area where your ancestors were from and see if they have any scrapbooks in their archived records collections. Scrapbooks are like time capsules: you don’t know what will be found in them until you open them up.

BONUS CONTENT for Genealogy Gems App Users

If you’re listening through the Genealogy Gems app, your bonus content for this episode is a PDF with tips for what to do if your own scrapbook gets wet. The Genealogy Gems app is FREE in Google Play and is only $2.99 for Windows, iPhone and iPad users.

ANIMOTO

Start creating fabulous, irresistible videos about your family history with Animoto.com. You don’t need special video-editing skills: just drag and drop your photos and videos, pick a layout and music, add a little text and voila! You’ve got an awesome video! Try this out for yourself at Animoto.

MYHERITAGE.COM

MyHeritage is the place to make connections with relatives overseas, particularly with those who may still live in your ancestral homeland. Click here to see what MyHeritage can do for you: it’s free to get started.

GEM: SAVING A LIFE STORY

Original story on SWVA Today: “String of Pearls: Marion’s Bob White Shares Family History Collection” by Margaret Linford, Columnist

Smyth County Public Library Local History webpage

Genealogy Gems how-to resources to help you:

Video record a loved one telling their life stories

How to video record a fantastic family history interview

How to create a family history video with Animoto

Digitize and share your research and your own life story: Interview with Larsen Digital in Genealogy Gems Podcast episode 183

How to Start Blogging series in the free Family History: Genealogy Made Easy podcast (episodes 38-42) and this article: 3 Ways to Improve Your Genealogy Blog

RootsMagic family history software has publishing tools (for print and online publishing):

Rootsmagic

Visit www.RootsMagic.com

Lisa Louise Cooke uses and recommends RootsMagic family history software. From within RootsMagic, you can search historical records on FamilySearch.org, Findmypast.com and MyHeritage.com. RootsMagic is now fully integrated with Ancestry.com: you can sync your RootsMagic trees with your Ancestry.com trees and search records on the site.

 

A BRILLIANT WAY TO “MEET” YOUR ANCESTOR

Your DNA Guide Diahan Southard shared this story from Christine:

“Friday night I brought out large cut out of my Grandmother, Christine Doering, sitting in an easy chair so it looks like she is talking with you, and I played a recording done in 1970’s of her talking and giggling about coming to America in 1896 at the age of 9.  For some they had never heard her voice before.”

Subscribe to the free Genealogy Gems YouTube channel.

PRODUCTION CREDITS

Lisa Louise Cooke, Host and Producer
Sunny Morton, Editor
Diahan Southard, Your DNA Guide, Content Contributor
Vienna Thomas, Associate Producer
Hannah Fullerton, Production Assistant
Lacey Cooke, Service Manager

FREE NEWSLETTER:

Genealogy Gems Newsletter Sign Up

Subscribe to the Genealogy Gems newsletter to receive a free weekly e-mail newsletter, with tips, inspiration and money-saving deals.

Resources

Download the episode

Download the show notes

5 Reason You MUST Look at Original Records

Show Notes: When you find family history information online you MUST make every effort to find the original genealogy record so that your family tree will be accurate! There are 5 reasons to find original records. I’ll explain what they are, and what to look for so that you get the most information possible for your family tree.

If you’re a genealogy beginner, this video will help you avoid a lot of problems. And if you’re an advanced genealogist, now is the time to fix things. 

Watch the Video

Show Notes

Downloadable ad-free Show Notes handout for Premium Members

#1 Many online records are simply way too vague.

Records come in many forms. Many genealogy websites consider that each name that appears on a document is a “record” when they’re counting records. So, when you hear that 10 million records have been added to a website, it doesn’t necessarily mean that 10 million genealogical documents have been added. It oftentimes means that that’s the number of names that they’ve added.

One document could have a lot of names. In the case of a death certificate, it could have the name of the deceased, the name of the spouse, the name of the informant, and the names of the parents. Each one of those gets counted as a record.

Recently, MyHeritage announced they’ve added 78 million new records to their website. However, many of these records are simply transcriptions, they’re extracting the information from whatever the original source was. That information becomes searchable, and that’s terrific because they are great clues. So, sometimes when you go and look at the records themselves, it turns out that record really is just a transcription. There is no digital record to look at.

Sometimes the website doesn’t even tell you what the original record was. There will be clues, though. You can use those clues and run a search on those words. So, if it talks about a particular location, or type of record, or the name of the record, you could start searching online and find out where are those original records are actually held. Sometimes they are on another genealogy website. But a lot of times, and I’ve seen this more recently, they are publicly available records, oftentimes from governmental agencies. Very recently, we’ve been seeing more recent records that are just selected text. They may be records for people who just passed away a year or two ago.

There are a wide range of places where these types of records can come from. But if that genealogy website got its hands on the record, chances are you could too. And it’s really important to do that.

#2 What’s important to you might not have been prioritized for indexing.

The indexer is a person, or perhaps even an artificial intelligence machine, who has gone through the documents and extracted information and provided it in text form. Sometimes when you search on a genealogy website, all you’re getting is just that typed text, that transcription, of some of the key data from the original document.

I’ll tell you about one example in my family. I was looking at a 2x great grandmother back in Germany. Her name was Louise Leckzyk. She’s listed as Louise Nikolowski in the Ancestry record hint. Technically, that’s true, she was Louise Nikolowski at the time of the birth of her child. But if you pull up the original record, what you discover is she’s not listed as Louise Nikolowski on the record. She’s listed with her maiden name, which was usually the case in those old German church records. So that’s huge. We’ve talked about how challenging it can be to find maiden names here on the Genealogy Gems channel. So, we don’t want to miss any opportunity to get one. But if we had taken this record hint at face value, and just extracted that information, put it in our database, or attached it to our online family tree, and never looked at the original document, we would have completely missed her maiden name. And that maiden name is the key to finding the next generation, her parents.

#3 Not all information on a record is indexed.

It’s very common for large portions of information on a document not to be indexed. Here’s the reason for that: Indexing costs money. When a genealogy company takes a look at a new record collection they have some hard decisions to make. They have to decide which fields of information will be included in the indexing. Oftentimes, there will be several columns, as in a church record or a census record. The 1950 census was an example of this. There’s so much data that the company has to look at that and say, what do we think would be of the most value to our users? They then index those fields. They’ve got to pay to not only have them indexed, but potentially also reviewed human eyes, or AI. That all costs money.

So, there will inevitably be information that gets left off the index. That means that when you search the website you’re going to see the record result, and it can give you the impression that that is the complete record. But very often, it’s not the complete record. Tracking down and taking a look at the original digital scan of the record is the only way to know.

It’s possible that the records have not been digitally scanned. In the case of public government records, that information may have been typed into a database, not extracted from a digital image. There may not be a digital scanned image. It may be very possible that the only original is sitting in a courthouse or church basement somewhere. It’s also possible that the digital images are only available on a subscription website that you don’t subscribe to.

We need to do our best to try to track down the original document and take a look at it to see if there’s anything else that’s of value to us in our research that the indexers or the company just didn’t pick up on or didn’t spend the money to index.

#4 Different websites potentially have different digital scans of the same record.

Websites sometimes collaborate on acquiring and indexing records. In those cases, they might be working with the same digital images. But oftentimes, they create their own digital scans. That means that a record may be darker or lighter, or sharper or blurrier from one website to the next. So while you found the record on one website, another might have a copy that’s much easier to read.

Digital scanning has also come a long way over the years. Many genealogy sites now are looking at some of the earlier scans they did. They’re realizing that some are pretty low quality by today’s standards. They might determine that it’s worth going back and rescanning the record collection. This happened with some of the earliest census records that were digitized many years ago. It makes a lot of sense, because a lot of time has passed, and technology has certainly changed.

So even though you found information many years ago, it might be worth taking a second look if you have any questions about what’s on that document. You may find that that record is actually a newly digitized image on the same website, or you might find that it’s also available somewhere else.

A lot of the partnerships out there are with FamilySearch which is free. So, while you may have a paid subscription to a site like Ancestry or MyHeritage, if there’s anything that you’re questionable on, or you didn’t actually see the original document from one of those paid websites, head to FamilySearch.org. Run a search and see if they happen to have the digitized images. There’s a good chance they might, and it’s worth taking a look.

Sometimes the genealogy website will have tools that allow you to get a better look at the digitized document. Ancestry is a great example of this. On the digitized image page click the tool icon to open the Tools menu. One of my favorite tools is “Invert colors”. Click that button, and it will turn it into a negative image. Sometimes this allows words to pop out in a way that they were not as clearly visible in the normal view.

I downloaded a digital scan from a website several years ago, and it was hard to decipher. I did some searching and was able to find  a clearer copy on another website.

#5 You can verify that the words were indexed accurately.

Reviewing a scan of the entire document provides you with a lot of examples of the handwriting of the person who made the entry. If you have any doubt about words or spelling, making comparisons with other entries can be extremely helpful.

When I first looked at a baptismal record of my 2x great grandmother’s son, I thought her surname was Lekcyzk. However, after seeing a different digital scan, I started to question that. Having the original record allows me to review the handwriting of the person who wrote these records. Comparing the handwriting of other entries on the page helped me determine that the swish at the top is the dotting of an eye that just had a bit more flourish. I also reconfirmed that the Z in the name is definitely a Z by comparing it to other Zs on the page.  

Bonus Reason: You may have missed the second page.

Some records have more than one page, and it’s easy to miss them. If the indexer took information primarily off of the first page, it may not be obvious when you look at that page, that in fact, it’s a two-page (or more) document. More pages potentially means more valuable information!

It’s also possible that if you downloaded a document years ago when you first started doing genealogy, you might have missed the additional pages. Now that you’re a more experienced researcher, it would be worth going back and looking at particular types of records that are prone to having second pages. Examples of this are:

  • census records,
  • passenger list,
  • passport records,
  • criminal records,
  • and probate records.

If you have single page records that fall in one of these categories saved to your computer, you might want to go back and do another search for them and check the images that come before and after that page to see if there are more gems to be found.

I hope I’ve convinced you to always make the effort to obtain and review original records for the information that you find while doing genealogy research online.

I’ll bet there’s even more reasons to do this, so I’m counting on you. Please leave a comment and let me know what you’ve found following these 5 reasons, and any additional reasons that you have.

Resources

Downloadable ad-free Show Notes handout for Premium Members

 

Pin It on Pinterest

MENU