How Artificial Intelligence AI and Machine Learning Impact Genealogy
Artificial Intelligence and Genealogy
Elevenses with Lisa Episode 32
In this episode we tackle a few small geeky tech questions about artificial intelligence, better known as AI, that may have a pretty big impact on your genealogy life. Questions like:
- Is artificial intelligence the same thing as machine learning?
And if not how are they related? - And am I using AI, maybe without even being aware of it?
- And what impact is AI really having on our lives? Is it all good, or are there some pitfalls we need to know about?
We’re going to approach these with a focus on family history, but pretty quickly I think we’ll discover it’s a much more far-reaching subject. And that means this episode is for everyone.

Watch the free video below.
While I’ve done my own homework on this subject and written about it in my book The Genealogist’s Google Toolbox, I’m smart enough to call in an expert in the field. So, my special guest is Benjamin Lee. He is the developer of the Newspaper Navigator, the new free tool that uses artificial intelligence to help you find and extract images from the free historical newspaper collection at The Library of Congress’ Chronicling America. I covered Newspaper Navigator extensively in Elevenses with Lisa episode 26.
Ben is a 2020 Innovator-in-Residence at the Library of Congress, as well as a third year Ph.D. Student in the Paul G. Allen School for Computer Science & Engineering at the University of Washington, where he studies human-AI interaction with his advisor, Professor Daniel Weld.
He graduated from Harvard College in 2017 and has served as the inaugural Digital Humanities Associate Fellow at the United States Holocaust Memorial Museum, as well as a Visiting Fellow in Harvard’s History Department. And currently he’s a National Science Foundation Graduate Research Fellow.
Thank you so much to Ben Lee for a really interesting discussion and for making Newspaper Navigator available to researchers. I am really looking forward to hearing from him about his future updates and improvements.
Artificial Intelligence and Genealogy
Covering technology and its application to genealogy is always a bit of a double-edged sword. It can be exciting and helpful, and also problematic in its invasiveness.
Tools like family tree hints, the Newspaper Navigator and Google Lens (learn more about that in Elevenses with Lisa episode 27) all have a lot to offer our genealogy research. But on a personal level, you may be concerned about the long reaching effects of artificial intelligence on the future, and most importantly your descendants. In today’s deeply concerning cancel culture and online censorship, AI can seriously impact our privacy, security and even our freedom.
As I did my research for this episode I discovered a few things. Artificial Intelligence and machine learning is having the same kind of massive and disrupting impact that DNA has had on genealogy, with almost none of the same publicity. (For background on DNA data usage, listen to Genealogy Gems Podcast episode 217. That episode covers the use of DNA in criminal cases and how our data potentially has wide-reaching appeal to many other entities and industries.)
A quick search of artificial intelligence in Google Patents reveals that work continues on ways to apply AI to DNA and genealogy. (See image below)

Patent search result: a pending patent involving AI and DNA by Regeneron Pharmaceuticals, Inc.
AI now makes our genealogical research and family tree data just as valuable to others outside of genealogy.
This begs the question, who else might be interested in our family tree research and data?
Who Is Interested in Your Genealogy Data
One answer to this question is academic researchers. During my research on this subject The Record Linking Lab at Brigham Young University surfaced as just one example. It’s run by a BYU Economics Professor who published a research paper on their work called Combining Family History and Machine Learning to Link Historical Records. The paper was co-authored with a Notre Dame Economics and Women’s Studies professor.
In this example, their goals are driven by economic, social, and political issues rather than genealogy. Their published paper does offer an eye-opening look at the value that those outside the genealogy community place on all of the personal data we’re collecting and the genealogical records we are linking. Our work is about our ancestors, and therefore it is about ourselves. Even if living people are not named on our tree, they are named in the records we are linking to it. We are making it all publicly available.
In the past, historical records like birth and death, military and the census have been available to these researchers, but on an individual basis. This made them difficult to work with. Academic (and industry) researchers couldn’t easily follow these records for individual people, families, and generations of families through time in order to draw meaningful conclusions. But for the first-time machine learning is being applied to online genealogy research data making it possible to link these records to living and deceased individuals and their families.
It’s a lot to think about, but it’s important because it is our family history data. We need to understand how our data is being used inside and outside the genealogy sandbox.
Answers to Your Live Chat Questions About AI
One of the advantages of tuning into the live broadcast of each Elevenses with Lisa show is participating in the Live Chat and asking your questions.
From Linda J: What about all the “people search” sites (not genealogy) that have all, or a lot of, our personal date?
Lisa’s Answer: My understanding is that much of the information provided on many of the “people search” websites comes from public information. So while the information is much easier to access these days, it’s been publicly available for years. That information isn’t as accessible to projects like the one discussed in this episode because those websites don’t make their Application Programming Interface (known as API) publicly available like FamilySearch does.
From Doug H: Wouldn’t that potentially find errors in our trees?
Lisa’s Answer: Yes.
From Sheryl T: Do these academic researchers have access to the living people on the trees? Or are those protected from them as it is to the public?
Lisa’s Answer: They have access to all information attached to people marked as “Living Person.” Therefore, if the attached record names them, their identity would then be known. Click a hint on your tree at Ancestry for example, and the found records clearly spell out the name of the person they believe is your “Living” person.
From Nancy M: How long do the show notes stay available? am looking for Google Books two weeks ago and last week’s Allen Co Library.
Lisa’s Answer: The show notes remain available until the episode is archived in Premium Membership. You can find all of the currently available free Elevenses with Lisa episodes on our website in the menu under VIDEOS click Elevenses with Lisa.
Nannie A: I heard a rumor that Ancestry .com has been sold. Do you know if that’s true?
Lisa’s Answer: Yes, they were sold again this year. Read:
Private equity firm Blackstone Group Inc. buying for $4.7 billion
Private equity wants to own your DNA by CBS News.
Get My Free Genealogy Gems Newsletter – click here.
Bonus Download exclusively for Premium Members: Download the show notes handout.
Become a Genealogy Gems Premium Member today.
Tracing your African American Roots: Top Tips
Researching African American roots has unique challenges. This Q&A with expert Angela Walton-Raji can inspire you with tips and success stories. Learn what to ask, what history you should know, how to face the 1870 “wall” and how to explore your ancestor’s freedom...5 Reason You MUST Look at Original Records
Show Notes: When you find family history information online you MUST make every effort to find the original genealogy record so that your family tree will be accurate! There are 5 reasons to find original records. I’ll explain what they are, and what to look for so that you get the most information possible for your family tree.
If you’re a genealogy beginner, this video will help you avoid a lot of problems. And if you’re an advanced genealogist, now is the time to fix things.
Watch the Video
Show Notes
Downloadable ad-free Show Notes handout for Premium Members.
#1 Many online records are simply way too vague.
Records come in many forms. Many genealogy websites consider that each name that appears on a document is a “record” when they’re counting records. So, when you hear that 10 million records have been added to a website, it doesn’t necessarily mean that 10 million genealogical documents have been added. It oftentimes means that that’s the number of names that they’ve added.
One document could have a lot of names. In the case of a death certificate, it could have the name of the deceased, the name of the spouse, the name of the informant, and the names of the parents. Each one of those gets counted as a record.
Recently, MyHeritage announced they’ve added 78 million new records to their website. However, many of these records are simply transcriptions, they’re extracting the information from whatever the original source was. That information becomes searchable, and that’s terrific because they are great clues. So, sometimes when you go and look at the records themselves, it turns out that record really is just a transcription. There is no digital record to look at.
Sometimes the website doesn’t even tell you what the original record was. There will be clues, though. You can use those clues and run a search on those words. So, if it talks about a particular location, or type of record, or the name of the record, you could start searching online and find out where are those original records are actually held. Sometimes they are on another genealogy website. But a lot of times, and I’ve seen this more recently, they are publicly available records, oftentimes from governmental agencies. Very recently, we’ve been seeing more recent records that are just selected text. They may be records for people who just passed away a year or two ago.
There are a wide range of places where these types of records can come from. But if that genealogy website got its hands on the record, chances are you could too. And it’s really important to do that.
#2 What’s important to you might not have been prioritized for indexing.
The indexer is a person, or perhaps even an artificial intelligence machine, who has gone through the documents and extracted information and provided it in text form. Sometimes when you search on a genealogy website, all you’re getting is just that typed text, that transcription, of some of the key data from the original document.
I’ll tell you about one example in my family. I was looking at a 2x great grandmother back in Germany. Her name was Louise Leckzyk. She’s listed as Louise Nikolowski in the Ancestry record hint. Technically, that’s true, she was Louise Nikolowski at the time of the birth of her child. But if you pull up the original record, what you discover is she’s not listed as Louise Nikolowski on the record. She’s listed with her maiden name, which was usually the case in those old German church records. So that’s huge. We’ve talked about how challenging it can be to find maiden names here on the Genealogy Gems channel. So, we don’t want to miss any opportunity to get one. But if we had taken this record hint at face value, and just extracted that information, put it in our database, or attached it to our online family tree, and never looked at the original document, we would have completely missed her maiden name. And that maiden name is the key to finding the next generation, her parents.
#3 Not all information on a record is indexed.
It’s very common for large portions of information on a document not to be indexed. Here’s the reason for that: Indexing costs money. When a genealogy company takes a look at a new record collection they have some hard decisions to make. They have to decide which fields of information will be included in the indexing. Oftentimes, there will be several columns, as in a church record or a census record. The 1950 census was an example of this. There’s so much data that the company has to look at that and say, what do we think would be of the most value to our users? They then index those fields. They’ve got to pay to not only have them indexed, but potentially also reviewed human eyes, or AI. That all costs money.
So, there will inevitably be information that gets left off the index. That means that when you search the website you’re going to see the record result, and it can give you the impression that that is the complete record. But very often, it’s not the complete record. Tracking down and taking a look at the original digital scan of the record is the only way to know.
It’s possible that the records have not been digitally scanned. In the case of public government records, that information may have been typed into a database, not extracted from a digital image. There may not be a digital scanned image. It may be very possible that the only original is sitting in a courthouse or church basement somewhere. It’s also possible that the digital images are only available on a subscription website that you don’t subscribe to.
We need to do our best to try to track down the original document and take a look at it to see if there’s anything else that’s of value to us in our research that the indexers or the company just didn’t pick up on or didn’t spend the money to index.
#4 Different websites potentially have different digital scans of the same record.
Websites sometimes collaborate on acquiring and indexing records. In those cases, they might be working with the same digital images. But oftentimes, they create their own digital scans. That means that a record may be darker or lighter, or sharper or blurrier from one website to the next. So while you found the record on one website, another might have a copy that’s much easier to read.
Digital scanning has also come a long way over the years. Many genealogy sites now are looking at some of the earlier scans they did. They’re realizing that some are pretty low quality by today’s standards. They might determine that it’s worth going back and rescanning the record collection. This happened with some of the earliest census records that were digitized many years ago. It makes a lot of sense, because a lot of time has passed, and technology has certainly changed.
So even though you found information many years ago, it might be worth taking a second look if you have any questions about what’s on that document. You may find that that record is actually a newly digitized image on the same website, or you might find that it’s also available somewhere else.
A lot of the partnerships out there are with FamilySearch which is free. So, while you may have a paid subscription to a site like Ancestry or MyHeritage, if there’s anything that you’re questionable on, or you didn’t actually see the original document from one of those paid websites, head to Run a search and see if they happen to have the digitized images. There’s a good chance they might, and it’s worth taking a look.
Sometimes the genealogy website will have tools that allow you to get a better look at the digitized document. Ancestry is a great example of this. On the digitized image page click the tool icon to open the Tools menu. One of my favorite tools is “Invert colors”. Click that button, and it will turn it into a negative image. Sometimes this allows words to pop out in a way that they were not as clearly visible in the normal view.
I downloaded a digital scan from a website several years ago, and it was hard to decipher. I did some searching and was able to find a clearer copy on another website.
#5 You can verify that the words were indexed accurately.
Reviewing a scan of the entire document provides you with a lot of examples of the handwriting of the person who made the entry. If you have any doubt about words or spelling, making comparisons with other entries can be extremely helpful.
When I first looked at a baptismal record of my 2x great grandmother’s son, I thought her surname was Lekcyzk. However, after seeing a different digital scan, I started to question that. Having the original record allows me to review the handwriting of the person who wrote these records. Comparing the handwriting of other entries on the page helped me determine that the swish at the top is the dotting of an eye that just had a bit more flourish. I also reconfirmed that the Z in the name is definitely a Z by comparing it to other Zs on the page.
Bonus Reason: You may have missed the second page.
Some records have more than one page, and it’s easy to miss them. If the indexer took information primarily off of the first page, it may not be obvious when you look at that page, that in fact, it’s a two-page (or more) document. More pages potentially means more valuable information!
It’s also possible that if you downloaded a document years ago when you first started doing genealogy, you might have missed the additional pages. Now that you’re a more experienced researcher, it would be worth going back and looking at particular types of records that are prone to having second pages. Examples of this are:
- census records,
- passenger list,
- passport records,
- criminal records,
- and probate records.
If you have single page records that fall in one of these categories saved to your computer, you might want to go back and do another search for them and check the images that come before and after that page to see if there are more gems to be found.
I hope I’ve convinced you to always make the effort to obtain and review original records for the information that you find while doing genealogy research online.
I’ll bet there’s even more reasons to do this, so I’m counting on you. Please leave a comment and let me know what you’ve found following these 5 reasons, and any additional reasons that you have.
Downloadable ad-free Show Notes handout for Premium Members.