Skip to main content

Building Family Trees: Identifying Enslaved People in Ecclesiastical Records

Posted by on Wednesday, August 25, 2021 in Arts and Humanities, College of Arts and Science, Completed Research, DS Team Engagement.

Picture of yellowed page of historical book with words written in beautiful script
Example of historical record entry

The Slave Societies Digital Archive (SSDA) preserves the oldest serial records for slave societies in the Americas.  The records contain handwritten text in collections of books by ecclesiastical (religious) and notaries (other government officials) for the purposes of recording births, marriages, and deaths.  The text are written in ecclesiastical Spanish and Portuguese.  Through efforts of history and language experts across the globe, many of these texts have been transcribed into an electronic form.

What familial relationships existed among these enslaved people?  How did their families grow, intermingle, and develop?  What insights can be gleaned by users discovering their previously unknown family lineages and origins by tracing these histories? Executive Director of SSDA and Mellon Assistant Professor of History and Digital Humanities at Vanderbilt University Dr. Daniel Genkins seeks to build these lineages and trace these familial networks.

With the expertise and time investments required for transcribing images of these documents into electronic text, Dr. Genkins realized the need for automated methods for extracting people, locations, and other information of interest from the transcripts to build a database of these entities as the foundation of entity matching, and for the end goal of building family trees.  In a joint effort between the Vanderbilt Data Science Institute (DSI) and Dr. Genkins, a hybrid team composed of undergraduates from the History department, graduate and pre-doctoral students, staff in Digital Humanities, and DSI data scientists began investigating state-of-the-art methods for named entity extraction.  Together, the team created a documented, reproducible repository of code and models which were able to even identify entities which were mistakenly labeled in the original dataset.  After the success of the initial model and new technical data science skills gained by the participants, Dr. Genkins and the SSDA team continue to move the project forward.

Drawing of historical dark-skinned women with black tunics overing white shirts and pastel colored skirts
image courtesy of

To watch Dr. Genkins further detail his efforts with SSDA and machine learning, view his video here: .  To learn about SSDA, peruse their website at

Tags: , , ,