A new technique lets researchers plumb oceans of human genetic data to produce the largest ever family tree — and they’re just getting started.

In our Study of the Day feature series, we highlight a research publication related to a John Templeton Foundation-supported project, connecting the fascinating and unique research we fund to important conversations happening around the world.

One day in the summer of 1873 an English barrister opened the pages of the “Pall Mall Gazette” to leaf through the society wedding pages, noticed a union announced between two people with the same last name, and commenced wondering what the chances were that the couple were first cousins, as his own parents had been. So began one of the first endeavors in genetic genealogy, the attempt to reconstruct family trees en masse by analyzing sheaves of gathered data.

Portrait of Sir George Howard Darwin.

After two years of census-reading, fraction-computing, and deep dives into “Burke’s Landed Gentry”, George Darwin, the fourth son of cousins Charles Darwin and Emma Darwin, published his analysis. In “Marriages between First Cousins in England and their Effects,” the younger Darwin used the frequency of same-name marriages to estimate the national rates of cousin marriage by location and social class. Those in higher social classes, like the Darwins, seemed to marry their cousins at up to double the rate of the general population.

In the 150 years that followed, genetic genealogy has come a long, long way, both in breadth and ambition. Since the first human genome was sequenced in 2001, the amount of available human genetic data has accelerated as the cost per genome has plummeted. More than 150,000 human genomes have now been sequenced, and projects like the National Institutes of Health’s All of Us program will soon push the number of sequences into the millions. It isn’t limited to contemporary humans either: Harvard geneticist David Reich is leading the effort to collect genome-scale data from 10,000 ancient humans, amounting to an “Ancient DNA Atlas of Humanity.”

The oceans of data generated offer researchers the potential to explore the full genetic relationship among humans today and far into the past, but there are certain barriers. Geneticists must wrangle multiple data sets made with different (and “noisy”) sequencing techniques, often with varied access and governance restrictions. Once tens of thousands of sequences are combined, the resulting data set is often too massive to easily work with, but current data reduction techniques can identify important parts of the big picture.

The journal Science recently published “A unified genealogy of modern and ancient genomes,” by lead author Anthony Wilder Wohns of the University of Oxford’s Big Data Institute, in collaboration with Reich and other co-authors. It outlines new ways to combine genomic sequences from different databases and analyze them using tree sequences — essentially gene-level family trees that diagram individual genes’ divergence from their nearest common ancestors. The team presents algorithms to infer tree sequences from sets of genomic data, provide estimates for the timing of genomic changes, and integrate with location and age data from historic genomes to provide maps of how humans’ genomes have changed across space and time. Because tree sequences are much more succinct than the underlying genetic data, they allow researchers to work efficiently with data currently available, with room to include millions more genomes as they become available.

In their proof-of-concept study, the team used their tools to analyze thousands of individual sequences from 215 human populations, including ancient samples from people who lived as much as 100,000 years ago, creating the largest human genealogy to date. Their algorithms predicted a network of almost 27 million ancestral branchings. When combined with location data, the results captured key historical events in human migration out of Africa and spread across Asia, Europe and the Americas, as well as the addition of archaic DNA as some Homo sapiens groups interbred with Neanderthal and Denisovan populations.

In addition to confirming some of what we already know about humans’ genetic history, tree sequence analysis offers huge promise as a tool for inferring a far more detailed history of human migration, mixture, and adaptation — allowing our genomes to tell us not just who we are today, but how we got here.

Still Curious?

Read the full text of the Science article and the press release from Oxford’s Big Data Institute

About

Funding Areas

Our Grants

About

Funding Areas

Our Grants

A Family Tree With 27 Million Branches

A new technique lets researchers plumb oceans of human genetic data to produce the largest ever family tree — and they’re just getting started.

Still Curious?

Religious but Not Spiritual

Inequality and the Constitution

Explore Other Studies of the Day

Religious but Not Spiritual

Inequality and the Constitution

About JTF

Quick Links

Social Media

Share on Mastodon

About

Funding Areas

Our Grants

About

Funding Areas

Our Grants

Be the most interesting person in the room.

Receive News from the John Templeton Foundation Pop Up

Sign up to get thought-provoking, awe-inspiring stories delivered right to your inbox.

A Family Tree With 27 Million Branches

A new technique lets researchers plumb oceans of human genetic data to produce the largest ever family tree — and they’re just getting started.

Still Curious?

Explore Other Studies of the Day

Religious but Not Spiritual

Inequality and the Constitution

Explore Other Studies of the Day

Religious but Not Spiritual

Inequality and the Constitution