I’ve been working on a project to simulate pedigrees and calculate the amounts of DNA shared between pairs of computer-generated relatives in different relationships. Using the Ped-sim Pedigree Simulator,1 I’ve generated data for eight relationships so far, with 10,000 simulated pairs of relatives apiece.
The main page for Sim-cM has summaries of the results that I have generated, along with links to download the raw data in text files. There is also a Sim-cM White Paper, which provides an in-depth discussion of the methods and preliminary data analysis, including comparisons between the simulated data and real-life data collected by the Shared cM Project.
From the white paper:
In genetic genealogy, evaluating the possible relationships for a pair of people given the amount of DNA they share is a cornerstone of being able to use DNA evidence in support of genealogical conclusions. While for many relationships, expected amounts of shared DNA can be estimated,2 the Shared cM Project3 has been an incredibly useful resource for analysis of genetic match data. The Shared cM Project’s utility stems not only from its basis in actual observations from DNA test takers, but also from to the fact that it shows distributions of amounts of shared DNA, providing reasonable ranges to expect for many relationships of interest.
The Shared cM Project results are limited, though, to what relationships DNA test takers have reported, and by the number of people who have reported. Many of the most common relationships in the Shared cM Project are fairly well represented, but many others have been reported much more rarely, if at all. Simulated data can help fill in the gaps in the observations of the Shared cM Project. Some simulated data have been previously generated and used in the community’s analytical tools,4 but these data have only been presented in a limited format, and they are also limited in the relationships included.
Sim-cM aims to supplement the data from the Shared cM Project by simulating shared DNA for various genealogical relationships (and having a pipeline in place to analyze additional relationships). The goals of Sim-cM are:
- to generate distributions of amount of identical by descent (IBD) DNA, number of IBD segments shared, and size of largest segments shared for well-studied, rarely reported, and previously unreported relationships; and
- to evaluate these simulated datasets, in conjunction with the empirical data (such as from the Shared cM Project), to better understand how to translate the amounts of DNA shared, as reported by DNA testing companies, into genealogical relationships.
1. Madison Caballero, Daniel N. Seidman, Thomas D. Dyer, et al., “Surprising impacts of crossover interference and sex-specific genetic maps on identical by descent distributions,” BioRxiv, 22 January 2019; preprint article (https://doi.org/10.1101/527655 : accessed 24 June 2019). Files downloaded from williamslab/ped-sim, GitHub (https://github.com/williamslab/ped-sim : 19 June 2019).
2. “Autosomal DNA statistics,” International Society of Genetic Genealogy Wiki (https://isogg.org/wiki/Autosomal_DNA_statistics : accessed 19 June 2019). F. M. Lancaster, “Calculation of the Coefficient of Relationship” in Genetic and Quantitative Aspects of Genealogy; accessed via Internet Archive(https://web.archive.org/web/20161229015050/http:/www.genetic-genealogy.co.uk/Toc115570135.html : 19 June 2019).
3. Blaine Bettinger, “August 2017 Update to the Shared cM Project,” The Genetic Genealogist, 26 August 2017 (https://thegeneticgenealogist.com/2017/08/26/august-2017-update-to-the-shared-cm-project/ : accessed 19 June 2019).
4. Catherine A. Ball, Mathew J Barber, Jake Byrnes, et al. “AncestryDNA Matching White Paper,” AncestryDNA, Ancestry, 31 March 2016 (https://www.ancestry.com/dna/resource/whitePaper/AncestryDNA-Matching-White-Paper.pdf : accessed 19 June 2019), 5.2 Method for estimating relationships. Leah Larkin, “Science the Heck out of Your DNA – Part 3: The DNA Painter Lookup Tool,” The DNA Geek, 11 January 2018 (https://thednageek.com/science-the-heck-out-of-your-dna-part-3/ : accessed 19 June 2019).