Inference of population structure from ancient DNA

Inference of population structure from ancient DNA

“Inference of population structure from ancient DNA” by Tyler A. Joseph and Itsik Pe’er. https://doi.org/10.1101/261131

I selected this article for review for a few reasons:

  • Inference of populaton structure is of fundamental interest
  • Abundance of ancient DNA samples have increased; the data is challenging and has complexities
  • I wanted to get an update on the state-of-the-art in the field in this space

This paper focuses on tackling the problem of inference of structure in ancient DNA. The reviewers selected are all population geneticists, knowledgable about the general issues and the specifics regarding technical complexities of ancient DNA. Overall, it looks like the reviewers appreciated the novelty, but wanted to see more simulations to tease out gain and advances in inference compared to the usual workhorses. A lot of technical questions and concerns pervade - hopefully helpful to the authors as they work through paper submission and evaluation.

I want to thank the authors for sharing their work on bioRxiv before it has undergone peer review. I also want to take an opportunity to thank the reviewers who have donated their time to evaluate the novelty, strengths, and limitations of this work.

One reviewers chose to remain anonymous, and two chose to be named. All three reviewers were faculty at academic institutions. All three reviews are available in full below.


Reviewer 1 (Torsten Günther)

The preprint by Joseph & Pe’er presents a new method Dystruct (Dynamic Structure), a framework for inferring population structure across time based on the approach of STRUCTURE introduced by Pritchard et al 18 years ago. STRUCTURE and derived approaches are one of the main workhorses in population genomic analysis. The original publication from 2000 has more than 20000 citations already - a number that does not even include citations of extended models and derived implementations of the same concept (e.g. ADMIXTURE). The general idea is that all individuals in an empirical dataset are modeled as mixtures of K latent populations or clusters where the genotype at each locus depends on the cluster of origin as well as the allele frequencies in that latent population.

These approaches are also widely used in the genomic analysis of ancient DNA (aDNA) data. Recent technological advances allow to study data sets of several hundreds of individuals sampled across time and space. Applying STRUCTURE-like approaches to these datasets does not fully fit the actual model since individuals are sampled across time, while the allele frequencies in the latent populations are considered constant. A nice illustration of this limitation are the assignments of samples which have lived more than 20000 years ago (the three Pleistocene hunter-gatherers in Fig 5b) which are usually shown as a mixture of many other populations while they in fact lived before some of these lineages have split.

Dystruct is an important and obvious extension to these popular approaches specifically aimed towards application in ancient DNA datasets. Dystruct allows the allele frequencies in the latent populations to drift over time providing a more realistic model when samples are from different points in time. This has the potential to become an important extension to the standard popgen toolkit used in ancient DNA studies most relying on relatively simple statistics of allele frequency correlations or methods not directly taking the temporal nature of the data into account. Below, I am listing some thoughts on the model behind Dystruct and the results of the preprint. I will not go deeper into the details of the model or the implementation.

One major point that I noticed was the constant effective population size across time and in all latent populations whereas differences in sampling time are explicitly part of the model. The limitation here is that the amount of drift between two time points depends on both the absolute time difference and the effective population size. We know that effective population size differs substantially between human populations (especially on a worldwide scale between e.g. African’s and non-Africans), so this assumption is easily violated. I understand that this is a step chosen to keep everything feasible. The authors claim that this does not affect their results substantially, but I would like to see that better presented in the paper so the readers and potential users can draw their own conclusions on the limitations of this approach.

Another potentially limiting assumption is that all K populations have existed since the beginning of time. As mentioned above, some of the individuals included in the “Real data” section of the preprint have lived before some of the lineages seen in modern Eurasians have diversified or who might represent evolutionary dead ends. Dystruct seems to force these populations into clusters explaining the differences between early and later populations only by drift. It would be interesting to see how dependent this is on the sampling since there is a risk of overfitting. More realistic simulations could be coalescent simulations based on a population split model and then one could also assess how different sample sizes at different time points affect the results.

The application of Dystruct to real data in this preprint highlights both the potential for additional insights as well as the potential limitations. It is great to see that the Pleistocene hunter-gatherers are not displayed as extremely admixed between several younger populations. According to the general opinion in aDNA literature, Steppe individuals of the Yamnaya culture are deriving from the admixture of two earlier hunter-gatherer populations. Only one of them is included in this analysis (in form of Samara_HG and Karelia_HG), but neither ADMIXTURE nor Dystruct can pick up this expected pattern. The authors point out that Kostenki14 is modeled as an ancestor of these steppe herders although populations related to this individual are sometimes assumed to be closer related to other European hunter-gatherer groups. These differences between the results of Dystruct and the previous literature are interesting and they illustrate how methodology specifically geared towards aDNA applications can help to uncover patterns people didn’t see before. On the other hand it also highlights that all methods need to be tested properly so we understand the potential artifacts and biases of both newly developed methods as well as the methodology originally used to draw these conclusions.

Finally, when I read about the differences in entropy between ADMIXTURE and Dystruct, I was wondering if Dystruct would be able to explain the same temporal data set at a lower value of K. The authors say that they do not have an approach to estimate an “optimal K” (and people trying to represent their data in a single K is a problem on its own) and I have no idea how to test this - but intuitively this could be one feature of Dystruct compared to previous approaches.


Reviewer 2 (Joshua Schraiber)

The manuscript by Joseph and Pe’er presents an exciting extension of the standard Pritchard-Stephens-Donnelly (PSD) model used in Structure and Admixture to explicitly model temporally separated samples, with a particular focus on ancient DNA. As is well known, the PSD model doesn’t behave well in a context where there’s substantial but unmodeled genetic drift, and can in fact result in some very strange inferences. One of those instances is demonstrated in this manuscript, where Admixture reports some ancient samples are mixtures of modern populations (which seems very strange), while Dystruct reports modern populations as mixtures of the ancient samples (which seems to fit our intuition and verbal models of the historical process better).

The authors make use of the well-known connection between the PSD model and latent Dirichlet allocation (LDA). In particular, they recognized that some of the machinery that was developed to model topic frequencies changing through time in an LDA context could be used to model allele frequencies changing through time in an evolutionary context. In particularly, they apply a neat augmentation trick, and are able to apply a Kalman filter.

The application to simulated data raises a few questions for me, however. First of all, there seemed to be some very interesting behavior with respect to the effective size assumed by Dystruct. If I understand correctly, all simulations were done with all effective sizes fixed at 2500. However, it seems that in many cases, assuming N = 1000 provided nearly identical and in some cases BETTER performance than using the true effective size. This is sort of interesting, and I’m guessing maybe be a result of the approximate variance that the authors use (more on this in a moment). Moreover, comparison of Fig 3 and Fig 4 reveal that there’s some interesting interaction between the age of the samples and the total time in the simulations, which could be interesting to explore a bit further, perhaps by breaking down 4a even further as a supplemental figure, showing the impact for each sampling time.

The real data analysis was fairly convincing and showed a very interesting result, as I already pointed out. I haven’t quite had the time to think about how the particulars of the inference interact with all the details of the history of Europe, but nothing seems overtly shocking. Nonetheless, Fig 4b remains very, very interesting and a powerful testament to the importance of modeling drift explicitly in this kind of analysis.

Overall, I very much enjoyed reading the manuscript, as it presents a clever solution to an important and timely problem. I have some additional minor comments:

1) The authors adapt the variational algorithm from fastStruture, but fail to mention it by name in the introduction, which I think is a bit of a shame. It also seems like it might be relevant to explicitly compare to fastStructure, to ensure that the differences aren’t due to difference between a (variational) Bayes approach and the maximum likelihood approach that Admixture uses

2) I’m assuming the main reason for the argument leading to equation 4 is that the Kalman filtering won’t work easily if the variance is a nonlinear function of the allele frequencies? It struck me as a very strange thing to do at first, given that the allele frequencies are inferred as part of the algorithm, so I think it’s worth explaining to the reader in a bit more detail why this approximation is necessary.

3) It might be interesting to see simulations of a more biologically realistic model, in which the populations are related by a tree/admixture graph.

4) Starting section 2.6 with a numbered citation is just kind of ugly, I would reword that to avoid such an unaesthetic sentence.

5) I appreciated the discussion of the effective population size as a regularization parameter. I wonder if that suggests that the only real way to tune it would be to use some kind of cross validation, as is often the case for regularization hyper parameters?

6) The authors should cite and maybe even compare results to Ohana (https://academic.oup.com/bioinformatics/article/33/14/2148/3002763), which solves a similar problem, but explicitly builds a population tree.

I prefer to sign my reviews. My name is Joshua Schraiber.


Reviewer 3 (Anonymous Reviewer)

This paper describes a modification of the commonly-used structure/ADMIXTURE (“PSD”) clustering algorithms that allows allele frequencies in the hypothetical source populations to drift over time. It’s a smart idea and seems to work well in certain scenarios. I also think it’s an important question to think about, given that there is still a great deal of confusion in the field about the interpretation of structure plots. So I think that this approach is definitely worth looking into, and is likely to be useful as a sort of robustness check to structure analysis. That said, I’m not sure that it really solves any of the major issues with structure, and the results on the real data are not (to me) obviously easier to interpret than the ADMXITURE results.

I think it’s an interesting method in a technical sense, and I don’t think it’s any worse than structure in terms of interpretation. But as it stands, I am not clear that you really learn anything new about population relationships compared to structure. I’d suggest running some simulations under more realistic demographic scenarios (i.e. run coalescent simulations of realistic demographic history, rather than simulating under the model), and seeing how the results compare with ADMXITURE. I’d also suggest looking at the real data for a much wider range of K, and trying understand get a better sense of what the differences are, and how they should be interpreted in the light of other analyses (e.g. what we know about the ancient populations based on PCA or f-statistics, or smc-based analysis).

1) The main issue with interpreting structure plots is that the latent populations do not exist. So as hard as people try to interpret them as ancestral populations, it simply isn’t true. See here, for example: https://www.biorxiv.org/content/early/2017/11/27/066431. This is because, even if sampled populations were discrete mixtures derived from ancestral populations, under any reasonable model of history, the ancestral components in each sampled population would have drifted relative to the “true” ancestral populations. This variation in drift is precisely what makes it difficult to interpret the output of structure. One component of that drift is temporal - i.e. we would expect that populations that are closer in time share drift to some extent. That specific component is exactly what this model incorporates, in contrast to structure.

However, another component of that drift just comes from the fact that the actual populations contributing to the sampled populations are themselves drifted. For example, see Figure 2 in the paper - in these simulations, the individuals that contribute to the admixed (or directly to the ancient populations) are sampled directly from the latent populations. But in a realistic model, they would be sampled from populations that were themselves drifted relative to the latent populations. Further, the drift leading to each sampled population would not be independent, or equal. It’s these differences, particularly differences in the extent of drift, that make the results difficult to interpret. For example see Figure 1 in Rosenberg et al 2002 (http://science.sciencemag.org/content/298/5602/2381.full) where the Kalash get their own cluster at K=6 just because they are highly drifted. I think this effect, even for ancient samples (maybe apart from very old samples), is much bigger than the temporal effect described above.

So basically, the reason it is hard to interpret structure analysis is that the PSD model is misspecified compared to reality. Although Dystruct clearly performs better than ADMIXTURE when simulating under the Dystruct model, and maybe the Dystruct model is slightly less misspecified, it’s not clear to me that the Dystruct results with a realistic demography, or on real data, would be any more interpretable.

The other major issues with structure are that it is very sensitive to the sampling scheme and that there’s no good way to choose K. Dystruct doesn’t solve those problems, but it doesn’t claim to and it’s certainly no worse than structure, so that’s not really a criticism.

2) The authors write that their results on the Haak et al. data are better, or at least more interpretable than the ADMIXTURE results, but it’s not clear to me that’s true. Also, I think you need to show the analysis for different K, rather than just picking K=11 because it’s most interpretable.

The authors say that in the Dystruct analysis, most ancient samples are “pure”. But this is only really true for the Yamnaya and the Upper Palaeolithic hunter-gatherers. In any case, I don’t think this is very meaningful. Most (all?) of the ancient samples are relatively recent and are, themselves, admixed. For example, the Yamnaya come out as their own cluster in the Dystruct analysis, and as admixed in the ADMIXTURE analysis. But in that case the ADMIXTURE analysis actually makes a lot more sense because the Yamnaya are known from independent analyses to be admixed (between a population that is related to Eastern European hunter-gatherers and one that is related to hunter-gatherers from the Caucasus mountains). For example, see page 73 of the Haak et al. supplementary information, or Figure 4 of Lazaridis et al 2016 (https://www.nature.com/articles/nature19310). Similarly, I don’t know how to interpret the observation that Yamnaya has exactly the same ancestry as Kostenki. Just to give one example, the D statistic D(Outgroup, LBK_EN, Kostenki14, Yamnaya), is significantly nonzero so it cannot be correct to interpret this as saying that Kostenki and Yamnaya are a clade to the exclusion of LBK.

Similarly, how should we interpret the fact that Ust-Ishim and MA1 get their own cluster in the Dystruct analysis? Ust-Ishim is pretty close to being ancestral (and symmetrically related) to all later non-Africans. The ADMXITURE analysis has it as a roughly equal mixture of all non-African ancestry components. I know that’s modelling the ancient populations as mixtures of modern populations, but at least it makes sense in that context. In the Dystruct analysis, it has its own cluster that seems to be sharing a lot of ancestry with some Asian populations? I don’t really know what to make of that, but it doesn’t seem like it’s accurately representing the modern populations as mixtures of the ancient ones.


I am a statistical and population geneticist interested in understanding the relationship between DNA sequence variation and complex human disease. To build this understanding, my lab constructs statistical and computational methods grounded in principles of population biology, applying them to genetic data collected across entire human genomes. My research has answered population genetic questions about recent demographic and selective events in human populations, and I have helped to map hundreds of risk loci common diseases, particularly type-2 diabetes and heart attack. I have also contributed to the growing field of causal inference studies via Mendelian Randomization.