Abstract
Allele-sharing statistics for a genetic locus measure the dissimilarity between two populations as a mean of the dissimilarity between random pairs of individuals, one from each population. Owing to within-population variation in genotype, allele-sharing dissimilarities can have the property that they have a nonzero value when computed between a population and itself. We consider the mathematical properties of allele-sharing dissimilarities in a pair of populations, treating the allele frequencies in the two populations parametrically. Examining two formulations of allele-sharing dissimilarity, we obtain the distributions of within-population and between-population dissimilarities for pairs of individuals. We then mathematically explore the scenarios in which, for certain allele-frequency distributions, the within-population dissimilarity – the mean dissimilarity between randomly chosen members of a population – can exceed the dissimilarity between two populations. Such scenarios assist in explaining observations in population-genetic data that members of a population can be empirically more genetically dissimilar from each other on average than they are from members of another population. For a population pair, however, the mathematical analysis finds that at least one of the two populations always possesses smaller within-population dissimilarity than the value of the between-population dissimilarity. We illustrate the mathematical results with an application to human population-genetic data.
1 Introduction
Statistics that measure the genetic dissimilarity between pairs of populations are widely used for interpreting population-genetic data (Bowcock et al. 1994; Chakraborty and Jin 1993; Gao and Martin 2009; Mountain and Cavalli-Sforza 1997; Mountain and Ramakrishnan 2005; Rosenberg 2011; Tal 2013; Witherspoon et al. 2007). Patterns in numerical values of the statistics appear in calculations of the relative similarity and dissimilarity of different human groups (Mountain and Ramakrishnan 2005; Rosenberg 2011; Witherspoon et al. 2007). Further, genetic dissimilarity statistics, often termed “genetic distances,” underlie frequently applied tools for data analysis and visualization, including methods such as evolutionary tree construction (Bowcock et al. 1994) and multidimensional scaling (Gao and Martin 2009).
Population-level genetic dissimilarity statistics computed at a single genetic locus often proceed by considering pairs of vectors, p and q, representing the allele frequencies of two populations. Each vector consists of nonnegative entries that sum to 1. Hence, for a locus with I distinct alleles, such a genetic dissimilarity statistic has domain Δ
I−1 × Δ
I−1, where Δ
I−1 is the simplex
Among the many genetic dissimilarity statistics that are available (Jorde 1985; Nei 1987), those known as allele-sharing dissimilarities form a distinctive subset. Such statistics view a dissimilarity between two populations as the mean of a dissimilarity between pairs of individuals, one from one population and one from the other. With this perspective, they have a simple interpretation as a population-level generalization of an individual-level statistic. They also have a natural connection to a fundamental computation in human population genetics – the apportionment of genetic diversity among different levels of genetic structure (Edge et al. 2022; Lewontin 1972) – which can be viewed in terms of various mean pairwise dissimilarities across certain subsets of individuals (Rosenberg 2011).
Unlike most dissimilarity statistics – such as those based on such principles as the Euclidean distance between functions of allele frequency vectors (Cavalli-Sforza and Edwards 1967) or the dot product of these vectors (Nei 1972) – because they emerge from inter-individual computations among non-identical individuals, allele-sharing dissimilarities can produce nonzero values for the dissimilarity between a polymorphic population and itself. This feature assists in understanding a property of genetic variation in structured populations: the extent to which genetic dissimilarity of individuals from the same population ever exceeds genetic dissimilarity of individuals from different populations, if at all.
Because individuals in a population generally possess a larger number of recent shared ancestors than individuals from different populations, a perspective focused on population-genetic descent predicts that individuals from the same population will be genetically more similar than individuals from different populations. Indeed, in human population genetics, studies of allele-sharing dissimilarity find that the mean dissimilarity across pairs of individuals from different populations does exceed the mean dissimilarity for pairs from the same populations (Mountain and Ramakrishnan 2005; Rosenberg 2011; Tal 2013; Witherspoon et al. 2007). However, such studies also find a perhaps unexpected result that the allele-sharing dissimilarity for some pairs of individuals from the same population can exceed the dissimilarity for some pairs from different populations.
Here, we seek to explain the properties of allele-sharing dissimilarities within and between populations. We study mathematical properties of population-level allele-sharing dissimilarities under the assumption that individuals in a population represent random draws from the vector of allele frequencies in the population. We consider mean allele-sharing dissimilarities for pairs of individuals from the same population and for pairs of individuals from different populations, evaluating the conditions on allele-frequency vectors under which the allele-sharing dissimilarity for a population to itself can exceed the allele-sharing dissimilarity between two populations. We interpret the results in relation to ongoing efforts to understand human genetic similarity and difference.
2 Methods
2.1 Allele-sharing dissimilarities
An allele-sharing dissimilarity (ASD) is a type of dissimilarity that is based on counting the number of alleles shared at a locus between two diploid individuals. We consider two different versions of the ASD concept.
In one ASD variant, which we denote by
Another variant of ASD, which we denote by
Table 1 shows all seven possible pairs of unordered diploid genotypes for two individuals and their corresponding dissimilarities measured by
Case | Genotypes |
|
|
---|---|---|---|
1 | AA, AA | 0 | 0 |
2 | AA, AB |
|
|
3 | AA, BB | 1 | 1 |
4 | AA, BC | 1 | 1 |
5 | AB, AB | 0 |
|
6 | AB, AC |
|
|
7 | AB, CD | 1 | 1 |
2.2 Notation
Consider a locus with I distinct alleles. We consider allele-frequency vectors in each of two populations. In Population 1, the allele frequencies are p = (p
1, p
2, …, p
I
), where p
i
represents the frequency of allele i. In Population 2, they are q = (q
1, q
2, …, q
I
). The frequencies satisfy 0 ≤ p
i
, q
i
≤ 1 for all i, and
We are interested in mathematical properties of the distribution of ASD measure
We will have occasion to use various symmetric sums involving allele frequencies. For t = 1, 2, 3, 4, for expressions in the separate populations, we use the notation
where σ 1 = τ 1 = 1.
For expressions involving both populations, we use
where (t, u) is equal to (1,1), (1,2), (2,1), or (2,2). Note that each of these sums can be viewed as an inner product.
2.3 Assumptions
We seek to perform ASD computations under the assumption that individuals are sampled at random from allele-frequency distributions. With this perspective, for a random pair of individuals, an ASD measure is a random variable that depends on the allele-frequency vectors of two populations of interest, treated as parameters.
At a given locus, we assume that the two alleles of an individual are sampled independently, so that diploid genotypes in a population are assumed to follow Hardy–Weinberg proportions. In other words, the probabilities of diploid genotypes in a population with allele-frequency vector p equal
3 Distribution of
D
w
We first compute allele-sharing dissimilarities between random pairs of individuals sampled from the same population, evaluating the properties of random variables
3.1 Distribution of
D
1
w
Case | Genotypes | Probability | Simplified probability |
---|---|---|---|
1 | AA, AA |
|
σ 4 |
2 | AA, AB |
|
4σ 3 − 4σ 4 |
3 | AA, BB |
|
|
4 | AA, BC |
|
|
5 | AB, AB |
|
|
6 | AB, AC |
|
|
7 | AB, CD |
|
|
With the probabilities of all genotype combinations obtained, we can sum across genotype combinations to compute probabilities for
Value of the dissimilarity (d) |
|
---|---|
0 |
|
|
|
1 |
|
Using the probabilities in Table 3, the result is
In the I = 2 case, using p
2 = 1 − p
1 so that
Figure 1A plots Eq. (4) as a function of p
1. In the figure, we can observe that the mean value of the dissimilarity increases from a value of 0 at p
1 = 0, when the population is monomorphic, to a peak of
The variance can then be obtained from Eqs. (3) and (5) by
For the I = 2 case, we once again use that p 2 = 1 − p 1:
Figure 1B plots Eq. (8) as a function of p
1. Like the mean, the variance of the dissimilarity increases from 0 at p
1 = 0 to a peak at
3.2 Distribution of
D
2
w
We compute the distribution of random variable
Value of the dissimilarity (d) |
|
---|---|
0 | σ 4 |
|
|
|
|
1 |
|
yielding the result
Note that Eq. (9) gives the “expected heterozygosity,” the probability that two draws from the allele-frequency distribution produce distinct alleles.
For the I = 2 case, we have
Figure 1A plots Eq. (10) as a function of p
1. The mean value of the dissimilarity is symmetric around a peak at
Therefore,
For the I = 2 case, we use p 2 = 1 − p 1 to obtain
Figure 1B plots Eq. (14). The variance has peaks at
3.3 Comparison of
D
1
w
and
D
2
w
Comparing
The result follows by noting
For I = 2, Eq. (15) can be observed in Figure 1A, as it can be seen that the curve for
For the variances, Figure 1B finds that for I = 2,
4 Distribution of
D
b
We now examine allele-sharing dissimilarities between pairs of individuals from different populations. Let p be the allele frequency vector for the population from which the first individual is sampled, and let q be the corresponding vector for the population of the second individual; the special case of q = p follows Section 3. We evaluate the properties of the random variables
4.1 Distribution of
D
1
b
Case | Genotypes | Probability | Simplified probability |
---|---|---|---|
1 | AA, AA |
|
ρ 22 |
2 | AA, AB |
|
2ρ 21 + 2ρ 12 − 4ρ 22 |
3 | AA, BB |
|
σ 2 τ 2 − ρ 22 |
4 | AA, BC |
|
σ 2 + τ 2 − 2σ 2 τ 2 − 2ρ 21 − 2ρ 12 + 4ρ 22 |
5 | AB, AB |
|
|
6 | AB, AC |
|
|
7 | AB, CD |
|
|
We sum across genotype combinations to obtain probabilities for
Value of the dissimilarity (d) |
|
---|---|
0 |
|
|
|
1 |
|
Using the values in Table 6, we obtain
For the I = 2 case, with p 2 = 1 − p 1 and q 2 = 1 − q 1, Eq. (16) simplifies to
Figure 3A plots Eq. (17). The figure has maxima of 1 at (p
1, q
1) = (1, 0) and (0,1), when the two populations have the greatest difference in allele frequency, and equals 0 at (0,0) and (1,1). It has a saddle surface with a value of
Using
For the I = 2 case, we have p 1 = 1 − p 2 and q 1 = 1 − q 2. Equations (18) and (19) simplify to
Figure 3D shows that the variance has higher values away from the four corners (0,0), (1,0), (0,1), and (1,1) for (p 1, q 1), equaling 0 in each of these corners.
4.2 Distribution of
D
2
b
Value of the dissimilarity (d) |
|
---|---|
0 | ρ 22 |
|
|
|
|
1 |
|
We obtain
This quantity is the between-population analogue of expected heterozygosity, the probability that two random draws, one from the allele-frequency distribution of a locus in one population and one from the corresponding distribution in a second population, represent the same allele.
For the I = 2 case, Eq. (22) simplifies to
Figure 3B plots Eq. (23). The figure has maxima of 1 at (p
1, q
1) = (1, 0) and (0,1) and equals 0 at (0,0) and (1,1). It has a saddle surface with a value of
Therefore, by
For the I = 2 case, Eqs. (24) and (25) simplify to
Figure 3E plots Eq. (27). The variance is greatest at
4.3 Comparison of
D
1
b
and
D
2
b
The two measures for the between-population dissimilarity have the same expected value,
Note that
The inequality in Eq. (28) can be observed for the I = 2 case in Figure 3C, where the surface plot of
Figure 3F compares the variances of
5 The relative magnitudes of
E
[
D
w
]
and
E
[
D
b
]
We now examine the relative magnitudes of the expectations
5.1 Inequality relationship between
E
[
D
1
w
(
p
)
]
and
E
[
D
1
b
(
p
,
q
)
]
For arbitrary I, using Eqs. (3) and (16), the expression
This condition can be written with vector notation. Let
Equation (29) thus becomes
which simplifies to
For I = 2, we can further simplify this condition on p 1 and q 1, noting p 2 = 1 − p 1 and q 2 = 1 − q 1.
Theorem 1
Consider a locus with I = 2 distinct alleles. For individuals sampled from two populations with allele frequency vectors p = (p
1, 1 − p
1) and q = (q
1, 1 − q
1),
where
and
is the unique real root of 2x 3 − 4x 2 + 4x − 1.
Proof
We simplify Eq. (29) noting p
2 = 1 − p
1 and q
2 = 1 − q
1. To find the region where
with 0 ≤ p
1 ≤ 1 and 0 ≤ q
1 ≤ 1. Solving for q
1 in terms of p
1, we find that the expression in Eq. (33) is 0 at q
1 = p
1 and at q
1 = g(p
1), and for fixed p, it is positive when q lies between the two roots. The unique real root for g(x) = x is at
For
For
Figure 4A plots the region identified in Theorem 1. That a nonempty region exists indicates that sometimes, allele frequencies for a biallelic locus produce a within-population dissimilarity that exceeds the between-population dissimilarity. Note that because the choice of which allele is labeled 1 and which is labeled 2 is arbitrary, (p 1, q 1) is included in the region if and only if (1 − p 1, 1 − q 1) is also included.
We can calculate the area of the region in the unit square representing the probability
To evaluate
Figure 5A plots the resulting probability. We can observe that for I = 2, the simulated
5.2 Inequality relationship between
E
[
D
2
w
(
p
)
]
and
E
[
D
2
b
(
p
,
q
)
]
For arbitrary I, via Eqs. (9) and (22), the expression
With σ 2 = pp T and ρ 11 = pq T , Eq. (35) thus becomes
For I = 2, Eq. (36) can be simplified to a condition on p 1 and q 1, again noting p 2 = 1 − p 1 and q 2 = 1 − q 1.
Theorem 2
Consider a locus with I = 2 distinct alleles. For individuals sampled from two populations with allele frequency vectors p = (p
1, 1 − p
1) and q = (q
1, 1 − q
1),
Proof
With p 2 = 1 − p 1 and q 2 = 1 − q 1, Eq. (35) simplifies to
Solving this inequality, we arrive at the result. □
Figure 4B plots the region identified in Theorem 2. This region describes the locations in which allele frequencies for a biallelic locus produce a within-population dissimilarity that exceeds the between-population dissimilarity. As is true for
The area of the region in the unit square, representing
We evaluate
Figure 5B plots the resulting probability, illustrating the agreement between the simulated
5.3 Comparison of the
E
[
D
w
]
−
E
[
D
b
]
inequalities for
D
1
and
D
2
The inequality
In Figure 5, we also observe that the probabilities
6 The relative magnitudes of
E
[
D
w
]
̄
and
E
[
D
b
]
We have seen that both for
For a pair of populations with allele frequency vectors p and q, let
6.1 Inequality relationship between
E
[
D
1
w
]
(
p
,
q
)
̄
and
E
[
D
1
b
(
p
,
q
)
]
Theorem 3
Proof
We use Eqs. (3) and (16) to rewrite
Rewriting in terms of the vectors p, q,
Equality is reached in the last step if and only if p = q. □
6.2 Inequality relationship between
E
[
D
2
w
(
p
,
q
)
]
̄
and
E
[
D
2
b
(
p
,
q
)
]
Theorem 4
Proof
We rewrite
In terms of the vectors p and q, we have
with equality if and only if p = q. □
6.3 Comparison of the
E
[
D
w
]
̄
−
E
[
D
b
]
inequalities for
D
1
and
D
2
The inequality
The extent to which
7 Data analysis
7.1 Data
Our theoretical analysis predicts features of dissimilarities
7.2 Theoretical computations
For our theoretical calculations, given a population in the data set and a locus, we compute allele frequencies. We then apply our theoretical formulas to the allele frequency vectors. Note that if a locus is missing genotypes in an individual, then we omit that individual from the calculation of population allele frequencies at the locus, so that we maintain the property that allele frequencies at a locus in a population sum to 1.
7.3 Empirical computations
For empirical calculations, we consider the actual diploid individuals in the HGDP-CEPH data, for within-population computations comparing all pairs of individuals within a population. For between-population computations, we compare all pairs of individuals, one each from two populations. Pairwise dissimilarities between diploid genotypes are obtained according to Table 1. We compute within-population and between-population dissimilarities as the means across relevant pairs, and we compute variances of dissimilarity distributions across pairs of individuals.
For this analysis, we omit individuals with missing data prior to computation of empirical ASD values. In between-population comparisons, all allelic types present in one but not the other population are assigned a frequency of 0 in the population in which they are absent.
We perform the theoretical and empirical calculations for all 783 loci.
7.4 Results of data analysis
Figure 6 compares empirical and theoretical means and variances of within-population dissimilarities across pairs of individuals, considering 100 randomly sampled loci in 30 populations. Figure 6A compares the empirical value of
Figure 6C and D compare empirical and theoretical variances across pairs of individuals for within-population dissimilarities, using Eqs. (6) and (12) for the theoretical computation. The theoretical variance predicts the empirical variance, but the agreement is not as close as for the mean (r = 0.676 for
Figure 7 plots analogous comparisons for between-population dissimilarities, considering a subset of loci from Figure 6. In Figure 7A, we see a close relationship between empirical
Figure 7C and D consider relationships between empirical and theoretical between-population variances for
Figure 8 empirically examines the inequalities in Theorems 3 and 4 stating that when computed from allele frequencies, the mean of the within-population dissimilarities for two populations is always less than the dissimilarity between them. It shows all population pairs from Figures 6 and 7 with a single random locus.
In Figure 8A, we find that the theoretical values of
Figure 9 tabulates the fraction of loci for which the empirical within-population dissimilarity of a population (denoted Population 1) exceeds the population’s empirical between-population dissimilarity with a second population (Population 2), or
8 Discussion
Allele-sharing statistics are often used to quantify genetic dissimilarity within and between populations. Because they typically share a larger number of recent ancestors, individuals from the same population might be predicted to possess a lower genetic dissimilarity than those from different populations. We have mathematically explored the circumstances under which this prediction fails, when the genetic dissimilarity within a population exceeds the genetic dissimilarity between two populations. The analysis characterizes the properties of allele frequency vectors that give rise to this counterintuitive scenario, illustrating its occurrence in human population-genetic data.
When does within-population dissimilarity for a population exceed between-population dissimilarity with a second population? The conditions that permit this inequality in the case of I = 2 alleles are instructive (Theorems 1 and 2 and Figure 4). In this case, two populations have unbalanced allele frequencies, with Population 2 more unbalanced than Population 1, but the two populations are similar in their frequencies. In Population 1, dissimilarity is generated from comparisons of homozygotes for one allele and homozygotes for the other allele. However, because Population 2 has allele frequencies that are more unbalanced than those of Population 1, fewer comparisons of distinct homozygotes occur in the between-population comparison. This phenomenon results in a within-population dissimilarity in Population 1 that exceeds the between-population dissimilarity. Beyond I = 2, such an excess is observed in empirical calculations with I ≥ 2 alleles (Figure 9), as well as in simulations, though with decreasing probability as I increases (Figure 5).
Although a population can possess greater within-population dissimilarity than its between-population dissimilarity to a second population, we find that for arbitrary numbers of alleles I, it is not possible for both populations in a pair to possess greater within-population dissimilarity than the between-population dissimilarity (Theorems 3 and 4). In data, “theoretical” dissimilarities obtained by treating allele frequencies in the data as parametric frequencies of two populations follow this inequality strictly, with greater between-population dissimilarity than at least one of the two within-population dissimilarities (Figure 8A and B). Similarly, the mean of the two within-population dissimilarities is strictly less than the between-population dissimilarity in theoretical calculations (Figure 8A and B); while “empirical” dissimilarities calculated from individual genotypes can violate the inequality, we find that these violations are generally mild (Figure 8C and D).
The results can contribute to understanding unexpected phenomena involving allele-sharing dissimilarities in human populations. We have seen that within-population dissimilarities in Population 1 sometimes exceed between-population dissimilarities, often in comparisons that involve a lower-diversity Population 2 and a higher-diversity Population 1 (Figure 9); in essence, a high-diversity population can possess enough variation that its inter-individual dissimilarity can exceed the dissimilarity between populations. Our theoretical calculations provide a basis for this scenario, and in fact, we saw for I = 2 that it is not unlikely in certain parts of the allele frequency space (Figure 4).
Our theoretical analysis deepens a line of inquiry on mathematical effects on allele-sharing. For each of two dissimilarity functions, we have obtained probability distributions of within- and between-population allele-sharing dissimilarities across pairs of individuals as functions of allele frequencies (Tables 3, 4, 6, 7), focusing on the mean and variance of the dissimilarity statistics (Eqs. (3), (6), (9), (12), (16), (19), (22) and (25)). The expressions for these quantities, and inequalities concerning their relationships (Theorems 1–4), augment previous efforts on the mathematics of allele-sharing dissimilarities in terms of allele frequencies (Chakraborty and Jin 1993; Tal 2013).
The two variants of allele-sharing dissimilarity that we studied,
However, some consistent differences between the two dissimilarities are also observed.
The within-population variance across pairs of individuals is not uniformly higher for either dissimilarity (Figures 1B and 2F); at I = 2, it has different shapes, as
Does our analysis suggest a preference for
This work has several possible extensions. We have focused on the first and second moments of allele-sharing dissimilarities across pairs of individuals; the full distributions (Tables 3, 4, 6, 7) could also be further investigated. We examined I = 2 in the greatest detail, but special cases that fix a maximal value of I could also be considered. We chose the two most frequently used ASD variants,
We have only considered allele-sharing dissimilarity between population pairs at a single locus, and it will be of interest to investigate dissimilarities that average across many loci. Our theoretical calculations focus on dissimilarities between two random individuals chosen from specified allele-frequency distributions at a locus. Although such distributions have nonzero probability only on the discrete values
We note significant caveats in interpreting our empirical analysis in relation to our theoretical computations. The empirical computations make use of all pairs of individuals drawn from specified samples; each sampled individual appears in many pairs, so that the empirical analysis does not follow the assumption of the theoretical analysis that pairs represent independent draws from allele frequency distributions. A second difference of the empirical and theoretical analyses is that the theoretical analysis assumes that pairs of alleles within an individual are independent draws from the allele-frequency distribution, whereas inbreeding can induce dependence of these alleles empirically. Such deviations from the assumptions of the theoretical analysis in conducting the empirical analysis could be explored in simulations that do and do not permit inbreeding and reuse of pairs of individuals and in empirical samples large enough to avoid such reuses.
Allele-sharing dissimilarities have long been used in population genetics. The mathematical relationships we have obtained assist both in predicting their properties in relation to allele frequencies and in understanding empirical aspects of their values. When counterintuitive phenomena are obtained with such dissimilarities – such as a greater within-population dissimilarity than the between-population dissimilarity – the mathematical results can potentially provide insight into the unexpected observations.
Funding source: National Institutes of Health
Award Identifier / Grant number: R01 HG005855
Funding source: National Science Foundation
Award Identifier / Grant number: BCS-2116322
-
Research ethics: Not applicable.
-
Author contributions: Study design: XL, NAR; Mathematical analysis: XL, ZA, TKM, NAR; Data analysis: XL, ZA; Manuscript preparation: XL, ZA, NAR. The authors have accepted responsibility for the entire content of this manuscript and approved its submission.
-
Competing interests: The authors state no conflict of interest.
-
Research funding: We acknowledge NIH grant R01 HG005855 and NSF grant BCS-2116322 for support.
-
Data availability: Not applicable.
References
Bowcock, A.M., Ruiz-Linares, A., Tomfohrde, J., Minch, E., Kidd, J.R., and Cavalli-Sforza, L.L. (1994). High resolution of human evolutionary trees with polymorphic microsatellites. Nature 368: 455–457. https://doi.org/10.1038/368455a0.Search in Google Scholar PubMed
Cavalli-Sforza, L.L. and Edwards, A.W.F. (1967). Phylogenetic analysis: models and estimation procedures. Am. J. Hum. Genet. 19: 233–257.Search in Google Scholar
Chakraborty, R. and Jin, L. (1993). A unified approach to study hypervariable polymorphisms: statistical considerations of determining relatedness and population distances. In: Pena, S.D.J., Chakraborty, R., Epplen, J.T., and Jeffreys, A.J. (Eds.), DNA fingerprinting: state of the science. Birkhäuser Verlag, Basel, pp. 153–175.10.1007/978-3-0348-8583-6_14Search in Google Scholar PubMed
Edge, M.D., Ramachandran, S., and Rosenberg, N.A. (2022). Celebrating 50 years since Lewontin’s apportionment of human diversity. Phil. Trans. Roy. Soc. Lond. B Biol. Sci. 377: 20200405. https://doi.org/10.1098/rstb.2020.0405.Search in Google Scholar PubMed PubMed Central
Gao, X. and Martin, E.R. (2009). Using allele sharing distance for detecting human population stratification. Hum. Hered. 68: 182–191. https://doi.org/10.1159/000224638.Search in Google Scholar PubMed PubMed Central
Jorde, L.B. (1985). Human genetic distance studies: present status and future prospects. Annu. Rev. Anthropol. 14: 343–373. https://doi.org/10.1146/annurev.an.14.100185.002015.Search in Google Scholar
Lewontin, R.C. (1972). The apportionment of human diversity. Evol. Biol. 6: 381–398. https://doi.org/10.1007/978-1-4684-9063-3_14.Search in Google Scholar
Mountain, J.L. and Cavalli-Sforza, L.L. (1997). Multilocus genotypes, a tree of individuals, and human evolutionary history. Am. J. Hum. Genet. 61: 705–718. https://doi.org/10.1086/515510.Search in Google Scholar PubMed PubMed Central
Mountain, J.L. and Ramakrishnan, U. (2005). Impact of human population history on distributions of individual-level genetic distance. Hum. Genom. 2: 4–19. https://doi.org/10.1186/1479-7364-2-1-4.Search in Google Scholar PubMed PubMed Central
Nei, M. (1972). Genetic distance between populations. Am. Nat. 106: 283–292. https://doi.org/10.1086/282771.Search in Google Scholar
Nei, M. (1987). Molecular evolutionary genetics. Columbia University Press, New York.10.7312/nei-92038Search in Google Scholar
Prugnolle, F., Manica, A., and Balloux, F. (2005). Geography predicts neutral genetic diversity of human populations. Curr. Biol. 15: R159–R160. https://doi.org/10.1016/j.cub.2005.02.038.Search in Google Scholar PubMed PubMed Central
Ramachandran, S., Deshpande, O., Roseman, C.C., Rosenberg, N.A., Feldman, M.W., and Cavalli-Sforza, L.L. (2005). Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. Proc. Natl. Acad. Sci. USA 102: 15942–15947. https://doi.org/10.1073/pnas.0507611102.Search in Google Scholar PubMed PubMed Central
Rosenberg, N.A. (2006). Standardized subsets of the HGDP-CEPH human genome diversity cell line panel, accounting for atypical and duplicated samples and pairs of close relatives. Ann. Hum. Genet. 70: 841–847. https://doi.org/10.1111/j.1469-1809.2006.00285.x.Search in Google Scholar PubMed
Rosenberg, N.A. (2011). A population-genetic perspective on the similarities and differences among worldwide human populations. Hum. Biol. 83: 659–684. https://doi.org/10.1353/hub.2011.a465110.Search in Google Scholar
Rosenberg, N.A., Mahajan, S., Ramachandran, S., Zhao, C., Pritchard, J.K., and Feldman, M.W. (2005). Clines, clusters, and the effect of study design on the inference of human population structure. PLoS Genet. 1: e70. https://doi.org/10.1371/journal.pgen.0010070.Search in Google Scholar PubMed PubMed Central
Tal, O. (2013). Two complementary perspectives on inter-individual genetic distance. Biosystems 111: 18–36. https://doi.org/10.1016/j.biosystems.2012.07.005.Search in Google Scholar PubMed
Witherspoon, D.J., Wooding, S., Rogers, A.R., Marchani, E.E., Watkins, W.S., Batzer, M.A., and Jorde, L.B. (2007). Genetic similarities within and between human populations. Genetics 176: 351–359. https://doi.org/10.1534/genetics.106.067355.Search in Google Scholar PubMed PubMed Central
© 2023 the author(s), published by De Gruyter, Berlin/Boston
This work is licensed under the Creative Commons Attribution 4.0 International License.