Introduction

Lineal admixture time is a microscale measure of admixture time. In this document, we derive an estimator of lineal admixture time. This derivation is from a stochastic process where the underlying random object is a gametic lineage.

A simple stochastic process

We define a simple stochastic process with the following properties:

  • discrete regular time steps,

  • infinite population size,

  • proportion αi\alpha_i from ii-th ancestral group,

  • stationary, and

  • underlying random object is a gametic lineage.

We further define an “island subpopulation” of individuals for which at each time step:

  1. there is random mating and then

  2. immigration such that ϕ\phi of the population is new non-admixed immigrants

The motivation for this stochastic process is to derive an estimator of lineal admixture time.

Lineal admixture time requires that there be a time horizon at which all individuals can be categorized into separate source populations.

Since the stochastic process in this document is stationary, this condition of full categorization of all lineages into ancestral source populations is only achieved asymptotically. In other words, for any lineage, there exists some time prior to which all gametes in the lineage are within one of the source populations. (Formally, this condition requires some measurable subset of exceptions, but of zero probability.)

Main Result

Formal notation

From the underlying random gametic lineage, two random variables are defined by the stochastic process:

  • MtM_t as lineal admixture time (or generation number) and

  • AtA_t as ancestral origin, represented as one-hot vector.

For notational brevity we use the notation Pt ⁣{...} \operatorname{P}_{ t}\!\left\{{ ...}\right\} and Et ⁣[...] \operatorname{E}_{ t}\!\left[{ ...}\right] to denote probability and expectation, respectively, of expressions where MM and AA denote MtM_t and AtA_t respectively.

Base Facts

From the definition of the “island subpopulation” we deduce that Pt+1 ⁣{M=0A=ei}=ϕαi+(1ϕ)Pt ⁣{M=0A=ei}2 \operatorname{P}_{ t+1}\!\left\{{ M=0 \wedge A=\mathrm{e}_i }\right\} = \phi \alpha_i + (1 - \phi) \operatorname{P}_{ t}\!\left\{{ M=0 \wedge A=\mathrm{e}_i }\right\}^2

From the definition of lineal admixture generation number, Et+1 ⁣[M]=(Et ⁣[MM>0]+1)Pt ⁣{M>0}2(1ϕ)+2(12Et ⁣[MM>0]+1)Pt ⁣{M>0}Pt ⁣{M=0}(1ϕ)+(Pt ⁣{M=0}2iPt ⁣{M=0A=ei}2)(1ϕ) \begin{aligned} \operatorname{E}_{ t+1}\!\left[{ M}\right] & = (\operatorname{E}_{ t}\!\left[{ M | M>0}\right] + 1) \operatorname{P}_{ t}\!\left\{{ M>0}\right\}^2 (1-\phi) \\ & + 2 \left(\frac{1}{2} \operatorname{E}_{ t}\!\left[{ M | M>0}\right] + 1\right) \operatorname{P}_{ t}\!\left\{{ M>0}\right\} \operatorname{P}_{ t}\!\left\{{ M=0}\right\} (1-\phi) \\ & + \left( \operatorname{P}_{ t}\!\left\{{ M=0}\right\}^2 - \sum_i \operatorname{P}_{ t}\!\left\{{ M=0 \wedge A=e_i}\right\}^2 \right) (1-\phi) \end{aligned}

Derivation

Given the assumptions of stationarity, we can define:

Let xi:=Pt ⁣{M=0A=ei}x_i := \operatorname{P}_{ t}\!\left\{{ M=0 \wedge A=\mathrm{e}_i }\right\}.

xi=ϕαi+(1ϕ)xi2 x_i = \phi \alpha_i + (1-\phi) x_i^2 where α:=E ⁣[A] \vec{\alpha} := {\operatorname{E}\!\left[{ A}\right]}

By theorem 1, the only quadratic solution for xix_i is

xi=114ϕ(1ϕ)αi2(1ϕ) x_i = \frac{ 1 - \sqrt{1 - 4 \phi (1- \phi) \alpha_i} }{ 2(1-\phi) }

Let q:=Pt ⁣{M=0}q := \operatorname{P}_{ t}\!\left\{{ M=0}\right\}, thus q=ϕ+(1ϕ)ixi2 q = \phi + (1 - \phi) \sum_i x_i^2

We define μ:=Et ⁣[M]\mu := \operatorname{E}_{ t}\!\left[{ M}\right] which is the expected lineal admixture time (and generation number).

Given the base facts, we make the following deduction using the newly defined variables μ\mu, ϕ\phi and xix_i. μ=(E ⁣[MM>0]+1)(1q)2(1ϕ)+2(12Et ⁣[MM>0]+1)(1q)q(1ϕ)+(q2ixi2)(1ϕ)=μ(1q)(1ϕ)+(1q)2(1ϕ)+μq(1ϕ)+2(1q)q(1ϕ)+(q2ixi2)(1ϕ)=μ(1ϕ)+((1q)+q)2(1ϕ)(1ϕ)ixi20=μϕ+1qμ=1qϕ \begin{aligned} \mu & = ({\operatorname{E}\!\left[{ M | M>0}\right]} + 1) (1-q)^2 (1-\phi) \\ & + 2 \left(\frac{1}{2} \operatorname{E}_{ t}\!\left[{ M | M>0}\right] + 1\right) (1-q) q (1-\phi) \\ & + \left( q^2 - \sum_i x_i^2 \right) (1-\phi) \\ & = \mu (1-q) (1-\phi) + (1-q)^2 (1-\phi) \\ & + \mu q (1-\phi) + 2 (1-q) q (1-\phi) \\ & + \left( q^2 - \sum_i x_i^2 \right) (1-\phi) \\ & = \mu (1-\phi) + ((1 - q) + q)^2 (1-\phi) - (1-\phi) \sum_i x_i^2 \\ 0 & = - \mu \phi + 1 - q \\ \mu & = \frac{1-q}{\phi} \end{aligned}

Replacing qq gets μ=1ϕϕ(1ixi2) \mu = \frac{1-\phi}{\phi} \left( 1 - \sum_i x_i^2 \right)

We conjecture that this formula serves as a consistent maximum likelihood estimator.

Estimation of ϕ\phi

Let α˙i\dot{\alpha}_i denote the expected frequency of an ii-th ancestral source. Let α¨i,j\ddot{\alpha}_{i,j} denote the frequency of a diploid genotype with an ii-th maternal ancestral source and jj-ith paternal ancestral source.

Thus α¨i,i=ϕα˙i+(1ϕ)α˙i2=ϕα˙i(1α˙i)+α˙i2ϕ=α¨i,iα˙i2α˙i(1α˙i) \begin{aligned} \ddot{\alpha}_{i,i} & = \phi \dot{\alpha}_i + (1-\phi) \dot{\alpha}_i^2 \\ & = \phi \dot{\alpha}_i (1 - \dot{\alpha}_i) + \dot{\alpha}_i^2 \\ \phi & = \frac{ \ddot{\alpha}_{i,i} - \dot{\alpha}_i^2 }{ \dot{\alpha}_i (1 - \dot{\alpha}_i) } \end{aligned}

Consider the case of only two ancestral sources. With β:=α¨0,1+α¨1,0\beta := \ddot{\alpha}_{0,1} + \ddot{\alpha}_{1,0} we deduce that α¨0,0+α¨1,1=1β \ddot{\alpha}_{0,0} + \ddot{\alpha}_{1,1} = 1 - \beta α˙1=1α˙0 \dot{\alpha}_1 = 1 - \dot{\alpha}_0 α˙02+α˙12=12α˙0(1α˙0) \dot{\alpha}_0^2 + \dot{\alpha}_1^2 = 1 - 2 \dot{\alpha}_0 (1 - \dot{\alpha}_0)

ϕ=ϕ+ϕ2=α¨0,0+α¨1,1α˙02α˙122α˙0(1α˙0)=1β2α˙0(1α˙0) \begin{aligned} \phi & = \frac{\phi + \phi}{2} \\ & = \frac{ \ddot{\alpha}_{0,0} + \ddot{\alpha}_{1,1} - \dot{\alpha}_0^2 - \dot{\alpha}_1^2 }{ 2 \dot{\alpha}_0 (1 - \dot{\alpha}_0) } \\ & = 1 - \frac{\beta}{ 2 \dot{\alpha}_0 (1 - \dot{\alpha}_0) } \end{aligned}

This form is the same as the inbreeding coefficient but with ancestral source as the allele state rather than haplotype.

Theorem 1

The solution to xtx_t given stationarity etc… can not be xi=1+14ϕ(1ϕ)αi2(1ϕ) x_i = \frac{ 1 + \sqrt{1 - 4 \phi (1- \phi) \alpha_i} }{ 2(1-\phi) } when αi<1\alpha_i < 1.

PROOF

Assume the contrary. Since ϕ<1\phi < 1 and xi1x_i \le 1, we have 11+14ϕ(1ϕ)αi2(1ϕ)2(1ϕ)1+14ϕ(1ϕ)αi12ϕ14ϕ(1ϕ)αi14ϕ+4ϕ214ϕ(1ϕ)αi4ϕ(1ϕ)4ϕ(1ϕ)αi1αi \begin{aligned} 1 & \ge \frac{ 1 + \sqrt{1 - 4 \phi (1- \phi) \alpha_i} }{ 2(1-\phi) } \\ 2(1-\phi) & \ge 1 + \sqrt{1 - 4 \phi (1- \phi) \alpha_i} \\ 1- 2 \phi & \ge \sqrt{1 - 4 \phi (1- \phi) \alpha_i} \\ 1 - 4 \phi + 4 \phi^2 & \ge 1 - 4 \phi (1- \phi) \alpha_i \\ - 4 \phi (1- \phi) & \ge - 4 \phi (1- \phi) \alpha_i \\ 1 & \le \alpha_i \\ \end{aligned} which can not be true given αi<1\alpha_i < 1.