Outline simple stochastic process of gametic lineages
Deduce estimator of lineal admixture generation number
Introduction
Lineal admixture time
is a microscale measure of admixture time.
In this document, we derive an estimator of lineal admixture time.
This derivation is from a stochastic process where the underlying
random object is a
gametic lineage.
A simple stochastic process
We define a simple stochastic process with the following properties:
We further define an “island subpopulation” of individuals
for which at each time step:
there is random mating and then
immigration such that ϕ of the population is new non-admixed
immigrants
The motivation for this stochastic process is to derive an estimator of
lineal admixture time.
Lineal admixture time requires that there be a time horizon
at which all individuals can be categorized into separate
source populations.
Since the stochastic process in this document is stationary,
this condition of full categorization of all lineages
into ancestral source populations is only achieved
asymptotically. In other words, for any lineage,
there exists some time prior to which all gametes in the lineage
are within one of the source populations.
(Formally, this
condition requires some measurable subset of exceptions,
but of zero probability.)
Main Result
Formal notation
From the underlying random gametic lineage, two random variables
are defined by the stochastic process:
Mt as lineal admixture time (or generation number) and
At as ancestral origin, represented as one-hot vector.
For notational brevity we use the notation
Pt{...}
and
Et[...]
to denote probability and expectation, respectively, of expressions
where M and A denote Mt and At respectively.
Base Facts
From the definition of the “island subpopulation”
we deduce that
Pt+1{M=0∧A=ei}=ϕαi+(1−ϕ)Pt{M=0∧A=ei}2
From the definition of lineal admixture generation number,
Et+1[M]=(Et[M∣M>0]+1)Pt{M>0}2(1−ϕ)+2(21Et[M∣M>0]+1)Pt{M>0}Pt{M=0}(1−ϕ)+(Pt{M=0}2−i∑Pt{M=0∧A=ei}2)(1−ϕ)
Derivation
Given the assumptions of stationarity, we can define:
Let xi:=Pt{M=0∧A=ei}.
xi=ϕαi+(1−ϕ)xi2
where
α:=E[A]
By theorem 1, the only quadratic solution for xi is
xi=2(1−ϕ)1−1−4ϕ(1−ϕ)αi
Let q:=Pt{M=0}, thus
q=ϕ+(1−ϕ)i∑xi2
We define
μ:=Et[M]
which is the expected lineal admixture time (and generation number).
Given the base facts, we make the following deduction using the newly
defined variables μ, ϕ and xi.
μ0μ=(E[M∣M>0]+1)(1−q)2(1−ϕ)+2(21Et[M∣M>0]+1)(1−q)q(1−ϕ)+(q2−i∑xi2)(1−ϕ)=μ(1−q)(1−ϕ)+(1−q)2(1−ϕ)+μq(1−ϕ)+2(1−q)q(1−ϕ)+(q2−i∑xi2)(1−ϕ)=μ(1−ϕ)+((1−q)+q)2(1−ϕ)−(1−ϕ)i∑xi2=−μϕ+1−q=ϕ1−q
Replacing q gets
μ=ϕ1−ϕ(1−i∑xi2)
We conjecture that this formula serves as a consistent maximum likelihood
estimator.
Estimation of ϕ
Let α˙i denote the expected frequency of an i-th ancestral
source.
Let α¨i,j denote the frequency of a diploid genotype
with an i-th maternal ancestral source and j-ith paternal ancestral
source.
This form is the same as the inbreeding coefficient but with
ancestral source as the allele state rather than haplotype.
Theorem 1
The solution to xt given stationarity etc…
can not be
xi=2(1−ϕ)1+1−4ϕ(1−ϕ)αi
when αi<1.
PROOF
Assume the contrary. Since ϕ<1 and xi≤1, we have
12(1−ϕ)1−2ϕ1−4ϕ+4ϕ2−4ϕ(1−ϕ)1≥2(1−ϕ)1+1−4ϕ(1−ϕ)αi≥1+1−4ϕ(1−ϕ)αi≥1−4ϕ(1−ϕ)αi≥1−4ϕ(1−ϕ)αi≥−4ϕ(1−ϕ)αi≤αi
which can not be true given αi<1.