Gametic genealogy

A gametic genealogy is a convenient mathematical formalism of the genealogy of a population from the perspective of gametes. Mathematically, it is a quadruple (Gam,Mate,Par,Fert) (\mathsf{Gam}, \mathsf{Mate}, \mathsf{Par}, \mathsf{Fert}) with components

  • Gam\mathsf{Gam}, the set of underlying gametes,

  • Mate\mathsf{Mate}, the set of zygotes formed by the fusion of egg gametes and sperm gametes,

  • Par\mathsf{Par}, a mapping from child gametes to parent zygotes, and

  • Fert\mathsf{Fert}, a mapping from zygotes to fertilization time.

For convenience, given a gametic genealogy,

  • Gam0\mathsf{Gam}_0 denotes the set of egg gametes,

  • Gam1\mathsf{Gam}_1 denotes the set of sperm gametes, and

  • Mate\mathsf{Mate}_* denotes the mapping from gametes to the zygotes they formed during fertilization.

Formally, a gametic genealogy must satisfy the following conditions.

MateGam0×Gam1\mathsf{Mate}\subset \mathsf{Gam}_0 \times \mathsf{Gam}_1 where Gam0Gam1=\mathsf{Gam}_0 \cap \mathsf{Gam}_1 = \emptyset, Gam0Gam1=Gam\mathsf{Gam}_0 \cup \mathsf{Gam}_1 = \mathsf{Gam} and Mate\mathsf{Mate} forms a one-to-one mapping between Gam0\mathsf{Gam}_0 and Gam1\mathsf{Gam}_1.

Par\mathsf{Par} is a function CMateC \mapsto \mathsf{Mate}, where CC is a subset of Gam\mathsf{Gam} representing child gametes.

Fert\mathsf{Fert} is a function MateR\mathsf{Mate}\mapsto \mathbb{R} such that for all child gametes gdomParg \in \operatorname{dom}\mathsf{Par}, Fert(Mate(g))>Fert(Par(g)) \mathsf{Fert}(\mathsf{Mate}_*(g)) > \mathsf{Fert}(\mathsf{Par}(g))

Note that domPar\operatorname{dom}\mathsf{Par} denotes the domain of Par\mathsf{Par}, that is, the set of child gametes.

Gametic lineage space

A gametic lineage space is a mathematical formalism representing the lines of transmission of genetic information via gametes of a population over time. It is a triplet (Loc,G,Lin) (\mathsf{Loc}, G, \mathsf{Lin}) where

  • Loc\mathsf{Loc} is the set of all genomic locations,

  • GG is a gametic genealogy (Gam,Mate,Par,Fert)(\mathsf{Gam}, \mathsf{Mate}, \mathsf{Par}, \mathsf{Fert}), and

  • Lin\mathsf{Lin} is a function Loc×Gam2Gam\mathsf{Loc}\times \mathsf{Gam}\mapsto 2^\mathsf{Gam} mapping a genomic position in a gamete to the set of gametes that transmitted genetic information to that position.

For every location Loc\ell \in \mathsf{Loc} and gamete gGamg \in \mathsf{Gam}, Lin(,g)\mathsf{Lin}(\ell, g) is the lineage ending at gamete gg via locus \ell and it must satisfy the condition Lin(,g)={g}Lin(,Par(g)i) for either i=0 or i=1 \mathsf{Lin}(\ell, g) = \{g\} \cup \mathsf{Lin}(\ell, \mathsf{Par}(g)_i) \text{ for either $i=0$ or $i=1$} when gdomParg \in \operatorname{dom}\mathsf{Par}, otherwise Lin(,g)={g}\mathsf{Lin}(\ell, g) = \{g\}.

Example mathematical application

Given a sample of gametes SS, define the genomic locations reached by an ancestral gamete as RS(g):={Loc:gS(gLin(,g))} R_S(g) := \left\{ \ell \in \mathsf{Loc}: \exists g' \in S \left( g \in \mathsf{Lin}(\ell, g') \right) \right\}

We conjecture that the set {RS(g):gGam} \{ R_S(g) : g \in \mathsf{Gam}\} is the set of haplotype blocks defined in [1].


Shipilina D, Stankowski S, Pal A, et al (2022) On the origin and structure of haplotype blocks. Preprints