## Gametic genealogy

A gametic genealogy is a convenient mathematical formalism of the genealogy of a population from the perspective of gametes. Mathematically, it is a quadruple $(\mathsf{Gam}, \mathsf{Mate}, \mathsf{Par}, \mathsf{Fert})$ with components

• $\mathsf{Gam}$, the set of underlying gametes,

• $\mathsf{Mate}$, the set of zygotes formed by the fusion of egg gametes and sperm gametes,

• $\mathsf{Par}$, a mapping from child gametes to parent zygotes, and

• $\mathsf{Fert}$, a mapping from zygotes to fertilization time.

For convenience, given a gametic genealogy,

• $\mathsf{Gam}_0$ denotes the set of egg gametes,

• $\mathsf{Gam}_1$ denotes the set of sperm gametes, and

• $\mathsf{Mate}_*$ denotes the mapping from gametes to the zygotes they formed during fertilization.

Formally, a gametic genealogy must satisfy the following conditions.

$\mathsf{Mate}\subset \mathsf{Gam}_0 \times \mathsf{Gam}_1$ where $\mathsf{Gam}_0 \cap \mathsf{Gam}_1 = \emptyset$, $\mathsf{Gam}_0 \cup \mathsf{Gam}_1 = \mathsf{Gam}$ and $\mathsf{Mate}$ forms a one-to-one mapping between $\mathsf{Gam}_0$ and $\mathsf{Gam}_1$.

$\mathsf{Par}$ is a function $C \mapsto \mathsf{Mate}$, where $C$ is a subset of $\mathsf{Gam}$ representing child gametes.

$\mathsf{Fert}$ is a function $\mathsf{Mate}\mapsto \mathbb{R}$ such that for all child gametes $g \in \operatorname{dom}\mathsf{Par}$, $\mathsf{Fert}(\mathsf{Mate}_*(g)) > \mathsf{Fert}(\mathsf{Par}(g))$

Note that $\operatorname{dom}\mathsf{Par}$ denotes the domain of $\mathsf{Par}$, that is, the set of child gametes.

## Gametic lineage space

A gametic lineage space is a mathematical formalism representing the lines of transmission of genetic information via gametes of a population over time. It is a triplet $(\mathsf{Loc}, G, \mathsf{Lin})$ where

• $\mathsf{Loc}$ is the set of all genomic locations,

• $G$ is a gametic genealogy $(\mathsf{Gam}, \mathsf{Mate}, \mathsf{Par}, \mathsf{Fert})$, and

• $\mathsf{Lin}$ is a function $\mathsf{Loc}\times \mathsf{Gam}\mapsto 2^\mathsf{Gam}$ mapping a genomic position in a gamete to the set of gametes that transmitted genetic information to that position.

For every location $\ell \in \mathsf{Loc}$ and gamete $g \in \mathsf{Gam}$, $\mathsf{Lin}(\ell, g)$ is the lineage ending at gamete $g$ via locus $\ell$ and it must satisfy the condition $\mathsf{Lin}(\ell, g) = \{g\} \cup \mathsf{Lin}(\ell, \mathsf{Par}(g)_i) \text{ for either i=0 or i=1}$ when $g \in \operatorname{dom}\mathsf{Par}$, otherwise $\mathsf{Lin}(\ell, g) = \{g\}$.

## Example mathematical application

Given a sample of gametes $S$, define the genomic locations reached by an ancestral gamete as $R_S(g) := \left\{ \ell \in \mathsf{Loc}: \exists g' \in S \left( g \in \mathsf{Lin}(\ell, g') \right) \right\}$

We conjecture that the set $\{ R_S(g) : g \in \mathsf{Gam}\}$ is the set of haplotype blocks defined in .

1.
Shipilina D, Stankowski S, Pal A, Chan YF, Barton N. On the origin and structure of haplotype blocks. Preprints; 2022 Feb. doi:10.22541/au.164425910.09070763/v1