Intro

The predCrossVar package contains a complete set of functions for the prediction of additive and dominance genetic variances and co-variances among full-siblings based on parents, following (C. Lehermeier et al. 2017; Christina Lehermeier, Teyssèdre, and Schön 2017). predCrossVar enables the prediction of genetic variance on a multi-trait selection index. Built for diploid organisms with phased, chromosome- or linkage-group ordered biallelic marker data, and a centimorgan-scale genetic map.

Predicting genetic variance

Summarize formula for predicting genetic variance in crosses. Thanks to those whose work I have learned from and built upon.

  • Lehermeier

  • Allier UCPC

  • Neyhart

  • Bijma et al.

  • Lynch & Walsh 1998

  • Also in silico as in PopVar

  • Wolfe et al. Genomic mate selection MS

Notice: Package dev. status

At this early development stage, package is unforgiving. There are not many arguments with defaults. Input formats are pretty specific, but well defined.

Especially in the case of many markers and using parallel processing, memory consumption can be potentially very large and needs to be considered. Hopefully there is sig. room for improvement in this area.

Example data

I simulated 100 diploid individuals with 2 chromosomes, 10 QTL each, using the AlphaSimR library. Two correlated traits with additive and dominance effects. Main use is to exemplify the input formats and usage of predCrossVar functions.

Inputs

To predict cross variance, 3 types of input are required:

  1. Phased Parental Haplotypes
  2. Genetic Map (or pairwise-recombination frequencies)
  3. Marker Effects

We start with the included example simulation data. 100 individuals, 2 chromosomes with 10 QTL/chr.

Genetic map

The usual starting place for recombination frequency is a centimorgan-scale genetic map.

data(genmap)
str(genmap)
#>  Named num [1:20] 0.0816 0.102 0.2653 0.3061 0.551 ...
#>  - attr(*, "names")= chr [1:20] "1_5" "1_6" "1_14" "1_16" ...
# utility function to compute matrix of pairwise recombination frequencies. 
recombFreqMat<-genmap2recombfreq(genmap,nChr = 2)
dim(recombFreqMat)
#> [1] 20 20

In practice, these matrices get big as they have dimension \(N_{snp} \times N_{snp}\). It is recommended to precompute and store the recombFreqMat.

In predicting genetic variance in the \(F_1\) cross, the recombFreqMat the matrix \(1-(2 \times recombFreqMat)\) is computed. To avoid having to do that separately for each family being predicted, I designed the prediction functions to require \(1-(2 \times recombFreqMat)\) as input, to save a great deal of computation. Therefore pre-compute it and store that on disk. Probably this can be improved on!

recombFreqMat<-1-(2*recombFreqMat)
recombFreqMat[1:5,1:5]
#>            1_5       1_6      1_14      1_16      1_28
#> 1_5  1.0000000 0.9600054 0.6925693 0.6382791 0.3911064
#> 1_6  0.9600054 1.0000000 0.7214223 0.6648703 0.4074002
#> 1_14 0.6925693 0.7214223 1.0000000 0.9216104 0.5647181
#> 1_16 0.6382791 0.6648703 0.9216104 1.0000000 0.6127514
#> 1_28 0.3911064 0.4074002 0.5647181 0.6127514 1.0000000

Phased parental haplotypes

The example dataset haplos contains 2 chrom. with 10 loci each (columns) for 100 individuals (200 rows).

data("haploMat")
dim(haploMat)
#> [1] 200  20
haploMat[1:5,1:10]
#>        1_5 1_6 1_14 1_16 1_28 1_31 1_33 1_35 1_40 1_41
#> 1_HapA   1   0    1    0    1    1    0    1    0    1
#> 1_HapB   1   0    1    0    0    1    0    0    1    0
#> 2_HapA   0   0    0    0    1    0    0    0    0    0
#> 2_HapB   1   0    0    1    1    0    1    1    1    0
#> 3_HapA   0   0    1    1    0    1    0    1    0    0

Marker effects

The example data come in the form of an list of row-matrices, which is useful for multivariate analyses we will get into later. There is also an example DomEffectsList available to load later.

data("AddEffectsList")
str(AddEffectsList)
#> List of 2
#>  $ Trait1: num [1, 1:20] 0.218 0.03 0.232 0.343 -0.114 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:20] "1_5" "1_6" "1_14" "1_16" ...
#>  $ Trait2: num [1, 1:20] -0.159 -0.297 -0.299 0.237 -0.716 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:20] "1_5" "1_6" "1_14" "1_16" ...

Single-trait predictions

The first set of functions we will use are simplest. They only handle single traits. No covariance predictions.

Predict one cross

To start, predict a single cross between parents “1” and “2”

# For the univariate case, input effects should be a column matrix.
addEffectsT1<-t(AddEffectsList$Trait1)
addEffectsT2<-t(AddEffectsList$Trait2)
dim(addEffectsT1)
#> [1] 20  1
# Trait 1
predCrossVarA(sireID = "1", damID = "2", 
              addEffects = addEffectsT1,
              haploMat = haploMat,
              recombFreqMat = recombFreqMat)
#> [1] "Variances predicted for family: 1x2- took 0.001 mins"
#> # A tibble: 1 x 3
#>   varA[,1] segsnps    computetime
#>      <dbl> <list>           <dbl>
#> 1    0.557 <chr [18]>       0.035
# Trait 2
predCrossVarA(sireID = "1", damID = "2", 
              addEffects = addEffectsT2,
              haploMat = haploMat,
              recombFreqMat = recombFreqMat)
#> [1] "Variances predicted for family: 1x2- took 0.001 mins"
#> # A tibble: 1 x 3
#>   varA[,1] segsnps    computetime
#>      <dbl> <list>           <dbl>
#> 1    0.502 <chr [18]>       0.047

Predict several crosses

Multiple families can be predicted, with an option for parallel processing across families with the runCrossVarPredsA() (additive-only) and runCrossVarPredsAD() (additive-plus-dominance) functions.

For example, we will pick three parents of interest.

There is a helper function crosses2predict() to make a list of all pairwise matings given an input vecot of parent IDs. Includes self-crosses and the upper-triangle of a mating matrix (i.e. no reciprocal crosses).

ped2predict<-crosses2predict(parents = c("1","2","11"))
ped2predict
#> # A tibble: 6 x 2
#>   sireID damID
#>   <chr>  <chr>
#> 1 1      1    
#> 2 1      2    
#> 3 1      11   
#> 4 2      2    
#> 5 2      11   
#> 6 11     11
# Trait 1 - serial (DEFAULT, ncores = 1)
runCrossVarPredsA(ped = ped2predict,
                  addEffects = addEffectsT1,
                  haploMat = haploMat,
                  recombFreqMat = recombFreqMat)
#> Loading required package: furrr
#> Loading required package: future
#> [1] "Variances predicted for family: 1x1- took 0.001 mins"
#> [1] "Variances predicted for family: 1x2- took 0.001 mins"
#> [1] "Variances predicted for family: 1x11- took 0.001 mins"
#> [1] "Variances predicted for family: 2x2- took 0.001 mins"
#> [1] "Variances predicted for family: 2x11- took 0.001 mins"
#> [1] "Variances predicted for family: 11x11- took 0.001 mins"
#> [1] "Done predicting fam vars. Took 0.01 mins for 6 families"
#> # A tibble: 6 x 5
#>   sireID damID varA[,1] segsnps    computetime
#>   <chr>  <chr>    <dbl> <list>           <dbl>
#> 1 1      1        0.617 <chr [9]>       0.0580
#> 2 1      2        0.557 <chr [18]>      0.0500
#> 3 1      11       0.410 <chr [18]>      0.052 
#> 4 2      2        0.497 <chr [12]>      0.052 
#> 5 2      11       0.349 <chr [19]>      0.0550
#> 6 11     11       0.202 <chr [11]>      0.055

Multi-core across families

# Trait 1 - parallel (ncores = 6)
runCrossVarPredsA(ped = ped2predict,
                  addEffects = addEffectsT1,
                  haploMat = haploMat,
                  recombFreqMat = recombFreqMat,
                  ncores = 6)
#> [1] "Variances predicted for family: 1x1- took 0.001 mins"
#> [1] "Variances predicted for family: 1x2- took 0.001 mins"
#> [1] "Variances predicted for family: 1x11- took 0.001 mins"
#> [1] "Variances predicted for family: 2x2- took 0.001 mins"
#> [1] "Variances predicted for family: 2x11- took 0.001 mins"
#> [1] "Variances predicted for family: 11x11- took 0.001 mins"
#> [1] "Done predicting fam vars. Took 0.02 mins for 6 families"
#> # A tibble: 6 x 5
#>   sireID damID varA[,1] segsnps    computetime
#>   <chr>  <chr>    <dbl> <list>           <dbl>
#> 1 1      1        0.617 <chr [9]>       0.03  
#> 2 1      2        0.557 <chr [18]>      0.03  
#> 3 1      11       0.410 <chr [18]>      0.031 
#> 4 2      2        0.497 <chr [12]>      0.0360
#> 5 2      11       0.349 <chr [19]>      0.032 
#> 6 11     11       0.202 <chr [11]>      0.032
# Trait 2 - parallel (ncores = 6)
runCrossVarPredsA(ped = ped2predict,
                  addEffects = addEffectsT2,
                  haploMat = haploMat,
                  recombFreqMat = recombFreqMat,
                  ncores = 6)
#> [1] "Variances predicted for family: 1x1- took 0 mins"
#> [1] "Variances predicted for family: 1x2- took 0 mins"
#> [1] "Variances predicted for family: 1x11- took 0.001 mins"
#> [1] "Variances predicted for family: 2x2- took 0.001 mins"
#> [1] "Variances predicted for family: 2x11- took 0.001 mins"
#> [1] "Variances predicted for family: 11x11- took 0.001 mins"
#> [1] "Done predicting fam vars. Took 0.02 mins for 6 families"
#> # A tibble: 6 x 5
#>   sireID damID varA[,1] segsnps    computetime
#>   <chr>  <chr>    <dbl> <list>           <dbl>
#> 1 1      1        0.637 <chr [9]>       0.0300
#> 2 1      2        0.502 <chr [18]>      0.028 
#> 3 1      11       0.505 <chr [18]>      0.0330
#> 4 2      2        0.367 <chr [12]>      0.033 
#> 5 2      11       0.370 <chr [19]>      0.042 
#> 6 11     11       0.373 <chr [11]>      0.0350

Additive-plus-dominance

data("DomEffectsList")
domEffectsT1<-t(DomEffectsList$Trait1)
domEffectsT2<-t(DomEffectsList$Trait2)
# Trait 1
runCrossVarPredsAD(ped = ped2predict,
                  addEffects = addEffectsT1,
                  domEffects = domEffectsT1,
                  haploMat = haploMat,
                  recombFreqMat = recombFreqMat,
                  ncores = 6)
#> [1] "Variances predicted for family: 1x1- took 0 mins"
#> [1] "Variances predicted for family: 1x2- took 0.001 mins"
#> [1] "Variances predicted for family: 1x11- took 0 mins"
#> [1] "Variances predicted for family: 2x2- took 0.001 mins"
#> [1] "Variances predicted for family: 2x11- took 0.001 mins"
#> [1] "Variances predicted for family: 11x11- took 0 mins"
#> [1] "Done predicting fam vars. Took 0.02 mins for 6 families"
#> # A tibble: 6 x 6
#>   sireID damID varA[,1] varD[,1] segsnps    computetime
#>   <chr>  <chr>    <dbl>    <dbl> <list>           <dbl>
#> 1 1      1        0.617    0.349 <chr [9]>       0.029 
#> 2 1      2        0.557    0.363 <chr [18]>      0.03  
#> 3 1      11       0.410    0.266 <chr [18]>      0.029 
#> 4 2      2        0.497    0.497 <chr [12]>      0.03  
#> 5 2      11       0.349    0.355 <chr [19]>      0.034 
#> 6 11     11       0.202    0.349 <chr [11]>      0.0270
# Trait 2
runCrossVarPredsAD(ped = ped2predict,
                  addEffects = addEffectsT1,
                  domEffects = domEffectsT1,
                  haploMat = haploMat,
                  recombFreqMat = recombFreqMat,
                  ncores = 6)
#> [1] "Variances predicted for family: 1x1- took 0 mins"
#> [1] "Variances predicted for family: 1x2- took 0 mins"
#> [1] "Variances predicted for family: 1x11- took 0.001 mins"
#> [1] "Variances predicted for family: 2x2- took 0.001 mins"
#> [1] "Variances predicted for family: 2x11- took 0.001 mins"
#> [1] "Variances predicted for family: 11x11- took 0 mins"
#> [1] "Done predicting fam vars. Took 0.02 mins for 6 families"
#> # A tibble: 6 x 6
#>   sireID damID varA[,1] varD[,1] segsnps    computetime
#>   <chr>  <chr>    <dbl>    <dbl> <list>           <dbl>
#> 1 1      1        0.617    0.349 <chr [9]>       0.0280
#> 2 1      2        0.557    0.363 <chr [18]>      0.029 
#> 3 1      11       0.410    0.266 <chr [18]>      0.03  
#> 4 2      2        0.497    0.497 <chr [12]>      0.0310
#> 5 2      11       0.349    0.355 <chr [19]>      0.03  
#> 6 11     11       0.202    0.349 <chr [11]>      0.0290

Multiple-trait predictions

(including covariances)

predOneCrossVarA() -> predCrossVarsA() -> runMtCrossVarPredsA()

runMtCrossVarPredsA() and runMtCrossVarPredsAD() should supersede in function all the others.

NOTES:

  • Arguments names are pretty opinionated and long, although not without good intention. Might be worth simplifying.

mtpred<-runMtCrossVarPredsAD(CrossesToPredict = ped2predict,
                             AddEffectList = AddEffectsList,
                             DomEffectList = DomEffectsList,
                             haploMat = haploMat,
                             recombFreqMat = recombFreqMat, 
                             ncores = 6)
#> [1] "Variances predicted for family: 1x1- took 0 mins"
#> [1] "Variances predicted for family: 1x2- took 0.001 mins"
#> [1] "Variances predicted for family: 1x11- took 0.001 mins"
#> [1] "Variances predicted for family: 2x2- took 0.001 mins"
#> [1] "Variances predicted for family: 2x11- took 0.001 mins"
#> [1] "Variances predicted for family: 11x11- took 0.001 mins"
#> [1] "Done predicting fam vars. Took 0.05 mins for 6 families"
#> [1] "Variances predicted for family: 1x1- took 0.001 mins"
#> [1] "Variances predicted for family: 1x2- took 0.001 mins"
#> [1] "Variances predicted for family: 1x11- took 0 mins"
#> [1] "Variances predicted for family: 2x2- took 0.001 mins"
#> [1] "Variances predicted for family: 2x11- took 0.001 mins"
#> [1] "Variances predicted for family: 11x11- took 0.001 mins"
#> [1] "Done predicting fam vars. Took 0.05 mins for 6 families"
#> [1] "Variances predicted for family: 1x1- took 0.001 mins"
#> [1] "Variances predicted for family: 1x2- took 0.001 mins"
#> [1] "Variances predicted for family: 1x11- took 0.001 mins"
#> [1] "Variances predicted for family: 2x2- took 0.001 mins"
#> [1] "Variances predicted for family: 2x11- took 0.001 mins"
#> [1] "Variances predicted for family: 11x11- took 0 mins"
#> [1] "Done predicting fam vars. Took 0.05 mins for 6 families"

The results are a nested list structure. Let’s unpack them…

library(tidyverse); library(magrittr)
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
#> ✓ ggplot2 3.3.2     ✓ dplyr   1.0.2
#> ✓ tibble  3.0.4     ✓ stringr 1.4.0
#> ✓ tidyr   1.1.2     ✓ forcats 0.5.0
#> ✓ readr   1.4.0
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag()    masks stats::lag()
#> 
#> Attaching package: 'magrittr'
#> The following object is masked from 'package:tidyr':
#> 
#>     extract
#> The following object is masked from 'package:purrr':
#> 
#>     set_names
mtpred
#> $varcovars
#> # A tibble: 3 x 3
#>   Trait1 Trait2 varcomps        
#>   <chr>  <chr>  <list>          
#> 1 Trait1 Trait1 <named list [2]>
#> 2 Trait2 Trait2 <named list [2]>
#> 3 Trait1 Trait2 <named list [2]>
#> 
#> $totcomputetime
#> elapsed 
#>   9.268

The list element “varcovars”

mtpred$varcovars %>% 
  tidyr::unnest_wider(varcomps)
#> # A tibble: 3 x 4
#>   Trait1 Trait2 predictedfamvars totcomputetime
#>   <chr>  <chr>  <list>                    <dbl>
#> 1 Trait1 Trait1 <tibble [6 × 3]>           2.90
#> 2 Trait2 Trait2 <tibble [6 × 3]>           3.07
#> 3 Trait1 Trait2 <tibble [6 × 3]>           3.29

Unpack varcovars to get at the variance and covariance predictions for additive and dominances.

Variances are in column VPM. Output is in long-form.

mtpred$varcovars %>% 
  tidyr::unnest_wider(varcomps) %>% 
  dplyr::select(-totcomputetime) %>% 
  tidyr::unnest(predictedfamvars) %>% 
  tidyr::unnest(predVars) %>% head
#> # A tibble: 6 x 9
#>   Trait1 Trait2 sireID damID VarComp   VPM PMV   Nsegsnps totcomputetime
#>   <chr>  <chr>  <chr>  <chr> <chr>   <dbl> <lgl>    <int>          <dbl>
#> 1 Trait1 Trait1 1      1     VarA    0.617 NA           9          0.028
#> 2 Trait1 Trait1 1      1     VarD    0.349 NA          NA         NA    
#> 3 Trait1 Trait1 1      2     VarA    0.557 NA          18          0.032
#> 4 Trait1 Trait1 1      2     VarD    0.363 NA          NA         NA    
#> 5 Trait1 Trait1 1      11    VarA    0.410 NA          18          0.033
#> 6 Trait1 Trait1 1      11    VarD    0.266 NA          NA         NA

Cut out the extra columns and spread the additive-vs-dominance across columns, just to tidy up

mtpred$varcovars %>% 
  tidyr::unnest_wider(varcomps) %>% 
  dplyr::select(-totcomputetime) %>% 
  tidyr::unnest(predictedfamvars) %>% 
  tidyr::unnest(predVars) %>% 
  dplyr::select(-PMV,-Nsegsnps,-totcomputetime) %>% 
  spread(VarComp,VPM) %>% rmarkdown::paged_table()

Note that variance results Trait1==Trait2 should match the single-trait results and co-variance predictions are now included.

Lastly, let’s quickly verify that the additive-only function works and that the runMt___ functions won’t errory if we give just one cross / trait.

mtpred_A<-runMtCrossVarPredsA(CrossesToPredict = ped2predict,
                              AlleleSubEffectList = AddEffectsList,
                              haploMat = haploMat,
                              recombFreqMat = recombFreqMat, 
                              ncores = 6)
#> [1] "Variances predicted for family: 1x1- took 0.001 mins"
#> [1] "Variances predicted for family: 1x2- took 0.001 mins"
#> [1] "Variances predicted for family: 1x11- took 0.001 mins"
#> [1] "Variances predicted for family: 2x2- took 0.001 mins"
#> [1] "Variances predicted for family: 2x11- took 0.001 mins"
#> [1] "Variances predicted for family: 11x11- took 0.001 mins"
#> [1] "Done predicting fam vars. Took 0.05 mins for 6 families"
#> [1] "Variances predicted for family: 1x1- took 0.001 mins"
#> [1] "Variances predicted for family: 1x2- took 0.001 mins"
#> [1] "Variances predicted for family: 1x11- took 0 mins"
#> [1] "Variances predicted for family: 2x2- took 0.001 mins"
#> [1] "Variances predicted for family: 2x11- took 0.001 mins"
#> [1] "Variances predicted for family: 11x11- took 0.001 mins"
#> [1] "Done predicting fam vars. Took 0.05 mins for 6 families"
#> [1] "Variances predicted for family: 1x1- took 0 mins"
#> [1] "Variances predicted for family: 1x2- took 0 mins"
#> [1] "Variances predicted for family: 1x11- took 0.001 mins"
#> [1] "Variances predicted for family: 2x2- took 0.001 mins"
#> [1] "Variances predicted for family: 2x11- took 0.001 mins"
#> [1] "Variances predicted for family: 11x11- took 0 mins"
#> [1] "Done predicting fam vars. Took 0.05 mins for 6 families"
mtpred_A$varcovars %>% 
  tidyr::unnest_wider(varcomps) %>% 
  dplyr::select(-totcomputetime) %>% 
  tidyr::unnest(predictedfamvars) %>% 
  tidyr::unnest(predVars) %>% 
  rmarkdown::paged_table()
st_addefflist<-AddEffectsList
st_addefflist$Trait2<-NULL
stpred_sc<-runMtCrossVarPredsA(CrossesToPredict = ped2predict[1,],
                               AlleleSubEffectList = st_addefflist,
                               haploMat = haploMat,
                               recombFreqMat = recombFreqMat,
                               ncores = 6)
#> [1] "Variances predicted for family: 1x1- took 0.001 mins"
#> [1] "Done predicting fam vars. Took 0.04 mins for 1 families"
stpred_sc$varcovars %>% 
  tidyr::unnest_wider(varcomps) %>% 
  dplyr::select(-totcomputetime) %>% 
  tidyr::unnest(predictedfamvars) %>% 
  tidyr::unnest(predVars) %>% 
  rmarkdown::paged_table()

A note on marker effects input

data("AddEffectsList")
str(AddEffectsList)
#> List of 2
#>  $ Trait1: num [1, 1:20] 0.218 0.03 0.232 0.343 -0.114 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:20] "1_5" "1_6" "1_14" "1_16" ...
#>  $ Trait2: num [1, 1:20] -0.159 -0.297 -0.299 0.237 -0.716 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:20] "1_5" "1_6" "1_14" "1_16" ...
  • List of allele substitution effects.

  • Each list element is a matrix. One matrix per trait.

  • Each element of the list is named with a string identifying the trait and the colnames of each matrix are labeled with snpIDs.

  • Dimensions of each matrix of marker effects

    • \(1 \times N_{snp}\): Many users will have marker effects estimated with REML or only want to use the posterior mean marker effects from a Bayesian marker-regression. For these users, the argument predType=="VPM" should be used and each matrix of the AlleleSubEffectList will have only one row.
    • \(\frac{N_{iter} - N_{burnIn}}{N_{thin}} \times N_{snp}\): When predicting the Posterior Mean Variance (PMV), use argument predType=="PMV" when computing the predicted variance across a sample of marker effects, e.g. the thinned MCMC samples, usually stored on disk.

References

Lehermeier, C., G. de los Campos, V. Wimmer, and C.-C. Schön. 2017. “Genomic Variance Estimates: With or Without Disequilibrium Covariances?” Journal of Animal Breeding and Genetics 134 (3): 232–41. https://doi.org/10.1111/jbg.12268.
Lehermeier, Christina, Simon Teyssèdre, and Chris-Carolin Schön. 2017. “Genetic Gain Increases by Applying the Usefulness Criterion with Improved Variance Prediction in Selection of Crosses.” Genetics, October, genetics.300403.2017. https://doi.org/10.1534/genetics.117.300403.