The predCrossVar package contains a complete set of functions for the prediction of additive and dominance genetic variances and co-variances among full-siblings based on parents, following (C. Lehermeier et al. 2017; Christina Lehermeier, Teyssèdre, and Schön 2017). predCrossVar enables the prediction of genetic variance on a multi-trait selection index. Built for diploid organisms with phased, chromosome- or linkage-group ordered biallelic marker data, and a centimorgan-scale genetic map.
At this early development stage, package is unforgiving. There are not many arguments with defaults. Input formats are pretty specific, but well defined.
Especially in the case of many markers and using parallel processing, memory consumption can be potentially very large and needs to be considered. Hopefully there is sig. room for improvement in this area.
I simulated 100 diploid individuals with 2 chromosomes, 10 QTL each, using the AlphaSimR library. Two correlated traits with additive and dominance effects. Main use is to exemplify the input formats and usage of predCrossVar functions.
To predict cross variance, 3 types of input are required:
We start with the included example simulation data. 100 individuals, 2 chromosomes with 10 QTL/chr.
The usual starting place for recombination frequency is a centimorgan-scale genetic map.
data(genmap)
str(genmap)
#> Named num [1:20] 0.0816 0.102 0.2653 0.3061 0.551 ...
#> - attr(*, "names")= chr [1:20] "1_5" "1_6" "1_14" "1_16" ...
# utility function to compute matrix of pairwise recombination frequencies.
recombFreqMat<-genmap2recombfreq(genmap,nChr = 2)
dim(recombFreqMat)
#> [1] 20 20
In practice, these matrices get big as they have dimension \(N_{snp} \times N_{snp}\). It is recommended to precompute and store the recombFreqMat
.
In predicting genetic variance in the \(F_1\) cross, the recombFreqMat
the matrix \(1-(2 \times recombFreqMat)\) is computed. To avoid having to do that separately for each family being predicted, I designed the prediction functions to require \(1-(2 \times recombFreqMat)\) as input, to save a great deal of computation. Therefore pre-compute it and store that on disk. Probably this can be improved on!
recombFreqMat<-1-(2*recombFreqMat)
recombFreqMat[1:5,1:5]
#> 1_5 1_6 1_14 1_16 1_28
#> 1_5 1.0000000 0.9600054 0.6925693 0.6382791 0.3911064
#> 1_6 0.9600054 1.0000000 0.7214223 0.6648703 0.4074002
#> 1_14 0.6925693 0.7214223 1.0000000 0.9216104 0.5647181
#> 1_16 0.6382791 0.6648703 0.9216104 1.0000000 0.6127514
#> 1_28 0.3911064 0.4074002 0.5647181 0.6127514 1.0000000
The example dataset haplos contains 2 chrom. with 10 loci each (columns) for 100 individuals (200 rows).
haploMat[1:5,1:10]
#> 1_5 1_6 1_14 1_16 1_28 1_31 1_33 1_35 1_40 1_41
#> 1_HapA 1 0 1 0 1 1 0 1 0 1
#> 1_HapB 1 0 1 0 0 1 0 0 1 0
#> 2_HapA 0 0 0 0 1 0 0 0 0 0
#> 2_HapB 1 0 0 1 1 0 1 1 1 0
#> 3_HapA 0 0 1 1 0 1 0 1 0 0
The example data come in the form of an list of row-matrices, which is useful for multivariate analyses we will get into later. There is also an example DomEffectsList available to load later.
data("AddEffectsList")
str(AddEffectsList)
#> List of 2
#> $ Trait1: num [1, 1:20] 0.218 0.03 0.232 0.343 -0.114 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:20] "1_5" "1_6" "1_14" "1_16" ...
#> $ Trait2: num [1, 1:20] -0.159 -0.297 -0.299 0.237 -0.716 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:20] "1_5" "1_6" "1_14" "1_16" ...
The first set of functions we will use are simplest. They only handle single traits. No covariance predictions.
To start, predict a single cross between parents “1” and “2”
# For the univariate case, input effects should be a column matrix.
addEffectsT1<-t(AddEffectsList$Trait1)
addEffectsT2<-t(AddEffectsList$Trait2)
dim(addEffectsT1)
#> [1] 20 1
# Trait 1
predCrossVarA(sireID = "1", damID = "2",
addEffects = addEffectsT1,
haploMat = haploMat,
recombFreqMat = recombFreqMat)
#> [1] "Variances predicted for family: 1x2- took 0.001 mins"
#> # A tibble: 1 x 3
#> varA[,1] segsnps computetime
#> <dbl> <list> <dbl>
#> 1 0.557 <chr [18]> 0.035
# Trait 2
predCrossVarA(sireID = "1", damID = "2",
addEffects = addEffectsT2,
haploMat = haploMat,
recombFreqMat = recombFreqMat)
#> [1] "Variances predicted for family: 1x2- took 0.001 mins"
#> # A tibble: 1 x 3
#> varA[,1] segsnps computetime
#> <dbl> <list> <dbl>
#> 1 0.502 <chr [18]> 0.047
Multiple families can be predicted, with an option for parallel processing across families with the runCrossVarPredsA()
(additive-only) and runCrossVarPredsAD()
(additive-plus-dominance) functions.
For example, we will pick three parents of interest.
There is a helper function crosses2predict()
to make a list of all pairwise matings given an input vecot of parent IDs. Includes self-crosses and the upper-triangle of a mating matrix (i.e. no reciprocal crosses).
ped2predict<-crosses2predict(parents = c("1","2","11"))
ped2predict
#> # A tibble: 6 x 2
#> sireID damID
#> <chr> <chr>
#> 1 1 1
#> 2 1 2
#> 3 1 11
#> 4 2 2
#> 5 2 11
#> 6 11 11
# Trait 1 - serial (DEFAULT, ncores = 1)
runCrossVarPredsA(ped = ped2predict,
addEffects = addEffectsT1,
haploMat = haploMat,
recombFreqMat = recombFreqMat)
#> Loading required package: furrr
#> Loading required package: future
#> [1] "Variances predicted for family: 1x1- took 0.001 mins"
#> [1] "Variances predicted for family: 1x2- took 0.001 mins"
#> [1] "Variances predicted for family: 1x11- took 0.001 mins"
#> [1] "Variances predicted for family: 2x2- took 0.001 mins"
#> [1] "Variances predicted for family: 2x11- took 0.001 mins"
#> [1] "Variances predicted for family: 11x11- took 0.001 mins"
#> [1] "Done predicting fam vars. Took 0.01 mins for 6 families"
#> # A tibble: 6 x 5
#> sireID damID varA[,1] segsnps computetime
#> <chr> <chr> <dbl> <list> <dbl>
#> 1 1 1 0.617 <chr [9]> 0.0580
#> 2 1 2 0.557 <chr [18]> 0.0500
#> 3 1 11 0.410 <chr [18]> 0.052
#> 4 2 2 0.497 <chr [12]> 0.052
#> 5 2 11 0.349 <chr [19]> 0.0550
#> 6 11 11 0.202 <chr [11]> 0.055
# Trait 1 - parallel (ncores = 6)
runCrossVarPredsA(ped = ped2predict,
addEffects = addEffectsT1,
haploMat = haploMat,
recombFreqMat = recombFreqMat,
ncores = 6)
#> [1] "Variances predicted for family: 1x1- took 0.001 mins"
#> [1] "Variances predicted for family: 1x2- took 0.001 mins"
#> [1] "Variances predicted for family: 1x11- took 0.001 mins"
#> [1] "Variances predicted for family: 2x2- took 0.001 mins"
#> [1] "Variances predicted for family: 2x11- took 0.001 mins"
#> [1] "Variances predicted for family: 11x11- took 0.001 mins"
#> [1] "Done predicting fam vars. Took 0.02 mins for 6 families"
#> # A tibble: 6 x 5
#> sireID damID varA[,1] segsnps computetime
#> <chr> <chr> <dbl> <list> <dbl>
#> 1 1 1 0.617 <chr [9]> 0.03
#> 2 1 2 0.557 <chr [18]> 0.03
#> 3 1 11 0.410 <chr [18]> 0.031
#> 4 2 2 0.497 <chr [12]> 0.0360
#> 5 2 11 0.349 <chr [19]> 0.032
#> 6 11 11 0.202 <chr [11]> 0.032
# Trait 2 - parallel (ncores = 6)
runCrossVarPredsA(ped = ped2predict,
addEffects = addEffectsT2,
haploMat = haploMat,
recombFreqMat = recombFreqMat,
ncores = 6)
#> [1] "Variances predicted for family: 1x1- took 0 mins"
#> [1] "Variances predicted for family: 1x2- took 0 mins"
#> [1] "Variances predicted for family: 1x11- took 0.001 mins"
#> [1] "Variances predicted for family: 2x2- took 0.001 mins"
#> [1] "Variances predicted for family: 2x11- took 0.001 mins"
#> [1] "Variances predicted for family: 11x11- took 0.001 mins"
#> [1] "Done predicting fam vars. Took 0.02 mins for 6 families"
#> # A tibble: 6 x 5
#> sireID damID varA[,1] segsnps computetime
#> <chr> <chr> <dbl> <list> <dbl>
#> 1 1 1 0.637 <chr [9]> 0.0300
#> 2 1 2 0.502 <chr [18]> 0.028
#> 3 1 11 0.505 <chr [18]> 0.0330
#> 4 2 2 0.367 <chr [12]> 0.033
#> 5 2 11 0.370 <chr [19]> 0.042
#> 6 11 11 0.373 <chr [11]> 0.0350
data("DomEffectsList")
domEffectsT1<-t(DomEffectsList$Trait1)
domEffectsT2<-t(DomEffectsList$Trait2)
# Trait 1
runCrossVarPredsAD(ped = ped2predict,
addEffects = addEffectsT1,
domEffects = domEffectsT1,
haploMat = haploMat,
recombFreqMat = recombFreqMat,
ncores = 6)
#> [1] "Variances predicted for family: 1x1- took 0 mins"
#> [1] "Variances predicted for family: 1x2- took 0.001 mins"
#> [1] "Variances predicted for family: 1x11- took 0 mins"
#> [1] "Variances predicted for family: 2x2- took 0.001 mins"
#> [1] "Variances predicted for family: 2x11- took 0.001 mins"
#> [1] "Variances predicted for family: 11x11- took 0 mins"
#> [1] "Done predicting fam vars. Took 0.02 mins for 6 families"
#> # A tibble: 6 x 6
#> sireID damID varA[,1] varD[,1] segsnps computetime
#> <chr> <chr> <dbl> <dbl> <list> <dbl>
#> 1 1 1 0.617 0.349 <chr [9]> 0.029
#> 2 1 2 0.557 0.363 <chr [18]> 0.03
#> 3 1 11 0.410 0.266 <chr [18]> 0.029
#> 4 2 2 0.497 0.497 <chr [12]> 0.03
#> 5 2 11 0.349 0.355 <chr [19]> 0.034
#> 6 11 11 0.202 0.349 <chr [11]> 0.0270
# Trait 2
runCrossVarPredsAD(ped = ped2predict,
addEffects = addEffectsT1,
domEffects = domEffectsT1,
haploMat = haploMat,
recombFreqMat = recombFreqMat,
ncores = 6)
#> [1] "Variances predicted for family: 1x1- took 0 mins"
#> [1] "Variances predicted for family: 1x2- took 0 mins"
#> [1] "Variances predicted for family: 1x11- took 0.001 mins"
#> [1] "Variances predicted for family: 2x2- took 0.001 mins"
#> [1] "Variances predicted for family: 2x11- took 0.001 mins"
#> [1] "Variances predicted for family: 11x11- took 0 mins"
#> [1] "Done predicting fam vars. Took 0.02 mins for 6 families"
#> # A tibble: 6 x 6
#> sireID damID varA[,1] varD[,1] segsnps computetime
#> <chr> <chr> <dbl> <dbl> <list> <dbl>
#> 1 1 1 0.617 0.349 <chr [9]> 0.0280
#> 2 1 2 0.557 0.363 <chr [18]> 0.029
#> 3 1 11 0.410 0.266 <chr [18]> 0.03
#> 4 2 2 0.497 0.497 <chr [12]> 0.0310
#> 5 2 11 0.349 0.355 <chr [19]> 0.03
#> 6 11 11 0.202 0.349 <chr [11]> 0.0290
(including covariances)
predOneCrossVarA()
-> predCrossVarsA()
-> runMtCrossVarPredsA()
runMtCrossVarPredsA()
and runMtCrossVarPredsAD()
should supersede in function all the others.
NOTES:
Arguments names are pretty opinionated and long, although not without good intention. Might be worth simplifying.
predOneCrossVarA()
: postMeanAlleleSubEffects=
could just be addEffectsLists=
mtpred<-runMtCrossVarPredsAD(CrossesToPredict = ped2predict,
AddEffectList = AddEffectsList,
DomEffectList = DomEffectsList,
haploMat = haploMat,
recombFreqMat = recombFreqMat,
ncores = 6)
#> [1] "Variances predicted for family: 1x1- took 0 mins"
#> [1] "Variances predicted for family: 1x2- took 0.001 mins"
#> [1] "Variances predicted for family: 1x11- took 0.001 mins"
#> [1] "Variances predicted for family: 2x2- took 0.001 mins"
#> [1] "Variances predicted for family: 2x11- took 0.001 mins"
#> [1] "Variances predicted for family: 11x11- took 0.001 mins"
#> [1] "Done predicting fam vars. Took 0.05 mins for 6 families"
#> [1] "Variances predicted for family: 1x1- took 0.001 mins"
#> [1] "Variances predicted for family: 1x2- took 0.001 mins"
#> [1] "Variances predicted for family: 1x11- took 0 mins"
#> [1] "Variances predicted for family: 2x2- took 0.001 mins"
#> [1] "Variances predicted for family: 2x11- took 0.001 mins"
#> [1] "Variances predicted for family: 11x11- took 0.001 mins"
#> [1] "Done predicting fam vars. Took 0.05 mins for 6 families"
#> [1] "Variances predicted for family: 1x1- took 0.001 mins"
#> [1] "Variances predicted for family: 1x2- took 0.001 mins"
#> [1] "Variances predicted for family: 1x11- took 0.001 mins"
#> [1] "Variances predicted for family: 2x2- took 0.001 mins"
#> [1] "Variances predicted for family: 2x11- took 0.001 mins"
#> [1] "Variances predicted for family: 11x11- took 0 mins"
#> [1] "Done predicting fam vars. Took 0.05 mins for 6 families"
The results are a nested list structure. Let’s unpack them…
library(tidyverse); library(magrittr)
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
#> ✓ ggplot2 3.3.2 ✓ dplyr 1.0.2
#> ✓ tibble 3.0.4 ✓ stringr 1.4.0
#> ✓ tidyr 1.1.2 ✓ forcats 0.5.0
#> ✓ readr 1.4.0
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()
#>
#> Attaching package: 'magrittr'
#> The following object is masked from 'package:tidyr':
#>
#> extract
#> The following object is masked from 'package:purrr':
#>
#> set_names
mtpred
#> $varcovars
#> # A tibble: 3 x 3
#> Trait1 Trait2 varcomps
#> <chr> <chr> <list>
#> 1 Trait1 Trait1 <named list [2]>
#> 2 Trait2 Trait2 <named list [2]>
#> 3 Trait1 Trait2 <named list [2]>
#>
#> $totcomputetime
#> elapsed
#> 9.268
The list element “varcovars”
mtpred$varcovars %>%
tidyr::unnest_wider(varcomps)
#> # A tibble: 3 x 4
#> Trait1 Trait2 predictedfamvars totcomputetime
#> <chr> <chr> <list> <dbl>
#> 1 Trait1 Trait1 <tibble [6 × 3]> 2.90
#> 2 Trait2 Trait2 <tibble [6 × 3]> 3.07
#> 3 Trait1 Trait2 <tibble [6 × 3]> 3.29
Unpack varcovars to get at the variance and covariance predictions for additive and dominances.
Variances are in column VPM. Output is in long-form.
mtpred$varcovars %>%
tidyr::unnest_wider(varcomps) %>%
dplyr::select(-totcomputetime) %>%
tidyr::unnest(predictedfamvars) %>%
tidyr::unnest(predVars) %>% head
#> # A tibble: 6 x 9
#> Trait1 Trait2 sireID damID VarComp VPM PMV Nsegsnps totcomputetime
#> <chr> <chr> <chr> <chr> <chr> <dbl> <lgl> <int> <dbl>
#> 1 Trait1 Trait1 1 1 VarA 0.617 NA 9 0.028
#> 2 Trait1 Trait1 1 1 VarD 0.349 NA NA NA
#> 3 Trait1 Trait1 1 2 VarA 0.557 NA 18 0.032
#> 4 Trait1 Trait1 1 2 VarD 0.363 NA NA NA
#> 5 Trait1 Trait1 1 11 VarA 0.410 NA 18 0.033
#> 6 Trait1 Trait1 1 11 VarD 0.266 NA NA NA
Cut out the extra columns and spread the additive-vs-dominance across columns, just to tidy up
mtpred$varcovars %>%
tidyr::unnest_wider(varcomps) %>%
dplyr::select(-totcomputetime) %>%
tidyr::unnest(predictedfamvars) %>%
tidyr::unnest(predVars) %>%
dplyr::select(-PMV,-Nsegsnps,-totcomputetime) %>%
spread(VarComp,VPM) %>% rmarkdown::paged_table()
Note that variance results Trait1==Trait2
should match the single-trait results and co-variance predictions are now included.
Lastly, let’s quickly verify that the additive-only function works and that the runMt___
functions won’t errory if we give just one cross / trait.
mtpred_A<-runMtCrossVarPredsA(CrossesToPredict = ped2predict,
AlleleSubEffectList = AddEffectsList,
haploMat = haploMat,
recombFreqMat = recombFreqMat,
ncores = 6)
#> [1] "Variances predicted for family: 1x1- took 0.001 mins"
#> [1] "Variances predicted for family: 1x2- took 0.001 mins"
#> [1] "Variances predicted for family: 1x11- took 0.001 mins"
#> [1] "Variances predicted for family: 2x2- took 0.001 mins"
#> [1] "Variances predicted for family: 2x11- took 0.001 mins"
#> [1] "Variances predicted for family: 11x11- took 0.001 mins"
#> [1] "Done predicting fam vars. Took 0.05 mins for 6 families"
#> [1] "Variances predicted for family: 1x1- took 0.001 mins"
#> [1] "Variances predicted for family: 1x2- took 0.001 mins"
#> [1] "Variances predicted for family: 1x11- took 0 mins"
#> [1] "Variances predicted for family: 2x2- took 0.001 mins"
#> [1] "Variances predicted for family: 2x11- took 0.001 mins"
#> [1] "Variances predicted for family: 11x11- took 0.001 mins"
#> [1] "Done predicting fam vars. Took 0.05 mins for 6 families"
#> [1] "Variances predicted for family: 1x1- took 0 mins"
#> [1] "Variances predicted for family: 1x2- took 0 mins"
#> [1] "Variances predicted for family: 1x11- took 0.001 mins"
#> [1] "Variances predicted for family: 2x2- took 0.001 mins"
#> [1] "Variances predicted for family: 2x11- took 0.001 mins"
#> [1] "Variances predicted for family: 11x11- took 0 mins"
#> [1] "Done predicting fam vars. Took 0.05 mins for 6 families"
mtpred_A$varcovars %>%
tidyr::unnest_wider(varcomps) %>%
dplyr::select(-totcomputetime) %>%
tidyr::unnest(predictedfamvars) %>%
tidyr::unnest(predVars) %>%
rmarkdown::paged_table()
st_addefflist<-AddEffectsList
st_addefflist$Trait2<-NULL
stpred_sc<-runMtCrossVarPredsA(CrossesToPredict = ped2predict[1,],
AlleleSubEffectList = st_addefflist,
haploMat = haploMat,
recombFreqMat = recombFreqMat,
ncores = 6)
#> [1] "Variances predicted for family: 1x1- took 0.001 mins"
#> [1] "Done predicting fam vars. Took 0.04 mins for 1 families"
stpred_sc$varcovars %>%
tidyr::unnest_wider(varcomps) %>%
dplyr::select(-totcomputetime) %>%
tidyr::unnest(predictedfamvars) %>%
tidyr::unnest(predVars) %>%
rmarkdown::paged_table()
data("AddEffectsList")
str(AddEffectsList)
#> List of 2
#> $ Trait1: num [1, 1:20] 0.218 0.03 0.232 0.343 -0.114 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:20] "1_5" "1_6" "1_14" "1_16" ...
#> $ Trait2: num [1, 1:20] -0.159 -0.297 -0.299 0.237 -0.716 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:20] "1_5" "1_6" "1_14" "1_16" ...
List of allele substitution effects.
Each list element is a matrix. One matrix per trait.
Each element of the list is named with a string identifying the trait and the colnames
of each matrix are labeled with snpIDs.
Dimensions of each matrix of marker effects
predType=="VPM"
should be used and each matrix of the AlleleSubEffectList
will have only one row.predType=="PMV"
when computing the predicted variance across a sample of marker effects, e.g. the thinned MCMC samples, usually stored on disk.