Last updated: 2021-08-31

Checks: 7 0

Knit directory: BreedingSchemeOpt/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20210422)

The command set.seed(20210422) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 211ab47

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 211ab47. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store
    Ignored:    analysis/images/.DS_Store
    Ignored:    data/.DS_Store
    Ignored:    output/.DS_Store

Untracked files:
    Untracked:  code/scrap.R
    Untracked:  data/baselineScheme.gsheet
    Untracked:  data/siErrorVarEst_byTrialType_directApproach_2021Aug25.rds
    Untracked:  output/benchmark_sim.rds
    Untracked:  output/benchmark_sims5.rds
    Untracked:  output/burnInSims_bsp1_iita_2021Aug27.rds
    Untracked:  output/burnInSims_bsp2_iita_2021Aug27.rds
    Untracked:  output/burnInSims_bsp3_iita_2021Aug27.rds
    Untracked:  output/burnInSims_iita_2021Aug27.rds
    Untracked:  output/burnIn_test.rds
    Untracked:  output/postBurnInGS_test.rds
    Untracked:  output/postBurnIn_test.rds
    Untracked:  output/test_burnIn_sim.rds

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/baselineSim.Rmd) and HTML (docs/baselineSim.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	211ab47	wolfemd	2021-08-31	Small test simulations with post burn-in GS vs. PS
Rmd	8d65f27	wolfemd	2021-08-30	Post burn-in test of new GS functions underway
html	e0d20bd	wolfemd	2021-08-27	Build site.
Rmd	9d369ee	wolfemd	2021-08-27	Publish burnInSims with the toy example completed and the full analysis almost ready to run.
html	e210a1f	wolfemd	2021-08-19	Build site.
Rmd	5914d8d	wolfemd	2021-08-19	Publish initial sims towards a baseline set of sims using runBreedingScheme_wBurnIn
Rmd	c2e379e	wolfemd	2021-08-13	Rebuild repo removing the “Group” part. Project is to contain BOTH the “Group” and a separate section just for the actual simulation analyses.

Previous steps

Empirical estimate of TrialType-specific error variances in terms of the IITA selection index (SELIND). See that analysis here.
Burn-in simulations

Set-up a singularity shell with R+OpenBLAS

This is optional and can be skipped.

If you want the advantage of multi-threaded BLAS to speed up predictions within the simulations, you need an R instance that is linked to OpenBLAS (another example is Microsoft R Open). For CBSU, the recommended approach is currently to use singularity shells provided by the “rocker” project. They even come pre-installed with tidyverse :). Linked to OpenBLAS, using a simple function RhpcBLASctl::blas_set_num_threads() I can add arguments to functions to control this feature. For optimal performance, it is import to balance the number of threads each R session uses for BLAS against any other form of parallel processing being used and considering total available system resources.

# 0) Pull a singularity image with OpenBLAS enabled R + tidyverse from rocker/
# singularity pull ~/rocker2.sif docker://rocker/tidyverse:latest;

# 1) start a screen shell 
screen; 
# 2) reserve interactive slurm
salloc -n 25 --mem 60G;
# 3) start the singularity Linux shell inside that
singularity shell ~/rocker2.sif; 
# Project directory, so R will use as working dir.
cd /home/mw489/BreedingSchemeOpt/;
# 3) Start R
export OMP_NUM_THREADS=1;
R

# Install genomicMateSelectR to user-accessible libPath
### In a singularity shell, sintall as follows:
libPath<-"/home/mw489/R/x86_64-pc-linux-gnu-library/4.1" # should be YOUR libPath
withr::with_libpaths(new=libPath, devtools::install_github("wolfemd/genomicMateSelectR", ref = 'master'))
### Else, simply
devtools::install_github("wolfemd/genomicMateSelectR", ref = 'master')
# Install my own forked repo of AlphaSimHlpR
withr::with_libpaths(new=libPath, install.packages("Rcpp"))
withr::with_libpaths(new=libPath, install.packages("AlphaSimR"))
withr::with_libpaths(new=libPath, install.packages("optiSel"))
withr::with_libpaths(new=libPath, install.packages("rgl"))
withr::with_libpaths(new=libPath, devtools::install_github("wolfemd/AlphaSimHlpR", ref = 'master', force=T))
# devtools::install_github("wolfemd/AlphaSimHlpR", ref = 'master', force=T)

A small example

Objective: develop a hopefully faster GS simulator which uses a fixed number of clones for predictions… hoping to constrain compute requirements better than using the trainingPopCycles parameter, without sacrificing much prediction accuracy.

For specifying a newBSP, there are some considerations. The bsp from previous run contains an entry bsp$checks specifying a pop-object with randomly chosen checks from the burn-in phase. For now, I will ensure the burn-in checks are used post burn-in. Keep it simple. For that reason, newBSP should not alter the nChks value. If it does, might cause code to break…

Set-up my own population improvement function
Added some options to bsp objects to control which phenotype records are used, which clones are includedin the TP, and which clones are considered selection candidates
JL’s original popImprov1Cyc drew candidates from the full records$F1@id (excluding potentially indivs only scored during the current year, if useCurrentPhenoTrain=FALSE).
My changes:
- nTrainPopCycles: draw training pop clones only from this number of recent cycles.
- nYrsAsCandidates: candidates for selection only from this number of recent years
- maxTrainingPopSize: From the lines in the most recent cycles (indicated by nTrainPopCycles), subsample this number of lines for training data.
  - This is in addition to the “check” (bsp$checks@id) and the lines indicates as selection candidates according to the setting of nYrsAsCandidates.
  - Phenotypic records of candidates will also be included in any predictions.
  - All “historical” data will always be used, but the number of maximum training lines will be held constant.
  - Replaces the stage-specific bsp$trainingPopCycles, which will be unused in this pipeline, but not deleted from the package.

suppressMessages(library(AlphaSimHlpR))
suppressMessages(library(tidyverse))
suppressMessages(library(genomicMateSelectR))
select <- dplyr::select

schemeDF<-read.csv(here::here("data","baselineScheme - Test.csv"), 
                   header = T, stringsAsFactors = F)

source(here::here("code","runSchemesPostBurnIn.R"))

simulations<-readRDS(here::here("output","burnIn_test.rds"))

Example

newBSP<-specifyBSP(schemeDF = schemeDF,
                   nChr = 3,effPopSize = 100,quickHaplo = F,
                   segSites = 400, nQTL = 40, nSNP = 100, genVar = 40,
                   gxeVar = NULL, gxyVar = 15, gxlVar = 10,gxyxlVar = 5,
                   meanDD = 0.5,varDD = 0.01,relAA = 0.5,
                   stageToGenotype = "PYT",
                   nParents = 10, nCrosses = 4, nProgeny = 50,nClonesToNCRP = 3,
                   phenoF1toStage1 = T,errVarPreStage1 = 500,
                   useCurrentPhenoTrain = F, 
                   nCyclesToKeepRecords = 30,
                   selCritPipeAdv = selCritIID, 
                   selCritPopImprov =  selCritIID,
                   nTrainPopCycles=6,
                   nYrsAsCandidates=2,
                   maxTrainingPopSize=500)

start<-proc.time()[3]
postBurnInGS_test<-runSchemesPostBurnIn(simulations = simulations,
                                        newBSP=newBSP,
                                        nPostBurnInCycles=10,
                                        selCritPop="parentSelCritGEBV",
                                        selCritPipe="selCritIID",
                                        productFunc="productPipeline",
                                        popImprovFunc="popImprovByParentSel",
                                        ncores=4,
                                        nBLASthreads=4)
saveRDS(postBurnInGS_test,file = here::here("output","postBurnInGS_test.rds"))
end<-proc.time()[3]; print(paste0((end-start)/60," mins elapsed"))
# [1] "3.8908 mins elapsed"

postBurnInGS<-readRDS(here::here("output","postBurnInGS_test.rds"))
postBurnInPS<-readRDS(here::here("output","postBurnIn_test.rds"))

forSimPlot<-postBurnInGS %>% 
  mutate(PostBurnIn="GS") %>% 
  bind_rows(postBurnInPS %>% 
              mutate(PostBurnIn="PS")) %>% 
  unnest_wider(SimOutput) %>% 
  select(SimRep,PostBurnIn,records) %>% 
  unnest_wider(records) %>% 
  select(SimRep,PostBurnIn,stageOutputs) %>% 
  unnest() %>% 
  filter(stage=="F1") %>% 
  mutate(YearPostBurnIn=year-10)
 
library(patchwork)
meanGplot<-forSimPlot %>% 
  group_by(PostBurnIn,YearPostBurnIn,year,stage) %>% 
  summarize(meanGenMean=mean(genValMean),
            seGenMean=sd(genValMean)/n()) %>% 
  ggplot(.,aes(x=YearPostBurnIn)) +
  geom_ribbon(aes(ymin = meanGenMean - seGenMean, 
                  ymax = meanGenMean + seGenMean,
                  fill = PostBurnIn), 
              alpha=0.75) + 
  geom_line(aes(y = meanGenMean, color=PostBurnIn))
sdGplot<-forSimPlot %>% 
  group_by(PostBurnIn,YearPostBurnIn,year,stage) %>% 
  summarize(meanGenSD=mean(genValSD),
            seGenSD=sd(genValSD)/n()) %>% 
  ggplot(.,aes(x=YearPostBurnIn)) +
  geom_ribbon(aes(ymin = meanGenSD - seGenSD, 
                  ymax = meanGenSD + seGenSD,
                  fill = PostBurnIn), 
              alpha=0.75) + 
  geom_line(aes(y = meanGenSD, group=PostBurnIn))
(meanGplot | sdGplot) & theme_bw() & geom_vline(xintercept = 0, color='darkred')

Benchmark “full-scale” GS cycles

Debug

# burnInSim<-simulations$burnInSim[[1]]
# selCritPop="parentSelCritGEBV";
# selCritPipe="selCritIID";
# productFunc="productPipeline";
# popImprovFunc="popImprov1Cyc";
# newBSP=burnInSim$bsp
# newBSP[["maxTrainingPopSize"]]<-500
# newBSP[["nYrsAsCandidates"]]<-2
# newBSP[["nTrainPopCycles"]]<-10
# newBSP$checks<-NULL

# runSchemesPostBurnIn<-function(simulations,
#                                newBSP=NULL, # so you can change the scheme entirely after burn-in
#                                nPostBurnInCycles,
#                                selCritPop="selCritIID",
#                                selCritPipe="selCritIID",
#                                productFunc="productPipeline",
#                                popImprovFunc="popImprovByParentSel",
#                                ncores=1,
#                                nBLASthreads=NULL,nThreadsMacs2=NULL){
# 
#   require(furrr); plan(multisession, workers = ncores)
#   options(future.globals.maxSize=+Inf); options(future.rng.onMisuse="ignore")
# 
#   simulations<-simulations %>%
#     mutate(SimOutput=future_map2(SimRep,burnInSim,function(SimRep,burnInSim,...){
#       # debug 
#       # burnInSim<-simulations$burnInSim[[1]]
#       if(!is.null(nBLASthreads)) { RhpcBLASctl::blas_set_num_threads(nBLASthreads) }
#       cat("******", SimRep, "\n")
# 
#       # This CONTINUES where previous sims left off
#       ## no initialize step
#       ## Keep burn-in stage sim params "SP"
#       SP<-burnInSim$SP
#       ## specify a potentially new bsp object 
#       ## (keep checks stored in burn-in stage's bsp)
#       if(!is.null(newBSP)){ 
#         bsp<-newBSP; bsp$checks<-burnInSim$bsp$checks 
#       } else { bsp<-burnInSim$bsp }
#       ## 'historical' records from burn-in
#       records<-burnInSim$records
#       ## override burn-in specified product and population improvement funcs 
#       bsp[["productPipeline"]] <- get(productFunc)
#       bsp[["populationImprovement"]] <- get(popImprovFunc)
#       bsp[["selCritPipeAdv"]] <- get(selCritPipe)
#       bsp[["selCritPopImprov"]] <- get(selCritPop)
# 
#       # Post burn-in cycles
#       cat("\n"); cat("Post burn-in cycles"); cat("\n")
#       for (cycle in 1:nPostBurnInCycles){
#         cat(cycle, " ")
#         records <- bsp$productPipeline(records, bsp, SP)
#         records <- bsp$populationImprovement(records, bsp, SP)
#       }
# 
#       # Finalize the stageOutputs
#       records <- AlphaSimHlpR:::lastCycStgOut(records, bsp, SP)
# 
#       return(list(records=records,
#                   bsp=bsp,
#                   SP=SP))
#     },
#     nPostBurnInCycles=nPostBurnInCycles,
#     selCritPop=selCritPop,
#     selCritPipe=selCritPipe,
#     productFunc=productFunc,
#     popImprovFunc=popImprovFunc,
#     nBLASthreads=nBLASthreads,
#     nThreadsMacs2=nThreadsMacs2))
#   plan(sequential)
#   return(simulations)
# }

sessionInfo()

R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] patchwork_1.1.1          genomicMateSelectR_0.2.0 forcats_0.5.1           
 [4] stringr_1.4.0            purrr_0.3.4              readr_2.0.1             
 [7] tidyr_1.1.3              tibble_3.1.4             ggplot2_3.3.5           
[10] tidyverse_1.3.1          AlphaSimHlpR_0.2.1       dplyr_1.0.7             
[13] AlphaSimR_1.0.3          R6_2.5.1                 workflowr_1.6.2         

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7       here_1.0.1       lubridate_1.7.10 assertthat_0.2.1
 [5] rprojroot_2.0.2  digest_0.6.27    utf8_1.2.2       cellranger_1.1.0
 [9] backports_1.2.1  reprex_2.0.1     evaluate_0.14    highr_0.9       
[13] httr_1.4.2       pillar_1.6.2     rlang_0.4.11     readxl_1.3.1    
[17] rstudioapi_0.13  whisker_0.4      jquerylib_0.1.4  rmarkdown_2.10  
[21] labeling_0.4.2   munsell_0.5.0    broom_0.7.9      compiler_4.1.0  
[25] httpuv_1.6.2     modelr_0.1.8     xfun_0.25        pkgconfig_2.0.3 
[29] htmltools_0.5.2  tidyselect_1.1.1 fansi_0.5.0      crayon_1.4.1    
[33] tzdb_0.1.2       dbplyr_2.1.1     withr_2.4.2      later_1.3.0     
[37] grid_4.1.0       jsonlite_1.7.2   gtable_0.3.0     lifecycle_1.0.0 
[41] DBI_1.1.1        git2r_0.28.0     magrittr_2.0.1   scales_1.1.1    
[45] cli_3.0.1        stringi_1.7.4    farver_2.1.0     fs_1.5.0        
[49] promises_1.2.0.1 xml2_1.3.2       bslib_0.2.5.1    ellipsis_0.3.2  
[53] generics_0.1.0   vctrs_0.3.8      tools_4.1.0      glue_1.4.2      
[57] hms_1.1.0        fastmap_1.1.0    yaml_2.2.1       colorspace_2.0-2
[61] rvest_1.0.1      knitr_1.33       haven_2.4.3      sass_0.4.0

Baseline simulation of cassava GS

Marnin Wolfe