Last updated: 2020-10-09
Checks: 7 0
Knit directory: NaCRRI_2020GS/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20200826)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version bcbcec3. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rproj.user/
Ignored: analysis/.DS_Store
Ignored: data/.DS_Store
Untracked files:
Untracked: data/Report-DCas20-5419/
Untracked: output/BeagleLogs/
Untracked: output/DosageMatrix_DCas20_5419_EA_REFimputedAndFiltered.rds
Untracked: output/DosageMatrix_DCas20_5419_LA_REFimputedAndFiltered.rds
Untracked: output/chr10_DCas20_5419_EA_REFimputed.INFO
Untracked: output/chr10_DCas20_5419_EA_REFimputed.hwe
Untracked: output/chr10_DCas20_5419_EA_REFimputed.log
Untracked: output/chr10_DCas20_5419_EA_REFimputed.sitesPassing
Untracked: output/chr10_DCas20_5419_EA_REFimputed.vcf.gz
Untracked: output/chr10_DCas20_5419_EA_REFimputedAndFiltered.bed
Untracked: output/chr10_DCas20_5419_EA_REFimputedAndFiltered.bim
Untracked: output/chr10_DCas20_5419_EA_REFimputedAndFiltered.fam
Untracked: output/chr10_DCas20_5419_EA_REFimputedAndFiltered.log
Untracked: output/chr10_DCas20_5419_EA_REFimputedAndFiltered.nosex
Untracked: output/chr10_DCas20_5419_EA_REFimputedAndFiltered.raw
Untracked: output/chr10_DCas20_5419_EA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr10_DCas20_5419_EAplusLA_REFimputedAndFiltered.log
Untracked: output/chr10_DCas20_5419_LA_REFimputed.log
Untracked: output/chr10_DCas20_5419_LA_REFimputed.vcf.gz
Untracked: output/chr10_DCas20_5419_LA_REFimputedAndFiltered.bed
Untracked: output/chr10_DCas20_5419_LA_REFimputedAndFiltered.bim
Untracked: output/chr10_DCas20_5419_LA_REFimputedAndFiltered.fam
Untracked: output/chr10_DCas20_5419_LA_REFimputedAndFiltered.log
Untracked: output/chr10_DCas20_5419_LA_REFimputedAndFiltered.nosex
Untracked: output/chr10_DCas20_5419_LA_REFimputedAndFiltered.raw
Untracked: output/chr10_DCas20_5419_LA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr11_DCas20_5419_EA_REFimputed.INFO
Untracked: output/chr11_DCas20_5419_EA_REFimputed.hwe
Untracked: output/chr11_DCas20_5419_EA_REFimputed.log
Untracked: output/chr11_DCas20_5419_EA_REFimputed.sitesPassing
Untracked: output/chr11_DCas20_5419_EA_REFimputed.vcf.gz
Untracked: output/chr11_DCas20_5419_EA_REFimputedAndFiltered.bed
Untracked: output/chr11_DCas20_5419_EA_REFimputedAndFiltered.bim
Untracked: output/chr11_DCas20_5419_EA_REFimputedAndFiltered.fam
Untracked: output/chr11_DCas20_5419_EA_REFimputedAndFiltered.log
Untracked: output/chr11_DCas20_5419_EA_REFimputedAndFiltered.nosex
Untracked: output/chr11_DCas20_5419_EA_REFimputedAndFiltered.raw
Untracked: output/chr11_DCas20_5419_EA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr11_DCas20_5419_EAplusLA_REFimputedAndFiltered.log
Untracked: output/chr11_DCas20_5419_LA_REFimputed.log
Untracked: output/chr11_DCas20_5419_LA_REFimputed.vcf.gz
Untracked: output/chr11_DCas20_5419_LA_REFimputedAndFiltered.bed
Untracked: output/chr11_DCas20_5419_LA_REFimputedAndFiltered.bim
Untracked: output/chr11_DCas20_5419_LA_REFimputedAndFiltered.fam
Untracked: output/chr11_DCas20_5419_LA_REFimputedAndFiltered.log
Untracked: output/chr11_DCas20_5419_LA_REFimputedAndFiltered.nosex
Untracked: output/chr11_DCas20_5419_LA_REFimputedAndFiltered.raw
Untracked: output/chr11_DCas20_5419_LA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr12_DCas20_5419_EA_REFimputed.INFO
Untracked: output/chr12_DCas20_5419_EA_REFimputed.hwe
Untracked: output/chr12_DCas20_5419_EA_REFimputed.log
Untracked: output/chr12_DCas20_5419_EA_REFimputed.sitesPassing
Untracked: output/chr12_DCas20_5419_EA_REFimputed.vcf.gz
Untracked: output/chr12_DCas20_5419_EA_REFimputedAndFiltered.bed
Untracked: output/chr12_DCas20_5419_EA_REFimputedAndFiltered.bim
Untracked: output/chr12_DCas20_5419_EA_REFimputedAndFiltered.fam
Untracked: output/chr12_DCas20_5419_EA_REFimputedAndFiltered.log
Untracked: output/chr12_DCas20_5419_EA_REFimputedAndFiltered.nosex
Untracked: output/chr12_DCas20_5419_EA_REFimputedAndFiltered.raw
Untracked: output/chr12_DCas20_5419_EA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr12_DCas20_5419_EAplusLA_REFimputedAndFiltered.log
Untracked: output/chr12_DCas20_5419_LA_REFimputed.log
Untracked: output/chr12_DCas20_5419_LA_REFimputed.vcf.gz
Untracked: output/chr12_DCas20_5419_LA_REFimputedAndFiltered.bed
Untracked: output/chr12_DCas20_5419_LA_REFimputedAndFiltered.bim
Untracked: output/chr12_DCas20_5419_LA_REFimputedAndFiltered.fam
Untracked: output/chr12_DCas20_5419_LA_REFimputedAndFiltered.log
Untracked: output/chr12_DCas20_5419_LA_REFimputedAndFiltered.nosex
Untracked: output/chr12_DCas20_5419_LA_REFimputedAndFiltered.raw
Untracked: output/chr12_DCas20_5419_LA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr13_DCas20_5419_EA_REFimputed.INFO
Untracked: output/chr13_DCas20_5419_EA_REFimputed.hwe
Untracked: output/chr13_DCas20_5419_EA_REFimputed.log
Untracked: output/chr13_DCas20_5419_EA_REFimputed.sitesPassing
Untracked: output/chr13_DCas20_5419_EA_REFimputed.vcf.gz
Untracked: output/chr13_DCas20_5419_EA_REFimputedAndFiltered.bed
Untracked: output/chr13_DCas20_5419_EA_REFimputedAndFiltered.bim
Untracked: output/chr13_DCas20_5419_EA_REFimputedAndFiltered.fam
Untracked: output/chr13_DCas20_5419_EA_REFimputedAndFiltered.log
Untracked: output/chr13_DCas20_5419_EA_REFimputedAndFiltered.nosex
Untracked: output/chr13_DCas20_5419_EA_REFimputedAndFiltered.raw
Untracked: output/chr13_DCas20_5419_EA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr13_DCas20_5419_EAplusLA_REFimputedAndFiltered.log
Untracked: output/chr13_DCas20_5419_LA_REFimputed.log
Untracked: output/chr13_DCas20_5419_LA_REFimputed.vcf.gz
Untracked: output/chr13_DCas20_5419_LA_REFimputedAndFiltered.bed
Untracked: output/chr13_DCas20_5419_LA_REFimputedAndFiltered.bim
Untracked: output/chr13_DCas20_5419_LA_REFimputedAndFiltered.fam
Untracked: output/chr13_DCas20_5419_LA_REFimputedAndFiltered.log
Untracked: output/chr13_DCas20_5419_LA_REFimputedAndFiltered.nosex
Untracked: output/chr13_DCas20_5419_LA_REFimputedAndFiltered.raw
Untracked: output/chr13_DCas20_5419_LA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr14_DCas20_5419_EA_REFimputed.INFO
Untracked: output/chr14_DCas20_5419_EA_REFimputed.hwe
Untracked: output/chr14_DCas20_5419_EA_REFimputed.log
Untracked: output/chr14_DCas20_5419_EA_REFimputed.sitesPassing
Untracked: output/chr14_DCas20_5419_EA_REFimputed.vcf.gz
Untracked: output/chr14_DCas20_5419_EA_REFimputedAndFiltered.bed
Untracked: output/chr14_DCas20_5419_EA_REFimputedAndFiltered.bim
Untracked: output/chr14_DCas20_5419_EA_REFimputedAndFiltered.fam
Untracked: output/chr14_DCas20_5419_EA_REFimputedAndFiltered.log
Untracked: output/chr14_DCas20_5419_EA_REFimputedAndFiltered.nosex
Untracked: output/chr14_DCas20_5419_EA_REFimputedAndFiltered.raw
Untracked: output/chr14_DCas20_5419_EA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr14_DCas20_5419_EAplusLA_REFimputedAndFiltered.log
Untracked: output/chr14_DCas20_5419_LA_REFimputed.log
Untracked: output/chr14_DCas20_5419_LA_REFimputed.vcf.gz
Untracked: output/chr14_DCas20_5419_LA_REFimputedAndFiltered.bed
Untracked: output/chr14_DCas20_5419_LA_REFimputedAndFiltered.bim
Untracked: output/chr14_DCas20_5419_LA_REFimputedAndFiltered.fam
Untracked: output/chr14_DCas20_5419_LA_REFimputedAndFiltered.log
Untracked: output/chr14_DCas20_5419_LA_REFimputedAndFiltered.nosex
Untracked: output/chr14_DCas20_5419_LA_REFimputedAndFiltered.raw
Untracked: output/chr14_DCas20_5419_LA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr15_DCas20_5419_EA_REFimputed.INFO
Untracked: output/chr15_DCas20_5419_EA_REFimputed.hwe
Untracked: output/chr15_DCas20_5419_EA_REFimputed.log
Untracked: output/chr15_DCas20_5419_EA_REFimputed.sitesPassing
Untracked: output/chr15_DCas20_5419_EA_REFimputed.vcf.gz
Untracked: output/chr15_DCas20_5419_EA_REFimputedAndFiltered.bed
Untracked: output/chr15_DCas20_5419_EA_REFimputedAndFiltered.bim
Untracked: output/chr15_DCas20_5419_EA_REFimputedAndFiltered.fam
Untracked: output/chr15_DCas20_5419_EA_REFimputedAndFiltered.log
Untracked: output/chr15_DCas20_5419_EA_REFimputedAndFiltered.nosex
Untracked: output/chr15_DCas20_5419_EA_REFimputedAndFiltered.raw
Untracked: output/chr15_DCas20_5419_EA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr15_DCas20_5419_EAplusLA_REFimputedAndFiltered.log
Untracked: output/chr15_DCas20_5419_LA_REFimputed.log
Untracked: output/chr15_DCas20_5419_LA_REFimputed.vcf.gz
Untracked: output/chr15_DCas20_5419_LA_REFimputedAndFiltered.bed
Untracked: output/chr15_DCas20_5419_LA_REFimputedAndFiltered.bim
Untracked: output/chr15_DCas20_5419_LA_REFimputedAndFiltered.fam
Untracked: output/chr15_DCas20_5419_LA_REFimputedAndFiltered.log
Untracked: output/chr15_DCas20_5419_LA_REFimputedAndFiltered.nosex
Untracked: output/chr15_DCas20_5419_LA_REFimputedAndFiltered.raw
Untracked: output/chr15_DCas20_5419_LA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr16_DCas20_5419_EA_REFimputed.INFO
Untracked: output/chr16_DCas20_5419_EA_REFimputed.hwe
Untracked: output/chr16_DCas20_5419_EA_REFimputed.log
Untracked: output/chr16_DCas20_5419_EA_REFimputed.sitesPassing
Untracked: output/chr16_DCas20_5419_EA_REFimputed.vcf.gz
Untracked: output/chr16_DCas20_5419_EA_REFimputedAndFiltered.bed
Untracked: output/chr16_DCas20_5419_EA_REFimputedAndFiltered.bim
Untracked: output/chr16_DCas20_5419_EA_REFimputedAndFiltered.fam
Untracked: output/chr16_DCas20_5419_EA_REFimputedAndFiltered.log
Untracked: output/chr16_DCas20_5419_EA_REFimputedAndFiltered.nosex
Untracked: output/chr16_DCas20_5419_EA_REFimputedAndFiltered.raw
Untracked: output/chr16_DCas20_5419_EA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr16_DCas20_5419_EAplusLA_REFimputedAndFiltered.log
Untracked: output/chr16_DCas20_5419_LA_REFimputed.log
Untracked: output/chr16_DCas20_5419_LA_REFimputed.vcf.gz
Untracked: output/chr16_DCas20_5419_LA_REFimputedAndFiltered.bed
Untracked: output/chr16_DCas20_5419_LA_REFimputedAndFiltered.bim
Untracked: output/chr16_DCas20_5419_LA_REFimputedAndFiltered.fam
Untracked: output/chr16_DCas20_5419_LA_REFimputedAndFiltered.log
Untracked: output/chr16_DCas20_5419_LA_REFimputedAndFiltered.nosex
Untracked: output/chr16_DCas20_5419_LA_REFimputedAndFiltered.raw
Untracked: output/chr16_DCas20_5419_LA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr17_DCas20_5419_EA_REFimputed.INFO
Untracked: output/chr17_DCas20_5419_EA_REFimputed.hwe
Untracked: output/chr17_DCas20_5419_EA_REFimputed.log
Untracked: output/chr17_DCas20_5419_EA_REFimputed.sitesPassing
Untracked: output/chr17_DCas20_5419_EA_REFimputed.vcf.gz
Untracked: output/chr17_DCas20_5419_EA_REFimputedAndFiltered.bed
Untracked: output/chr17_DCas20_5419_EA_REFimputedAndFiltered.bim
Untracked: output/chr17_DCas20_5419_EA_REFimputedAndFiltered.fam
Untracked: output/chr17_DCas20_5419_EA_REFimputedAndFiltered.log
Untracked: output/chr17_DCas20_5419_EA_REFimputedAndFiltered.nosex
Untracked: output/chr17_DCas20_5419_EA_REFimputedAndFiltered.raw
Untracked: output/chr17_DCas20_5419_EA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr17_DCas20_5419_EAplusLA_REFimputedAndFiltered.log
Untracked: output/chr17_DCas20_5419_LA_REFimputed.log
Untracked: output/chr17_DCas20_5419_LA_REFimputed.vcf.gz
Untracked: output/chr17_DCas20_5419_LA_REFimputedAndFiltered.bed
Untracked: output/chr17_DCas20_5419_LA_REFimputedAndFiltered.bim
Untracked: output/chr17_DCas20_5419_LA_REFimputedAndFiltered.fam
Untracked: output/chr17_DCas20_5419_LA_REFimputedAndFiltered.log
Untracked: output/chr17_DCas20_5419_LA_REFimputedAndFiltered.nosex
Untracked: output/chr17_DCas20_5419_LA_REFimputedAndFiltered.raw
Untracked: output/chr17_DCas20_5419_LA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr18_DCas20_5419_EA_REFimputed.INFO
Untracked: output/chr18_DCas20_5419_EA_REFimputed.hwe
Untracked: output/chr18_DCas20_5419_EA_REFimputed.log
Untracked: output/chr18_DCas20_5419_EA_REFimputed.sitesPassing
Untracked: output/chr18_DCas20_5419_EA_REFimputed.vcf.gz
Untracked: output/chr18_DCas20_5419_EA_REFimputedAndFiltered.bed
Untracked: output/chr18_DCas20_5419_EA_REFimputedAndFiltered.bim
Untracked: output/chr18_DCas20_5419_EA_REFimputedAndFiltered.fam
Untracked: output/chr18_DCas20_5419_EA_REFimputedAndFiltered.log
Untracked: output/chr18_DCas20_5419_EA_REFimputedAndFiltered.nosex
Untracked: output/chr18_DCas20_5419_EA_REFimputedAndFiltered.raw
Untracked: output/chr18_DCas20_5419_EA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr18_DCas20_5419_EAplusLA_REFimputedAndFiltered.log
Untracked: output/chr18_DCas20_5419_LA_REFimputed.log
Untracked: output/chr18_DCas20_5419_LA_REFimputed.vcf.gz
Untracked: output/chr18_DCas20_5419_LA_REFimputedAndFiltered.bed
Untracked: output/chr18_DCas20_5419_LA_REFimputedAndFiltered.bim
Untracked: output/chr18_DCas20_5419_LA_REFimputedAndFiltered.fam
Untracked: output/chr18_DCas20_5419_LA_REFimputedAndFiltered.log
Untracked: output/chr18_DCas20_5419_LA_REFimputedAndFiltered.nosex
Untracked: output/chr18_DCas20_5419_LA_REFimputedAndFiltered.raw
Untracked: output/chr18_DCas20_5419_LA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr1_DCas20_5419_EA_REFimputed.INFO
Untracked: output/chr1_DCas20_5419_EA_REFimputed.hwe
Untracked: output/chr1_DCas20_5419_EA_REFimputed.log
Untracked: output/chr1_DCas20_5419_EA_REFimputed.sitesPassing
Untracked: output/chr1_DCas20_5419_EA_REFimputed.vcf.gz
Untracked: output/chr1_DCas20_5419_EA_REFimputedAndFiltered.bed
Untracked: output/chr1_DCas20_5419_EA_REFimputedAndFiltered.bim
Untracked: output/chr1_DCas20_5419_EA_REFimputedAndFiltered.fam
Untracked: output/chr1_DCas20_5419_EA_REFimputedAndFiltered.log
Untracked: output/chr1_DCas20_5419_EA_REFimputedAndFiltered.nosex
Untracked: output/chr1_DCas20_5419_EA_REFimputedAndFiltered.raw
Untracked: output/chr1_DCas20_5419_EA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr1_DCas20_5419_EAplusLA_REFimputedAndFiltered.log
Untracked: output/chr1_DCas20_5419_LA_REFimputed.log
Untracked: output/chr1_DCas20_5419_LA_REFimputed.vcf.gz
Untracked: output/chr1_DCas20_5419_LA_REFimputedAndFiltered.bed
Untracked: output/chr1_DCas20_5419_LA_REFimputedAndFiltered.bim
Untracked: output/chr1_DCas20_5419_LA_REFimputedAndFiltered.fam
Untracked: output/chr1_DCas20_5419_LA_REFimputedAndFiltered.log
Untracked: output/chr1_DCas20_5419_LA_REFimputedAndFiltered.nosex
Untracked: output/chr1_DCas20_5419_LA_REFimputedAndFiltered.raw
Untracked: output/chr1_DCas20_5419_LA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr2_DCas20_5419_EA_REFimputed.INFO
Untracked: output/chr2_DCas20_5419_EA_REFimputed.hwe
Untracked: output/chr2_DCas20_5419_EA_REFimputed.log
Untracked: output/chr2_DCas20_5419_EA_REFimputed.sitesPassing
Untracked: output/chr2_DCas20_5419_EA_REFimputed.vcf.gz
Untracked: output/chr2_DCas20_5419_EA_REFimputedAndFiltered.bed
Untracked: output/chr2_DCas20_5419_EA_REFimputedAndFiltered.bim
Untracked: output/chr2_DCas20_5419_EA_REFimputedAndFiltered.fam
Untracked: output/chr2_DCas20_5419_EA_REFimputedAndFiltered.log
Untracked: output/chr2_DCas20_5419_EA_REFimputedAndFiltered.nosex
Untracked: output/chr2_DCas20_5419_EA_REFimputedAndFiltered.raw
Untracked: output/chr2_DCas20_5419_EA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr2_DCas20_5419_EAplusLA_REFimputedAndFiltered.log
Untracked: output/chr2_DCas20_5419_LA_REFimputed.log
Untracked: output/chr2_DCas20_5419_LA_REFimputed.vcf.gz
Untracked: output/chr2_DCas20_5419_LA_REFimputedAndFiltered.bed
Untracked: output/chr2_DCas20_5419_LA_REFimputedAndFiltered.bim
Untracked: output/chr2_DCas20_5419_LA_REFimputedAndFiltered.fam
Untracked: output/chr2_DCas20_5419_LA_REFimputedAndFiltered.log
Untracked: output/chr2_DCas20_5419_LA_REFimputedAndFiltered.nosex
Untracked: output/chr2_DCas20_5419_LA_REFimputedAndFiltered.raw
Untracked: output/chr2_DCas20_5419_LA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr3_DCas20_5419_EA_REFimputed.INFO
Untracked: output/chr3_DCas20_5419_EA_REFimputed.hwe
Untracked: output/chr3_DCas20_5419_EA_REFimputed.log
Untracked: output/chr3_DCas20_5419_EA_REFimputed.sitesPassing
Untracked: output/chr3_DCas20_5419_EA_REFimputed.vcf.gz
Untracked: output/chr3_DCas20_5419_EA_REFimputedAndFiltered.bed
Untracked: output/chr3_DCas20_5419_EA_REFimputedAndFiltered.bim
Untracked: output/chr3_DCas20_5419_EA_REFimputedAndFiltered.fam
Untracked: output/chr3_DCas20_5419_EA_REFimputedAndFiltered.log
Untracked: output/chr3_DCas20_5419_EA_REFimputedAndFiltered.nosex
Untracked: output/chr3_DCas20_5419_EA_REFimputedAndFiltered.raw
Untracked: output/chr3_DCas20_5419_EA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr3_DCas20_5419_EAplusLA_REFimputedAndFiltered.log
Untracked: output/chr3_DCas20_5419_LA_REFimputed.log
Untracked: output/chr3_DCas20_5419_LA_REFimputed.vcf.gz
Untracked: output/chr3_DCas20_5419_LA_REFimputedAndFiltered.bed
Untracked: output/chr3_DCas20_5419_LA_REFimputedAndFiltered.bim
Untracked: output/chr3_DCas20_5419_LA_REFimputedAndFiltered.fam
Untracked: output/chr3_DCas20_5419_LA_REFimputedAndFiltered.log
Untracked: output/chr3_DCas20_5419_LA_REFimputedAndFiltered.nosex
Untracked: output/chr3_DCas20_5419_LA_REFimputedAndFiltered.raw
Untracked: output/chr3_DCas20_5419_LA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr4_DCas20_5419_EA_REFimputed.INFO
Untracked: output/chr4_DCas20_5419_EA_REFimputed.hwe
Untracked: output/chr4_DCas20_5419_EA_REFimputed.log
Untracked: output/chr4_DCas20_5419_EA_REFimputed.sitesPassing
Untracked: output/chr4_DCas20_5419_EA_REFimputed.vcf.gz
Untracked: output/chr4_DCas20_5419_EA_REFimputedAndFiltered.bed
Untracked: output/chr4_DCas20_5419_EA_REFimputedAndFiltered.bim
Untracked: output/chr4_DCas20_5419_EA_REFimputedAndFiltered.fam
Untracked: output/chr4_DCas20_5419_EA_REFimputedAndFiltered.log
Untracked: output/chr4_DCas20_5419_EA_REFimputedAndFiltered.nosex
Untracked: output/chr4_DCas20_5419_EA_REFimputedAndFiltered.raw
Untracked: output/chr4_DCas20_5419_EA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr4_DCas20_5419_EAplusLA_REFimputedAndFiltered.log
Untracked: output/chr4_DCas20_5419_LA_REFimputed.log
Untracked: output/chr4_DCas20_5419_LA_REFimputed.vcf.gz
Untracked: output/chr4_DCas20_5419_LA_REFimputedAndFiltered.bed
Untracked: output/chr4_DCas20_5419_LA_REFimputedAndFiltered.bim
Untracked: output/chr4_DCas20_5419_LA_REFimputedAndFiltered.fam
Untracked: output/chr4_DCas20_5419_LA_REFimputedAndFiltered.log
Untracked: output/chr4_DCas20_5419_LA_REFimputedAndFiltered.nosex
Untracked: output/chr4_DCas20_5419_LA_REFimputedAndFiltered.raw
Untracked: output/chr4_DCas20_5419_LA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr5_DCas20_5419_EA_REFimputed.INFO
Untracked: output/chr5_DCas20_5419_EA_REFimputed.hwe
Untracked: output/chr5_DCas20_5419_EA_REFimputed.log
Untracked: output/chr5_DCas20_5419_EA_REFimputed.sitesPassing
Untracked: output/chr5_DCas20_5419_EA_REFimputed.vcf.gz
Untracked: output/chr5_DCas20_5419_EA_REFimputedAndFiltered.bed
Untracked: output/chr5_DCas20_5419_EA_REFimputedAndFiltered.bim
Untracked: output/chr5_DCas20_5419_EA_REFimputedAndFiltered.fam
Untracked: output/chr5_DCas20_5419_EA_REFimputedAndFiltered.log
Untracked: output/chr5_DCas20_5419_EA_REFimputedAndFiltered.nosex
Untracked: output/chr5_DCas20_5419_EA_REFimputedAndFiltered.raw
Untracked: output/chr5_DCas20_5419_EA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr5_DCas20_5419_EAplusLA_REFimputedAndFiltered.log
Untracked: output/chr5_DCas20_5419_LA_REFimputed.log
Untracked: output/chr5_DCas20_5419_LA_REFimputed.vcf.gz
Untracked: output/chr5_DCas20_5419_LA_REFimputedAndFiltered.bed
Untracked: output/chr5_DCas20_5419_LA_REFimputedAndFiltered.bim
Untracked: output/chr5_DCas20_5419_LA_REFimputedAndFiltered.fam
Untracked: output/chr5_DCas20_5419_LA_REFimputedAndFiltered.log
Untracked: output/chr5_DCas20_5419_LA_REFimputedAndFiltered.nosex
Untracked: output/chr5_DCas20_5419_LA_REFimputedAndFiltered.raw
Untracked: output/chr5_DCas20_5419_LA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr6_DCas20_5419_EA_REFimputed.INFO
Untracked: output/chr6_DCas20_5419_EA_REFimputed.hwe
Untracked: output/chr6_DCas20_5419_EA_REFimputed.log
Untracked: output/chr6_DCas20_5419_EA_REFimputed.sitesPassing
Untracked: output/chr6_DCas20_5419_EA_REFimputed.vcf.gz
Untracked: output/chr6_DCas20_5419_EA_REFimputedAndFiltered.bed
Untracked: output/chr6_DCas20_5419_EA_REFimputedAndFiltered.bim
Untracked: output/chr6_DCas20_5419_EA_REFimputedAndFiltered.fam
Untracked: output/chr6_DCas20_5419_EA_REFimputedAndFiltered.log
Untracked: output/chr6_DCas20_5419_EA_REFimputedAndFiltered.nosex
Untracked: output/chr6_DCas20_5419_EA_REFimputedAndFiltered.raw
Untracked: output/chr6_DCas20_5419_EA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr6_DCas20_5419_EAplusLA_REFimputedAndFiltered.log
Untracked: output/chr6_DCas20_5419_LA_REFimputed.log
Untracked: output/chr6_DCas20_5419_LA_REFimputed.vcf.gz
Untracked: output/chr6_DCas20_5419_LA_REFimputedAndFiltered.bed
Untracked: output/chr6_DCas20_5419_LA_REFimputedAndFiltered.bim
Untracked: output/chr6_DCas20_5419_LA_REFimputedAndFiltered.fam
Untracked: output/chr6_DCas20_5419_LA_REFimputedAndFiltered.log
Untracked: output/chr6_DCas20_5419_LA_REFimputedAndFiltered.nosex
Untracked: output/chr6_DCas20_5419_LA_REFimputedAndFiltered.raw
Untracked: output/chr6_DCas20_5419_LA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr7_DCas20_5419_EA_REFimputed.INFO
Untracked: output/chr7_DCas20_5419_EA_REFimputed.hwe
Untracked: output/chr7_DCas20_5419_EA_REFimputed.log
Untracked: output/chr7_DCas20_5419_EA_REFimputed.sitesPassing
Untracked: output/chr7_DCas20_5419_EA_REFimputed.vcf.gz
Untracked: output/chr7_DCas20_5419_EA_REFimputedAndFiltered.bed
Untracked: output/chr7_DCas20_5419_EA_REFimputedAndFiltered.bim
Untracked: output/chr7_DCas20_5419_EA_REFimputedAndFiltered.fam
Untracked: output/chr7_DCas20_5419_EA_REFimputedAndFiltered.log
Untracked: output/chr7_DCas20_5419_EA_REFimputedAndFiltered.nosex
Untracked: output/chr7_DCas20_5419_EA_REFimputedAndFiltered.raw
Untracked: output/chr7_DCas20_5419_EA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr7_DCas20_5419_EAplusLA_REFimputedAndFiltered.log
Untracked: output/chr7_DCas20_5419_LA_REFimputed.log
Untracked: output/chr7_DCas20_5419_LA_REFimputed.vcf.gz
Untracked: output/chr7_DCas20_5419_LA_REFimputedAndFiltered.bed
Untracked: output/chr7_DCas20_5419_LA_REFimputedAndFiltered.bim
Untracked: output/chr7_DCas20_5419_LA_REFimputedAndFiltered.fam
Untracked: output/chr7_DCas20_5419_LA_REFimputedAndFiltered.log
Untracked: output/chr7_DCas20_5419_LA_REFimputedAndFiltered.nosex
Untracked: output/chr7_DCas20_5419_LA_REFimputedAndFiltered.raw
Untracked: output/chr7_DCas20_5419_LA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr8_DCas20_5419_EA_REFimputed.INFO
Untracked: output/chr8_DCas20_5419_EA_REFimputed.hwe
Untracked: output/chr8_DCas20_5419_EA_REFimputed.log
Untracked: output/chr8_DCas20_5419_EA_REFimputed.sitesPassing
Untracked: output/chr8_DCas20_5419_EA_REFimputed.vcf.gz
Untracked: output/chr8_DCas20_5419_EA_REFimputedAndFiltered.bed
Untracked: output/chr8_DCas20_5419_EA_REFimputedAndFiltered.bim
Untracked: output/chr8_DCas20_5419_EA_REFimputedAndFiltered.fam
Untracked: output/chr8_DCas20_5419_EA_REFimputedAndFiltered.log
Untracked: output/chr8_DCas20_5419_EA_REFimputedAndFiltered.nosex
Untracked: output/chr8_DCas20_5419_EA_REFimputedAndFiltered.raw
Untracked: output/chr8_DCas20_5419_EA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr8_DCas20_5419_EAplusLA_REFimputedAndFiltered.log
Untracked: output/chr8_DCas20_5419_LA_REFimputed.log
Untracked: output/chr8_DCas20_5419_LA_REFimputed.vcf.gz
Untracked: output/chr8_DCas20_5419_LA_REFimputedAndFiltered.bed
Untracked: output/chr8_DCas20_5419_LA_REFimputedAndFiltered.bim
Untracked: output/chr8_DCas20_5419_LA_REFimputedAndFiltered.fam
Untracked: output/chr8_DCas20_5419_LA_REFimputedAndFiltered.log
Untracked: output/chr8_DCas20_5419_LA_REFimputedAndFiltered.nosex
Untracked: output/chr8_DCas20_5419_LA_REFimputedAndFiltered.raw
Untracked: output/chr8_DCas20_5419_LA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr9_DCas20_5419_EA_REFimputed.INFO
Untracked: output/chr9_DCas20_5419_EA_REFimputed.hwe
Untracked: output/chr9_DCas20_5419_EA_REFimputed.log
Untracked: output/chr9_DCas20_5419_EA_REFimputed.sitesPassing
Untracked: output/chr9_DCas20_5419_EA_REFimputed.vcf.gz
Untracked: output/chr9_DCas20_5419_EA_REFimputedAndFiltered.bed
Untracked: output/chr9_DCas20_5419_EA_REFimputedAndFiltered.bim
Untracked: output/chr9_DCas20_5419_EA_REFimputedAndFiltered.fam
Untracked: output/chr9_DCas20_5419_EA_REFimputedAndFiltered.log
Untracked: output/chr9_DCas20_5419_EA_REFimputedAndFiltered.nosex
Untracked: output/chr9_DCas20_5419_EA_REFimputedAndFiltered.raw
Untracked: output/chr9_DCas20_5419_EA_REFimputedAndFiltered.vcf.gz
Untracked: output/chr9_DCas20_5419_EAplusLA_REFimputedAndFiltered.log
Untracked: output/chr9_DCas20_5419_LA_REFimputed.log
Untracked: output/chr9_DCas20_5419_LA_REFimputed.vcf.gz
Untracked: output/chr9_DCas20_5419_LA_REFimputedAndFiltered.bed
Untracked: output/chr9_DCas20_5419_LA_REFimputedAndFiltered.bim
Untracked: output/chr9_DCas20_5419_LA_REFimputedAndFiltered.fam
Untracked: output/chr9_DCas20_5419_LA_REFimputedAndFiltered.log
Untracked: output/chr9_DCas20_5419_LA_REFimputedAndFiltered.nosex
Untracked: output/chr9_DCas20_5419_LA_REFimputedAndFiltered.raw
Untracked: output/chr9_DCas20_5419_LA_REFimputedAndFiltered.vcf.gz
Untracked: workflowr_log.R
Unstaged changes:
Deleted: EMBRAPA_2020GS.Rproj
Deleted: analysis/Imputation_EMBRAPA_102419.Rmd
Deleted: analysis/ImputeDCas20_5360.Rmd
Deleted: analysis/Verify_gbs2dart_sampleMatches_EMBRAPA_102419.Rmd
Deleted: analysis/convertDCas19_4403_ToVCF_102419.Rmd
Deleted: analysis/convertDCas20_5360_ToVCF.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/Imputation_EastAfrica_StageII_91019.Rmd
) and HTML (docs/Imputation_EastAfrica_StageII_91019.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
html | c6022b6 | wolfemd | 2020-10-09 | Build site. |
Rmd | 4f8a229 | wolfemd | 2020-10-09 | Publish imputations for 2020 of DCAs20_5360 (and 2019 code too) for |
MapforBeagle
tibble(CHR=1:18) %>%
mutate(CHROMdata=map(CHR,~read.table(file=paste0("~/Google Drive/NextGenGS/nextgenImputation2019/",
"CassavaGeneticMap/chr",.,"_cassava_cM_pred.v6_90619.map"),
stringsAsFactors = F, header = F))) %>%
unnest() %>% dim
ggplot(.,aes(x=V4,y=V3)) + geom_point() + facet_wrap(~CHR)
GBS-only samples from NaCRRI and TARI Mostly will include NaCRRI C1 but also might be some TARI TP (those whose new samples weren’t verified). Use previously REF imputed data in order to only require Beagle 5.1
Get Tanzania imputed data from:
Samples NOT in RefPanelIV from 1) /workdir/marnin/beagle_GBS_june17 (imputed Tz data) 2) /workdir/marnin/beagle_GBS_october2016/CYCLE_NACRRI_june16/vcf/ (NaCRRI C1)
library(tidyverse); library(magrittr); library(furrr); options(mc.cores=18); plan(multiprocess)
system(paste0("bcftools query --list-samples /workdir/marnin/beagle_GBS_june17/TanzaniaData_20170601_withRef/",
"TanzaniaData_20170601_withRef_chr1.filt2.imputed.vcf.gz ",
"> /workdir/marnin/beagle_GBS_june17/TanzaniaData_20170601_withRef/",
"TanzaniaData_20170601_withRef_chr1.filt2.imputed.samples"))
system(paste0("bcftools query --list-samples /workdir/marnin/beagle_GBS_october2016/CYCLE_NACRRI_june16/vcf/",
"Subset_cassavaGBSbuild_June2016_withRef_NACCRI_CYCLE_chr1.imputed.vcf.gz ",
"> /workdir/marnin/beagle_GBS_october2016/CYCLE_NACRRI_june16/vcf/",
"Subset_cassavaGBSbuild_June2016_withRef_NACCRI_CYCLE_chr1.imputed.samples"))
system(paste0("bcftools query --list-samples /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageI_82819/",
"chr1_ImputationReferencePanel_StageIV_82819.vcf.gz ",
"> /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageI_82819/",
"chr1_ImputationReferencePanel_StageIV_82819.samples"))
refpanelIV<-read.table(paste0("/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageI_82819/",
"chr1_ImputationReferencePanel_StageIV_82819.samples"),
stringsAsFactors = F, header = F)$V1
ugC1<-read.table(paste0("/workdir/marnin/beagle_GBS_october2016/CYCLE_NACRRI_june16/vcf/",
"Subset_cassavaGBSbuild_June2016_withRef_NACCRI_CYCLE_chr1.imputed.samples"),
stringsAsFactors = F, header = F)$V1
tzTP<-read.table(paste0("/workdir/marnin/beagle_GBS_june17/TanzaniaData_20170601_withRef/",
"TanzaniaData_20170601_withRef_chr1.filt2.imputed.samples"),
stringsAsFactors = F, header = F)$V1
table(ugC1 %in% tzTP) # false
table(ugC1 %in% refpanelIV)
# FALSE TRUE
# 1915 175
table(tzTP %in% refpanelIV)
# FALSE TRUE
# 1016 328
gbsOnlySamplesToImpute<-union(ugC1,tzTP) %>% .[!. %in% refpanelIV]
gbsOnlySamplesToImpute_ugC1<-ugC1 %>% .[!. %in% refpanelIV]
gbsOnlySamplesToImpute_tzTP<-tzTP %>% .[!. %in% refpanelIV]
write.table(gbsOnlySamplesToImpute_ugC1,
file=paste0("/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/",
"gbsOnlySamplesToImpute_ugC1_90919.txt"),
row.names = F, col.names = F, quote = F)
write.table(gbsOnlySamplesToImpute_tzTP,
file=paste0("/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/",
"gbsOnlySamplesToImpute_tzTP_90919.txt"),
row.names = F, col.names = F, quote = F)
library(tidyverse); library(magrittr); library(furrr); options(mc.cores=18); plan(multiprocess)
tibble(Chr=1:18) %>%
mutate(ExtractSiteList=future_map(Chr,function(Chr){
system(paste0("zcat /workdir/marnin/beagle_GBS_june17/TanzaniaData_20170601_withRef/",
"TanzaniaData_20170601_withRef_chr",Chr,".filt2.imputed.vcf.gz ",
"| cut -f1-5 > ",
"/workdir/marnin/beagle_GBS_june17/TanzaniaData_20170601_withRef/",
"TanzaniaData_20170601_withRef_chr",Chr,".filt2.imputed.sitesWithAlleles"))}))
tibble(Chr=1:18) %>%
mutate(ExtractSiteList=future_map(Chr,function(Chr){
system(paste0("zcat /workdir/marnin/beagle_GBS_october2016/CYCLE_NACRRI_june16/vcf/",
"Subset_cassavaGBSbuild_June2016_withRef_NACCRI_CYCLE_chr",Chr,".imputed.vcf.gz ",
"| cut -f1-5 > ",
"/workdir/marnin/beagle_GBS_october2016/CYCLE_NACRRI_june16/vcf/",
"Subset_cassavaGBSbuild_June2016_withRef_NACCRI_CYCLE_chr",Chr,".imputed.sitesWithAlleles"))}))
tibble(Chr=1:18) %>%
mutate(ExtractSiteList=future_map(Chr,function(Chr){
system(paste0("zcat /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageI_82819/chr",
Chr,"_ImputationReferencePanel_StageIV_82819.vcf.gz ",
"| cut -f1-5 > ",
"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageI_82819/chr",
Chr,"_ImputationReferencePanel_StageIV_82819.sitesWithAlleles"))}))
tibble(Chr=1:18) %>%
mutate(GetINFO=future_map(Chr,function(Chr){
fileIn<-paste0("vcftools --gzvcf /workdir/marnin/beagle_GBS_october2016/CYCLE_NACRRI_june16/vcf/",
"Subset_cassavaGBSbuild_June2016_withRef_NACCRI_CYCLE_chr",Chr,".imputed.vcf.gz ")
fileOut<-paste0("--out /workdir/marnin/beagle_GBS_october2016/CYCLE_NACRRI_june16/vcf/",
"Subset_cassavaGBSbuild_June2016_withRef_NACCRI_CYCLE_chr",Chr,".imputed")
system(paste0(fileIn,"--get-INFO AR2 --get-INFO AF ",fileOut))
system(paste0(fileIn,"--hardy ",fileOut)) }))
tibble(Chr=1:18) %>%
mutate(GetINFO=future_map(Chr,function(Chr){
fileIn<-paste0("vcftools --gzvcf /workdir/marnin/beagle_GBS_june17/TanzaniaData_20170601_withRef/",
"TanzaniaData_20170601_withRef_chr",Chr,".filt2.imputed.vcf.gz ")
fileOut<-paste0("--out /workdir/marnin/beagle_GBS_june17/TanzaniaData_20170601_withRef/",
"TanzaniaData_20170601_withRef_chr",Chr,".filt2.imputed")
system(paste0(fileIn,"--get-INFO AR2 --get-INFO AF ",fileOut))
system(paste0(fileIn,"--hardy ",fileOut)) }))
stats2filterOn_ugC1<-tibble(Chr=1:18) %>%
mutate(INFO=future_map(Chr,~read.table(paste0("/workdir/marnin/beagle_GBS_october2016/CYCLE_NACRRI_june16/vcf/",
"Subset_cassavaGBSbuild_June2016_withRef_NACCRI_CYCLE_chr",.,
".imputed.INFO"),
stringsAsFactors = F, header = T)),
hwe=future_map(Chr,~read.table(paste0("/workdir/marnin/beagle_GBS_october2016/CYCLE_NACRRI_june16/vcf/",
"Subset_cassavaGBSbuild_June2016_withRef_NACCRI_CYCLE_chr",.,
".imputed.hwe"),
stringsAsFactors = F, header = T)))
stats2filterOn_ugC1 %<>%
select(Chr,INFO,hwe) %>%
mutate(INFO=map2(INFO,hwe,~left_join(.x,(.y %>%
rename(CHROM=CHR))))) %>%
select(-hwe)
stats2filterOn_ugC1 %>%
mutate(NotNumDR2=map_lgl(INFO,~is.numeric(.$DR2)))
stats2filterOn_ugC1 %<>%
mutate(INFO=map(INFO,
~mutate(.,
AR2=as.numeric(AR2),
AF=as.numeric(AF)) %>%
filter(!is.na(AR2),
!is.na(AF))))
stats2filterOn_ugC1 %<>%
mutate(INFO=map(INFO,~mutate(.,MAF=ifelse(AF>0.5,1-AF,AF))))
stats2filterOn_ugC1 %>% unnest()
Chr INFO NotNumDR2
1 1 <data.frame [32,789 x 12]> FALSE
2 2 <data.frame [25,770 x 12]> FALSE
3 3 <data.frame [23,542 x 12]> FALSE
4 4 <data.frame [19,718 x 12]> FALSE
5 5 <data.frame [22,094 x 12]> FALSE
6 6 <data.frame [21,505 x 12]> FALSE
7 7 <data.frame [13,097 x 12]> FALSE
8 8 <data.frame [19,177 x 12]> FALSE
9 9 <data.frame [19,279 x 12]> FALSE
10 10 <data.frame [16,185 x 12]> FALSE
11 11 <data.frame [20,176 x 12]> FALSE
12 12 <data.frame [16,877 x 12]> FALSE
13 13 <data.frame [16,892 x 12]> FALSE
14 14 <data.frame [22,189 x 12]> FALSE
15 15 <data.frame [21,376 x 12]> FALSE
16 16 <data.frame [16,423 x 12]> FALSE
17 17 <data.frame [16,515 x 12]> FALSE
18 18 <data.frame [16,188 x 12]> FALSE
stats2filterOn_tzTP<-tibble(Chr=1:18) %>%
mutate(INFO=future_map(Chr,~read.table(paste0("/workdir/marnin/beagle_GBS_june17/TanzaniaData_20170601_withRef/",
"TanzaniaData_20170601_withRef_chr",.,".filt2.imputed.INFO"),
stringsAsFactors = F, header = T)),
hwe=future_map(Chr,~read.table(paste0("/workdir/marnin/beagle_GBS_june17/TanzaniaData_20170601_withRef/",
"TanzaniaData_20170601_withRef_chr",.,".filt2.imputed.hwe"),
stringsAsFactors = F, header = T)))
stats2filterOn_tzTP %<>%
select(Chr,INFO,hwe) %>%
mutate(INFO=map2(INFO,hwe,~left_join(.x,(.y %>%
rename(CHROM=CHR))))) %>%
select(-hwe)
stats2filterOn_tzTP %>%
mutate(NotNumDR2=map_lgl(INFO,~is.numeric(.$DR2)))
stats2filterOn_tzTP %<>%
mutate(INFO=map(INFO,
~mutate(.,
AR2=as.numeric(AR2),
AF=as.numeric(AF)) %>%
filter(!is.na(AR2),
!is.na(AF))))
stats2filterOn_tzTP %<>%
mutate(INFO=map(INFO,~mutate(.,MAF=ifelse(AF>0.5,1-AF,AF))))
stats2filterOn_tzTP %>% unnest()
Chr INFO NotNumDR2
1 1 <data.frame [32,789 x 12]> FALSE
2 2 <data.frame [25,770 x 12]> FALSE
3 3 <data.frame [23,542 x 12]> FALSE
4 4 <data.frame [19,718 x 12]> FALSE
5 5 <data.frame [22,094 x 12]> FALSE
6 6 <data.frame [21,505 x 12]> FALSE
7 7 <data.frame [13,097 x 12]> FALSE
8 8 <data.frame [19,177 x 12]> FALSE
9 9 <data.frame [19,279 x 12]> FALSE
10 10 <data.frame [16,185 x 12]> FALSE
11 11 <data.frame [20,176 x 12]> FALSE
12 12 <data.frame [16,877 x 12]> FALSE
13 13 <data.frame [16,892 x 12]> FALSE
14 14 <data.frame [22,189 x 12]> FALSE
15 15 <data.frame [21,376 x 12]> FALSE
16 16 <data.frame [16,423 x 12]> FALSE
17 17 <data.frame [16,515 x 12]> FALSE
18 18 <data.frame [16,188 x 12]> FALSE
refpanelIV_sites<-tibble(Chr=1:18) %>%
mutate(SiteList=future_map(Chr,function(Chr){
refpanel4<-read.table(file = paste0("/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageI_82819/chr",
Chr,"_ImputationReferencePanel_StageIV_82819.sitesWithAlleles"),
stringsAsFactors = F, header = F)
return(refpanel4)})) %>%
unnest() %>%
rename(CHROM=V1,
POS=V2,
ID=V3,
REF=V4,
ALT=V5)
dim(refpanelIV_sites) # [1] 65509 6
head(refpanelIV_sites)
After poor results on 9/09/19, this time, I want to treat tzTP and ugC1 separately, with emphasis on ugC1…
Keep sites that have: 1. Matching chrom, pos and alleles in both imputed GBS datasets (ugC1 and tzTP), 2. Passing pre-imputation filters in the GBS datasets (ugC1 and tzTP) + AR2>0.75, P_HWE>1e-20, MAF>0.005 [0.5%] 3. Intersect the refpanelIV sitesWithAlleles
stats2filterOn_tzTP %<>%
mutate(allSitesPassing=map(INFO,
~filter(.,AR2>=0.75,
P_HWE>1e-20,
MAF>0.005) %>%
select(CHROM,POS,REF,ALT)))
stats2filterOn_ugC1 %<>%
mutate(allSitesPassing=map(INFO,
~filter(.,AR2>=0.75,
P_HWE>1e-20,
MAF>0.005) %>%
select(CHROM,POS,REF,ALT)))
sitesPassingFilters_tzTP<-stats2filterOn_tzTP %>%
unnest(allSitesPassing) %>%
semi_join(refpanelIV_sites) %>%
group_by(Chr) %>%
nest(.key = "allSitesPassing")
sitesPassingFilters_ugC1<-stats2filterOn_ugC1 %>%
unnest(allSitesPassing) %>%
semi_join(refpanelIV_sites) %>%
group_by(Chr) %>%
nest(.key = "allSitesPassing")
sitesPassingFilters_tzTP %>% unnest() %>% nrow() # 48996
sitesPassingFilters_ugC1 %>% unnest() %>% nrow() # 8170
sitesPassingFilters_tzTP
sitesPassingFilters_ugC1
Chr allSitesPassing
1 1 <tibble [5,985 x 4]> 2 2 <tibble [2,980 x 4]> 3 3 <tibble [3,144 x 4]> 4 4 <tibble [2,860 x 4]> 5 5 <tibble [2,781 x 4]> 6 6 <tibble [2,794 x 4]> 7 7 <tibble [1,476 x 4]> 8 8 <tibble [2,767 x 4]> 9 9 <tibble [2,492 x 4]> 10 10 <tibble [2,218 x 4]> 11 11 <tibble [2,421 x 4]> 12 12 <tibble [2,108 x 4]> 13 13 <tibble [2,079 x 4]> 14 14 <tibble [3,290 x 4]> 15 15 <tibble [2,933 x 4]> 16 16 <tibble [2,135 x 4]> 17 17 <tibble [2,109 x 4]> 18 18 <tibble [2,424 x 4]> Chr allSitesPassing
1 1 <tibble [2,985 x 4]> 2 2 <tibble [332 x 4]>
3 3 <tibble [372 x 4]>
4 4 <tibble [597 x 4]>
5 5 <tibble [322 x 4]>
6 6 <tibble [321 x 4]>
7 7 <tibble [204 x 4]>
8 8 <tibble [152 x 4]>
9 9 <tibble [376 x 4]>
10 10 <tibble [186 x 4]>
11 11 <tibble [301 x 4]>
12 12 <tibble [187 x 4]>
13 13 <tibble [263 x 4]>
14 14 <tibble [326 x 4]>
15 15 <tibble [242 x 4]>
16 16 <tibble [263 x 4]>
17 17 <tibble [486 x 4]>
18 18 <tibble [255 x 4]>
sitesPassingFilters_ugC1 %>%
mutate(WriteSiteList=future_map2(Chr,allSitesPassing,function(Chr,allSitesPassing){
pathIn<-"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/"
#pathIn<-"/workdir/mw489/ImputationEastAfrica_StageI_82819/"
sitesPassing_thisChr<-allSitesPassing %>%
select(CHROM,POS) %>%
arrange(CHROM,POS)
write.table(sitesPassing_thisChr,
file = paste0(pathIn,"chr",Chr,
"_ugC1_ImputedGBS_91019.sitesPassing"),
row.names = F, col.names = F, quote = F)}))
sitesPassingFilters_tzTP %>%
mutate(WriteSiteList=future_map2(Chr,allSitesPassing,function(Chr,allSitesPassing){
pathIn<-"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/"
#pathIn<-"/workdir/mw489/ImputationEastAfrica_StageI_82819/"
sitesPassing_thisChr<-allSitesPassing %>%
select(CHROM,POS) %>%
arrange(CHROM,POS)
write.table(sitesPassing_thisChr,
file = paste0(pathIn,"chr",Chr,
"_tzTP_ImputedGBS_91019.sitesPassing"),
row.names = F, col.names = F, quote = F)}))
library(tidyverse); library(magrittr); require(furrr); options(mc.cores=18); plan(multiprocess)
tibble(Chr=1:18) %>%
mutate(ExtractRaw_gbsSamples=future_map(Chr,function(Chr){
system(paste0("vcftools --gzvcf /workdir/marnin/beagle_GBS_june17/TanzaniaData_20170601_withRef/",
"TanzaniaData_20170601_withRef_chr",Chr,".filt2.imputed.vcf.gz ",
"--keep /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/",
"gbsOnlySamplesToImpute_tzTP_90919.txt ",
"--chr ",Chr," ",
"--positions /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/chr",
Chr,"_tzTP_ImputedGBS_91019.sitesPassing ",
"--recode ",
"--stdout | bgzip -c -@ 24 > ",
"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/chr",
Chr,"_tzTPsamples_REFimputedGBSsites_91019.vcf.gz")) }))
library(tidyverse); library(magrittr); require(furrr); options(mc.cores=18); plan(multiprocess)
tibble(Chr=1:18) %>%
mutate(ExtractRaw_gbsSamples=future_map(Chr,function(Chr){
system(paste0("vcftools --gzvcf /workdir/marnin/beagle_GBS_october2016/CYCLE_NACRRI_june16/vcf/",
"Subset_cassavaGBSbuild_June2016_withRef_NACCRI_CYCLE_chr",Chr,".imputed.vcf.gz ",
"--keep /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/",
"gbsOnlySamplesToImpute_ugC1_90919.txt ",
"--chr ",Chr," ",
"--positions /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1_ImputedGBS_91019.sitesPassing ",
"--recode ",
"--stdout | bgzip -c -@ 24 > ",
"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_REFimputedGBSsites_91019.vcf.gz")) }))
tibble(Chr=1:18) %>%
mutate(Index=future_map(Chr,function(Chr){
system(paste0("tabix -f -p vcf /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_REFimputedGBSsites_91019.vcf.gz"))
}))
# cbsulm__ --> cbsurobbins
rsync --update --archive --verbose /workdir/mw489/ImputationEastAfrica_StageI_82819/ mw489@cbsurobbins.biohpc.cornell.edu:/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageI_82819
rsync --update --archive --verbose /workdir/mw489/ImputationEastAfrica_StageII_90919/ mw489@cbsurobbins.biohpc.cornell.edu:/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919
# cbsurobbins --> cbsulm__
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageI_82819/ mw489@cbsulm17.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageI_82819;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageI_82819/ mw489@cbsulm16.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageI_82819;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageI_82819/ mw489@cbsulm14.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageI_82819;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageI_82819/ mw489@cbsulm13.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageI_82819;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageI_82819/ mw489@cbsulm12.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageI_82819;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageI_82819/ mw489@cbsulm07.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageI_82819;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm17.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm16.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm14.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm13.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm12.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm07.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
library(tidyverse); library(magrittr);
tibble(Chr=c(1,2,3,5)) %>%
mutate(REFimpute_UnGenotypedSites_GBSonlySamples=map(Chr,function(Chr){
nthreads<-112
system(paste0("java -Xms2g -Xmx500g -jar /programs/beagle/beagle.jar ",
"gt=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_tzTPsamples_REFimputedGBSsites_91019.vcf.gz ",
"map=/workdir/mw489/CassavaGeneticMap/chr",
Chr,"_cassava_cM_pred.v6_91019.map ",
"ref=/workdir/mw489/ImputationEastAfrica_StageI_82819/chr",
Chr,"_ImputationReferencePanel_StageIV_82819.vcf.gz ",
"out=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_tzTPsamples_AllSitesREFimputedAndPhased_91019 ",
"nthreads=",nthreads," impute=true ne=100000"))}))
library(tidyverse); library(magrittr);
tibble(Chr=c(4,6,7,8)) %>%
mutate(REFimpute_UnGenotypedSites_GBSonlySamples=map(Chr,function(Chr){
nthreads<-112
system(paste0("java -Xms2g -Xmx500g -jar /programs/beagle/beagle.jar ",
"gt=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_tzTPsamples_REFimputedGBSsites_91019.vcf.gz ",
"map=/workdir/mw489/CassavaGeneticMap/chr",
Chr,"_cassava_cM_pred.v6_91019.map ",
"ref=/workdir/mw489/ImputationEastAfrica_StageI_82819/chr",
Chr,"_ImputationReferencePanel_StageIV_82819.vcf.gz ",
"out=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_tzTPsamples_AllSitesREFimputedAndPhased_91019 ",
"nthreads=",nthreads," impute=true ne=100000"))}))
library(tidyverse); library(magrittr);
tibble(Chr=c(9,10,11,12)) %>%
mutate(REFimpute_UnGenotypedSites_GBSonlySamples=map(Chr,function(Chr){
nthreads<-112
system(paste0("java -Xms2g -Xmx500g -jar /programs/beagle/beagle.jar ",
"gt=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_tzTPsamples_REFimputedGBSsites_91019.vcf.gz ",
"map=/workdir/mw489/CassavaGeneticMap/chr",
Chr,"_cassava_cM_pred.v6_91019.map ",
"ref=/workdir/mw489/ImputationEastAfrica_StageI_82819/chr",
Chr,"_ImputationReferencePanel_StageIV_82819.vcf.gz ",
"out=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_tzTPsamples_AllSitesREFimputedAndPhased_91019 ",
"nthreads=",nthreads," impute=true ne=100000"))}))
library(tidyverse); library(magrittr);
tibble(Chr=c(13,14,15)) %>%
mutate(REFimpute_UnGenotypedSites_GBSonlySamples=map(Chr,function(Chr){
nthreads<-112
system(paste0("java -Xms2g -Xmx500g -jar /programs/beagle/beagle.jar ",
"gt=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_tzTPsamples_REFimputedGBSsites_91019.vcf.gz ",
"map=/workdir/mw489/CassavaGeneticMap/chr",
Chr,"_cassava_cM_pred.v6_91019.map ",
"ref=/workdir/mw489/ImputationEastAfrica_StageI_82819/chr",
Chr,"_ImputationReferencePanel_StageIV_82819.vcf.gz ",
"out=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_tzTPsamples_AllSitesREFimputedAndPhased_91019 ",
"nthreads=",nthreads," impute=true ne=100000"))}))
library(tidyverse); library(magrittr);
tibble(Chr=c(16,17,18)) %>%
mutate(REFimpute_UnGenotypedSites_GBSonlySamples=map(Chr,function(Chr){
nthreads<-112
system(paste0("java -Xms2g -Xmx500g -jar /programs/beagle/beagle.jar ",
"gt=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_tzTPsamples_REFimputedGBSsites_91019.vcf.gz ",
"map=/workdir/mw489/CassavaGeneticMap/chr",
Chr,"_cassava_cM_pred.v6_91019.map ",
"ref=/workdir/mw489/ImputationEastAfrica_StageI_82819/chr",
Chr,"_ImputationReferencePanel_StageIV_82819.vcf.gz ",
"out=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_tzTPsamples_AllSitesREFimputedAndPhased_91019 ",
"nthreads=",nthreads," impute=true ne=100000"))}))
AR2>0.75, P_HWE>1e-20, MAF>0.005 [0.5%]
library(tidyverse); library(magrittr); library(furrr); options(mc.cores=18); plan(multiprocess)
pathIn<-"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/"
tibble(Chr=1:18) %>%
mutate(PostImputeFilter=future_map(Chr,function(Chr){
fileIn<-paste0("vcftools --gzvcf ",pathIn,"chr",
Chr,"_tzTPsamples_AllSitesREFimputedAndPhased_91019.vcf.gz ")
fileOut<-paste0("--out ",pathIn,"chr",
Chr,"_tzTPsamples_AllSitesREFimputedAndPhased_91019")
system(paste0(fileIn,"--get-INFO DR2 --get-INFO AF ",fileOut))
system(paste0(fileIn,"--hardy ",fileOut)) }))
pathIn<-"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/"
stats2filterOn<-tibble(Chr=1:18) %>%
mutate(INFO=future_map(Chr,~read.table(paste0(pathIn,"chr",.,
"_tzTPsamples_AllSitesREFimputedAndPhased_91019.INFO"),
stringsAsFactors = F, header = T)),
hwe=future_map(Chr,~read.table(paste0(pathIn,"chr",.,
"_tzTPsamples_AllSitesREFimputedAndPhased_91019.hwe"),
stringsAsFactors = F, header = T)))
stats2filterOn %<>%
select(Chr,INFO,hwe) %>%
mutate(INFO=map2(INFO,hwe,~left_join(.x,(.y %>%
rename(CHROM=CHR))))) %>%
select(-hwe)
stats2filterOn %>%
mutate(NotNumDR2=map_lgl(INFO,~is.numeric(.$DR2)))
stats2filterOn %<>%
mutate(INFO=map(INFO,
~mutate(.,
DR2=as.numeric(DR2),
AF=as.numeric(AF)) %>%
filter(!is.na(DR2),
!is.na(AF))))
stats2filterOn %<>%
mutate(INFO=map(INFO,~mutate(.,MAF=ifelse(AF>0.5,1-AF,AF))))
Check what’s left
sitesPassingFilters<-stats2filterOn %>%
mutate(allSitesPassing=map(INFO,
~filter(.,DR2>=0.75,
P_HWE>1e-20,
MAF>0.005) %>%
select(CHROM,POS))) %>%
select(-INFO)
stats2filterOn %>% unnest() %>% nrow() # 65451
sitesPassingFilters %>% unnest(allSitesPassing) %>% nrow() # 65130
stats2filterOn
sitesPassingFilters
# Chr allSitesPassing
# <int> <list>
# 1 1 <data.frame [8,201 x 2]>
# 2 2 <data.frame [3,854 x 2]>
# 3 3 <data.frame [4,102 x 2]>
# 4 4 <data.frame [3,773 x 2]>
# 5 5 <data.frame [3,807 x 2]>
# 6 6 <data.frame [3,594 x 2]>
# 7 7 <data.frame [1,959 x 2]>
# 8 8 <data.frame [3,628 x 2]>
# 9 9 <data.frame [3,449 x 2]>
# 10 10 <data.frame [2,901 x 2]>
# 11 11 <data.frame [3,297 x 2]>
# 12 12 <data.frame [2,893 x 2]>
# 13 13 <data.frame [2,773 x 2]>
# 14 14 <data.frame [4,172 x 2]>
# 15 15 <data.frame [3,959 x 2]>
# 16 16 <data.frame [2,875 x 2]>
# 17 17 <data.frame [2,775 x 2]>
# 18 18 <data.frame [3,118 x 2]>
sitesPassingFilters %>%
mutate(PostImputePrePhaseFilter=future_map2(Chr,allSitesPassing,function(Chr,allSitesPassing){
pathIn<-"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/"
sitesPassing_thisChr<-allSitesPassing %>%
select(CHROM,POS)
write.table(sitesPassing_thisChr,
file = paste0(pathIn,"chr",Chr,
"_tzTPsamples_AllSitesREFimputedAndPhased_91019.sitesPassing"),
row.names = F, col.names = F, quote = F)}))
sitesPassingFilters %>%
mutate(PostImputePrePhaseFilter=future_map2(Chr,allSitesPassing,function(Chr,allSitesPassing){
pathIn<-"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/"
system(paste0("vcftools --gzvcf ",pathIn,"chr",Chr,
"_tzTPsamples_AllSitesREFimputedAndPhased_91019.vcf.gz ",
"--positions ",pathIn,"chr",Chr,
"_tzTPsamples_AllSitesREFimputedAndPhased_91019.sitesPassing ",
"--recode --stdout | ",
"bgzip -c -@ 24 > ",
pathIn,"chr",Chr,
"_tzTPsamples_AllSitesREFimputedAndPhased_ReadyToMergeWithRefPanel_91019.vcf.gz"))}))
system(paste0("bcftools query --list-samples /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/",
"chr1_tzTPsamples_AllSitesREFimputedAndPhased_ReadyToMergeWithRefPanel_91019.vcf.gz ",
"> /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/",
"chr1_tzTPsamples_AllSitesREFimputedAndPhased_ReadyToMergeWithRefPanel_91019.samples"))
library(tidyverse); library(magrittr);
tibble(Chr=c(1,2,3,5)) %>%
mutate(REFimpute_UnGenotypedSites_GBSonlySamples=map(Chr,function(Chr){
nthreads<-112
system(paste0("java -Xms2g -Xmx500g -jar /programs/beagle/beagle.jar ",
"gt=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_REFimputedGBSsites_91019.vcf.gz ",
"map=/workdir/mw489/CassavaGeneticMap/chr",
Chr,"_cassava_cM_pred.v6_91019.map ",
"ref=/workdir/mw489/ImputationEastAfrica_StageI_82819/chr",
Chr,"_ImputationReferencePanel_StageIV_82819.vcf.gz ",
"out=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_AllSitesREFimputedAndPhased_91019 ",
"nthreads=",nthreads," impute=true ne=100000"))}))
library(tidyverse); library(magrittr);
tibble(Chr=c(4,6,7,8)) %>%
mutate(REFimpute_UnGenotypedSites_GBSonlySamples=map(Chr,function(Chr){
nthreads<-112
system(paste0("java -Xms2g -Xmx500g -jar /programs/beagle/beagle.jar ",
"gt=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_REFimputedGBSsites_91019.vcf.gz ",
"map=/workdir/mw489/CassavaGeneticMap/chr",
Chr,"_cassava_cM_pred.v6_91019.map ",
"ref=/workdir/mw489/ImputationEastAfrica_StageI_82819/chr",
Chr,"_ImputationReferencePanel_StageIV_82819.vcf.gz ",
"out=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_AllSitesREFimputedAndPhased_91019 ",
"nthreads=",nthreads," impute=true ne=100000"))}))
library(tidyverse); library(magrittr);
tibble(Chr=c(9,10,11,12)) %>%
mutate(REFimpute_UnGenotypedSites_GBSonlySamples=map(Chr,function(Chr){
nthreads<-112
system(paste0("java -Xms2g -Xmx500g -jar /programs/beagle/beagle.jar ",
"gt=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_REFimputedGBSsites_91019.vcf.gz ",
"map=/workdir/mw489/CassavaGeneticMap/chr",
Chr,"_cassava_cM_pred.v6_91019.map ",
"ref=/workdir/mw489/ImputationEastAfrica_StageI_82819/chr",
Chr,"_ImputationReferencePanel_StageIV_82819.vcf.gz ",
"out=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_AllSitesREFimputedAndPhased_91019 ",
"nthreads=",nthreads," impute=true ne=100000"))}))
library(tidyverse); library(magrittr);
tibble(Chr=c(13,14,15)) %>%
mutate(REFimpute_UnGenotypedSites_GBSonlySamples=map(Chr,function(Chr){
nthreads<-112
system(paste0("java -Xms2g -Xmx500g -jar /programs/beagle/beagle.jar ",
"gt=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_REFimputedGBSsites_91019.vcf.gz ",
"map=/workdir/mw489/CassavaGeneticMap/chr",
Chr,"_cassava_cM_pred.v6_91019.map ",
"ref=/workdir/mw489/ImputationEastAfrica_StageI_82819/chr",
Chr,"_ImputationReferencePanel_StageIV_82819.vcf.gz ",
"out=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_AllSitesREFimputedAndPhased_91019 ",
"nthreads=",nthreads," impute=true ne=100000"))}))
library(tidyverse); library(magrittr);
tibble(Chr=c(16,17,18)) %>%
mutate(REFimpute_UnGenotypedSites_GBSonlySamples=map(Chr,function(Chr){
nthreads<-112
system(paste0("java -Xms2g -Xmx500g -jar /programs/beagle/beagle.jar ",
"gt=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_REFimputedGBSsites_91019.vcf.gz ",
"map=/workdir/mw489/CassavaGeneticMap/chr",
Chr,"_cassava_cM_pred.v6_91019.map ",
"ref=/workdir/mw489/ImputationEastAfrica_StageI_82819/chr",
Chr,"_ImputationReferencePanel_StageIV_82819.vcf.gz ",
"out=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_AllSitesREFimputedAndPhased_91019 ",
"nthreads=",nthreads," impute=true ne=100000"))}))
AR2>0.75, P_HWE>1e-20, MAF>0.005 [0.5%]
library(tidyverse); library(magrittr); library(furrr); options(mc.cores=18); plan(multiprocess)
pathIn<-"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/"
tibble(Chr=1:18) %>%
mutate(PostImputeFilter=future_map(Chr,function(Chr){
fileIn<-paste0("vcftools --gzvcf ",pathIn,"chr",
Chr,"_ugC1samples_AllSitesREFimputedAndPhased_91019.vcf.gz ")
fileOut<-paste0("--out ",pathIn,"chr",
Chr,"_ugC1samples_AllSitesREFimputedAndPhased_91019")
system(paste0(fileIn,"--get-INFO DR2 --get-INFO AF ",fileOut))
system(paste0(fileIn,"--hardy ",fileOut)) }))
pathIn<-"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/"
stats2filterOn<-tibble(Chr=1:18) %>%
mutate(INFO=future_map(Chr,~read.table(paste0(pathIn,"chr",.,
"_ugC1samples_AllSitesREFimputedAndPhased_91019.INFO"),
stringsAsFactors = F, header = T)),
hwe=future_map(Chr,~read.table(paste0(pathIn,"chr",.,
"_ugC1samples_AllSitesREFimputedAndPhased_91019.hwe"),
stringsAsFactors = F, header = T)))
stats2filterOn %<>%
select(Chr,INFO,hwe) %>%
mutate(INFO=map2(INFO,hwe,~left_join(.x,(.y %>%
rename(CHROM=CHR))))) %>%
select(-hwe)
stats2filterOn %>%
mutate(NotNumDR2=map_lgl(INFO,~is.numeric(.$DR2)))
stats2filterOn %<>%
mutate(INFO=map(INFO,
~mutate(.,
DR2=as.numeric(DR2),
AF=as.numeric(AF)) %>%
filter(!is.na(DR2),
!is.na(AF))))
stats2filterOn %<>%
mutate(INFO=map(INFO,~mutate(.,MAF=ifelse(AF>0.5,1-AF,AF))))
Check what’s left
sitesPassingFilters<-stats2filterOn %>%
mutate(allSitesPassing=map(INFO,
~filter(.,DR2>=0.75,
P_HWE>1e-20,
MAF>0.005) %>%
select(CHROM,POS))) %>%
select(-INFO)
stats2filterOn %>% unnest() %>% nrow() # 65451
sitesPassingFilters %>% unnest(allSitesPassing) %>% nrow() # 24426
stats2filterOn
sitesPassingFilters
# Chr allSitesPassing
# <int> <list>
# 1 1 <data.frame [8,201 x 2]>
# 2 2 <data.frame [3,854 x 2]>
# 3 3 <data.frame [4,102 x 2]>
# 4 4 <data.frame [3,773 x 2]>
# 5 5 <data.frame [3,807 x 2]>
# 6 6 <data.frame [3,594 x 2]>
# 7 7 <data.frame [1,959 x 2]>
# 8 8 <data.frame [3,628 x 2]>
# 9 9 <data.frame [3,449 x 2]>
# 10 10 <data.frame [2,901 x 2]>
# 11 11 <data.frame [3,297 x 2]>
# 12 12 <data.frame [2,893 x 2]>
# 13 13 <data.frame [2,773 x 2]>
# 14 14 <data.frame [4,172 x 2]>
# 15 15 <data.frame [3,959 x 2]>
# 16 16 <data.frame [2,875 x 2]>
# 17 17 <data.frame [2,775 x 2]>
# 18 18 <data.frame [3,118 x 2]>
Exclude IITA samples from C1-C4 unless they have GBS+DArT. This is to save memory. Should retain the key haplotypes.
refpanelIV<-read.table(paste0("/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageI_82819/",
"chr1_ImputationReferencePanel_StageIV_82819.samples"),
stringsAsFactors = F, header = F)$V1
tzTP<-read.table(paste0("/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/",
"chr1_tzTPsamples_AllSitesREFimputedAndPhased_ReadyToMergeWithRefPanel_91019.samples"),
stringsAsFactors = F, header = F)$V1
samplesWithVerifiedGBSandDart<-read.table(file=paste0("/workdir/marnin/nextgenImputation2019/ImputationStageI_71119/",
"samplesWithVerifiedGBSandDart_71119.txt"),
stringsAsFactors = F, header = F)$V1
table(refpanelIV %in% tzTP)
table(refpanelIV %in% samplesWithVerifiedGBSandDart)
iitaSamples2remove<-refpanelIV %>%
grep("TMS13F|TMS14F|TMS15F|TMS16F|TMS17F|TMS18F|2013_",.,value = T) %>%
.[!. %in% samplesWithVerifiedGBSandDart]
refpanelIV %>%
.[!. %in% iitaSamples2remove] %>%
c(.,tzTP) %>% length # 13100... reasonable!
iitaSamples2remove %>% #length # 10170 to remove
write.table(.,paste0("/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/",
"iitaSamples2remove_fromUgC1RefPanel_91019.txt"),
row.names = F, col.names = F, quote = F)
tibble(Chr=1:18) %>%
mutate(ExtractRefPanelIVsamplesForStageV=future_map(Chr,function(Chr){
system(paste0("vcftools --gzvcf /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageI_82819/chr",
Chr,"_ImputationReferencePanel_StageIV_82819.vcf.gz ",
"--remove /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/",
"iitaSamples2remove_fromUgC1RefPanel_91019.txt ",
"--chr ",Chr," ",
"--positions /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/chr",
Chr,"_tzTPsamples_AllSitesREFimputedAndPhased_91019.sitesPassing ",
"--recode ",
"--stdout | bgzip -c -@ 24 > ",
" /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ImputationReferencePanel_StageIVsamples_ReadyToMergeAndFormStageV_91019.vcf.gz")) }))
library(tidyverse); library(magrittr);
pathIn<-"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/"
tibble(Chr=1:18) %>%
mutate(Index=future_map(Chr,function(Chr){
system(paste0("tabix -f -p vcf ",pathIn,"chr",Chr,
"_ImputationReferencePanel_StageIVsamples_ReadyToMergeAndFormStageV_91019.vcf.gz"))
system(paste0("tabix -f -p vcf ",pathIn,"chr",Chr,
"_tzTPsamples_AllSitesREFimputedAndPhased_ReadyToMergeWithRefPanel_91019.vcf.gz")) }))
tibble(Chr=1:18) %>%
mutate(Merge=future_map(Chr,function(Chr){
system(paste0("bcftools merge ",
"--output ",pathIn,"chr",Chr,
"_ImputationReferencePanel_StageV_91019.vcf.gz ",
"--merge snps --output-type z --threads 24 ",
pathIn,"chr",Chr,
"_ImputationReferencePanel_StageIVsamples_ReadyToMergeAndFormStageV_91019.vcf.gz ",
pathIn,"chr",Chr,
"_tzTPsamples_AllSitesREFimputedAndPhased_ReadyToMergeWithRefPanel_91019.vcf.gz"))}))
library(tidyverse); library(magrittr); library(furrr); options(mc.cores=18); plan(multiprocess)
pathIn<-"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/"
tibble(Chr=1:18) %>%
mutate(ExtractSiteList=future_map(Chr,function(Chr){
system(paste0("zcat ",pathIn,"chr",Chr,
"_ImputationReferencePanel_StageV_91019.vcf.gz ",
"| cut -f1-5 > ",
pathIn,"chr",Chr,"_ImputationReferencePanel_StageV_91019.sitesWithAlleles"))}))
tibble(Chr=1:18) %>%
mutate(ExtractSiteList=future_map(Chr,function(Chr){
system(paste0("zcat /workdir/marnin/June2016_VCF/",
"cassavaGBSbuild_June2016_withRef_chr",Chr,".vcf.gz ",
"| cut -f1-5 > ",
"/workdir/marnin/June2016_VCF/cassavaGBSbuild_June2016_withRef_chr",Chr,".sitesWithAlleles"))}))
gbs_sites<-tibble(Chr=1:18) %>%
mutate(sites=future_map(Chr,~read.table(paste0("/workdir/marnin/June2016_VCF/",
"cassavaGBSbuild_June2016_withRef_chr",.,".sitesWithAlleles"),
stringsAsFactors = F, header = F)))
refpanelV_sites<-tibble(Chr=1:18) %>%
mutate(sites=future_map(Chr,~read.table(paste0(pathIn,"chr",.,
"_ImputationReferencePanel_StageV_91019.sitesWithAlleles"),
stringsAsFactors = F, header = F)))
gbs_sites %>% unnest() %>% str()
refpanelV_sites %>% unnest() %>% str()
refpanelV_sites %>%
unnest() %>%
semi_join(gbs_sites %>% unnest()) %>%
group_by(Chr) %>%
nest(.key = "allSitesPassing") %>%
mutate(Sites2Keep=future_map2(Chr,allSitesPassing,function(Chr,allSitesPassing){
pathIn<-"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/"
sitesPassing_thisChr<-allSitesPassing %>%
select(V1,V2)
write.table(sitesPassing_thisChr,
file = paste0(pathIn,"chr",Chr,
"_ugC1samples_sitesInRefPanelVtoImpute_91019.sites2keep"),
row.names = F, col.names = F, quote = F)}))
library(tidyverse); library(magrittr); require(furrr); options(mc.cores=18); plan(multiprocess)
tibble(Chr=1:18) %>%
mutate(ExtractRaw_gbsSamples=future_map(Chr,function(Chr){
system(paste0("vcftools --gzvcf /workdir/marnin/June2016_VCF/",
"cassavaGBSbuild_June2016_withRef_chr",Chr,".vcf.gz ",
"--keep /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/",
"gbsOnlySamplesToImpute_ugC1_90919.txt ",
"--chr ",Chr," ",
"--positions /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_sitesInRefPanelVtoImpute_91019.sites2keep ",
"--recode --stdout | ",
"awk '$4 != \"-\" {print}' | awk '$5 != \"-\" {print}' | grep -v 'INFO=' | ",
"bgzip -c -@ 24 > ",
"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_FromJune2016vcf_Ready2REFimpute_91019.vcf.gz")) }))
vcftools --gzvcf /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/chr1_ugC1samples_FromJune2016vcf_Ready2REFimpute_91019.vcf.gz
zcat /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/chr1_ugC1samples_FromJune2016vcf_Ready2REFimpute_91019.vcf.gz | head -n40 | cut -f1-12
1:29425054 [-]
# cbsulm__ --> cbsurobbins
rsync --update --archive --verbose /workdir/mw489/ImputationEastAfrica_StageI_82819/ mw489@cbsurobbins.biohpc.cornell.edu:/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageI_82819
rsync --update --archive --verbose /workdir/mw489/ImputationEastAfrica_StageII_90919/ mw489@cbsurobbins.biohpc.cornell.edu:/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919
# cbsurobbins --> cbsulm__
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm17.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm16.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm14.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm13.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm12.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm07.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
library(tidyverse); library(magrittr);
tibble(Chr=c(1,2,3)) %>%
mutate(REFimpute_GenotypedSites_GBSonly_ugC1samples=map(Chr,function(Chr){
nthreads<-112
system(paste0("java -Xms2g -Xmx500g -jar /programs/beagle41/beagle41.jar ",
"gl=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_FromJune2016vcf_Ready2REFimpute_91019.vcf.gz ",
"ref=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ImputationReferencePanel_StageV_91019.vcf.gz ",
"out=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_FromJune2016vcf_REFimputedGLmode_91019 ",
"nthreads=",nthreads," niterations=10"))}))
Window=1 Iteration=1 Time for building model: 16 minutes 29 seconds Time for sampling (singles): 31 hours 32 minutes 39 seconds Window=1 Iteration=2 Time for building model: 23 minutes 59 seconds Time for sampling (singles): 28 hours 8 minutes 5 seconds Window=1 Iteration=3 Time for building model: 24 minutes 58 seconds Time for sampling (singles): 30 hours 24 minutes 33 seconds Window=1 Iteration=4 Time for building model: 24 minutes 16 seconds Time for sampling (singles): 27 hours 31 minutes 15 seconds
library(tidyverse); library(magrittr);
tibble(Chr=c(4,5,6)) %>%
mutate(REFimpute_GenotypedSites_GBSonly_ugC1samples=map(Chr,function(Chr){
nthreads<-112
system(paste0("java -Xms2g -Xmx500g -jar /programs/beagle41/beagle41.jar ",
"gl=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_FromJune2016vcf_Ready2REFimpute_91019.vcf.gz ",
"ref=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ImputationReferencePanel_StageV_91019.vcf.gz ",
"out=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_FromJune2016vcf_REFimputedGLmode_91019 ",
"nthreads=",nthreads," niterations=10"))}))
# Window=1 Iteration=2
# Time for building model: 7 minutes 9 seconds
# Time for sampling (singles): 6 hours 24 minutes 54 seconds
# Window=1 Iteration=5
# Time for building model: 7 minutes 5 seconds
# Time for sampling (singles): 6 hours 56 minutes 43 seconds
# Window=1 Iteration=8
# Time for building model: 7 minutes 53 seconds
# Time for sampling (singles): 7 hours 13 minutes 27 seconds
library(tidyverse); library(magrittr);
tibble(Chr=c(7,8,9)) %>%
mutate(REFimpute_GenotypedSites_GBSonly_ugC1samples=map(Chr,function(Chr){
nthreads<-112
system(paste0("java -Xms2g -Xmx500g -jar /programs/beagle41/beagle41.jar ",
"gl=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_FromJune2016vcf_Ready2REFimpute_91019.vcf.gz ",
"ref=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ImputationReferencePanel_StageV_91019.vcf.gz ",
"out=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_FromJune2016vcf_REFimputedGLmode_91019 ",
"nthreads=",nthreads," niterations=10"))}))
# Chr 7
# Window=1 Iteration=10
# Time for building model: 1 minute 57 seconds
# Time for sampling (singles): 1 hour 37 minutes 59 seconds
# Window=1 Iteration=11
# Time for building model: 1 minute 48 seconds
# Time for sampling (singles): 7 minutes 36 seconds
#
# Chr 8
# Window=1 Iteration=3
# Time for building model: 4 minutes 58 seconds
# Time for sampling (singles): 5 hours 32 minutes 27 seconds
# Window=1 Iteration=7
# Time for building model: 4 minutes 58 seconds
# Time for sampling (singles): 5 hours 40 minutes 3 seconds
library(tidyverse); library(magrittr);
tibble(Chr=c(10,11,12)) %>%
mutate(REFimpute_GenotypedSites_GBSonly_ugC1samples=map(Chr,function(Chr){
nthreads<-112
system(paste0("java -Xms2g -Xmx500g -jar /programs/beagle41/beagle41.jar ",
"gl=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_FromJune2016vcf_Ready2REFimpute_91019.vcf.gz ",
"ref=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ImputationReferencePanel_StageV_91019.vcf.gz ",
"out=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_FromJune2016vcf_REFimputedGLmode_91019 ",
"nthreads=",nthreads," niterations=10"))}))
# Chr. 10
# Window=1 Iteration=4
# Time for building model: 4 minutes 19 seconds
# Time for sampling (singles): 3 hours 45 minutes 9 seconds
# Window=1 Iteration=9
# Time for building model: 4 minutes 48 seconds
# Time for sampling (singles): 4 hours 30 minutes 56 seconds
# Number of markers: 2786
# Total time for building model: 1 hour 27 minutes 48 seconds
# Total time for sampling: 41 hours 38 minutes 45 seconds
# Total run time: 43 hours 6 minutes 49 seconds
#
# Chr. 11
# Window=1 Iteration=1
# Time for building model: 4 minutes 14 seconds
# Time for sampling (singles): 8 hours 48 minutes 18 seconds
# Window=1 Iteration=2
# Time for building model: 5 minutes 16 seconds
# Time for sampling (singles): 7 hours 33 minutes 32 seconds
# Window=1 Iteration=5
# Time for building model: 5 minutes 50 seconds
# Time for sampling (singles): 10 hours 9 minutes 35 seconds
# Number of markers: 3173
# Total time for building model: 2 hours 18 minutes 26 seconds
# Total time for sampling: 87 hours 44 minutes 39 seconds
# Total run time: 90 hours 3 minutes 23 seconds
library(tidyverse); library(magrittr);
tibble(Chr=c(14,13,15)) %>%
mutate(REFimpute_GenotypedSites_GBSonly_ugC1samples=map(Chr,function(Chr){
nthreads<-112
system(paste0("java -Xms2g -Xmx500g -jar /programs/beagle41/beagle41.jar ",
"gl=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_FromJune2016vcf_Ready2REFimpute_91019.vcf.gz ",
"ref=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ImputationReferencePanel_StageV_91019.vcf.gz ",
"out=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_FromJune2016vcf_REFimputedGLmode_91019 ",
"nthreads=",nthreads," niterations=10"))}))
# Chr. 14
# Window=1 Iteration=2
# Time for building model: 14 minutes 3 seconds
# Time for sampling (singles): 16 hours 9 minutes 24 seconds
# Window=1 Iteration=3
# Time for building model: 10 minutes 38 seconds
# Time for sampling (singles): 19 hours 23 minutes 21 seconds
# Window=1 Iteration=4
# Time for building model: 11 minutes 57 seconds
# Time for sampling (singles): 16 hours 35 minutes 9 seconds
#
# Window=1 Iteration=7
# Time for building model: 10 minutes 0 seconds
# Time for sampling (singles): 20 hours 10 minutes 9 seconds
# DAG statistics
# mean edges/level: 651 max edges/level: 1150
# mean edges/node: 1.054 mean count/edge: 46
# Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00007eca31400000, 71757725696, 0) failed; error='Cannot allocate memory' (errno=12)
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 71757725696 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /local/workdir/mw489/hs_err_pid42464.log
#
# Chr. 13
# Window=1 Iteration=1
# Time for building model: 3 minutes 34 seconds
# Time for sampling (singles): 5 hours 3 minutes 50 seconds
library(tidyverse); library(magrittr);
tibble(Chr=c(16)) %>%
mutate(REFimpute_GenotypedSites_GBSonly_ugC1samples=map(Chr,function(Chr){
nthreads<-64
system(paste0("java -Xms2g -Xmx500g -jar /programs/beagle41/beagle41.jar ",
"gl=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_FromJune2016vcf_Ready2REFimpute_91019.vcf.gz ",
"ref=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ImputationReferencePanel_StageV_91019.vcf.gz ",
"out=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_FromJune2016vcf_REFimputedGLmode_91019 ",
"nthreads=",nthreads," niterations=10"))}))
# Window=1 Iteration=1
# Time for building model: 5 minutes 15 seconds
# Time for sampling (singles): 12 hours 36 minutes 47 seconds
# Window=1 Iteration=2
# Time for building model: 6 minutes 2 seconds
# Time for sampling (singles): 12 hours 7 minutes 51 seconds
# Window=1 Iteration=3
# Time for building model: 6 minutes 38 seconds
# Time for sampling (singles): 10 hours 50 minutes 41 seconds
# Window=1 Iteration=5
# Time for building model: 9 minutes 28 seconds
# Time for sampling (singles): 10 hours 52 minutes 11 seconds
# Number of markers: 2778
# Total time for building model: 2 hours 41 minutes 47 seconds
# Total time for sampling: 125 hours 32 minutes 28 seconds
# Total run time: 128 hours 14 minutes 42 seconds
AR2>0.75, P_HWE>1e-20, MAF>0.005 [0.5%]
library(tidyverse); library(magrittr); library(furrr); options(mc.cores=18); plan(multiprocess)
pathIn<-"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/"
tibble(Chr=1:18) %>%
mutate(PostImputeFilter=future_map(Chr,function(Chr){
fileIn<-paste0("vcftools --gzvcf ",pathIn,"chr",
Chr,"_ugC1samples_FromJune2016vcf_REFimputedGLmode_91019.vcf.gz ")
fileOut<-paste0("--out ",pathIn,"chr",
Chr,"_ugC1samples_FromJune2016vcf_REFimputedGLmode_91019")
system(paste0(fileIn,"--get-INFO AR2 --get-INFO AF ",fileOut))
system(paste0(fileIn,"--hardy ",fileOut)) }))
pathIn<-"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/"
stats2filterOn<-tibble(Chr=1:18) %>%
mutate(INFO=future_map(Chr,~read.table(paste0(pathIn,"chr",.,
"_tzTPsamples_AllSitesREFimputedAndPhased_91019.INFO"),
stringsAsFactors = F, header = T)),
hwe=future_map(Chr,~read.table(paste0(pathIn,"chr",.,
"_tzTPsamples_AllSitesREFimputedAndPhased_91019.hwe"),
stringsAsFactors = F, header = T)))
stats2filterOn %<>%
select(Chr,INFO,hwe) %>%
mutate(INFO=map2(INFO,hwe,~left_join(.x,(.y %>%
rename(CHROM=CHR))))) %>%
select(-hwe)
stats2filterOn %>%
mutate(NotNumDR2=map_lgl(INFO,~is.numeric(.$DR2)))
stats2filterOn %<>%
mutate(INFO=map(INFO,
~mutate(.,
DR2=as.numeric(DR2),
AF=as.numeric(AF)) %>%
filter(!is.na(DR2),
!is.na(AF))))
stats2filterOn %<>%
mutate(INFO=map(INFO,~mutate(.,MAF=ifelse(AF>0.5,1-AF,AF))))
Check what’s left
sitesPassingFilters<-stats2filterOn %>%
mutate(allSitesPassing=map(INFO,
~filter(.,DR2>=0.75,
P_HWE>1e-20,
MAF>0.005) %>%
select(CHROM,POS))) %>%
select(-INFO)
stats2filterOn %>% unnest() %>% nrow() # 65451
sitesPassingFilters %>% unnest(allSitesPassing) %>% nrow() # 65130
stats2filterOn
sitesPassingFilters
Chr allSitesPassing
Apply filter
sitesPassingFilters %>%
mutate(PostImputePrePhaseFilter=future_map2(Chr,allSitesPassing,function(Chr,allSitesPassing){
pathIn<-"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/"
sitesPassing_thisChr<-allSitesPassing %>%
select(CHROM,POS)
write.table(sitesPassing_thisChr,
file = paste0(pathIn,"chr",Chr,
"_ugC1samples_FromJune2016vcf_REFimputedGLmode_91019.sitesPassing"),
row.names = F, col.names = F, quote = F)
system(paste0("vcftools --gzvcf ",pathIn,"chr",Chr,
"_ugC1samples_FromJune2016vcf_REFimputedGLmode_91019.vcf.gz ",
"--positions ",pathIn,"chr",Chr,
"_ugC1samples_FromJune2016vcf_REFimputedGLmode_91019.sitesPassing ",
"--recode --stdout | ",
"bgzip -c -@ 24 > ",
pathIn,"chr",Chr,
"_ugC1samples_FromJune2016vcf_ReadyToPhaseAndImputeUngenotypedSites_91119.vcf.gz"))}))
# cbsulm__ --> cbsurobbins
rsync --update --archive --verbose /workdir/mw489/ImputationEastAfrica_StageI_82819/ mw489@cbsurobbins.biohpc.cornell.edu:/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageI_82819
rsync --update --archive --verbose /workdir/mw489/ImputationEastAfrica_StageII_90919/ mw489@cbsurobbins.biohpc.cornell.edu:/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919
# cbsurobbins --> cbsulm__
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm17.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm16.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm14.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm13.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm12.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm07.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
I realized I can try Beagle5 using carefully filtered TASSEL GT calls on the unimputed data. Maybe the phasing screwed stuff up on the previous UgC1 imputation. I had thought Beagle5 required the genotyped sites in a target dataset to bet phased and not contain missing. However, suggests that missingness and lack-of-phase is handled, just GL are not handled.
library(tidyverse); library(magrittr); library(furrr); options(mc.cores=18); plan(multiprocess)
pathIn<-"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/"
tibble(Chr=1:18) %>%
mutate(ExtractSiteList=future_map(Chr,function(Chr){
system(paste0("zcat ",pathIn,"chr",Chr,
"_ImputationReferencePanel_StageV_91019.vcf.gz ",
"| cut -f1-5 > ",
pathIn,"chr",Chr,"_ImputationReferencePanel_StageV_91019.sitesWithAlleles"))}))
tibble(Chr=1:18) %>%
mutate(ExtractSiteList=future_map(Chr,function(Chr){
system(paste0("zcat /workdir/marnin/June2016_VCF/",
"cassavaGBSbuild_June2016_withRef_chr",Chr,".vcf.gz ",
"| cut -f1-5 > ",
"/workdir/marnin/June2016_VCF/cassavaGBSbuild_June2016_withRef_chr",Chr,".sitesWithAlleles"))}))
gbs_sites<-tibble(Chr=1:18) %>%
mutate(sites=future_map(Chr,~read.table(paste0("/workdir/marnin/June2016_VCF/",
"cassavaGBSbuild_June2016_withRef_chr",.,".sitesWithAlleles"),
stringsAsFactors = F, header = F)))
refpanelV_sites<-tibble(Chr=1:18) %>%
mutate(sites=future_map(Chr,~read.table(paste0(pathIn,"chr",.,
"_ImputationReferencePanel_StageV_91019.sitesWithAlleles"),
stringsAsFactors = F, header = F)))
gbs_sites %>% unnest() %>% str()
refpanelV_sites %>% unnest() %>% str()
refpanelV_sites %>%
unnest() %>%
semi_join(gbs_sites %>% unnest()) %>%
group_by(Chr) %>%
nest(.key = "allSitesPassing") %>%
mutate(Sites2Keep=future_map2(Chr,allSitesPassing,function(Chr,allSitesPassing){
pathIn<-"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/"
sitesPassing_thisChr<-allSitesPassing %>%
select(V1,V2)
write.table(sitesPassing_thisChr,
file = paste0(pathIn,"chr",Chr,
"_ugC1samples_sitesInRefPanelVtoImpute_91019.sites2keep"),
row.names = F, col.names = F, quote = F)}))
library(tidyverse); library(magrittr); require(furrr); options(mc.cores=18); plan(multiprocess)
tibble(Chr=1:18) %>%
mutate(ExtractRaw_gbsSamples=future_map(Chr,function(Chr){
system(paste0("vcftools --gzvcf /workdir/marnin/June2016_VCF/",
"cassavaGBSbuild_June2016_withRef_chr",Chr,".vcf.gz ",
"--keep /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/",
"gbsOnlySamplesToImpute_ugC1_90919.txt ",
"--chr ",Chr," ",
"--minDP 4 --maxDP 50 ",
"--positions /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_sitesInRefPanelVtoImpute_91019.sites2keep ",
"--recode --stdout | ",
"awk '$4 != \"-\" {print}' | awk '$5 != \"-\" {print}' | grep -v 'INFO=' | ",
"bgzip -c -@ 24 > ",
"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_FromJune2016vcf_Ready2GTimputeWithBeagle5_91119.vcf.gz")) }))
vcftools --gzvcf /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/chr1_ugC1samples_FromJune2016vcf_Ready2REFimpute_91019.vcf.gz
zcat /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/chr1_ugC1samples_FromJune2016vcf_Ready2REFimpute_91019.vcf.gz | head -n40 | cut -f1-12
1:29425054 [-]
# cbsulm__ --> cbsurobbins
rsync --update --archive --verbose /workdir/mw489/ImputationEastAfrica_StageI_82819/ mw489@cbsurobbins.biohpc.cornell.edu:/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageI_82819
rsync --update --archive --verbose /workdir/mw489/ImputationEastAfrica_StageII_90919/ mw489@cbsurobbins.biohpc.cornell.edu:/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919
# cbsurobbins --> cbsulm__
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm15.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm17.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm16.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm14.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm13.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm12.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm07.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
library(tidyverse); library(magrittr);
tibble(Chr=c(1:18)) %>%
mutate(REFimpute_GenotypedSites_GBSonly_ugC1samples=map(Chr,function(Chr){
nthreads<-112
system(paste0("java -Xms2g -Xmx500g -jar /programs/beagle/beagle.jar ",
"gt=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_FromJune2016vcf_Ready2GTimputeWithBeagle5_91119.vcf.gz ",
"map=/workdir/mw489/CassavaGeneticMap/chr",
Chr,"_cassava_cM_pred.v6_91019.map ",
"ref=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ImputationReferencePanel_StageV_91019.vcf.gz ",
"out=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_FromJune2016vcf_REFimputedGTmodeGenotypedSites_91119 ",
"nthreads=",nthreads," impute=true ne=100000"))}))
AR2>0.75, P_HWE>1e-20, MAF>0.005 [0.5%]
library(tidyverse); library(magrittr); library(furrr); options(mc.cores=18); plan(multiprocess)
pathIn<-"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/"
tibble(Chr=1:18) %>%
mutate(PostImputeFilter=future_map(Chr,function(Chr){
fileIn<-paste0("vcftools --gzvcf ",pathIn,"chr",
Chr,"_ugC1samples_FromJune2016vcf_REFimputedGTmodeGenotypedSites_91119.vcf.gz ")
fileOut<-paste0("--out ",pathIn,"chr",
Chr,"_ugC1samples_FromJune2016vcf_REFimputedGTmodeGenotypedSites_91119")
system(paste0(fileIn,"--get-INFO DR2 --get-INFO AF ",fileOut))
system(paste0(fileIn,"--hardy ",fileOut)) }))
pathIn<-"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/"
stats2filterOn<-tibble(Chr=1:18) %>%
mutate(INFO=future_map(Chr,~read.table(paste0(pathIn,"chr",.,
"_ugC1samples_FromJune2016vcf_REFimputedGTmodeGenotypedSites_91119.INFO"),
stringsAsFactors = F, header = T)),
hwe=future_map(Chr,~read.table(paste0(pathIn,"chr",.,
"_ugC1samples_FromJune2016vcf_REFimputedGTmodeGenotypedSites_91119.hwe"),
stringsAsFactors = F, header = T)))
stats2filterOn %<>%
select(Chr,INFO,hwe) %>%
mutate(INFO=map2(INFO,hwe,~left_join(.x,(.y %>%
rename(CHROM=CHR))))) %>%
select(-hwe)
stats2filterOn %>%
mutate(NotNumDR2=map_lgl(INFO,~is.numeric(.$DR2)))
stats2filterOn %<>%
mutate(INFO=map(INFO,
~mutate(.,
DR2=as.numeric(DR2),
AF=as.numeric(AF)) %>%
filter(!is.na(DR2),
!is.na(AF))))
stats2filterOn %<>%
mutate(INFO=map(INFO,~mutate(.,MAF=ifelse(AF>0.5,1-AF,AF))))
Check what’s left
sitesPassingFilters<-stats2filterOn %>%
mutate(allSitesPassing=map(INFO,
~filter(.,DR2>=0.75,
P_HWE>1e-20,
MAF>0.005) %>%
select(CHROM,POS))) %>%
select(-INFO)
stats2filterOn %>% unnest() %>% nrow() # 65120
sitesPassingFilters %>% unnest(allSitesPassing) %>% nrow() # 51152
stats2filterOn
sitesPassingFilters
Apply filter
sitesPassingFilters %>%
mutate(PostImputePrePhaseFilter=future_map2(Chr,allSitesPassing,function(Chr,allSitesPassing){
pathIn<-"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/"
sitesPassing_thisChr<-allSitesPassing %>%
select(CHROM,POS)
write.table(sitesPassing_thisChr,
file = paste0(pathIn,"chr",Chr,
"_ugC1samples_FromJune2016vcf_REFimputedGTmodeGenotypedSites_91119.sitesPassing"),
row.names = F, col.names = F, quote = F)
system(paste0("vcftools --gzvcf ",pathIn,"chr",Chr,
"_ugC1samples_FromJune2016vcf_REFimputedGTmodeGenotypedSites_91119.vcf.gz ",
"--positions ",pathIn,"chr",Chr,
"_ugC1samples_FromJune2016vcf_REFimputedGTmodeGenotypedSites_91119.sitesPassing ",
"--recode --stdout | ",
"bgzip -c -@ 24 > ",
pathIn,"chr",Chr,
"_ugC1samples_FromJune2016vcf_ReadyToPhaseAndImputeUngenotypedSites_91119.vcf.gz"))}))
Second round of impute for these samples. See if we come out with more markers passing filters…? ## Rsync to cbsulm15
# cbsurobbins --> cbsulm__
rsync --update --archive --verbose /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/ mw489@cbsulm15.biohpc.cornell.edu:/workdir/mw489/ImputationEastAfrica_StageII_90919;
library(tidyverse); library(magrittr);
tibble(Chr=c(1:18)) %>%
mutate(REFimpute_GenotypedSites_GBSonly_ugC1samples=map(Chr,function(Chr){
nthreads<-112
system(paste0("java -Xms2g -Xmx500g -jar /programs/beagle/beagle.jar ",
"gt=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_FromJune2016vcf_ReadyToPhaseAndImputeUngenotypedSites_91119.vcf.gz ",
"map=/workdir/mw489/CassavaGeneticMap/chr",
Chr,"_cassava_cM_pred.v6_91019.map ",
"ref=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ImputationReferencePanel_StageV_91019.vcf.gz ",
"out=/workdir/mw489/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_FromJune2016vcf_AllSitesREFimputedAndPhased_91119 ",
"nthreads=",nthreads," impute=true ne=100000"))}))
AR2>0.75, P_HWE>1e-20, MAF>0.005 [0.5%]
library(tidyverse); library(magrittr); library(furrr); options(mc.cores=18); plan(multiprocess)
pathIn<-"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/"
tibble(Chr=1:18) %>%
mutate(PostImputeFilter=future_map(Chr,function(Chr){
fileIn<-paste0("vcftools --gzvcf ",pathIn,"chr",
Chr,"_ugC1samples_FromJune2016vcf_AllSitesREFimputedAndPhased_91119.vcf.gz ")
fileOut<-paste0("--out ",pathIn,"chr",
Chr,"_ugC1samples_FromJune2016vcf_AllSitesREFimputedAndPhased_91119")
system(paste0(fileIn,"--get-INFO DR2 --get-INFO AF ",fileOut))
system(paste0(fileIn,"--hardy ",fileOut)) }))
pathIn<-"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/"
stats2filterOn<-tibble(Chr=1:18) %>%
mutate(INFO=future_map(Chr,~read.table(paste0(pathIn,"chr",.,
"_ugC1samples_FromJune2016vcf_AllSitesREFimputedAndPhased_91119.INFO"),
stringsAsFactors = F, header = T)),
hwe=future_map(Chr,~read.table(paste0(pathIn,"chr",.,
"_ugC1samples_FromJune2016vcf_AllSitesREFimputedAndPhased_91119.hwe"),
stringsAsFactors = F, header = T)))
stats2filterOn %<>%
select(Chr,INFO,hwe) %>%
mutate(INFO=map2(INFO,hwe,~left_join(.x,(.y %>%
rename(CHROM=CHR))))) %>%
select(-hwe)
stats2filterOn %>%
mutate(NotNumDR2=map_lgl(INFO,~is.numeric(.$DR2)))
stats2filterOn %<>%
mutate(INFO=map(INFO,
~mutate(.,
DR2=as.numeric(DR2),
AF=as.numeric(AF)) %>%
filter(!is.na(DR2),
!is.na(AF))))
stats2filterOn %<>%
mutate(INFO=map(INFO,~mutate(.,MAF=ifelse(AF>0.5,1-AF,AF))))
Check what’s left
sitesPassingFilters<-stats2filterOn %>%
mutate(allSitesPassing=map(INFO,
~filter(.,DR2>=0.75,
P_HWE>1e-20,
MAF>0.005) %>%
select(CHROM,POS))) %>%
select(-INFO)
stats2filterOn %>% unnest() %>% nrow() # 65113
sitesPassingFilters %>% unnest(allSitesPassing) %>% nrow() # 56250
stats2filterOn %>% left_join(sitesPassingFilters)
# Chr INFO allSitesPassing
# <int> <list> <list>
# 1 1 <data.frame [8,201 x 13]> <data.frame [7,254 x 2]>
# 2 2 <data.frame [3,854 x 13]> <data.frame [3,131 x 2]>
# 3 3 <data.frame [4,102 x 13]> <data.frame [3,536 x 2]>
# 4 4 <data.frame [3,756 x 13]> <data.frame [3,422 x 2]>
# 5 5 <data.frame [3,807 x 13]> <data.frame [3,424 x 2]>
# 6 6 <data.frame [3,594 x 13]> <data.frame [2,861 x 2]>
# 7 7 <data.frame [1,959 x 13]> <data.frame [1,538 x 2]>
# 8 8 <data.frame [3,628 x 13]> <data.frame [3,144 x 2]>
# 9 9 <data.frame [3,449 x 13]> <data.frame [2,902 x 2]>
# 10 10 <data.frame [2,901 x 13]> <data.frame [2,486 x 2]>
# 11 11 <data.frame [3,297 x 13]> <data.frame [2,702 x 2]>
# 12 12 <data.frame [2,893 x 13]> <data.frame [2,604 x 2]>
# 13 13 <data.frame [2,773 x 13]> <data.frame [2,378 x 2]>
# 14 14 <data.frame [4,172 x 13]> <data.frame [3,830 x 2]>
# 15 15 <data.frame [3,959 x 13]> <data.frame [3,284 x 2]>
# 16 16 <data.frame [2,875 x 13]> <data.frame [2,519 x 2]>
# 17 17 <data.frame [2,775 x 13]> <data.frame [2,545 x 2]>
# 18 18 <data.frame [3,118 x 13]> <data.frame [2,690 x 2]>
Apply filter
sitesPassingFilters %>%
mutate(PostImputePrePhaseFilter=future_map2(Chr,allSitesPassing,function(Chr,allSitesPassing){
pathIn<-"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/"
sitesPassing_thisChr<-allSitesPassing %>%
select(CHROM,POS)
write.table(sitesPassing_thisChr,
file = paste0(pathIn,"chr",Chr,
"_ugC1samples_FromJune2016vcf_AllSitesREFimputedAndPhased_91119.sitesPassing"),
row.names = F, col.names = F, quote = F)
system(paste0("vcftools --gzvcf ",pathIn,"chr",Chr,
"_ugC1samples_FromJune2016vcf_AllSitesREFimputedAndPhased_91119.vcf.gz ",
"--positions ",pathIn,"chr",Chr,
"_ugC1samples_FromJune2016vcf_AllSitesREFimputedAndPhased_91119.sitesPassing ",
"--recode --stdout | ",
"bgzip -c -@ 24 > ",
pathIn,"chr",Chr,
"_ugC1samples_FromJune2016vcf_ReadyToMergeWithRefPanel_91119.vcf.gz"))}))
tibble(Chr=1:18) %>%
mutate(ExtractRefPanelIVsamplesForStageV=future_map(Chr,function(Chr){
system(paste0("vcftools --gzvcf /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ImputationReferencePanel_StageV_91019.vcf.gz ",
"--chr ",Chr," ",
"--positions /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ugC1samples_FromJune2016vcf_AllSitesREFimputedAndPhased_91119.sitesPassing ",
"--recode ",
"--stdout | bgzip -c -@ 24 > ",
" /workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/chr",
Chr,"_ImputationReferencePanel_StageVsamples_ReadyToMergeAndFormStageVI_91119.vcf.gz")) }))
library(tidyverse); library(magrittr);
pathIn<-"/workdir/marnin/nextgenImputation2019/ImputationEastAfrica_StageII_90919/"
tibble(Chr=1:18) %>%
mutate(Index=future_map(Chr,function(Chr){
system(paste0("tabix -f -p vcf ",pathIn,"chr",Chr,
"_ImputationReferencePanel_StageVsamples_ReadyToMergeAndFormStageVI_91119.vcf.gz"))
system(paste0("tabix -f -p vcf ",pathIn,"chr",Chr,
"_ugC1samples_FromJune2016vcf_ReadyToMergeWithRefPanel_91119.vcf.gz")) }))
tibble(Chr=1:18) %>%
mutate(Merge=future_map(Chr,function(Chr){
system(paste0("bcftools merge ",
"--output ",pathIn,"chr",Chr,
"_ImputationReferencePanel_StageVI_91119.vcf.gz ",
"--merge snps --output-type z --threads 24 ",
pathIn,"chr",Chr,
"_ImputationReferencePanel_StageVsamples_ReadyToMergeAndFormStageVI_91119.vcf.gz ",
pathIn,"chr",Chr,
"_ugC1samples_FromJune2016vcf_ReadyToMergeWithRefPanel_91119.vcf.gz"))}))