The function below will
convertVCFtoDosage(pathIn, pathOut, vcfName)
(1) convert the input VCF to plink1.9 binary format and (2) convert the plink binary to a dosage (0,1,2) matrix with special attention to which allele gets counted in the file.
Uses plink1.9
. With plink1.9
there is some risk the counted allele could switch between
e.g. the reference panel and the progeny files because of allele freq. (see plink documentation).
To avoid this, went to extra trouble: write a file suffixed *.alleleToCount
listing SNP ID (column 1) and the ALT allele from the VCF (column 2).
Pass the file to plink1.9
using the --recode-allele
flag
to ensure all output dosages count the ALT allele consistent with the VCFs.
The reason to use plink1.9
is that Beagle5
imputed files
don't have a DS (dosage) field that can be directly extracted.
Instead, phased genotypes e.g. 0|1
need to be converted to dosages
(e.g. 0|1 --> 1
, code1|1 --> 2).
An alternative might be to extract the haplotypes using codevcftools and
manually compute the dosages.
NOTICE: This function is part of a family of functions ("imputation_functions"
) developed as part of the NextGen Cassava Breeding Project genomic selection pipeline.
For some examples of their useage:
Other imputation_functions:
convertDart2vcf()
,
createGenomewideDosage()
,
filter_positions()
,
mergeVCFs()
,
postImputeFilterBeagle4pt1()
,
postImputeFilter()
,
runBeagle4pt1GL()
,
runBeagle5()
,
splitVCFbyChr()