The function below will

convertVCFtoDosage(pathIn, pathOut, vcfName)

Arguments

vcfName

Value

Details

(1) convert the input VCF to plink1.9 binary format and (2) convert the plink binary to a dosage (0,1,2) matrix with special attention to which allele gets counted in the file.

Uses plink1.9. With plink1.9 there is some risk the counted allele could switch between e.g. the reference panel and the progeny files because of allele freq. (see plink documentation). To avoid this, went to extra trouble: write a file suffixed *.alleleToCount listing SNP ID (column 1) and the ALT allele from the VCF (column 2). Pass the file to plink1.9 using the --recode-allele flag to ensure all output dosages count the ALT allele consistent with the VCFs. The reason to use plink1.9 is that Beagle5 imputed files don't have a DS (dosage) field that can be directly extracted. Instead, phased genotypes e.g. 0|1 need to be converted to dosages (e.g. 0|1 --> 1, code1|1 --> 2). An alternative might be to extract the haplotypes using codevcftools and manually compute the dosages.

NOTICE: This function is part of a family of functions ("imputation_functions") developed as part of the NextGen Cassava Breeding Project genomic selection pipeline. For some examples of their useage: