API

Here is a list of available function calls. A detailed description can be found below.

Index

fastPHASE.fastphase_estim_param
fastPHASE.process_fastphase_output

Functions

fastPHASE.fastphase_estim_param — Function

fastphase_estim_param(xdata; ...)

Runs fastPHASE to estimate r, θ, α parameters. Will run different initial EM starts in parallel if Julia is started with multiple threads. θ (emission) probabilities) values will automatically be flipped so that θ[i, j] is the probability of observing allele A2 (usually major) in the PLINK bam file at SNP i of haplotype j.

Inputs

xdata: A String for binary PLINK file (without .bed/bim/fam extensions) or a SnpData (see SnpArrays.jl).

Optional inputs

n: Number of samples used to fit HMM in fastPHASE. Defaults to sample size in xdata. T: Number of different initial conditions for EM. Different initial conditions will be run in parallel in Threads.nthreads() number of threads. K: Number of haplotype clusters. Defaults to 12 C: Number of EM iterations before convergence. Defaults to 10. outfile: Extension of output alpha, theta, and r file names. Defaults to fastphase_out outdir: Output directory. By default all output will be stored in new folder in knockoffs in the current directory. fastphase_infile: Filename of fastPHASE's input, which is the decompressed PLINK genotypes readable by fastPHASE. Defaults to fastphase.inp.

source

fastPHASE.process_fastphase_output — Function

process_fastphase_output(filename; [T])

Reads r, θ, α into memory, averaging over T simulations. θ (emission probabilities) must be flipped sometimes depending on which allele was defined as "allele 1". fastPHASE simply uses whichever allele was observed first in sample 1 haplotype 1. Thus, in sample 1, genotypes that start with "10" or "1?" must be flipped, and this info is provided in the "_origchars" file.

Inputs

filename: Path to the rhat.txt, thetahat.txt, alphahat.txt and _origchars files are stored. E.g. Use out if your output files are out_rhat.txt, out_thetahat.txt, out_alphahat.txt T: Number of different runs excuted by fastPHASE. This is the number of different initial conditions used for EM algorithm. All files rhat.txt, thetahat.txt, alphahat.txt would therefore have T × p rows

source