API
Here is a list of available function calls. A detailed description can be found below.
Index
Functions
fastPHASE.fastphase_estim_param
— Functionfastphase_estim_param(xdata; ...)
Runs fastPHASE to estimate r
, θ
, α
parameters. Will run different initial EM starts in parallel if Julia is started with multiple threads. θ
(emission) probabilities) values will automatically be flipped so that θ[i, j]
is the probability of observing allele A2 (usually major) in the PLINK bam file at SNP i
of haplotype j
.
Inputs
xdata
: A String
for binary PLINK file (without .bed/bim/fam
extensions) or a SnpData
(see SnpArrays.jl).
Optional inputs
n
: Number of samples used to fit HMM in fastPHASE. Defaults to sample size in xdata
. T
: Number of different initial conditions for EM. Different initial conditions will be run in parallel in Threads.nthreads()
number of threads. K
: Number of haplotype clusters. Defaults to 12 C
: Number of EM iterations before convergence. Defaults to 10. outfile
: Extension of output alpha, theta, and r file names. Defaults to fastphase_out
outdir
: Output directory. By default all output will be stored in new folder in knockoffs
in the current directory. fastphase_infile
: Filename of fastPHASE's input, which is the decompressed PLINK genotypes readable by fastPHASE. Defaults to fastphase.inp
.
fastPHASE.process_fastphase_output
— Functionprocess_fastphase_output(filename; [T])
Reads r, θ, α into memory, averaging over T
simulations. θ (emission probabilities) must be flipped sometimes depending on which allele was defined as "allele 1". fastPHASE simply uses whichever allele was observed first in sample 1 haplotype 1. Thus, in sample 1, genotypes that start with "10" or "1?" must be flipped, and this info is provided in the "_origchars" file.
Inputs
filename
: Path to the rhat.txt
, thetahat.txt
, alphahat.txt
and _origchars
files are stored. E.g. Use out
if your output files are out_rhat.txt
, out_thetahat.txt
, out_alphahat.txt
T
: Number of different runs excuted by fastPHASE. This is the number of different initial conditions used for EM algorithm. All files rhat.txt
, thetahat.txt
, alphahat.txt
would therefore have T × p
rows