API

Here is a list of available function calls. A detailed description can be found below.

Knockoffs.MK_statisticsMethod
MK_statistics(T0::Vector, Tk::Vector{Vector}; filter_method)

Computes the multiple knockoff statistics kappa, tau, and W.

Inputs

  • T0: p-vector of importance score for original variables
  • Tk: Vector storing T1, ..., Tm, where Ti is importance scores for the ith knockoff copy
  • filter_method: Either Statistics.median (default) or max (original function used in 2019 Gimenez and Zou)

output

  • κ: Index of the most significant feature (κ[i] = 0 if original feature most important, otherwise κ[i] = k if the kth knockoff is most important)
  • τ: τ[i] stores the most significant statistic among original and knockoff variables minus filter_method() applied to the remaining statistics.
  • W: coefficient difference statistic W[i] = abs(T0[i]) - abs(Tk[i])
source
Knockoffs.MK_statisticsMethod
MK_statistics(T0::Vector, Tk::Vector)

Compute regular knockoff statistics tau and W.

Inputs

  • T0: p-vector of importance score for original variables
  • Tk: p-vector of importance score for knockoff variables

output

  • W: coefficient difference statistic W[i] = abs(T0[i]) - abs(Tk[i])
source
Knockoffs.adj_constrained_hclustMethod
adj_constrained_hclust(distmat::AbstractMatrix, h::Number)

Performs (single-linkage) hierarchical clustering, forcing groups to be contiguous. After clustering, variables in different group is guaranteed to have distance less than h.

Note: this is a custom (bottom-up) implementation because Clustering.jl does not support adjacency constraints, see https://github.com/JuliaStats/Clustering.jl/issues/230

source
Knockoffs.approx_modelX_gaussian_knockoffsMethod
approx_modelX_gaussian_knockoffs(X, method; [m=1], [windowsize = 500], [covariance_approximator], kwargs...)
approx_modelX_gaussian_knockoffs(X, method, window_ranges; [m=1], [covariance_approximator], kwargs...)

Generates Gaussian knockoffs by approximating the covariance as a block diagonal matrix. Each block contains windowsize consecutive features. One could alternatively specify the window_ranges argument to construct blocks of different sizes.

Inputs

  • X: A n × p numeric matrix or SnpArray. Each row is a sample, and each column is covariate.
  • method: Can be one of the following
    • :mvr for minimum variance-based reconstructability knockoffs (alg 1 in ref 2)
    • :maxent for maximum entropy knockoffs (alg 2 in ref 2)
    • :equi for equi-distant knockoffs (eq 2.3 in ref 1),
    • :sdp for SDP knockoffs (eq 2.4 in ref 1)
    • :sdp_fast for SDP knockoffs via coordiate descent (alg 2.2 in ref 3)
  • m: Number of knockoff copies per variable to generate, defaults to 1.
  • windowsize: Number of covariates to be included in a block. Each block consists of adjacent variables. The last block could contain less than windowsize variables.
  • window_ranges: Vector of ranges for each window. e.g. [1:97, 98:200, 201:500]
  • covariance_approximator: A covariance estimator, defaults to LinearShrinkage(DiagonalUnequalVariance(), :lw). See CovarianceEstimation.jl for more options.
  • kwargs...: Possible optional inputs to solvers specified in method, see solve_MVR, solve_max_entropy, and solve_sdp_ccd

Multithreading (todo)

To enable multiple threads, simply start Julia with >1 threads and this routine will run with all available threads.

Covariance Approximation:

The covariance is approximated by a LinearShrinkageEstimator using Ledoit-Wolf shrinkage with DiagonalUnequalVariance target, which seems to perform well for p>n cases. We do not simply use cov(X) since isposdef(cov(X)) is typically false. For comparison of different estimators, see: https://mateuszbaran.github.io/CovarianceEstimation.jl/dev/man/msecomp/#msecomp

source
Knockoffs.choose_group_repsMethod
choose_group_reps(Σ::Symmetric, groups::AbstractVector; [threshold=0.5], [prioritize_idx], [Σinv])

Chooses group representatives. Returns indices of Σ that are representatives. If R is the set of selected variables within a group and O is the set of variables outside the group, then we keep adding variables to R until the proportion of variance explained by R divided by the proportion of variance explained by R and O exceeds threshold.

Inputs

  • Σ: Correlation matrix wrapped in the Symmetric argument.
  • groups: Vector of group membership.

Optional inputs

  • threshold: Value between 0 and 1 that controls the number of representatives per group. Larger means more representatives (default 0.5)
  • prioritize_idx: Variable indices that should receive priority to be chosen as representatives, defaults to nothing
  • Σinv: Precomputed inv(Σ) (it will be computed if not supplied)
source
Knockoffs.conditionMethod
condition(x::AbstractVector, μ::AbstractVector, Σ::AbstractMatrix, S::AbstractMatrix, [m::Number=1])

Samples a knockoff x̃ from Gaussian x using conditional distribution formulas:

If (x, x̃) ~ N((μ, μ), G) where G = [Σ Σ - S; Σ - S Σ], then we sample x̃ from x̃|x = N(μ+(Σ-S)inv(Σ)(x-μ) , 2S-Sinv(Σ)S).

If we sample m knockoffs, we use the algorithm in "Improving the Stability of the Knockoff Procedure: Multiple Simultaneous Knockoffs and Entropy Maximization" by Gimenez and Zou.

Inputs

  • X: A n × p numeric matrix, each row is a sample, and each column is covariate.
  • μ: A p × 1 vector of column mean of X
  • Σ: A p × p covariance matrix of X
  • S: A p × p matrix solved to satisfy S ⪰ 0 and (m+1)/m*Σ - S ⪰ 0
  • m: Number of (simultaneous) knockoffs per variable to generate, default m=1

Output

  • : A n × pm numeric matrix. The first p columns store the first knockoff copy, and the next p columns store the second knockoff...etc

Todo

  • When s is the zero vector, X̃ should be identical to X but it isn't
  • Consider changing sampling code to using Distribution's MvNormal
  • For multiple knockoffs, can we avoid storing a pm × pm matrix in memory?
source
Knockoffs.fit_lassoMethod
fit_lasso(y, X, method=:maxent, ...)
fit_lasso(y, X, μ, Σ, method=:maxent, ...)

Generates model-X knockoffs with method, runs Lasso, then applies the knockoff-filter. If μ and Σ are not provided, they will be estimated from data.

Inputs

  • y: A n × 1 response vector
  • X: A n × p numeric matrix, each row is a sample, and each column is covariate.
  • method: Method for knockoff generation (defaults to :maxent)
  • μ: A p × 1 vector of column mean of X. If not provided, defaults to column mean.
  • Σ: A p × p covariance matrix of X. If not provided, it will be estimated based on a shrinked empirical covariance matrix, see modelX_gaussian_knockoffs
  • d: Distribution of response. Defaults Normal(), for binary response (logistic regression) use Binomial().
  • m: Number of simultaneous knockoffs to generate, defaults to m=1
  • fdrs: Target FDRs, defaults to [0.01, 0.05, 0.1, 0.25, 0.5]
  • groups: Vector of group membership. If not supplied, we generate regular knockoffs. If supplied, we run group knockoffs.
  • filter_method: Choices are :knockoff or :knockoff_plus (default)
  • debias: Defines how the selected coefficients are debiased. Specify :ls for least squares or :lasso for Lasso (only running on the support). To not debias, specify debias=nothing (default).
  • kwargs: Additional arguments to input into glmnetcv and glmnet
source
Knockoffs.fit_marginalMethod
fit_marginal(y, X, method=:maxent, ...)
fit_marginal(y, X, μ, Σ, method=:maxent, ...)

Generates model-X knockoffs with method and computes feature importance statistics based on squared marginal Z score: abs2(x[:, i]^t*y) / n. If μ and Σ are not provided, they will be estimated from data.

Inputs

  • y: A n × 1 response vector
  • X: A n × p numeric matrix, each row is a sample, and each column is covariate.
  • method: Method for knockoff generation (defaults to :maxent)
  • μ: A p × 1 vector of column mean of X. If not provided, defaults to column mean.
  • Σ: A p × p covariance matrix of X. If not provided, it will be estimated based on a shrinked empirical covariance matrix, see modelX_gaussian_knockoffs
  • d: Distribution of response. Defaults Normal(), for binary response (logistic regression) use Binomial().
  • m: Number of simultaneous knockoffs to generate, defaults to m=1
  • fdrs: Target FDRs, defaults to [0.01, 0.05, 0.1, 0.25, 0.5]
  • groups: Vector of group membership. If not supplied, we generate regular knockoffs. If supplied, we run group knockoffs.
  • filter_method: Choices are :knockoff or :knockoff_plus (default)
  • debias: Defines how the selected coefficients are debiased. Specify :ls for least squares or :lasso for Lasso (only running on the support). To not debias, specify debias=nothing (default).
  • kwargs: Additional arguments to input into glmnetcv and glmnet
source
Knockoffs.fixed_knockoffsMethod
fixed_knockoffs(X::Matrix{T}; [method], [kwargs...])

Creates fixed-X knockoffs. Internally, X will be automatically normalized before computing its knockoff.

Inputs

  • X: A column-normalized n × p numeric matrix, each row is a sample, and each column is covariate. We will internally normalized X if it is not.
  • method: Can be one of the following
    • :mvr: Minimum variance-based reconstructability knockoffs (alg 1 in ref 2)
    • :maxent: Maximum entropy knockoffs (alg 2 in ref 2)
    • :equi: Equi-distant knockoffs (eq 2.3 in ref 1),
    • :sdp: SDP knockoffs (eq 2.4 in ref 1)
    • :sdp_fast: SDP knockoffs via coordiate descent (alg 2.2 in ref 3)
  • kwargs...: Possible optional inputs to method, see solve_MVR, solve_max_entropy, and solve_sdp_ccd

Output

  • GaussianKnockoff: A struct containing the original (column-normalized) X and its knockoff , in addition to other variables (e.g. s)

Reference

  1. "Controlling the false discovery rate via Knockoffs" by Barber and Candes (2015).
  2. "Powerful knockoffs via minimizing reconstructability" by Spector, Asher, and Lucas Janson (2020)
  3. "FANOK: Knockoffs in Linear Time" by Askari et al. (2020).
source
Knockoffs.form_emission_prob_matrixMethod
form_emission_prob_matrix(a, θ, xi::AbstractVector)

Inputs

  • a: p × K matrix with values estimated from fastPHASE (i.e. they called it the α parameter)
  • θ: p × K matrix with values estimated from fastPHASE
  • xi: Length p vector with sample i's genotypes (entries 0, 1 or 2)
source
Knockoffs.forward_backward!Function
forward_backward!(x, L, y, storage=zeros(length(x)))

Non-allocating solver for finding x to the solution of LL'x = y where L is a cholesky factor.

source
Knockoffs.forward_backward_sampling!Method
forward_backward_sampling!(Z, X, Q, q, θ, ...)

Samples Z, the hidden states of a HMM, from observed sequence of unphased genotypes X.

Inputs

Z: Length p vector of integers. This will store the sampled Markov states X: Length p vector of genotypes (0, 1, or 2) Q: K × K × p array. Q[:, :, j] is a K × K matrix of transition probabilities for jth state, i.e. Q[l, k, j] = P(X{j} = k | X{j - 1} = l). The first transition matrix is not used. q: Length K vector of initial probabilities θ: The θ parameter estimated from fastPHASE

Preallocated storage variables

table: a MarkovChainTable that maps markov chain states to haplotype pairs (ka, kb) d: Sampling distribution, probabilities in d.p are mutated α̂: p × K scaled forward probability matrix, where α̂[j, k] = P(x_1,...,x_k, z_k) / P(x_1,...,x_k) c: normalizing constants, c[k] = p(x_k | x_1,...,x_{k-1})

Reference

Algorithm 3 of "Gene hunting with hidden Markov model knockoffs" by Sesia et al

source
Knockoffs.full_knockoffscreenMethod
full_knockoffscreen(x::SnpArray; windowsize::Int=100)

Generates knockoffs X̃ⱼ by on regressing Xⱼ on SNPs knockoffs within a sliding window of width windowsize.

Inputs

  • x: A SnpArray or String for the path of the PLINK .bed file
  • windowsize: Int specifying window width. Defaults to 100

Outputs

  • : A n × p dense matrix of Float64, each row is a sample.

References

  • He, Zihuai, Linxi Liu, Chen Wang, Yann Le Guen, Justin Lee, Stephanie Gogarten, Fred Lu et al. "Identification of putative causal loci in whole-genome sequencing data via knockoff statistics." Nature communications 12, no. 1 (2021): 1-18.
  • He, Zihuai, Yann Le Guen, Linxi Liu, Justin Lee, Shiyang Ma, Andrew C. Yang, Xiaoxia Liu et al. "Genome-wide analysis of common and rare variants via multiple knockoffs at biobank scale, with an application to Alzheimer disease genetics." The American Journal of Human Genetics 108, no. 12 (2021): 2336-2353.

TODO

  • Use ElasticArrays.jl to avoid reallocating design matrix in each loop
  • Write iterator interface to avoid allocating and storing all knockoffs at once
source
Knockoffs.get_genotype_emission_probabilitiesMethod
get_genotype_emission_probabilities(θ::AbstractMatrix, xj::Number, ka::Int, kb::Int, j::Int)

Computes P(xj | k={ka,kb}, θ): emission probabilities for genotypes. This is eq 10 of "Gene hunting with hidden Markov model knockoffs" by Sesia et al.

source
Knockoffs.get_genotype_transition_matrixMethod
get_genotype_transition_matrix(r, θ, α, q, table)

Compute transition matrices for the hidden Markov chains in unphased genotypes. This is equation 9 of "Gene hunting with hidden Markov model knockoffs" by Sesia et al.

Inputs

r: Length p vector, the "recombination rates" θ: Size p × K matrix, θ[j, k] is probability that the allele is 1 for SNP p at kth haplotype motif α: Size p × K matrix, probabilities that haplotype motifs succeed each other. Rows should sum to 1. q: Length K vector of initial probabilities table: a MarkovChainTable that maps markov chain states k = 1, ..., K+(K+1)/2 to haplotype pairs (ka, kb).

source
Knockoffs.get_haplotype_emission_probabilitiesMethod
get_haplotype_emission_probabilities(θ::AbstractMatrix, j::Int, hj::Number, zj::Int)

Computes emission probabilities for unphased HMM. This is the equation above eq8 of "Gene hunting with hidden Markov model knockoffs" by Sesia et al.

source
Knockoffs.get_haplotype_transition_matrixMethod
get_haplotype_transition_matrix(r, θ, α)

Compute transition matrices for the hidden Markov chains in haplotypes. This is 2 equations above eq8 in "Gene hunting with hidden Markov model knockoffs" by Sesia et al.

Inputs

r: Length p vector, the "recombination rates" θ: Size p × K matrix, θ[j, k] is probability that the allele is 1 for SNP p at kth haplotype motif α: Size p × K matrix, probabilities that haplotype motifs succeed each other. Rows should sum to 1.

Output

Q: A p-dimensional vector of K × K matrices. Q[:, :, j] is the jth transition matrix.

source
Knockoffs.group_block_objectiveMethod
group_block_objective(Σ, S, groups, m, method)

Evaluate the objective for SDP/MVR/ME. This is not an efficient function, so it should only be called at the start of each algorithm.

Inputs

  • Σ: Covariance or correlation matrix for original data
  • S: Optimization variable (group-block-diagonal)
  • groups: Vector of group membership. Variable i belongs to group groups[i]
  • m: Number of knockoffs to generate for each variable
  • method: The optimization method for group knockoffs
source
Knockoffs.hc_partition_groupsMethod
hc_partition_groups(X::AbstractMatrix; [cutoff], [min_clusters], [force_contiguous])
hc_partition_groups(Σ::Symmetric; [cutoff], [min_clusters], [force_contiguous])

Computes a group partition based on individual level data X or correlation matrix Σ using hierarchical clustering with specified linkage.

Inputs

  • X: n × p data matrix. Each row is a sample
  • Σ: p × p correlation matrix. Must be wrapped in the Symmetric argument, otherwise we will treat it as individual level data
  • cutoff: Height value for which the clustering result is cut, between 0 and 1 (default 0.5). This ensures that no variables between 2 groups have correlation greater than cutoff. 1 recovers ungrouped structure, 0 corresponds to everything in a single group.
  • min_clusters: The desired number of clusters.
  • linkage: cluster linkage function to use (when force_contiguous=true, linkage must be :single). linkage defines how the distances between the data points are aggregated into the distances between the clusters. Naturally, it affects what clusters are merged on each iteration. The valid choices are:
    • :single (default): use the minimum distance between any of the cluster members
    • :average: use the mean distance between any of the cluster members
    • :complete: use the maximum distance between any of the members
    • :ward: the distance is the increase of the average squared distance of a point to its cluster centroid after merging the two clusters
    • :ward_presquared: same as :ward, but assumes that the distances in d are already squared.
  • rep_method: Method for selecting representatives for each group. Options are :id (tends to select roughly independent variables) or :rss (tends to select more correlated variables)

If force_contiguous = false and both min_clusters and cutoff are specified, it is guaranteed that the number of clusters is not less than min_clusters and their height is not above cutoff. If force_contiguous = true, min_clusters keyword is ignored.

Outputs

  • groups: Length p vector of group membership for each variable
  • group_reps: Columns of X selected as representatives. Each group have at most nrep representatives. These are typically used to construct smaller group knockoff for extremely large groups
source
Knockoffs.hmm_knockoffMethod
hmm_knockoff(plinkname; [datadir], [plink_outfile], [fastphase_outfile], [outdir], [verbose], args...)

Generates HMM knockoffs from binary PLINK formatted files. This is done by first running fastPHASE, then running Algorithm 2 of "Gene hunting with hidden Markov model knockoffs" by Sesia, Sabatti, and Candes

Input

  • plinkname: Binary PLINK file names without the .bed/.bim/.fam suffix.

Optional arguments

  • datadir: Full path to the PLINK and fastPHASE files (default = current directory)
  • plink_outfile: Output PLINK format name
  • fastphase_outfile: The output file name from fastPHASE's alpha, theta, r files
  • args...: Any parameter that accepted in fastPHASE.fastphase_estim_param()

Output

  • plink_outfile.bed: n × p knockoff genotypes
  • plink_outfile.bim: SNP mapping file. Knockoff have SNP names ending in ".k"
  • plink_outfile.fam: Sample mapping file, this is a copy of the original plinkname.fam file
  • fastphase_outfile_rhat.txt: averaged r hat file from fastPHASE
  • fastphase_outfile_alphahat.txt: averaged alpha hat file from fastPHASE
  • fastphase_outfile_thetahat.txt: averaged theta hat file from fastPHASE
source
Knockoffs.hmm_knockoffMethod
hmm_knockoff(snpdata::SnpData, r::AbstractVecOrMat, θ::AbstractMatrix, α::AbstractMatrix)

Generates knockoff of snpdata with loaded r, θ, α

Input

  • SnpData: A SnpData object from SnpArrays
  • r: The r vector estimated by fastPHASE
  • θ: The θ matrix estimated by fastPHASE
  • α: The α matrix estimated by fastPHASE

Optional Inputs

  • outdir: Output directory for generated knockoffs
  • plink_outfile: Output file name for knockoff genotypes
  • estimate_δ: If true, will estimate pseudo-FDR by computing a δ value for each SNP via likelihood ratio bound
source
Knockoffs.id_partition_groupsMethod
id_partition_groups(X::AbstractMatrix; [rss_target], [force_contiguous])
id_partition_groups(Σ::Symmetric; [rss_target], [force_contiguous])

Compute group members based on interpolative decompositions. An initial pass first selects the most representative features such that regressing each non-represented feature on the selected will have residual less than rss_target. The selected features are then defined as group centers and the remaining features are assigned to groups

Inputs

  • G: Either individual level data X or a correlation matrix Σ. If one inputs Σ, it must be wrapped in the Symmetric argument, otherwise we will treat it as individual level data
  • rss_target: Target residual level (greater than 0) for the first pass, smaller means more groups
  • force_contiguous: Whether groups are forced to be contiguous. If true, variants are assigned its left or right center, whichever has the largest correlation with it without breaking contiguity.

Outputs

  • groups: Length p vector of group membership for each variable

Note: interpolative decomposition is a stochastic algorithm. Set a seed to guarantee reproducible results.

source
Knockoffs.initialize_SFunction
initialize_S(Σ, groups, m, method, verbose)

Internal function to help initialize S to a good starting value, returns the final S matrix as well as the cholesky factorizations L and C where

  • L.LL.U = cholesky((m+1)/mΣ - S)
  • C.L*C.U = cholesky(S)
source
Knockoffs.interpolative_decompositionMethod
interpolative_decomposition(A::AbstractMatrix, rk::Int)

Computes the interpolative decomposition of A with rank rk and returns the top rk most representative columns of A

source
Knockoffs.ipadMethod
ipad(X::Matrix; [r_method], [m])

Generates knockoffs based on intertwined probabilitistic factors decoupling (IPAD). This assumes that X can be factored as X = FΛ' + E where F is a n × r random matrix of latent factors, Λ are factor loadings, and E are residual errors. When this assumption is met, FDR can be controlled with no power loss when applying the knockoff procedure. Internally, we need to compute an eigenfactorization for a n × n matrix. This is often faster than standard model-X knockoffs which requires solving p-dimensional convex optimization problem.

Inputs

  • X: A n × p numeric matrix, each row is a sample, and each column is covariate.
  • r_method: Method used for estimating r, the number of latent factors. Choices include :er (default), :gr, or :ve
  • m: Number of (simultaneous) knockoffs per variable to generate, default m=1

References

  • Fan, Y., Lv, J., Sharifvaghefi, M. and Uematsu, Y., 2020. IPAD: stable interpretable forecasting with knockoffs inference. Journal of the American Statistical Association, 115(532), pp.1822-1834.
  • Bai, J., 2003. Inferential theory for factor models of large dimensions. Econometrica, 71(1), pp.135-171.
  • Ahn, S.C. and Horenstein, A.R., 2013. Eigenvalue ratio test for the number of factors. Econometrica, 81(3), pp.1203-1227.
source
Knockoffs.likelihood_ratioMethod
likelihood_ratio(θa, θb, ρ; α=0.1, n = 1000, threshold = true)

Estimates the likelihood ratio bound log(P(x)Q(x̃) / Q(x)P(x̃)) for each a single HMM state (fixed i and j)

  • θa: State 1 of genotype markov state
  • θb: State 2 of genotype markov state
  • ρ is maf of SNP
  • α is % SNPs decorrelated (defaults 10%)
source
Knockoffs.lowrankdowndate_turbo!Method
lowrankdowndate_turbo!(C::Cholesky, v::AbstractVector)

Vectorized version of lowrankdowndate!, source https://github.com/JuliaLang/julia/blob/742b9abb4dd4621b667ec5bb3434b8b3602f96fd/stdlib/LinearAlgebra/src/cholesky.jl#L753 Takes advantage of the fact that v is 0 everywhere except at 1 position

source
Knockoffs.lowrankupdate_turbo!Method
lowrankupdate_turbo!(C::Cholesky, v::AbstractVector)

Vectorized version of lowrankupdate!, source https://github.com/JuliaLang/julia/blob/742b9abb4dd4621b667ec5bb3434b8b3602f96fd/stdlib/LinearAlgebra/src/cholesky.jl#L707 Takes advantage of the fact that v is 0 everywhere except at 1 position

source
Knockoffs.markov_knockoffsMethod
markov_knockoffs(Z::Vector{Int}, Q::Array{T, 3}, q::Vector{T})

Generates knockoff of variables distributed as a discrete Markov Chain with K states.

Inputs

  • Z: Length p vector of Int where Z[i] is the ith state
  • Q: K × K × p array. Q[:, :, j] is a K × K matrix of transition probabilities for jth state, i.e. Q[l, k, j] = P(X{j} = k | X{j - 1} = l) The first transition matrix is not used.
  • q: K × 1 vector of initial probabilities

Reference

Equations 4-5 of "Gene hunting with hidden Markov model knockoffs" by Sesia, Sabatti, and Candes

source
Knockoffs.mk_thresholdMethod
mk_threshold(τ::Vector{T}, κ::Vector{Int}, m::Int, q::Number)

Chooses the multiple knockoff threshold τ̂ > 0 by setting τ̂ = min{ t > 0 : (1/m + 1/m * {#j: κ[j] ≥ 1 and W[j] ≥ t}) / {#j: κ[j] == 0 and W[j] ≥ τ̂} ≤ q }.

Inputs

  • τ: τ[i] stores the feature importance score for the ith feature, i.e. the value T0 - median(T1,...,Tm). Note in Gimenez and Zou, the max function is used instead of median
  • κ: κ[i] stores which of m knockoffs has largest importance score. When original variable has largest score, κ[i] == 0.
  • m: Number of knockoffs per variable generated
  • q: target FDR (between 0 and 1)
  • rej_bounds: Number of values of top τ to consider (default = 10000)

Reference:

  • Equations 8 and 9 in supplement of "Identification of putative causal loci in wholegenome sequencing data via knockoff statistics" by He et al.
  • Algorithm 1 of "Improving the Stability of the Knockoff Procedure: Multiple Simultaneous Knockoffs and Entropy Maximization" by Gimenez and Zou.
source
Knockoffs.modelX_gaussian_group_knockoffsMethod
modelX_gaussian_group_knockoffs(X, method, groups, μ, Σ; [m], [covariance_approximator], [kwargs])
modelX_gaussian_group_knockoffs(X, method, groups; [m], [covariance_approximator], [kwargs])

Constructs Gaussian model-X group knockoffs. If the covariance Σ and mean μ are not specified, they will be estimated from data, i.e. we will make second-order group knockoffs. To incorporate group structure, the (true or estimated) covariance matrix is block-diagonalized according to groups membership to solve a relaxed optimization problem. See reference paper and Knockoffs.jl docs for more details.

Inputs

  • X: A n × p design matrix. Each row is a sample, each column is a feature.
  • method: Method for constructing knockoffs. Options include
    • :maxent: (recommended) for fully general maximum entropy group knockoffs
    • :mvr: for fully general minimum variance-based reconstructability (MVR) group knockoffs
    • :equi: for equi-correlated knockoffs. This is the methodology proposed in Dai R, Barber R. The knockoff filter for FDR control in group-sparse and multitask regression. International conference on machine learning 2016 Jun 11 (pp. 1851-1859). PMLR.
    • :sdp: Fully general SDP group knockoffs based on coodinate descent
    • :sdp_block: Fully general SDP group knockoffs where each block is solved exactly using an interior point solver.
    • :sdp_subopt: Chooses each block S_{i} = γ_i * Σ_{ii}. This slightly generalizes the equi-correlated group knockoff idea proposed in Dai and Barber 2016.
  • groups: Vector of group membership
  • μ: A length p vector storing the true column means of X
  • Σ: A p × p covariance matrix for columns of X
  • m: Number of knockoffs per variable, defaults to 1.
  • covariance_approximator: A covariance estimator, defaults to LinearShrinkage(DiagonalUnequalVariance(), :lw). See CovarianceEstimation.jl for more options.
  • kwargs: Extra keyword arguments for solve_s_group

How to define groups

The exported functions hc_partition_groups and id_partition_groups can be used to build a group membership vector.

A note on compute time

The computational complexity of group knockoffs scales quadratically with group size. Thus, very large groups (e.g. >100 members per group) dramatically slows down parameter estimation. In such cases, one can consider running the routine modelX_gaussian_rep_group_knockoffs which constructs group knockoffs by choosing top representatives from each group.

Reference

Dai & Barber 2016, The knockoff filter for FDR control in group-sparse and multitask regression

source
Knockoffs.modelX_gaussian_knockoffsMethod
modelX_gaussian_knockoffs(X::Matrix, method::Symbol; [m], [covariance_approximator], [kwargs...])
modelX_gaussian_knockoffs(X::Matrix, method::Symbol, μ::Vector, Σ::Matrix; [m], [kwargs...])

Creates model-free multivariate normal knockoffs by sequentially sampling from conditional multivariate normal distributions. The true mean μ and covariance Σ is estimated from data if not supplied.

Inputs

  • X: A n × p numeric matrix, each row is a sample, and each column is covariate.
  • method: Can be one of the following
    • :mvr for minimum variance-based reconstructability knockoffs (alg 1 in ref 2)
    • :maxent for maximum entropy knockoffs (alg 2 in ref 2)
    • :equi for equi-distant knockoffs (eq 2.3 in ref 1),
    • :sdp for SDP knockoffs (eq 2.4 in ref 1)
    • :sdp_ccd for SDP knockoffs via coordiate descent (alg 2.2 in ref 3)
  • μ: A p × 1 vector of column mean of X, defaults to column mean
  • Σ: A p × p matrix of covariance of X, defaults to a shrinkage estimator specified by covariance_approximator.
  • m: Number of knockoff copies per variable to generate, defaults to 1.
  • covariance_approximator: A covariance estimator, defaults to LinearShrinkage(DiagonalUnequalVariance(), :lw) which tends to give good empirical performance when p>n. See CovarianceEstimation.jl for more options.
  • kwargs...: Possible optional inputs to solvers specified in method, see solve_MVR, solve_max_entropy, and solve_sdp_ccd

Reference:

  1. "Panning for Gold: Model-X Knockoffs for High-dimensional Controlled Variable Selection" by Candes, Fan, Janson, and Lv (2018)
  2. "Powerful knockoffs via minimizing reconstructability" by Spector, Asher, and Lucas Janson (2020)
  3. "FANOK: Knockoffs in Linear Time" by Askari et al. (2020).

Covariance Approximation:

The covariance is approximated by a linear shrinkage estimator using Ledoit-Wolf with DiagonalUnequalVariance target, which seems to perform well for p>n cases. We do not simply use cov(X) since isposdef(cov(X)) is typically false. For comparison of various estimators, see: https://mateuszbaran.github.io/CovarianceEstimation.jl/dev/man/msecomp/#msecomp

source
Knockoffs.modelX_gaussian_rep_group_knockoffsMethod
modelX_gaussian_rep_group_knockoffs(X, method, groups; [m], [covariance_approximator], [kwargs...])
modelX_gaussian_rep_group_knockoffs(X, method, groups, μ, Σ; [m], [kwargs...])

Constructs group knockoffs by choosing representatives from each group and solving a smaller optimization problem based on the representatives only. Remaining knockoffs are generated based on a conditional independence assumption similar to a graphical model (details to be given later). The representatives are computed by choose_group_reps

Inputs

  • X: A n × p design matrix. Each row is a sample, each column is a feature.
  • method: Method for constructing knockoffs. Options are the same as modelX_gaussian_group_knockoffs
  • groups: Vector of Int denoting group membership. groups[i] is the group of X[:, i]
  • covariance_approximator: A covariance estimator, defaults to LinearShrinkage(DiagonalUnequalVariance(), :lw). See CovarianceEstimation.jl for more options.
  • μ: A length p vector storing the true column means of X
  • Σ: A p × p covariance matrix for columns of X
  • rep_threshold: Value between 0 and 1 that controls the number of representatives per group. Larger means more representatives (default 0.5)
  • m: Number of knockoffs per variable, defaults to 1.
  • kwargs: Extra keyword arguments for solve_s_group
source
Knockoffs.prioritize_variantsMethod
prioritize_variants!(index::AbstractVector, priority_vars::AbstractVector)

Given (unsorted) index, we make variables in priority_vars appear first in index, preserving the original order in index and those not in priority_vars.

Example

index = [11, 4, 5, 9, 7]
priority_vars = [4, 9]
result = prioritize_variants(index, priority_vars)
result == [4, 9, 11, 5, 7]
source
Knockoffs.rapidMethod
rapid(rapid_exe, vcffile, mapfile, d, outfolder, w, r, s, [a])

Wrapper for the RaPID program.

Inputs

  • rapid_exe: Full path to the RaPID_v.1.7 executable file
  • vcffile: Phased VCF file name
  • mapfile: Map file name
  • d: Actual Minimum IBD length in cM
  • outfolder: Output folder name
  • w: Number of SNPs in a window for sub-sampling
  • r: Number of runs
  • s: Minimum number of successes to consider a hit

Optional Inputs

  • a: If true, ignore MAFs. By default (a=false) the sites are selected at random weighted by their MAFs.
source
Knockoffs.search_rankFunction
search_rank(A::AbstractMatrix, sk::Vector{Int}, target=0.25, verbose=false)

Finds the rank (number of columns of A) that best approximates the remaining columns such that regressing each remaining variable on those selected has RSS less than some target.

  • Σ: Original (p × p) correlation matrix
  • A: The (upper triangular) cholesky factor of Σ
  • sk: The (unsorted) columns of A, earlier ones are more important
  • target: Target residual level

note: we cannot do binary search because large ranks can increase residuals

source
Knockoffs.simulate_AR1Method
simulate_AR1(p::Int, a=1, b=1, tol=1e-3, max_corr=1, rho=nothing)

Generates p-dimensional correlation matrix for AR(1) Gaussian process, where successive correlations are drawn from Beta(a,b) independently. If rho is specified, then the process is stationary with correlation rho.

Source

https://github.com/amspector100/knockpy/blob/20eddb3eb60e0e82b206ec989cb936e3c3ee7939/knockpy/dgp.py#L61

source
Knockoffs.simulate_ERMethod
simulate_ER(p::Int; [invert])

Simulates a covariance matrix from a clustered Erdos-Renyi graph, which is a block diagonal matrix where each block is an Erdo-Renyi graph. The result is scaled back to a correlation matrix.

For details, see the 4th simulation routine in section 5.1 of Li and Maathius https://academic.oup.com/jrsssb/article/83/3/534/7056103?login=false

Inputs

  • p: Dimension of covariance matrix
  • ϕ: Probability of forming an edge between any 2 nodes
  • lb: lower bound for the value of an edge (drawn from uniform distribution)
  • ub: upper bound for the value of an edge (drawn from uniform distribution)
  • invert: Whther to invert the covariance matrix (to obtain the precision)
  • λmin: minimum eigenvalue of the resulting covariance matrix
  • blocksize: Number of variables within each ER graph.
source
Knockoffs.simulate_block_covarianceMethod
simulate_block_covariance(groups, ρ, γ, num_v, w)

Simulates a block covariance matrix similar to the one in Dai & Barber 2016, The knockoff filter for FDR control in group-sparse and multitask regression. That is, all diagonal elements will be 1, correlation within groups will be ρ, and correlation between groups will be ρ*γ.

Inputs

  • groups: Vector of group membership
  • ρ: within group correlation
  • γ: between group correlation

Optional arguments

  • num_v: Number of added rank 1 update Σ + v1*v1' + ... + vn*vn' where v is iid N(0, w) (default 0)
  • w: variance of the rank 1 update used in num_v (default 1)
source
Knockoffs.single_linkage_distanceMethod
single_linkage_distance(distmat, left, right)

Computes the minimum distance (i.e. single-linkage distance) between members in left and members in right. Member distances are precomputed in distmat

source
Knockoffs.solve_MVRMethod
solve_MVR(Σ::AbstractMatrix)

Solves the minimum variance-based reconstructability problem for fixed-X and model-X knockoffs given correlation matrix Σ. Users should call solve_s instead of this function.

See algorithm 1 of "Powerful knockoffs via minimizing reconstructability" by Spector, Asher, and Lucas Janson (2020) https://arxiv.org/pdf/2011.14625.pdf

source
Knockoffs.solve_SDPMethod
solve_SDP(Σ::AbstractMatrix)

Solves the SDP problem for fixed-X and model-X knockoffs given correlation matrix Σ. Users should call solve_s instead of this function.

The optimization problem is stated in equation 3.13 of https://arxiv.org/pdf/1610.02351.pdf

Arguments

  • Σ: A correlation matrix (diagonals all equal to 1)
  • m: Number of knockoffs to generate, defaults to 1
  • optm: SDP solver. Defaults to Hypatia.Optimizer(verbose=false). This can be any solver that supports the JuMP interface. For example, use SDPT3.Optimizer in SDPT3.jl package (which is a MATLAB dependency) for the best performance.
source
Knockoffs.solve_equiMethod
solve_equi(Σ::AbstractMatrix)

Solves the equicorrelated problem for fixed-X and model-X knockoffs given correlation matrix Σ. Users should call solve_s instead of this function.

source
Knockoffs.solve_group_SDP_single_blockMethod
solve_group_SDP_single_block(Σ11, ub)

Solves a single block of the fully general group SDP problem. The objective is min sum_{i,j} |Σ[i,j] - S[i,j]| s.t. 0 ⪯ S ⪯ A11 - [A12 A13]inv(A22-S2 A32; A23 A33-S3)[A21; A31]

Inputs

  • Σ11: The block corresponding to the current group. Must be a correlation matrix.
  • ub: The matrix defined as A11 - [A12 A13]inv(A22-S2 A32; A23 A33-S3)[A21; A31]
  • optm: Any solver compatible with JuMP.jl
source
Knockoffs.solve_group_SDP_suboptMethod

Solves the SDP group knockoff problem using analogy to the equi-correlated group knockoffs. Basically, the idea is to optimize a vector γ where γ[j] multiplies Σ_jj. In the equi-correlated setting, all γ[j] is forced to be equal.

Details can be found in Dai & Barber 2016, The knockoff filter for FDR control in group-sparse and multitask regression

source
Knockoffs.solve_group_block_updateMethod

Todo

  • somehow avoid reallocating ub every iteration
  • When solving each individual block,
    • warmstart
    • avoid reallocating S1_new
    • allocate vector of models
    • use loose convergence criteria
  • For singleton groups, don't use JuMP and directly update
  • Currently all objective values are computed based on SDP case. Need to display objective values for ME/MVR objective
source
Knockoffs.solve_group_equiMethod

Solves the equi-correlated group knockoff problem. Here Σ is the true covariance matrix (scaled so that it has 1 on its diagonal) and Σblocks is the block-diagonal covariance matrix where each block corresponds to groups.

Details can be found in Dai & Barber 2016, The knockoff filter for FDR control in group-sparse and multitask regression

source
Knockoffs.solve_group_max_entropy_hybridMethod
solve_group_max_entropy_hybrid(Σ, groups, [outer_iter=100], [inner_pca_iter=1],
    [inner_ccd_iter=1], [tol=0.0001], [ϵ=1e-6], [m=1], [robust=false], [verbose=false])

Solves the group-knockoff optimization problem based on Maximum Entropy objective. Users should call solve_s_group instead of this function.

Inputs

  • Σ: Correlation matrix
  • groups: Group membership vector

Optional inputs

  • outer_iter: Maximum number of outer iterations. Each outer iteration will perform inner_pca_iter PCA updates inner_ccd_iter full optimization updates (default = 100).
  • inner_pca_iter: Number of full PCA updates before changing to fully general coordinate descent updates (default = 1)
  • inner_ccd_iter: Number of full general coordinate descent updates before changing to PCA updates (default = 1)
  • tol: convergence tolerance. Algorithm converges when abs((obj_new-obj_old)/obj_old) < tol OR when changes in S matrix falls below 1e-4
  • ϵ: tolerance added to the lower and upper bound, prevents numerical issues (default = 1e-6)
  • m: Number of knockoffs per variable (defaults 1)
  • robust: whether to use "robust" Cholesky updates. If robust=true, alg will be ~10x slower, only use this if robust=false causes cholesky updates to fail. (default false)
  • verbose: Whether to print intermediate results (default false)
source
Knockoffs.solve_group_mvr_hybridMethod
solve_group_mvr_hybrid(Σ, groups, [outer_iter=100], [inner_pca_iter=1],
    [inner_ccd_iter=1], [tol=0.0001], [ϵ=1e-6], [m=1], [robust=false], [verbose=false])

Solves the group-knockoff optimization problem based on MVR objective. Users should call solve_s_group instead of this function.

Inputs

  • Σ: Correlation matrix
  • groups: Group membership vector

Optional inputs

  • outer_iter: Maximum number of outer iterations. Each outer iteration will perform inner_pca_iter PCA updates inner_ccd_iter full optimization updates (default = 100).
  • inner_pca_iter: Number of full PCA updates before changing to fully general coordinate descent updates (default = 1)
  • inner_ccd_iter: Number of full general coordinate descent updates before changing to PCA updates (default = 1)
  • tol: convergence tolerance. Algorithm converges when abs((obj_new-obj_old)/obj_old) < tol OR when changes in S matrix falls below 1e-4
  • ϵ: tolerance added to the lower and upper bound, prevents numerical issues (default = 1e-6)
  • m: Number of knockoffs per variable (defaults 1)
  • robust: whether to use "robust" Cholesky updates. If robust=true, alg will be ~10x slower, only use this if robust=false causes cholesky updates to fail. (default false)
  • verbose: Whether to print intermediate results (default false)
source
Knockoffs.solve_group_sdp_hybridMethod
solve_group_sdp_hybrid(Σ, groups, [outer_iter=100], [inner_pca_iter=1],
    [inner_ccd_iter=1], [tol=0.0001], [ϵ=1e-6], [m=1], [robust=false], [verbose=false])

Solves the group-knockoff optimization problem based on SDP objective. Users should call solve_s_group instead of this function.

Inputs

  • Σ: Correlation matrix
  • groups: Group membership vector

Optional inputs

  • outer_iter: Maximum number of outer iterations. Each outer iteration will perform inner_pca_iter PCA updates inner_ccd_iter full optimization updates (default = 100).
  • inner_pca_iter: Number of full PCA updates before changing to fully general coordinate descent updates (default = 1)
  • inner_ccd_iter: Number of full general coordinate descent updates before changing to PCA updates (default = 1)
  • tol: convergence tolerance. Algorithm converges when abs((obj_new-obj_old)/obj_old) < tol OR when changes in S matrix falls below 1e-4
  • ϵ: tolerance added to the lower and upper bound, prevents numerical issues (default = 1e-6)
  • m: Number of knockoffs per variable (defaults 1)
  • robust: whether to use "robust" Cholesky updates. If robust=true, alg will be ~10x slower, only use this if robust=false causes cholesky updates to fail. (default false)
  • verbose: Whether to print intermediate results (default false)
source
Knockoffs.solve_max_entropyMethod
solve_max_entropy(Σ::AbstractMatrix)

Solves the maximum entropy knockoff problem for fixed-X and model-X knockoffs given correlation matrix Σ. Users should call solve_s instead of this function.

Reference

Algorithm 2.2 from Powerful Knockoffs via Minimizing Reconstructability: https://arxiv.org/pdf/2011.14625.pdf

Note

There is a typo in algorithm for computing ME knockoffs in "Powerful knockoffs via minimizing reconstructability" by Spector, Asher, and Lucas Janson (2020). In the supplemental section, equation 59, they needed to evaluate c_m = D^t_{-j,j}D^{-1}_{-j,-j}D_{-j,j}. They claimed the FANOK paper ("FANOK: KNOCKOFFS IN LINEAR TIME" by Askari et al. (2020)) implies that c_m = ||v_m||^2 where Lv_m = u. However, according to section A.1.2 of the FANOK paper, it seems like the actual update should be D^t_{-j,j}D^{-1}_{-j,-j}D_{-j,j} = ζ*||c_m||^2 / (ζ + ||c_m||^2) where ζ = 2Σ_{jj} - s_j.

source
Knockoffs.solve_sMethod
solve_s(Σ::Symmetric, method::Symbol; m=1, kwargs...)

Solves the vector s for generating knockoffs. Σ can be a general covariance matrix but it must be wrapped in the Symmetric keyword.

Inputs

  • Σ: A covariance matrix (one must wrap Symmetric(Σ) explicitly)
  • method: Can be one of the following
    • :mvr for minimum variance-based reconstructability knockoffs (alg 1 in ref 2)
    • :maxent for maximum entropy knockoffs (alg 2 in ref 2)
    • :equi for equi-distant knockoffs (eq 2.3 in ref 1),
    • :sdp for SDP knockoffs (eq 2.4 in ref 1)
    • :sdp_ccd fast SDP knockoffs via coordiate descent (alg 2.2 in ref 3)
  • m: Number of knockoffs per variable, defaults to 1.
  • kwargs: Extra arguments available for specific methods. For example, to use less stringent convergence tolerance for MVR knockoffs, specify tol = 0.001. For a list of available options, see solve_MVR, solve_max_entropy, solve_sdp_ccd, solve_SDP, or solve_equi

Reference

  1. "Controlling the false discovery rate via Knockoffs" by Barber and Candes (2015).
  2. "Powerful knockoffs via minimizing reconstructability" by Spector, Asher, and Lucas Janson (2020)
  3. "FANOK: Knockoffs in Linear Time" by Askari et al. (2020).
source
Knockoffs.solve_s_graphical_groupMethod
solve_s_graphical_group(Σ::Symmetric, groups::Vector{Int}, group_reps::Vector{Int},
method; [m], [verbose])

Solves the group knockoff problem but the convex optimization problem only runs on the representatives. The non-representative variables are assumed to be independent by groups when conditioning on the reprensetatives.

Inputs

  • Σ: Symmetric p × p covariance matrix
  • groups: p dimensional vector of group membership
  • group_reps: Indices for the representatives.
  • method: Method for solving group knockoff problem
  • m: Number of knockoffs to generate per feature
  • verbose: Whether to print informative intermediate results
  • kwargs...: extra arguments for solve_s_group

Outputs

  • S: Matrix obtained from solving the optimization problem on the representatives.
  • D: A p × p (dense) matrix corresponding to the S matrix for both the representative and non-representative variables. Knockoff sampling should use this matrix. If the graphical conditional independent assumption is satisfied exactly, this matrix should be sparse, but it is always never sparse unless we use cond_indep_corr to force the covariance matrix to satisify it.
  • obj: Objective value for solving the optimization problem on the representatives.
source
Knockoffs.solve_s_groupMethod
solve_s_group(Σ, groups, method; [m=1], kwargs...)

Solves the group knockoff problem, returns block diagonal matrix S satisfying (m+1)/m*Σ - S ⪰ 0 where m is number of knockoffs per feature.

Inputs

  • Σ: A general covariance matrix wrapped by Symmetric keyword
  • groups: Vector of group membership, does not need to be contiguous
  • method: Method for constructing knockoffs. Options include
    • :maxent: (recommended) for fully general maximum entropy group knockoffs
    • :mvr: for fully general minimum variance-based reconstructability (MVR) group knockoffs
    • :equi: for equi-correlated knockoffs. This is the methodology proposed in Dai R, Barber R. The knockoff filter for FDR control in group-sparse and multitask regression. International conference on machine learning 2016 Jun 11 (pp. 1851-1859). PMLR.
    • :sdp: Fully general SDP group knockoffs based on coodinate descent
    • :sdp_subopt: Chooses each block S_{i} = γ_i * Σ_{ii}. This slightly generalizes the equi-correlated group knockoff idea proposed in Dai and Barber 2016.
    • :sdp_block: Fully general SDP group knockoffs where each block is solved exactly using an interior point solver.
  • m: Number of knockoffs per variable, defaults to 1.
  • kwargs: Extra arguments available for specific methods. For example, to use less stringent convergence tolerance, specify tol = 0.001. For a list of available options, see solve_group_mvr_hybrid, solve_group_max_entropy_hybrid, solve_group_sdp_hybrid, or solve_group_equi

Output

  • S: A matrix solved so that (m+1)/m*Σ - S ⪰ 0 and S ⪰ 0
  • γ: A vector that is only non-empty for equi and suboptimal knockoff constructions. They correspond to values of γ where S_{gg} = γΣ_{gg}. So for equi, the vector is length 1. For SDP, the vector has length equal to number of groups
  • obj: Final SDP/MVR/ME objective value given S. Equi-correlated group knockoffs and singleton (non-grouped knockoffs) returns 0 because they either no objective value or it is not necessary to evaluate the objectives

Warning

This function potentially permutes the columns/rows of Σ, and puts them back at the end. Thus one should NOT call solve_s_group on the same Σ simultaneously, e.g. in a multithreaded for loop. Permutation does not happen when groups are contiguous.

source
Knockoffs.solve_sdp_ccdMethod
solve_sdp_ccd(Σ::AbstractMatrix)

Solves the SDP problem for fixed-X and model-X knockoffs using coordinate descent, given correlation matrix Σ. Users should call solve_s instead of this function.

Reference

Algorithm 2.2 from "FANOK: Knockoffs in Linear Time" by Askari et al. (2020).

source
Knockoffs.thresholdMethod
threshold(w::AbstractVector, q::Number, [method=:knockoff], [m::Int=1])

Chooses a threshold τ > 0 by choosing τ to be one of the following τ = min{ t > 0 : {#j: w[j] ≤ -t} / {#j: w[j] ≥ t} ≤ q } (method=:knockoff) τ = min{ t > 0 : (1 + {#j: w[j] ≤ -t}) / {#j: w[j] ≥ t} ≤ q } (method=:knockoff)

Inputs

  • w: Vector of feature important statistics
  • q: target FDR (between 0 and 1)
  • method: either :knockoff or :knockoff_plus (default)
  • rej_bounds: Number of values of top W to consider (default = 10000)

Reference:

Equation 3.10 (method=:knockoff) or 3.11 (method=:knockoff_plus) of "Panning for Gold: Model-X Knockoffs for High-dimensional Controlled Variable Selection" by Candes, Fan, Janson, and Lv (2018)

source
Knockoffs.update_normalizing_constants!Method
update_normalizing_constants!(Q::AbstractMatrix{T}, q::AbstractVector{T})

Computes normalizing constants recursively using equation (5).

Inputs

  • Q: K × K × p array. Q[:, :, j] is a K × K matrix of transition probabilities for jth state, i.e. Q[l, k, j] = P(X{j} = k | X{j - 1} = l). The first transition matrix is not used.
  • q: K × 1 vector of initial probabilities

todo: efficiency

source
Knockoffs.KnockoffFilterType

A KnockoffFilter is essentially a Knockoff that has gone through a feature selection procedure, such as the Lasso. It stores, among other things, the final estimated parameters beta after applying the knockoff-filter procedure.

The debiased variable is a boolean indicating whether estimated effect size have been debiased with Lasso. The W vector stores the feature importance statistic that satisfies the flip coin property. tau is the knockoff threshold, which controls the empirical FDR at level q

source
Knockoffs.MarkovChainTableType

Genotype states are index pairs (ka, kb) where ka, kb is unordered haplotype 1 and 2. If there are K=5 haplotype motifs, then the 15 possible genotype states and their index are

(1, 1) = 1 (1, 2) = 2 (2, 2) = 6 (1, 3) = 3 (2, 3) = 7 (3, 3) = 10 (1, 4) = 4 (2, 4) = 8 (3, 4) = 11 (4, 4) = 13 (1, 5) = 5 (2, 5) = 9 (3, 5) = 12 (4, 5) = 14 (5, 5) = 15

source