API

Here is a list of available function calls. A detailed description can be found below.

Knockoffs.Knockoff
Knockoffs.KnockoffFilter
Knockoffs.MarkovChainTable
Knockoffs.MK_statistics
Knockoffs.MK_statistics
Knockoffs.adj_constrained_hclust
Knockoffs.approx_modelX_gaussian_knockoffs
Knockoffs.block_diagonalize
Knockoffs.check_model_solution
Knockoffs.choose_group_reps
Knockoffs.cond_indep_corr
Knockoffs.condition
Knockoffs.fit_lasso
Knockoffs.fit_marginal
Knockoffs.fixed_knockoffs
Knockoffs.form_emission_prob_matrix
Knockoffs.forward_backward!
Knockoffs.forward_backward_sampling!
Knockoffs.full_knockoffscreen
Knockoffs.get_genotype_emission_probabilities
Knockoffs.get_genotype_transition_matrix
Knockoffs.get_haplotype_emission_probabilities
Knockoffs.get_haplotype_transition_matrix
Knockoffs.ghost_knockoffs
Knockoffs.group_block_objective
Knockoffs.hc_partition_groups
Knockoffs.hmm_knockoff
Knockoffs.hmm_knockoff
Knockoffs.initialize_S
Knockoffs.inverse_mat_sqrt
Knockoffs.ipad
Knockoffs.likelihood_ratio
Knockoffs.lowrankdowndate_turbo!
Knockoffs.lowrankupdate_turbo!
Knockoffs.markov_knockoffs
Knockoffs.mk_threshold
Knockoffs.modelX_gaussian_group_knockoffs
Knockoffs.modelX_gaussian_knockoffs
Knockoffs.modelX_gaussian_rep_group_knockoffs
Knockoffs.normalize_col!
Knockoffs.prioritize_variants
Knockoffs.rapid
Knockoffs.sample_mvn_efficient
Knockoffs.search_rank
Knockoffs.shift_until_PSD!
Knockoffs.simulate_AR1
Knockoffs.simulate_ER
Knockoffs.simulate_block_covariance
Knockoffs.single_linkage_distance
Knockoffs.single_state_dmc_knockoff!
Knockoffs.solve_MVR
Knockoffs.solve_SDP
Knockoffs.solve_equi
Knockoffs.solve_group_SDP_single_block
Knockoffs.solve_group_SDP_subopt
Knockoffs.solve_group_block_update
Knockoffs.solve_group_equi
Knockoffs.solve_group_max_entropy_hybrid
Knockoffs.solve_group_mvr_hybrid
Knockoffs.solve_group_sdp_hybrid
Knockoffs.solve_max_entropy
Knockoffs.solve_s
Knockoffs.solve_s_graphical_group
Knockoffs.solve_s_group
Knockoffs.solve_sdp_ccd
Knockoffs.threshold
Knockoffs.update_normalizing_constants!

Knockoffs.MK_statistics — Method

MK_statistics(T0::Vector, Tk::Vector{Vector}; filter_method)

Computes the multiple knockoff statistics kappa, tau, and W.

Inputs

T0: p-vector of importance score for original variables
Tk: Vector storing T1, ..., Tm, where Ti is importance scores for the ith knockoff copy
filter_method: Either Statistics.median (default) or max (original function used in 2019 Gimenez and Zou)

output

κ: Index of the most significant feature (κ[i] = 0 if original feature most important, otherwise κ[i] = k if the kth knockoff is most important)
τ: τ[i] stores the most significant statistic among original and knockoff variables minus filter_method() applied to the remaining statistics.
W: coefficient difference statistic W[i] = abs(T0[i]) - abs(Tk[i])

source

Knockoffs.MK_statistics — Method

MK_statistics(T0::Vector, Tk::Vector)

Compute regular knockoff statistics tau and W.

Inputs

T0: p-vector of importance score for original variables
Tk: p-vector of importance score for knockoff variables

output

W: coefficient difference statistic W[i] = abs(T0[i]) - abs(Tk[i])

source

Knockoffs.adj_constrained_hclust — Method

adj_constrained_hclust(distmat::AbstractMatrix, h::Number)

Performs (single-linkage) hierarchical clustering, forcing groups to be contiguous. After clustering, variables in different group is guaranteed to have distance less than h.

Note: this is a custom (bottom-up) implementation because Clustering.jl does not support adjacency constraints, see https://github.com/JuliaStats/Clustering.jl/issues/230

source

Knockoffs.approx_modelX_gaussian_knockoffs — Method

approx_modelX_gaussian_knockoffs(X, method; [m=1], [windowsize = 500], [covariance_approximator], kwargs...)
approx_modelX_gaussian_knockoffs(X, method, window_ranges; [m=1], [covariance_approximator], kwargs...)

Generates Gaussian knockoffs by approximating the covariance as a block diagonal matrix. Each block contains windowsize consecutive features. One could alternatively specify the window_ranges argument to construct blocks of different sizes.

Inputs

X: A n × p numeric matrix or SnpArray. Each row is a sample, and each column is covariate.
method: Can be one of the following
- :mvr for minimum variance-based reconstructability knockoffs (alg 1 in ref 2)
- :maxent for maximum entropy knockoffs (alg 2 in ref 2)
- :equi for equi-distant knockoffs (eq 2.3 in ref 1),
- :sdp for SDP knockoffs (eq 2.4 in ref 1)
- :sdp_fast for SDP knockoffs via coordiate descent (alg 2.2 in ref 3)
m: Number of knockoff copies per variable to generate, defaults to 1.
windowsize: Number of covariates to be included in a block. Each block consists of adjacent variables. The last block could contain less than windowsize variables.
window_ranges: Vector of ranges for each window. e.g. [1:97, 98:200, 201:500]
covariance_approximator: A covariance estimator, defaults to LinearShrinkage(DiagonalUnequalVariance(), :lw). See CovarianceEstimation.jl for more options.
kwargs...: Possible optional inputs to solvers specified in method, see solve_MVR, solve_max_entropy, and solve_sdp_ccd

Multithreading (todo)

To enable multiple threads, simply start Julia with >1 threads and this routine will run with all available threads.

Covariance Approximation:

The covariance is approximated by a LinearShrinkageEstimator using Ledoit-Wolf shrinkage with DiagonalUnequalVariance target, which seems to perform well for p>n cases. We do not simply use cov(X) since isposdef(cov(X)) is typically false. For comparison of different estimators, see: https://mateuszbaran.github.io/CovarianceEstimation.jl/dev/man/msecomp/#msecomp

source

Knockoffs.block_diagonalize — Method

block_diagonalize(Σ, groups)

Internal function to block-diagonalize the covariance Σ according to groups.

source

Knockoffs.check_model_solution — Method

check_model_solution(model; verbose=false)

After solving a JuMP model, checks if the solution is accurate.

source

Knockoffs.choose_group_reps — Method

choose_group_reps(Σ::Symmetric, groups::AbstractVector; [threshold=0.5], [prioritize_idx], [Σinv])

Chooses group representatives. Returns indices of Σ that are representatives. If R is the set of selected variables within a group and O is the set of variables outside the group, then we keep adding variables to R until the proportion of variance explained by R divided by the proportion of variance explained by R and O exceeds threshold.

Inputs

Σ: Correlation matrix wrapped in the Symmetric argument.
groups: Vector of group membership.

Optional inputs

threshold: Value between 0 and 1 that controls the number of representatives per group. Larger means more representatives (default 0.5)
prioritize_idx: Variable indices that should receive priority to be chosen as representatives, defaults to nothing
Σinv: Precomputed inv(Σ) (it will be computed if not supplied)

source

Knockoffs.cond_indep_corr — Method

Returns Σnew as a covariance matrix that strictly satisfies the conditional independence assumption.

source

Knockoffs.condition — Method

condition(x::AbstractVector, μ::AbstractVector, Σ::AbstractMatrix, S::AbstractMatrix, [m::Number=1])

Samples a knockoff x̃ from Gaussian x using conditional distribution formulas:

If (x, x̃) ~ N((μ, μ), G) where G = [Σ Σ - S; Σ - S Σ], then we sample x̃ from x̃|x = N(μ+(Σ-S)inv(Σ)(x-μ) , 2S-Sinv(Σ)S).

If we sample m knockoffs, we use the algorithm in "Improving the Stability of the Knockoff Procedure: Multiple Simultaneous Knockoffs and Entropy Maximization" by Gimenez and Zou.

Inputs

X: A n × p numeric matrix, each row is a sample, and each column is covariate.
μ: A p × 1 vector of column mean of X
Σ: A p × p covariance matrix of X
S: A p × p matrix solved to satisfy S ⪰ 0 and (m+1)/m*Σ - S ⪰ 0
m: Number of (simultaneous) knockoffs per variable to generate, default m=1

Output

X̃: A n × pm numeric matrix. The first p columns store the first knockoff copy, and the next p columns store the second knockoff...etc

Todo

When s is the zero vector, X̃ should be identical to X but it isn't
Consider changing sampling code to using Distribution's MvNormal
For multiple knockoffs, can we avoid storing a pm × pm matrix in memory?

source

Knockoffs.fit_lasso — Method

fit_lasso(y, X, [method], [d], [m], [fdrs], [groups], [filter_method], 
    [debias], [kwargs...])
fit_lasso(y, X, μ, Σ, [method], [d], [m], [fdrs], [groups], [filter_method], 
    [debias], [kwargs...])

Generates model-X knockoffs with method, runs Lasso, then applies the knockoff-filter. If μ and Σ are not provided, they will be estimated from data.

Inputs

y: A n × 1 response vector
X: A n × p numeric matrix, each row is a sample, and each column is covariate.
method: Method for knockoff generation (defaults to :maxent)
μ: A p × 1 vector of column mean of X. If not provided, defaults to column mean.
Σ: A p × p covariance matrix of X. If not provided, it will be estimated based on a shrinked empirical covariance matrix, see modelX_gaussian_knockoffs
d: Distribution of response. Defaults Normal(), for binary response (logistic regression) use Binomial().
m: Number of simultaneous knockoffs to generate, defaults to m=1
fdrs: Target FDRs, defaults to [0.01, 0.05, 0.1, 0.25, 0.5]
groups: Vector of group membership. If not supplied, we generate regular knockoffs. If supplied, we run group knockoffs.
filter_method: Choices are :knockoff or :knockoff_plus (default)
debias: Defines how the selected coefficients are debiased. Specify :ls for least squares or :lasso for Lasso (only running on the support). To not debias, specify debias=nothing (default).
kwargs: Additional arguments to input into glmnetcv and glmnet

source

Knockoffs.fit_marginal — Method

fit_marginal(y, X, method=:maxent, ...)
fit_marginal(y, X, μ, Σ, method=:maxent, ...)

Generates model-X knockoffs with method and computes feature importance statistics based on squared marginal Z score: abs2(x[:, i]^t*y) / n. If μ and Σ are not provided, they will be estimated from data.

Inputs

y: A n × 1 response vector
X: A n × p numeric matrix, each row is a sample, and each column is covariate.
method: Method for knockoff generation (defaults to :maxent)
μ: A p × 1 vector of column mean of X. If not provided, defaults to column mean.
Σ: A p × p covariance matrix of X. If not provided, it will be estimated based on a shrinked empirical covariance matrix, see modelX_gaussian_knockoffs
d: Distribution of response. Defaults Normal(), for binary response (logistic regression) use Binomial().
m: Number of simultaneous knockoffs to generate, defaults to m=1
fdrs: Target FDRs, defaults to [0.01, 0.05, 0.1, 0.25, 0.5]
groups: Vector of group membership. If not supplied, we generate regular knockoffs. If supplied, we run group knockoffs.
filter_method: Choices are :knockoff or :knockoff_plus (default)
debias: Defines how the selected coefficients are debiased. Specify :ls for least squares or :lasso for Lasso (only running on the support). To not debias, specify debias=nothing (default).
kwargs: Additional arguments to input into glmnetcv and glmnet

source

Knockoffs.fixed_knockoffs — Method

fixed_knockoffs(X::Matrix{T}; [method], [kwargs...])

Creates fixed-X knockoffs. Internally, X will be automatically normalized before computing its knockoff.

Inputs

X: A column-normalized n × p numeric matrix, each row is a sample, and each column is covariate. We will internally normalized X if it is not.
method: Can be one of the following
- :mvr: Minimum variance-based reconstructability knockoffs (alg 1 in ref 2)
- :maxent: Maximum entropy knockoffs (alg 2 in ref 2)
- :equi: Equi-distant knockoffs (eq 2.3 in ref 1),
- :sdp: SDP knockoffs (eq 2.4 in ref 1)
- :sdp_fast: SDP knockoffs via coordiate descent (alg 2.2 in ref 3)
kwargs...: Possible optional inputs to method, see solve_MVR, solve_max_entropy, and solve_sdp_ccd

Output

GaussianKnockoff: A struct containing the original (column-normalized) X and its knockoff X̃, in addition to other variables (e.g. s)

Reference

"Controlling the false discovery rate via Knockoffs" by Barber and Candes (2015).
"Powerful knockoffs via minimizing reconstructability" by Spector, Asher, and Lucas Janson (2020)
"FANOK: Knockoffs in Linear Time" by Askari et al. (2020).

source

Knockoffs.form_emission_prob_matrix — Method

form_emission_prob_matrix(a, θ, xi::AbstractVector)

Inputs

a: p × K matrix with values estimated from fastPHASE (i.e. they called it the α parameter)
θ: p × K matrix with values estimated from fastPHASE
xi: Length p vector with sample i's genotypes (entries 0, 1 or 2)

source

Knockoffs.forward_backward! — Function

forward_backward!(x, L, y, storage=zeros(length(x)))

Non-allocating solver for finding x to the solution of LL'x = y where L is a cholesky factor.

source

Knockoffs.forward_backward_sampling! — Method

forward_backward_sampling!(Z, X, Q, q, θ, ...)

Samples Z, the hidden states of a HMM, from observed sequence of unphased genotypes X.

Inputs

Z: Length p vector of integers. This will store the sampled Markov states X: Length p vector of genotypes (0, 1, or 2) Q: K × K × p array. Q[:, :, j] is a K × K matrix of transition probabilities for jth state, i.e. Q[l, k, j] = P(X{j} = k | X{j - 1} = l). The first transition matrix is not used. q: Length K vector of initial probabilities θ: The θ parameter estimated from fastPHASE

Preallocated storage variables

table: a MarkovChainTable that maps markov chain states to haplotype pairs (ka, kb) d: Sampling distribution, probabilities in d.p are mutated α̂: p × K scaled forward probability matrix, where α̂[j, k] = P(x_1,...,x_k, z_k) / P(x_1,...,x_k) c: normalizing constants, c[k] = p(x_k | x_1,...,x_{k-1})

Reference

Algorithm 3 of "Gene hunting with hidden Markov model knockoffs" by Sesia et al

source

Knockoffs.full_knockoffscreen — Method

full_knockoffscreen(x::SnpArray; windowsize::Int=100)

Generates knockoffs X̃ⱼ by on regressing Xⱼ on SNPs knockoffs within a sliding window of width windowsize.

Inputs

x: A SnpArray or String for the path of the PLINK .bed file
windowsize: Int specifying window width. Defaults to 100

Outputs

X̃: A n × p dense matrix of Float64, each row is a sample.

References

He, Zihuai, Linxi Liu, Chen Wang, Yann Le Guen, Justin Lee, Stephanie Gogarten, Fred Lu et al. "Identification of putative causal loci in whole-genome sequencing data via knockoff statistics." Nature communications 12, no. 1 (2021): 1-18.
He, Zihuai, Yann Le Guen, Linxi Liu, Justin Lee, Shiyang Ma, Andrew C. Yang, Xiaoxia Liu et al. "Genome-wide analysis of common and rare variants via multiple knockoffs at biobank scale, with an application to Alzheimer disease genetics." The American Journal of Human Genetics 108, no. 12 (2021): 2336-2353.

TODO

Use ElasticArrays.jl to avoid reallocating design matrix in each loop
Write iterator interface to avoid allocating and storing all knockoffs at once

source

Knockoffs.get_genotype_emission_probabilities — Method

get_genotype_emission_probabilities(θ::AbstractMatrix, xj::Number, ka::Int, kb::Int, j::Int)

Computes P(xj | k={ka,kb}, θ): emission probabilities for genotypes. This is eq 10 of "Gene hunting with hidden Markov model knockoffs" by Sesia et al.

source

Knockoffs.get_genotype_transition_matrix — Method

get_genotype_transition_matrix(r, θ, α, q, table)

Compute transition matrices for the hidden Markov chains in unphased genotypes. This is equation 9 of "Gene hunting with hidden Markov model knockoffs" by Sesia et al.

Inputs

r: Length p vector, the "recombination rates" θ: Size p × K matrix, θ[j, k] is probability that the allele is 1 for SNP p at kth haplotype motif α: Size p × K matrix, probabilities that haplotype motifs succeed each other. Rows should sum to 1. q: Length K vector of initial probabilities table: a MarkovChainTable that maps markov chain states k = 1, ..., K+(K+1)/2 to haplotype pairs (ka, kb).

source

Knockoffs.get_haplotype_emission_probabilities — Method

get_haplotype_emission_probabilities(θ::AbstractMatrix, j::Int, hj::Number, zj::Int)

Computes emission probabilities for unphased HMM. This is the equation above eq8 of "Gene hunting with hidden Markov model knockoffs" by Sesia et al.

source

Knockoffs.get_haplotype_transition_matrix — Method

get_haplotype_transition_matrix(r, θ, α)

Compute transition matrices for the hidden Markov chains in haplotypes. This is 2 equations above eq8 in "Gene hunting with hidden Markov model knockoffs" by Sesia et al.

Inputs

Output

Q: A p-dimensional vector of K × K matrices. Q[:, :, j] is the jth transition matrix.

source

Knockoffs.ghost_knockoffs — Method

ghost_knockoffs(Zscores, D, Σinv; [m=1])

Generate Ghost knockoffs given a list of z-scores (GWAS summary statistic).

Inputs

Zscores: List of z-score statistics
D: Matrix obtained from solving the knockoff problem satisfying (m+1)/m*Σ - D ⪰ 0
Σinv: Inverse of the covariance matrix

optional inputs

m: Number of knockoffs

Reference

He, Z., Liu, L., Belloy, M. E., Le Guen, Y., Sossin, A., Liu, X., ... & Ionita-Laza, I. (2021). Summary statistics knockoff inference empowers identification of putative causal variants in genome-wide association studies.

source

Knockoffs.group_block_objective — Method

group_block_objective(Σ, S, groups, m, method)

Evaluate the objective for SDP/MVR/ME. This is not an efficient function, so it should only be called at the start of each algorithm.

Inputs

Σ: Covariance or correlation matrix for original data
S: Optimization variable (group-block-diagonal)
groups: Vector of group membership. Variable i belongs to group groups[i]
m: Number of knockoffs to generate for each variable
method: The optimization method for group knockoffs

source

Knockoffs.hc_partition_groups — Method

hc_partition_groups(X::AbstractMatrix; [cutoff], [min_clusters], [force_contiguous])
hc_partition_groups(Σ::Symmetric; [cutoff], [min_clusters], [force_contiguous])

Computes a group partition based on individual level data X or correlation matrix Σ using hierarchical clustering with specified linkage.

Inputs

X: n × p data matrix. Each row is a sample
Σ: p × p correlation matrix. Must be wrapped in the Symmetric argument, otherwise we will treat it as individual level data
cutoff: Height value for which the clustering result is cut, between 0 and 1 (default 0.5). This ensures that no variables between 2 groups have correlation greater than cutoff. 1 recovers ungrouped structure, 0 corresponds to everything in a single group.
min_clusters: The desired number of clusters.
linkage: cluster linkage function to use (when force_contiguous=true, linkage must be :single). linkage defines how the distances between the data points are aggregated into the distances between the clusters. Naturally, it affects what clusters are merged on each iteration. The valid choices are:
- :single (default): use the minimum distance between any of the cluster members
- :average: use the mean distance between any of the cluster members
- :complete: use the maximum distance between any of the members
- :ward: the distance is the increase of the average squared distance of a point to its cluster centroid after merging the two clusters
- :ward_presquared: same as :ward, but assumes that the distances in d are already squared.
rep_method: Method for selecting representatives for each group. Options are :id (tends to select roughly independent variables) or :rss (tends to select more correlated variables)

If force_contiguous = false and both min_clusters and cutoff are specified, it is guaranteed that the number of clusters is not less than min_clusters and their height is not above cutoff. If force_contiguous = true, min_clusters keyword is ignored.

Outputs

groups: Length p vector of group membership for each variable
group_reps: Columns of X selected as representatives. Each group have at most nrep representatives. These are typically used to construct smaller group knockoff for extremely large groups

source

Knockoffs.hmm_knockoff — Method

hmm_knockoff(plinkname; [datadir], [plink_outfile], [fastphase_outfile], [outdir], [verbose], args...)

Generates HMM knockoffs from binary PLINK formatted files. This is done by first running fastPHASE, then running Algorithm 2 of "Gene hunting with hidden Markov model knockoffs" by Sesia, Sabatti, and Candes

Input

plinkname: Binary PLINK file names without the .bed/.bim/.fam suffix.

Optional arguments

datadir: Full path to the PLINK and fastPHASE files (default = current directory)
plink_outfile: Output PLINK format name
fastphase_outfile: The output file name from fastPHASE's alpha, theta, r files
args...: Any parameter that accepted in fastPHASE.fastphase_estim_param()

Output

plink_outfile.bed: n × p knockoff genotypes
plink_outfile.bim: SNP mapping file. Knockoff have SNP names ending in ".k"
plink_outfile.fam: Sample mapping file, this is a copy of the original plinkname.fam file
fastphase_outfile_rhat.txt: averaged r hat file from fastPHASE
fastphase_outfile_alphahat.txt: averaged alpha hat file from fastPHASE
fastphase_outfile_thetahat.txt: averaged theta hat file from fastPHASE

source

Knockoffs.hmm_knockoff — Method

hmm_knockoff(snpdata::SnpData, r::AbstractVecOrMat, θ::AbstractMatrix, α::AbstractMatrix)

Generates knockoff of snpdata with loaded r, θ, α

Input

SnpData: A SnpData object from SnpArrays
r: The r vector estimated by fastPHASE
θ: The θ matrix estimated by fastPHASE
α: The α matrix estimated by fastPHASE

Optional Inputs

outdir: Output directory for generated knockoffs
plink_outfile: Output file name for knockoff genotypes
estimate_δ: If true, will estimate pseudo-FDR by computing a δ value for each SNP via likelihood ratio bound

source

Knockoffs.initialize_S — Function

initialize_S(Σ, groups, m, method, verbose)

Internal function to help initialize S to a good starting value, returns the final S matrix as well as the cholesky factorizations L and C where

L.LL.U = cholesky((m+1)/mΣ - S)
C.L*C.U = cholesky(S)

source

Knockoffs.inverse_mat_sqrt — Method

Computes A^{-1/2} via eigen-decomposition

source

Knockoffs.ipad — Method

ipad(X::Matrix; [r_method], [m])

Generates knockoffs based on intertwined probabilitistic factors decoupling (IPAD). This assumes that X can be factored as X = FΛ' + E where F is a n × r random matrix of latent factors, Λ are factor loadings, and E are residual errors. When this assumption is met, FDR can be controlled with no power loss when applying the knockoff procedure. Internally, we need to compute an eigenfactorization for a n × n matrix. This is often faster than standard model-X knockoffs which requires solving p-dimensional convex optimization problem.

Inputs

X: A n × p numeric matrix, each row is a sample, and each column is covariate.
r_method: Method used for estimating r, the number of latent factors. Choices include :er (default), :gr, or :ve
m: Number of (simultaneous) knockoffs per variable to generate, default m=1

References

Fan, Y., Lv, J., Sharifvaghefi, M. and Uematsu, Y., 2020. IPAD: stable interpretable forecasting with knockoffs inference. Journal of the American Statistical Association, 115(532), pp.1822-1834.
Bai, J., 2003. Inferential theory for factor models of large dimensions. Econometrica, 71(1), pp.135-171.
Ahn, S.C. and Horenstein, A.R., 2013. Eigenvalue ratio test for the number of factors. Econometrica, 81(3), pp.1203-1227.

source

Knockoffs.likelihood_ratio — Method

likelihood_ratio(θa, θb, ρ; α=0.1, n = 1000, threshold = true)

Estimates the likelihood ratio bound log(P(x)Q(x̃) / Q(x)P(x̃)) for each a single HMM state (fixed i and j)

θa: State 1 of genotype markov state
θb: State 2 of genotype markov state
ρ is maf of SNP
α is % SNPs decorrelated (defaults 10%)

source

Knockoffs.lowrankdowndate_turbo! — Method

lowrankdowndate_turbo!(C::Cholesky, v::AbstractVector)

Vectorized version of lowrankdowndate!, source https://github.com/JuliaLang/julia/blob/742b9abb4dd4621b667ec5bb3434b8b3602f96fd/stdlib/LinearAlgebra/src/cholesky.jl#L753 Takes advantage of the fact that v is 0 everywhere except at 1 position

source

Knockoffs.lowrankupdate_turbo! — Method

lowrankupdate_turbo!(C::Cholesky, v::AbstractVector)

Vectorized version of lowrankupdate!, source https://github.com/JuliaLang/julia/blob/742b9abb4dd4621b667ec5bb3434b8b3602f96fd/stdlib/LinearAlgebra/src/cholesky.jl#L707 Takes advantage of the fact that v is 0 everywhere except at 1 position

source

Knockoffs.markov_knockoffs — Method

markov_knockoffs(Z::Vector{Int}, Q::Array{T, 3}, q::Vector{T})

Generates knockoff of variables distributed as a discrete Markov Chain with K states.

Inputs

Z: Length p vector of Int where Z[i] is the ith state
Q: K × K × p array. Q[:, :, j] is a K × K matrix of transition probabilities for jth state, i.e. Q[l, k, j] = P(X{j} = k | X{j - 1} = l) The first transition matrix is not used.
q: K × 1 vector of initial probabilities

Reference

Equations 4-5 of "Gene hunting with hidden Markov model knockoffs" by Sesia, Sabatti, and Candes

source

Knockoffs.mk_threshold — Method

mk_threshold(τ::Vector{T}, κ::Vector{Int}, m::Int, q::Number)

Chooses the multiple knockoff threshold τ̂ > 0 by setting τ̂ = min{ t > 0 : (1/m + 1/m * {#j: κ[j] ≥ 1 and W[j] ≥ t}) / {#j: κ[j] == 0 and W[j] ≥ τ̂} ≤ q }.

Inputs

τ: τ[i] stores the feature importance score for the ith feature, i.e. the value T0 - median(T1,...,Tm). Note in Gimenez and Zou, the max function is used instead of median
κ: κ[i] stores which of m knockoffs has largest importance score. When original variable has largest score, κ[i] == 0.
m: Number of knockoffs per variable generated
q: target FDR (between 0 and 1)
rej_bounds: Number of values of top τ to consider (default = 10000)

Reference:

Equations 8 and 9 in supplement of "Identification of putative causal loci in wholegenome sequencing data via knockoff statistics" by He et al.
Algorithm 1 of "Improving the Stability of the Knockoff Procedure: Multiple Simultaneous Knockoffs and Entropy Maximization" by Gimenez and Zou.

source

Knockoffs.modelX_gaussian_group_knockoffs — Method

modelX_gaussian_group_knockoffs(X, method, groups, μ, Σ; [m], [covariance_approximator], [kwargs])
modelX_gaussian_group_knockoffs(X, method, groups; [m], [covariance_approximator], [kwargs])

Constructs Gaussian model-X group knockoffs. If the covariance Σ and mean μ are not specified, they will be estimated from data, i.e. we will make second-order group knockoffs. To incorporate group structure, the (true or estimated) covariance matrix is block-diagonalized according to groups membership to solve a relaxed optimization problem. See reference paper and Knockoffs.jl docs for more details.

Inputs

X: A n × p design matrix. Each row is a sample, each column is a feature.
method: Method for constructing knockoffs. Options include
- :maxent: (recommended) for fully general maximum entropy group knockoffs
- :mvr: for fully general minimum variance-based reconstructability (MVR) group knockoffs
- :equi: for equi-correlated knockoffs. This is the methodology proposed in Dai R, Barber R. The knockoff filter for FDR control in group-sparse and multitask regression. International conference on machine learning 2016 Jun 11 (pp. 1851-1859). PMLR.
- :sdp: Fully general SDP group knockoffs based on coodinate descent
- :sdp_block: Fully general SDP group knockoffs where each block is solved exactly using an interior point solver.
- :sdp_subopt: Chooses each block S_{i} = γ_i * Σ_{ii}. This slightly generalizes the equi-correlated group knockoff idea proposed in Dai and Barber 2016.
groups: Vector of group membership
μ: A length p vector storing the true column means of X
Σ: A p × p covariance matrix for columns of X
m: Number of knockoffs per variable, defaults to 1.
covariance_approximator: A covariance estimator, defaults to LinearShrinkage(DiagonalUnequalVariance(), :lw). See CovarianceEstimation.jl for more options.
kwargs: Extra keyword arguments for solve_s_group

How to define groups

The exported functions hc_partition_groups can be used to build a group membership vector.

A note on compute time

The computational complexity of group knockoffs scales quadratically with group size. Thus, very large groups (e.g. >100 members per group) dramatically slows down parameter estimation. In such cases, one can consider running the routine modelX_gaussian_rep_group_knockoffs which constructs group knockoffs by choosing top representatives from each group.

Reference

Dai & Barber 2016, The knockoff filter for FDR control in group-sparse and multitask regression

source

Knockoffs.modelX_gaussian_knockoffs — Method

modelX_gaussian_knockoffs(X::Matrix, method::Symbol; [m], [covariance_approximator], [kwargs...])
modelX_gaussian_knockoffs(X::Matrix, method::Symbol, μ::Vector, Σ::Matrix; [m], [kwargs...])

Creates model-free multivariate normal knockoffs by sequentially sampling from conditional multivariate normal distributions. The true mean μ and covariance Σ is estimated from data if not supplied.

Inputs

X: A n × p numeric matrix, each row is a sample, and each column is covariate.
method: Can be one of the following
- :mvr for minimum variance-based reconstructability knockoffs (alg 1 in ref 2)
- :maxent for maximum entropy knockoffs (alg 2 in ref 2)
- :equi for equi-distant knockoffs (eq 2.3 in ref 1),
- :sdp for SDP knockoffs (eq 2.4 in ref 1)
- :sdp_ccd for SDP knockoffs via coordiate descent (alg 2.2 in ref 3)
μ: A p × 1 vector of column mean of X, defaults to column mean
Σ: A p × p matrix of covariance of X, defaults to a shrinkage estimator specified by covariance_approximator.
m: Number of knockoff copies per variable to generate, defaults to 1.
covariance_approximator: A covariance estimator, defaults to LinearShrinkage(DiagonalUnequalVariance(), :lw) which tends to give good empirical performance when p>n. See CovarianceEstimation.jl for more options.
kwargs...: Possible optional inputs to solvers specified in method, see solve_MVR, solve_max_entropy, and solve_sdp_ccd

Reference:

"Panning for Gold: Model-X Knockoffs for High-dimensional Controlled Variable Selection" by Candes, Fan, Janson, and Lv (2018)
"Powerful knockoffs via minimizing reconstructability" by Spector, Asher, and Lucas Janson (2020)
"FANOK: Knockoffs in Linear Time" by Askari et al. (2020).

Covariance Approximation:

The covariance is approximated by a linear shrinkage estimator using Ledoit-Wolf with DiagonalUnequalVariance target, which seems to perform well for p>n cases. We do not simply use cov(X) since isposdef(cov(X)) is typically false. For comparison of various estimators, see: https://mateuszbaran.github.io/CovarianceEstimation.jl/dev/man/msecomp/#msecomp

source

Knockoffs.modelX_gaussian_rep_group_knockoffs — Method

modelX_gaussian_rep_group_knockoffs(X, method, groups; [m], [covariance_approximator], [kwargs...])
modelX_gaussian_rep_group_knockoffs(X, method, groups, μ, Σ; [m], [kwargs...])

Constructs group knockoffs by choosing representatives from each group and solving a smaller optimization problem based on the representatives only. Remaining knockoffs are generated based on a conditional independence assumption similar to a graphical model (details to be given later). The representatives are computed by choose_group_reps

Inputs

X: A n × p design matrix. Each row is a sample, each column is a feature.
method: Method for constructing knockoffs. Options are the same as modelX_gaussian_group_knockoffs
groups: Vector of Int denoting group membership. groups[i] is the group of X[:, i]
covariance_approximator: A covariance estimator, defaults to LinearShrinkage(DiagonalUnequalVariance(), :lw). See CovarianceEstimation.jl for more options.
μ: A length p vector storing the true column means of X
Σ: A p × p covariance matrix for columns of X
rep_threshold: Value between 0 and 1 that controls the number of representatives per group. Larger means more representatives (default 0.5)
m: Number of knockoffs per variable, defaults to 1.
kwargs: Extra keyword arguments for solve_s_group

source

Knockoffs.normalize_col! — Method

normalize_col!(X::AbstractVecOrMat, [center=false])

Normalize each column of X so they sum to 1.

source

Knockoffs.prioritize_variants — Method

prioritize_variants!(index::AbstractVector, priority_vars::AbstractVector)

Given (unsorted) index, we make variables in priority_vars appear first in index, preserving the original order in index and those not in priority_vars.

Example

index = [11, 4, 5, 9, 7]
priority_vars = [4, 9]
result = prioritize_variants(index, priority_vars)
result == [4, 9, 11, 5, 7]

source

Knockoffs.rapid — Method

rapid(rapid_exe, vcffile, mapfile, d, outfolder, w, r, s, [a])

Wrapper for the RaPID program.

Inputs

rapid_exe: Full path to the RaPID_v.1.7 executable file
vcffile: Phased VCF file name
mapfile: Map file name
d: Actual Minimum IBD length in cM
outfolder: Output folder name
w: Number of SNPs in a window for sub-sampling
r: Number of runs
s: Minimum number of successes to consider a hit

Optional Inputs

a: If true, ignore MAFs. By default (a=false) the sites are selected at random weighted by their MAFs.

source

Knockoffs.sample_mvn_efficient — Method

sample_mvn_efficient(C::AbstractMatrix{T}, D::AbstractMatrix{T}, m::Int)

Efficiently samples from N(0, A) where

\[\begin{aligned} A &= \begin{pmatrix} C & C-D & \cdots & C-D\\ C-D & C & \cdots & C-D\\ \vdots & & \ddots & \vdots\\ C-D & C-D & & C \end{pmatrix} \end{aligned}\]

Note there are m blocks per row/col

source

Knockoffs.search_rank — Function

search_rank(A::AbstractMatrix, sk::Vector{Int}, target=0.25, verbose=false)

Finds the rank (number of columns of A) that best approximates the remaining columns such that regressing each remaining variable on those selected has RSS less than some target.

Σ: Original (p × p) correlation matrix
A: The (upper triangular) cholesky factor of Σ
sk: The (unsorted) columns of A, earlier ones are more important
target: Target residual level

note: we cannot do binary search because large ranks can increase residuals

source

Knockoffs.shift_until_PSD! — Function

shift_until_PSD!(Σ::AbstractMatrix)

Keeps adding λI to Σ until the minimum eigenvalue > tol

source

Knockoffs.simulate_AR1 — Method

simulate_AR1(p::Int, a=1, b=1, tol=1e-3, max_corr=1, rho=nothing)

Generates p-dimensional correlation matrix for AR(1) Gaussian process, where successive correlations are drawn from Beta(a,b) independently. If rho is specified, then the process is stationary with correlation rho.

Source

https://github.com/amspector100/knockpy/blob/20eddb3eb60e0e82b206ec989cb936e3c3ee7939/knockpy/dgp.py#L61

source

Knockoffs.simulate_ER — Method

simulate_ER(p::Int; [invert])

Simulates a covariance matrix from a clustered Erdos-Renyi graph, which is a block diagonal matrix where each block is an Erdo-Renyi graph. The result is scaled back to a correlation matrix.

For details, see the 4th simulation routine in section 5.1 of Li and Maathius https://academic.oup.com/jrsssb/article/83/3/534/7056103?login=false

Inputs

p: Dimension of covariance matrix
ϕ: Probability of forming an edge between any 2 nodes
lb: lower bound for the value of an edge (drawn from uniform distribution)
ub: upper bound for the value of an edge (drawn from uniform distribution)
invert: Whther to invert the covariance matrix (to obtain the precision)
λmin: minimum eigenvalue of the resulting covariance matrix
blocksize: Number of variables within each ER graph.

source

Knockoffs.simulate_block_covariance — Method

simulate_block_covariance(groups, ρ, γ, num_v, w)

Simulates a block covariance matrix similar to the one in Dai & Barber 2016, The knockoff filter for FDR control in group-sparse and multitask regression. That is, all diagonal elements will be 1, correlation within groups will be ρ, and correlation between groups will be ρ*γ.

Inputs

groups: Vector of group membership
ρ: within group correlation
γ: between group correlation

Optional arguments

num_v: Number of added rank 1 update Σ + v1*v1' + ... + vn*vn' where v is iid N(0, w) (default 0)
w: variance of the rank 1 update used in num_v (default 1)

source

Knockoffs.single_linkage_distance — Method

single_linkage_distance(distmat, left, right)

Computes the minimum distance (i.e. single-linkage distance) between members in left and members in right. Member distances are precomputed in distmat

source

Knockoffs.single_state_dmc_knockoff! — Method

Samples Zj, the j state of the hidden Markov chain.

source

Knockoffs.solve_MVR — Method

solve_MVR(Σ::AbstractMatrix)

Solves the minimum variance-based reconstructability problem for fixed-X and model-X knockoffs given correlation matrix Σ. Users should call solve_s instead of this function.

See algorithm 1 of "Powerful knockoffs via minimizing reconstructability" by Spector, Asher, and Lucas Janson (2020) https://arxiv.org/pdf/2011.14625.pdf

source

Knockoffs.solve_SDP — Method

solve_SDP(Σ::AbstractMatrix)

Solves the SDP problem for fixed-X and model-X knockoffs given correlation matrix Σ. Users should call solve_s instead of this function.

The optimization problem is stated in equation 3.13 of https://arxiv.org/pdf/1610.02351.pdf

Arguments

Σ: A correlation matrix (diagonals all equal to 1)
m: Number of knockoffs to generate, defaults to 1
optm: SDP solver. Defaults to Hypatia.Optimizer(verbose=false). This can be any solver that supports the JuMP interface. For example, use SDPT3.Optimizer in SDPT3.jl package (which is a MATLAB dependency) for the best performance.

source

Knockoffs.solve_equi — Method

solve_equi(Σ::AbstractMatrix)

Solves the equicorrelated problem for fixed-X and model-X knockoffs given correlation matrix Σ. Users should call solve_s instead of this function.

source

Knockoffs.solve_group_SDP_single_block — Method

solve_group_SDP_single_block(Σ11, ub)

Solves a single block of the fully general group SDP problem. The objective is min sum_{i,j} |Σ[i,j] - S[i,j]| s.t. 0 ⪯ S ⪯ A11 - [A12 A13]inv(A22-S2 A32; A23 A33-S3)[A21; A31]

Inputs

Σ11: The block corresponding to the current group. Must be a correlation matrix.
ub: The matrix defined as A11 - [A12 A13]inv(A22-S2 A32; A23 A33-S3)[A21; A31]
optm: Any solver compatible with JuMP.jl

source

Knockoffs.solve_group_SDP_subopt — Method

Solves the SDP group knockoff problem using analogy to the equi-correlated group knockoffs. Basically, the idea is to optimize a vector γ where γ[j] multiplies Σ_jj. In the equi-correlated setting, all γ[j] is forced to be equal.

Details can be found in Dai & Barber 2016, The knockoff filter for FDR control in group-sparse and multitask regression

source

Knockoffs.solve_group_block_update — Method

Todo

somehow avoid reallocating ub every iteration
When solving each individual block,
- warmstart
- avoid reallocating S1_new
- allocate vector of models
- use loose convergence criteria
For singleton groups, don't use JuMP and directly update
Currently all objective values are computed based on SDP case. Need to display objective values for ME/MVR objective

source

Knockoffs.solve_group_equi — Method

Solves the equi-correlated group knockoff problem. Here Σ is the true covariance matrix (scaled so that it has 1 on its diagonal) and Σblocks is the block-diagonal covariance matrix where each block corresponds to groups.

Details can be found in Dai & Barber 2016, The knockoff filter for FDR control in group-sparse and multitask regression

source

Knockoffs.solve_group_max_entropy_hybrid — Method

solve_group_max_entropy_hybrid(Σ, groups, [outer_iter=100], [inner_pca_iter=1],
    [inner_ccd_iter=1], [tol=0.0001], [ϵ=1e-6], [m=1], [robust=false], [verbose=false])

Solves the group-knockoff optimization problem based on Maximum Entropy objective. Users should call solve_s_group instead of this function.

Inputs

Σ: Correlation matrix
groups: Group membership vector

Optional inputs

outer_iter: Maximum number of outer iterations. Each outer iteration will perform inner_pca_iter PCA updates inner_ccd_iter full optimization updates (default = 100).
inner_pca_iter: Number of full PCA updates before changing to fully general coordinate descent updates (default = 1)
inner_ccd_iter: Number of full general coordinate descent updates before changing to PCA updates (default = 1)
tol: convergence tolerance. Algorithm converges when abs((obj_new-obj_old)/obj_old) < tol OR when changes in S matrix falls below 1e-4
ϵ: tolerance added to the lower and upper bound, prevents numerical issues (default = 1e-6)
m: Number of knockoffs per variable (defaults 1)
robust: whether to use "robust" Cholesky updates. If robust=true, alg will be ~10x slower, only use this if robust=false causes cholesky updates to fail. (default false)
verbose: Whether to print intermediate results (default false)

source

Knockoffs.solve_group_mvr_hybrid — Method

solve_group_mvr_hybrid(Σ, groups, [outer_iter=100], [inner_pca_iter=1],
    [inner_ccd_iter=1], [tol=0.0001], [ϵ=1e-6], [m=1], [robust=false], [verbose=false])

Solves the group-knockoff optimization problem based on MVR objective. Users should call solve_s_group instead of this function.

Inputs

Σ: Correlation matrix
groups: Group membership vector

Optional inputs

outer_iter: Maximum number of outer iterations. Each outer iteration will perform inner_pca_iter PCA updates inner_ccd_iter full optimization updates (default = 100).
inner_pca_iter: Number of full PCA updates before changing to fully general coordinate descent updates (default = 1)
inner_ccd_iter: Number of full general coordinate descent updates before changing to PCA updates (default = 1)
tol: convergence tolerance. Algorithm converges when abs((obj_new-obj_old)/obj_old) < tol OR when changes in S matrix falls below 1e-4
ϵ: tolerance added to the lower and upper bound, prevents numerical issues (default = 1e-6)
m: Number of knockoffs per variable (defaults 1)
robust: whether to use "robust" Cholesky updates. If robust=true, alg will be ~10x slower, only use this if robust=false causes cholesky updates to fail. (default false)
verbose: Whether to print intermediate results (default false)

source

Knockoffs.solve_group_sdp_hybrid — Method

solve_group_sdp_hybrid(Σ, groups, [outer_iter=100], [inner_pca_iter=1],
    [inner_ccd_iter=1], [tol=0.0001], [ϵ=1e-6], [m=1], [robust=false], [verbose=false])

Solves the group-knockoff optimization problem based on SDP objective. Users should call solve_s_group instead of this function.

Inputs

Σ: Correlation matrix
groups: Group membership vector

Optional inputs

outer_iter: Maximum number of outer iterations. Each outer iteration will perform inner_pca_iter PCA updates inner_ccd_iter full optimization updates (default = 100).
inner_pca_iter: Number of full PCA updates before changing to fully general coordinate descent updates (default = 1)
inner_ccd_iter: Number of full general coordinate descent updates before changing to PCA updates (default = 1)
tol: convergence tolerance. Algorithm converges when abs((obj_new-obj_old)/obj_old) < tol OR when changes in S matrix falls below 1e-4
ϵ: tolerance added to the lower and upper bound, prevents numerical issues (default = 1e-6)
m: Number of knockoffs per variable (defaults 1)
robust: whether to use "robust" Cholesky updates. If robust=true, alg will be ~10x slower, only use this if robust=false causes cholesky updates to fail. (default false)
verbose: Whether to print intermediate results (default false)

source

Knockoffs.solve_max_entropy — Method

solve_max_entropy(Σ::AbstractMatrix)

Solves the maximum entropy knockoff problem for fixed-X and model-X knockoffs given correlation matrix Σ. Users should call solve_s instead of this function.

Reference

Algorithm 2.2 from Powerful Knockoffs via Minimizing Reconstructability: https://arxiv.org/pdf/2011.14625.pdf

Note

There is a typo in algorithm for computing ME knockoffs in "Powerful knockoffs via minimizing reconstructability" by Spector, Asher, and Lucas Janson (2020). In the supplemental section, equation 59, they needed to evaluate c_m = D^t_{-j,j}D^{-1}_{-j,-j}D_{-j,j}. They claimed the FANOK paper ("FANOK: KNOCKOFFS IN LINEAR TIME" by Askari et al. (2020)) implies that c_m = ||v_m||^2 where Lv_m = u. However, according to section A.1.2 of the FANOK paper, it seems like the actual update should be D^t_{-j,j}D^{-1}_{-j,-j}D_{-j,j} = ζ*||c_m||^2 / (ζ + ||c_m||^2) where ζ = 2Σ_{jj} - s_j.

source

Knockoffs.solve_s — Method

solve_s(Σ::Symmetric, method::Symbol; m=1, kwargs...)

Solves the vector s for generating knockoffs. Σ can be a general covariance matrix but it must be wrapped in the Symmetric keyword.

Inputs

Σ: A covariance matrix (one must wrap Symmetric(Σ) explicitly)
method: Can be one of the following
- :mvr for minimum variance-based reconstructability knockoffs (alg 1 in ref 2)
- :maxent for maximum entropy knockoffs (alg 2 in ref 2)
- :equi for equi-distant knockoffs (eq 2.3 in ref 1),
- :sdp for SDP knockoffs (eq 2.4 in ref 1)
- :sdp_ccd fast SDP knockoffs via coordiate descent (alg 2.2 in ref 3)
m: Number of knockoffs per variable, defaults to 1.
kwargs: Extra arguments available for specific methods. For example, to use less stringent convergence tolerance for MVR knockoffs, specify tol = 0.001. For a list of available options, see solve_MVR, solve_max_entropy, solve_sdp_ccd, solve_SDP, or solve_equi

Reference

"Controlling the false discovery rate via Knockoffs" by Barber and Candes (2015).
"Powerful knockoffs via minimizing reconstructability" by Spector, Asher, and Lucas Janson (2020)
"FANOK: Knockoffs in Linear Time" by Askari et al. (2020).

source

Knockoffs.solve_s_graphical_group — Method

solve_s_graphical_group(Σ::Symmetric, groups::Vector{Int}, group_reps::Vector{Int},
method; [m], [verbose])

Solves the group knockoff problem but the convex optimization problem only runs on the representatives. The non-representative variables are assumed to be independent by groups when conditioning on the reprensetatives.

Inputs

Σ: Symmetric p × p covariance matrix
groups: p dimensional vector of group membership
group_reps: Indices for the representatives.
method: Method for solving group knockoff problem
m: Number of knockoffs to generate per feature
verbose: Whether to print informative intermediate results
kwargs...: extra arguments for solve_s_group

Outputs

S: Matrix obtained from solving the optimization problem on the representatives.
D: A p × p (dense) matrix corresponding to the S matrix for both the representative and non-representative variables. Knockoff sampling should use this matrix. If the graphical conditional independent assumption is satisfied exactly, this matrix should be sparse, but it is always never sparse unless we use cond_indep_corr to force the covariance matrix to satisify it.
obj: Objective value for solving the optimization problem on the representatives.

source

Knockoffs.solve_s_group — Method

solve_s_group(Σ, groups, method; [m=1], kwargs...)

Solves the group knockoff problem, returns block diagonal matrix S satisfying (m+1)/m*Σ - S ⪰ 0 where m is number of knockoffs per feature.

Inputs

Σ: A general covariance matrix wrapped by Symmetric keyword
groups: Vector of group membership, does not need to be contiguous
method: Method for constructing knockoffs. Options include
- :maxent: (recommended) for fully general maximum entropy group knockoffs
- :mvr: for fully general minimum variance-based reconstructability (MVR) group knockoffs
- :equi: for equi-correlated knockoffs. This is the methodology proposed in Dai R, Barber R. The knockoff filter for FDR control in group-sparse and multitask regression. International conference on machine learning 2016 Jun 11 (pp. 1851-1859). PMLR.
- :sdp: Fully general SDP group knockoffs based on coodinate descent
- :sdp_subopt: Chooses each block S_{i} = γ_i * Σ_{ii}. This slightly generalizes the equi-correlated group knockoff idea proposed in Dai and Barber 2016.
- :sdp_block: Fully general SDP group knockoffs where each block is solved exactly using an interior point solver.
m: Number of knockoffs per variable, defaults to 1.
kwargs: Extra arguments available for specific methods. For example, to use less stringent convergence tolerance, specify tol = 0.001. For a list of available options, see solve_group_mvr_hybrid, solve_group_max_entropy_hybrid, solve_group_sdp_hybrid, or solve_group_equi

Output

S: A matrix solved so that (m+1)/m*Σ - S ⪰ 0 and S ⪰ 0
γ: A vector that is only non-empty for equi and suboptimal knockoff constructions. They correspond to values of γ where S_{gg} = γΣ_{gg}. So for equi, the vector is length 1. For SDP, the vector has length equal to number of groups
obj: Final SDP/MVR/ME objective value given S. Equi-correlated group knockoffs and singleton (non-grouped knockoffs) returns 0 because they either no objective value or it is not necessary to evaluate the objectives

Warning

This function potentially permutes the columns/rows of Σ, and puts them back at the end. Thus one should NOT call solve_s_group on the same Σ simultaneously, e.g. in a multithreaded for loop. Permutation does not happen when groups are contiguous.

source

Knockoffs.solve_sdp_ccd — Method

solve_sdp_ccd(Σ::AbstractMatrix)

Solves the SDP problem for fixed-X and model-X knockoffs using coordinate descent, given correlation matrix Σ. Users should call solve_s instead of this function.

Reference

Algorithm 2.2 from "FANOK: Knockoffs in Linear Time" by Askari et al. (2020).

source

Knockoffs.threshold — Method

threshold(w::AbstractVector, q::Number, [method=:knockoff], [m::Int=1])

Chooses a threshold τ > 0 by choosing τ to be one of the following τ = min{ t > 0 : {#j: w[j] ≤ -t} / {#j: w[j] ≥ t} ≤ q } (method=:knockoff) τ = min{ t > 0 : (1 + {#j: w[j] ≤ -t}) / {#j: w[j] ≥ t} ≤ q } (method=:knockoff)

Inputs

w: Vector of feature important statistics
q: target FDR (between 0 and 1)
method: either :knockoff or :knockoff_plus (default)
rej_bounds: Number of values of top W to consider (default = 10000)

Reference:

Equation 3.10 (method=:knockoff) or 3.11 (method=:knockoff_plus) of "Panning for Gold: Model-X Knockoffs for High-dimensional Controlled Variable Selection" by Candes, Fan, Janson, and Lv (2018)

source

Knockoffs.update_normalizing_constants! — Method

update_normalizing_constants!(Q::AbstractMatrix{T}, q::AbstractVector{T})

Computes normalizing constants recursively using equation (5).

Inputs

Q: K × K × p array. Q[:, :, j] is a K × K matrix of transition probabilities for jth state, i.e. Q[l, k, j] = P(X{j} = k | X{j - 1} = l). The first transition matrix is not used.
q: K × 1 vector of initial probabilities

todo: efficiency

source

Knockoffs.Knockoff — Type

A Knockoff holds the original design matrix X, along with its knockoff Xko.

source

Knockoffs.KnockoffFilter — Type

A KnockoffFilter is essentially a Knockoff that has gone through a feature selection procedure, such as the Lasso. It stores, among other things, the final estimated parameters beta after applying the knockoff-filter procedure.

The debiased variable is a boolean indicating whether estimated effect size have been debiased with Lasso. The W vector stores the feature importance statistic that satisfies the flip coin property. tau is the knockoff threshold, which controls the empirical FDR at level q

source

Knockoffs.MarkovChainTable — Type

Genotype states are index pairs (ka, kb) where ka, kb is unordered haplotype 1 and 2. If there are K=5 haplotype motifs, then the 15 possible genotype states and their index are

(1, 1) = 1 (1, 2) = 2 (2, 2) = 6 (1, 3) = 3 (2, 3) = 7 (3, 3) = 10 (1, 4) = 4 (2, 4) = 8 (3, 4) = 11 (4, 4) = 13 (1, 5) = 5 (2, 5) = 9 (3, 5) = 12 (4, 5) = 14 (5, 5) = 15

source