API
Here is a list of available function calls. A detailed description can be found below.
Knockoffs.KnockoffKnockoffs.KnockoffFilterKnockoffs.MarkovChainTableKnockoffs.MK_statisticsKnockoffs.MK_statisticsKnockoffs.adj_constrained_hclustKnockoffs.approx_modelX_gaussian_knockoffsKnockoffs.block_diagonalizeKnockoffs.check_model_solutionKnockoffs.choose_group_repsKnockoffs.cond_indep_corrKnockoffs.conditionKnockoffs.fit_lassoKnockoffs.fit_marginalKnockoffs.fixed_knockoffsKnockoffs.form_emission_prob_matrixKnockoffs.forward_backward!Knockoffs.forward_backward_sampling!Knockoffs.full_knockoffscreenKnockoffs.get_genotype_emission_probabilitiesKnockoffs.get_genotype_transition_matrixKnockoffs.get_haplotype_emission_probabilitiesKnockoffs.get_haplotype_transition_matrixKnockoffs.ghost_knockoffsKnockoffs.group_block_objectiveKnockoffs.hc_partition_groupsKnockoffs.hmm_knockoffKnockoffs.hmm_knockoffKnockoffs.initialize_SKnockoffs.inverse_mat_sqrtKnockoffs.ipadKnockoffs.likelihood_ratioKnockoffs.lowrankdowndate_turbo!Knockoffs.lowrankupdate_turbo!Knockoffs.markov_knockoffsKnockoffs.mk_thresholdKnockoffs.modelX_gaussian_group_knockoffsKnockoffs.modelX_gaussian_knockoffsKnockoffs.modelX_gaussian_rep_group_knockoffsKnockoffs.normalize_col!Knockoffs.prioritize_variantsKnockoffs.rapidKnockoffs.sample_mvn_efficientKnockoffs.search_rankKnockoffs.shift_until_PSD!Knockoffs.simulate_AR1Knockoffs.simulate_ERKnockoffs.simulate_block_covarianceKnockoffs.single_linkage_distanceKnockoffs.single_state_dmc_knockoff!Knockoffs.solve_MVRKnockoffs.solve_SDPKnockoffs.solve_equiKnockoffs.solve_group_SDP_single_blockKnockoffs.solve_group_SDP_suboptKnockoffs.solve_group_block_updateKnockoffs.solve_group_equiKnockoffs.solve_group_max_entropy_hybridKnockoffs.solve_group_mvr_hybridKnockoffs.solve_group_sdp_hybridKnockoffs.solve_max_entropyKnockoffs.solve_sKnockoffs.solve_s_graphical_groupKnockoffs.solve_s_groupKnockoffs.solve_sdp_ccdKnockoffs.thresholdKnockoffs.update_normalizing_constants!
Knockoffs.MK_statistics — MethodMK_statistics(T0::Vector, Tk::Vector{Vector}; filter_method)Computes the multiple knockoff statistics kappa, tau, and W.
Inputs
T0: p-vector of importance score for original variablesTk: Vector storing T1, ..., Tm, where Ti is importance scores for theith knockoff copyfilter_method: EitherStatistics.median(default) or max (original function used in 2019 Gimenez and Zou)
output
κ: Index of the most significant feature (κ[i] = 0if original feature most important, otherwiseκ[i] = kif thekth knockoff is most important)τ:τ[i]stores the most significant statistic among original and knockoff variables minusfilter_method()applied to the remaining statistics.W: coefficient difference statisticW[i] = abs(T0[i]) - abs(Tk[i])
Knockoffs.MK_statistics — MethodMK_statistics(T0::Vector, Tk::Vector)Compute regular knockoff statistics tau and W.
Inputs
T0: p-vector of importance score for original variablesTk: p-vector of importance score for knockoff variables
output
W: coefficient difference statisticW[i] = abs(T0[i]) - abs(Tk[i])
Knockoffs.adj_constrained_hclust — Methodadj_constrained_hclust(distmat::AbstractMatrix, h::Number)Performs (single-linkage) hierarchical clustering, forcing groups to be contiguous. After clustering, variables in different group is guaranteed to have distance less than h.
Note: this is a custom (bottom-up) implementation because Clustering.jl does not support adjacency constraints, see https://github.com/JuliaStats/Clustering.jl/issues/230
Knockoffs.approx_modelX_gaussian_knockoffs — Methodapprox_modelX_gaussian_knockoffs(X, method; [m=1], [windowsize = 500], [covariance_approximator], kwargs...)
approx_modelX_gaussian_knockoffs(X, method, window_ranges; [m=1], [covariance_approximator], kwargs...)Generates Gaussian knockoffs by approximating the covariance as a block diagonal matrix. Each block contains windowsize consecutive features. One could alternatively specify the window_ranges argument to construct blocks of different sizes.
Inputs
X: An × pnumeric matrix orSnpArray. Each row is a sample, and each column is covariate.method: Can be one of the following:mvrfor minimum variance-based reconstructability knockoffs (alg 1 in ref 2):maxentfor maximum entropy knockoffs (alg 2 in ref 2):equifor equi-distant knockoffs (eq 2.3 in ref 1),:sdpfor SDP knockoffs (eq 2.4 in ref 1):sdp_fastfor SDP knockoffs via coordiate descent (alg 2.2 in ref 3)
m: Number of knockoff copies per variable to generate, defaults to 1.windowsize: Number of covariates to be included in a block. Each block consists of adjacent variables. The last block could contain less thanwindowsizevariables.window_ranges: Vector of ranges for each window. e.g. [1:97, 98:200, 201:500]covariance_approximator: A covariance estimator, defaults toLinearShrinkage(DiagonalUnequalVariance(), :lw). See CovarianceEstimation.jl for more options.kwargs...: Possible optional inputs to solvers specified inmethod, seesolve_MVR,solve_max_entropy, andsolve_sdp_ccd
Multithreading (todo)
To enable multiple threads, simply start Julia with >1 threads and this routine will run with all available threads.
Covariance Approximation:
The covariance is approximated by a LinearShrinkageEstimator using Ledoit-Wolf shrinkage with DiagonalUnequalVariance target, which seems to perform well for p>n cases. We do not simply use cov(X) since isposdef(cov(X)) is typically false. For comparison of different estimators, see: https://mateuszbaran.github.io/CovarianceEstimation.jl/dev/man/msecomp/#msecomp
Knockoffs.block_diagonalize — Methodblock_diagonalize(Σ, groups)Internal function to block-diagonalize the covariance Σ according to groups.
Knockoffs.check_model_solution — Methodcheck_model_solution(model; verbose=false)After solving a JuMP model, checks if the solution is accurate.
Knockoffs.choose_group_reps — Methodchoose_group_reps(Σ::Symmetric, groups::AbstractVector; [threshold=0.5], [prioritize_idx], [Σinv])Chooses group representatives. Returns indices of Σ that are representatives. If R is the set of selected variables within a group and O is the set of variables outside the group, then we keep adding variables to R until the proportion of variance explained by R divided by the proportion of variance explained by R and O exceeds threshold.
Inputs
Σ: Correlation matrix wrapped in theSymmetricargument.groups: Vector of group membership.
Optional inputs
threshold: Value between 0 and 1 that controls the number of representatives per group. Larger means more representatives (default 0.5)prioritize_idx: Variable indices that should receive priority to be chosen as representatives, defaults tonothingΣinv: Precomputedinv(Σ)(it will be computed if not supplied)
Knockoffs.cond_indep_corr — MethodReturns Σnew as a covariance matrix that strictly satisfies the conditional independence assumption.
Knockoffs.condition — Methodcondition(x::AbstractVector, μ::AbstractVector, Σ::AbstractMatrix, S::AbstractMatrix, [m::Number=1])Samples a knockoff x̃ from Gaussian x using conditional distribution formulas:
If (x, x̃) ~ N((μ, μ), G) where G = [Σ Σ - S; Σ - S Σ], then we sample x̃ from x̃|x = N(μ+(Σ-S)inv(Σ)(x-μ) , 2S-Sinv(Σ)S).
If we sample m knockoffs, we use the algorithm in "Improving the Stability of the Knockoff Procedure: Multiple Simultaneous Knockoffs and Entropy Maximization" by Gimenez and Zou.
Inputs
X: An × pnumeric matrix, each row is a sample, and each column is covariate.μ: Ap × 1vector of column mean ofXΣ: Ap × pcovariance matrix ofXS: Ap × pmatrix solved to satisfyS ⪰ 0and(m+1)/m*Σ - S ⪰ 0m: Number of (simultaneous) knockoffs per variable to generate, defaultm=1
Output
X̃: An × pmnumeric matrix. The firstpcolumns store the first knockoff copy, and the nextpcolumns store the second knockoff...etc
Todo
- When s is the zero vector, X̃ should be identical to X but it isn't
- Consider changing sampling code to using Distribution's MvNormal
- For multiple knockoffs, can we avoid storing a pm × pm matrix in memory?
Knockoffs.fit_lasso — Methodfit_lasso(y, X, [method], [d], [m], [fdrs], [groups], [filter_method],
[debias], [kwargs...])
fit_lasso(y, X, μ, Σ, [method], [d], [m], [fdrs], [groups], [filter_method],
[debias], [kwargs...])Generates model-X knockoffs with method, runs Lasso, then applies the knockoff-filter. If μ and Σ are not provided, they will be estimated from data.
Inputs
y: An × 1response vectorX: An × pnumeric matrix, each row is a sample, and each column is covariate.method: Method for knockoff generation (defaults to:maxent)μ: Ap × 1vector of column mean ofX. If not provided, defaults to column mean.Σ: Ap × pcovariance matrix ofX. If not provided, it will be estimated based on a shrinked empirical covariance matrix, seemodelX_gaussian_knockoffsd: Distribution of response. DefaultsNormal(), for binary response (logistic regression) useBinomial().m: Number of simultaneous knockoffs to generate, defaults tom=1fdrs: Target FDRs, defaults to[0.01, 0.05, 0.1, 0.25, 0.5]groups: Vector of group membership. If not supplied, we generate regular knockoffs. If supplied, we run group knockoffs.filter_method: Choices are:knockoffor:knockoff_plus(default)debias: Defines how the selected coefficients are debiased. Specify:lsfor least squares or:lassofor Lasso (only running on the support). To not debias, specifydebias=nothing(default).kwargs: Additional arguments to input intoglmnetcvandglmnet
Knockoffs.fit_marginal — Methodfit_marginal(y, X, method=:maxent, ...)
fit_marginal(y, X, μ, Σ, method=:maxent, ...)Generates model-X knockoffs with method and computes feature importance statistics based on squared marginal Z score: abs2(x[:, i]^t*y) / n. If μ and Σ are not provided, they will be estimated from data.
Inputs
y: An × 1response vectorX: An × pnumeric matrix, each row is a sample, and each column is covariate.method: Method for knockoff generation (defaults to:maxent)μ: Ap × 1vector of column mean ofX. If not provided, defaults to column mean.Σ: Ap × pcovariance matrix ofX. If not provided, it will be estimated based on a shrinked empirical covariance matrix, seemodelX_gaussian_knockoffsd: Distribution of response. DefaultsNormal(), for binary response (logistic regression) useBinomial().m: Number of simultaneous knockoffs to generate, defaults tom=1fdrs: Target FDRs, defaults to[0.01, 0.05, 0.1, 0.25, 0.5]groups: Vector of group membership. If not supplied, we generate regular knockoffs. If supplied, we run group knockoffs.filter_method: Choices are:knockoffor:knockoff_plus(default)debias: Defines how the selected coefficients are debiased. Specify:lsfor least squares or:lassofor Lasso (only running on the support). To not debias, specifydebias=nothing(default).kwargs: Additional arguments to input intoglmnetcvandglmnet
Knockoffs.fixed_knockoffs — Methodfixed_knockoffs(X::Matrix{T}; [method], [kwargs...])Creates fixed-X knockoffs. Internally, X will be automatically normalized before computing its knockoff.
Inputs
X: A column-normalizedn × pnumeric matrix, each row is a sample, and each column is covariate. We will internally normalizedXif it is not.method: Can be one of the following:mvr: Minimum variance-based reconstructability knockoffs (alg 1 in ref 2):maxent: Maximum entropy knockoffs (alg 2 in ref 2):equi: Equi-distant knockoffs (eq 2.3 in ref 1),:sdp: SDP knockoffs (eq 2.4 in ref 1):sdp_fast: SDP knockoffs via coordiate descent (alg 2.2 in ref 3)
kwargs...: Possible optional inputs tomethod, seesolve_MVR,solve_max_entropy, andsolve_sdp_ccd
Output
GaussianKnockoff: A struct containing the original (column-normalized)Xand its knockoffX̃, in addition to other variables (e.g.s)
Reference
- "Controlling the false discovery rate via Knockoffs" by Barber and Candes (2015).
- "Powerful knockoffs via minimizing reconstructability" by Spector, Asher, and Lucas Janson (2020)
- "FANOK: Knockoffs in Linear Time" by Askari et al. (2020).
Knockoffs.form_emission_prob_matrix — Methodform_emission_prob_matrix(a, θ, xi::AbstractVector)Inputs
a:p × Kmatrix with values estimated from fastPHASE (i.e. they called it the α parameter)θ:p × Kmatrix with values estimated from fastPHASExi: Lengthpvector with samplei's genotypes (entries 0, 1 or 2)
Knockoffs.forward_backward! — Functionforward_backward!(x, L, y, storage=zeros(length(x)))Non-allocating solver for finding x to the solution of LL'x = y where L is a cholesky factor.
Knockoffs.forward_backward_sampling! — Methodforward_backward_sampling!(Z, X, Q, q, θ, ...)Samples Z, the hidden states of a HMM, from observed sequence of unphased genotypes X.
Inputs
Z: Length p vector of integers. This will store the sampled Markov states X: Length p vector of genotypes (0, 1, or 2) Q: K × K × p array. Q[:, :, j] is a K × K matrix of transition probabilities for jth state, i.e. Q[l, k, j] = P(X{j} = k | X{j - 1} = l). The first transition matrix is not used. q: Length K vector of initial probabilities θ: The θ parameter estimated from fastPHASE
Preallocated storage variables
table: a MarkovChainTable that maps markov chain states to haplotype pairs (ka, kb) d: Sampling distribution, probabilities in d.p are mutated α̂: p × K scaled forward probability matrix, where α̂[j, k] = P(x_1,...,x_k, z_k) / P(x_1,...,x_k) c: normalizing constants, c[k] = p(x_k | x_1,...,x_{k-1})
Reference
Algorithm 3 of "Gene hunting with hidden Markov model knockoffs" by Sesia et al
Knockoffs.full_knockoffscreen — Methodfull_knockoffscreen(x::SnpArray; windowsize::Int=100)Generates knockoffs X̃ⱼ by on regressing Xⱼ on SNPs knockoffs within a sliding window of width windowsize.
Inputs
x: ASnpArrayorStringfor the path of the PLINK.bedfilewindowsize:Intspecifying window width. Defaults to 100
Outputs
X̃: An × pdense matrix ofFloat64, each row is a sample.
References
- He, Zihuai, Linxi Liu, Chen Wang, Yann Le Guen, Justin Lee, Stephanie Gogarten, Fred Lu et al. "Identification of putative causal loci in whole-genome sequencing data via knockoff statistics." Nature communications 12, no. 1 (2021): 1-18.
- He, Zihuai, Yann Le Guen, Linxi Liu, Justin Lee, Shiyang Ma, Andrew C. Yang, Xiaoxia Liu et al. "Genome-wide analysis of common and rare variants via multiple knockoffs at biobank scale, with an application to Alzheimer disease genetics." The American Journal of Human Genetics 108, no. 12 (2021): 2336-2353.
TODO
- Use
ElasticArrays.jlto avoid reallocating design matrix in each loop - Write iterator interface to avoid allocating and storing all knockoffs at once
Knockoffs.get_genotype_emission_probabilities — Methodget_genotype_emission_probabilities(θ::AbstractMatrix, xj::Number, ka::Int, kb::Int, j::Int)Computes P(xj | k={ka,kb}, θ): emission probabilities for genotypes. This is eq 10 of "Gene hunting with hidden Markov model knockoffs" by Sesia et al.
Knockoffs.get_genotype_transition_matrix — Methodget_genotype_transition_matrix(r, θ, α, q, table)Compute transition matrices for the hidden Markov chains in unphased genotypes. This is equation 9 of "Gene hunting with hidden Markov model knockoffs" by Sesia et al.
Inputs
r: Length p vector, the "recombination rates" θ: Size p × K matrix, θ[j, k] is probability that the allele is 1 for SNP p at kth haplotype motif α: Size p × K matrix, probabilities that haplotype motifs succeed each other. Rows should sum to 1. q: Length K vector of initial probabilities table: a MarkovChainTable that maps markov chain states k = 1, ..., K+(K+1)/2 to haplotype pairs (ka, kb).
Knockoffs.get_haplotype_emission_probabilities — Methodget_haplotype_emission_probabilities(θ::AbstractMatrix, j::Int, hj::Number, zj::Int)Computes emission probabilities for unphased HMM. This is the equation above eq8 of "Gene hunting with hidden Markov model knockoffs" by Sesia et al.
Knockoffs.get_haplotype_transition_matrix — Methodget_haplotype_transition_matrix(r, θ, α)Compute transition matrices for the hidden Markov chains in haplotypes. This is 2 equations above eq8 in "Gene hunting with hidden Markov model knockoffs" by Sesia et al.
Inputs
r: Length p vector, the "recombination rates" θ: Size p × K matrix, θ[j, k] is probability that the allele is 1 for SNP p at kth haplotype motif α: Size p × K matrix, probabilities that haplotype motifs succeed each other. Rows should sum to 1.
Output
Q: A p-dimensional vector of K × K matrices. Q[:, :, j] is the jth transition matrix.
Knockoffs.ghost_knockoffs — Methodghost_knockoffs(Zscores, D, Σinv; [m=1])Generate Ghost knockoffs given a list of z-scores (GWAS summary statistic).
Inputs
Zscores: List of z-score statisticsD: Matrix obtained from solving the knockoff problem satisfying(m+1)/m*Σ - D ⪰ 0Σinv: Inverse of the covariance matrix
optional inputs
m: Number of knockoffs
Reference
He, Z., Liu, L., Belloy, M. E., Le Guen, Y., Sossin, A., Liu, X., ... & Ionita-Laza, I. (2021). Summary statistics knockoff inference empowers identification of putative causal variants in genome-wide association studies.
Knockoffs.group_block_objective — Methodgroup_block_objective(Σ, S, groups, m, method)Evaluate the objective for SDP/MVR/ME. This is not an efficient function, so it should only be called at the start of each algorithm.
Inputs
Σ: Covariance or correlation matrix for original dataS: Optimization variable (group-block-diagonal)groups: Vector of group membership. Variableibelongs to groupgroups[i]m: Number of knockoffs to generate for each variablemethod: The optimization method for group knockoffs
Knockoffs.hc_partition_groups — Methodhc_partition_groups(X::AbstractMatrix; [cutoff], [min_clusters], [force_contiguous])
hc_partition_groups(Σ::Symmetric; [cutoff], [min_clusters], [force_contiguous])Computes a group partition based on individual level data X or correlation matrix Σ using hierarchical clustering with specified linkage.
Inputs
X:n × pdata matrix. Each row is a sampleΣ:p × pcorrelation matrix. Must be wrapped in theSymmetricargument, otherwise we will treat it as individual level datacutoff: Height value for which the clustering result is cut, between 0 and 1 (default 0.5). This ensures that no variables between 2 groups have correlation greater thancutoff. 1 recovers ungrouped structure, 0 corresponds to everything in a single group.min_clusters: The desired number of clusters.linkage: cluster linkage function to use (whenforce_contiguous=true,linkagemust be:single).linkagedefines how the distances between the data points are aggregated into the distances between the clusters. Naturally, it affects what clusters are merged on each iteration. The valid choices are::single(default): use the minimum distance between any of the cluster members:average: use the mean distance between any of the cluster members:complete: use the maximum distance between any of the members:ward: the distance is the increase of the average squared distance of a point to its cluster centroid after merging the two clusters:ward_presquared: same as:ward, but assumes that the distances in d are already squared.
rep_method: Method for selecting representatives for each group. Options are:id(tends to select roughly independent variables) or:rss(tends to select more correlated variables)
If force_contiguous = false and both min_clusters and cutoff are specified, it is guaranteed that the number of clusters is not less than min_clusters and their height is not above cutoff. If force_contiguous = true, min_clusters keyword is ignored.
Outputs
groups: Lengthpvector of group membership for each variablegroup_reps: Columns of X selected as representatives. Each group have at mostnreprepresentatives. These are typically used to construct smaller group knockoff for extremely large groups
Knockoffs.hmm_knockoff — Methodhmm_knockoff(plinkname; [datadir], [plink_outfile], [fastphase_outfile], [outdir], [verbose], args...)Generates HMM knockoffs from binary PLINK formatted files. This is done by first running fastPHASE, then running Algorithm 2 of "Gene hunting with hidden Markov model knockoffs" by Sesia, Sabatti, and Candes
Input
plinkname: Binary PLINK file names without the.bed/.bim/.famsuffix.
Optional arguments
datadir: Full path to the PLINK and fastPHASE files (default = current directory)plink_outfile: Output PLINK format namefastphase_outfile: The output file name from fastPHASE's alpha, theta, r filesargs...: Any parameter that accepted infastPHASE.fastphase_estim_param()
Output
plink_outfile.bed:n × pknockoff genotypesplink_outfile.bim: SNP mapping file. Knockoff have SNP names ending in ".k"plink_outfile.fam: Sample mapping file, this is a copy of the originalplinkname.famfilefastphase_outfile_rhat.txt: averaged r hat file from fastPHASEfastphase_outfile_alphahat.txt: averaged alpha hat file from fastPHASEfastphase_outfile_thetahat.txt: averaged theta hat file from fastPHASE
Knockoffs.hmm_knockoff — Methodhmm_knockoff(snpdata::SnpData, r::AbstractVecOrMat, θ::AbstractMatrix, α::AbstractMatrix)Generates knockoff of snpdata with loaded r, θ, α
Input
SnpData: ASnpDataobject from SnpArraysr: The r vector estimated by fastPHASEθ: The θ matrix estimated by fastPHASEα: The α matrix estimated by fastPHASE
Optional Inputs
outdir: Output directory for generated knockoffsplink_outfile: Output file name for knockoff genotypesestimate_δ: If true, will estimate pseudo-FDR by computing a δ value for each SNP via likelihood ratio bound
Knockoffs.initialize_S — Functioninitialize_S(Σ, groups, m, method, verbose)Internal function to help initialize S to a good starting value, returns the final S matrix as well as the cholesky factorizations L and C where
- L.LL.U = cholesky((m+1)/mΣ - S)
- C.L*C.U = cholesky(S)
Knockoffs.inverse_mat_sqrt — MethodComputes A^{-1/2} via eigen-decomposition
Knockoffs.ipad — Methodipad(X::Matrix; [r_method], [m])Generates knockoffs based on intertwined probabilitistic factors decoupling (IPAD). This assumes that X can be factored as X = FΛ' + E where F is a n × r random matrix of latent factors, Λ are factor loadings, and E are residual errors. When this assumption is met, FDR can be controlled with no power loss when applying the knockoff procedure. Internally, we need to compute an eigenfactorization for a n × n matrix. This is often faster than standard model-X knockoffs which requires solving p-dimensional convex optimization problem.
Inputs
X: An × pnumeric matrix, each row is a sample, and each column is covariate.r_method: Method used for estimatingr, the number of latent factors. Choices include:er(default),:gr, or:vem: Number of (simultaneous) knockoffs per variable to generate, defaultm=1
References
- Fan, Y., Lv, J., Sharifvaghefi, M. and Uematsu, Y., 2020. IPAD: stable interpretable forecasting with knockoffs inference. Journal of the American Statistical Association, 115(532), pp.1822-1834.
- Bai, J., 2003. Inferential theory for factor models of large dimensions. Econometrica, 71(1), pp.135-171.
- Ahn, S.C. and Horenstein, A.R., 2013. Eigenvalue ratio test for the number of factors. Econometrica, 81(3), pp.1203-1227.
Knockoffs.likelihood_ratio — Methodlikelihood_ratio(θa, θb, ρ; α=0.1, n = 1000, threshold = true)Estimates the likelihood ratio bound log(P(x)Q(x̃) / Q(x)P(x̃)) for each a single HMM state (fixed i and j)
- θa: State 1 of genotype markov state
- θb: State 2 of genotype markov state
- ρ is maf of SNP
- α is % SNPs decorrelated (defaults 10%)
Knockoffs.lowrankdowndate_turbo! — Methodlowrankdowndate_turbo!(C::Cholesky, v::AbstractVector)Vectorized version of lowrankdowndate!, source https://github.com/JuliaLang/julia/blob/742b9abb4dd4621b667ec5bb3434b8b3602f96fd/stdlib/LinearAlgebra/src/cholesky.jl#L753 Takes advantage of the fact that v is 0 everywhere except at 1 position
Knockoffs.lowrankupdate_turbo! — Methodlowrankupdate_turbo!(C::Cholesky, v::AbstractVector)Vectorized version of lowrankupdate!, source https://github.com/JuliaLang/julia/blob/742b9abb4dd4621b667ec5bb3434b8b3602f96fd/stdlib/LinearAlgebra/src/cholesky.jl#L707 Takes advantage of the fact that v is 0 everywhere except at 1 position
Knockoffs.markov_knockoffs — Methodmarkov_knockoffs(Z::Vector{Int}, Q::Array{T, 3}, q::Vector{T})Generates knockoff of variables distributed as a discrete Markov Chain with K states.
Inputs
Z: Lengthpvector ofIntwhereZ[i]is theith stateQ:K × K × parray.Q[:, :, j]is aK × Kmatrix of transition probabilities forjth state, i.e. Q[l, k, j] = P(X{j} = k | X{j - 1} = l) The first transition matrix is not used.q:K × 1vector of initial probabilities
Reference
Equations 4-5 of "Gene hunting with hidden Markov model knockoffs" by Sesia, Sabatti, and Candes
Knockoffs.mk_threshold — Methodmk_threshold(τ::Vector{T}, κ::Vector{Int}, m::Int, q::Number)Chooses the multiple knockoff threshold τ̂ > 0 by setting τ̂ = min{ t > 0 : (1/m + 1/m * {#j: κ[j] ≥ 1 and W[j] ≥ t}) / {#j: κ[j] == 0 and W[j] ≥ τ̂} ≤ q }.
Inputs
τ: τ[i] stores the feature importance score for the ith feature, i.e. the value T0 - median(T1,...,Tm). Note in Gimenez and Zou, the max function is used instead of medianκ: κ[i] stores which of m knockoffs has largest importance score. When original variable has largest score, κ[i] == 0.m: Number of knockoffs per variable generatedq: target FDR (between 0 and 1)rej_bounds: Number of values of top τ to consider (default = 10000)
Reference:
- Equations 8 and 9 in supplement of "Identification of putative causal loci in wholegenome sequencing data via knockoff statistics" by He et al.
- Algorithm 1 of "Improving the Stability of the Knockoff Procedure: Multiple Simultaneous Knockoffs and Entropy Maximization" by Gimenez and Zou.
Knockoffs.modelX_gaussian_group_knockoffs — MethodmodelX_gaussian_group_knockoffs(X, method, groups, μ, Σ; [m], [covariance_approximator], [kwargs])
modelX_gaussian_group_knockoffs(X, method, groups; [m], [covariance_approximator], [kwargs])Constructs Gaussian model-X group knockoffs. If the covariance Σ and mean μ are not specified, they will be estimated from data, i.e. we will make second-order group knockoffs. To incorporate group structure, the (true or estimated) covariance matrix is block-diagonalized according to groups membership to solve a relaxed optimization problem. See reference paper and Knockoffs.jl docs for more details.
Inputs
X: An × pdesign matrix. Each row is a sample, each column is a feature.method: Method for constructing knockoffs. Options include:maxent: (recommended) for fully general maximum entropy group knockoffs:mvr: for fully general minimum variance-based reconstructability (MVR) group knockoffs:equi: for equi-correlated knockoffs. This is the methodology proposed inDai R, Barber R. The knockoff filter for FDR control in group-sparse and multitask regression. International conference on machine learning 2016 Jun 11 (pp. 1851-1859). PMLR.:sdp: Fully general SDP group knockoffs based on coodinate descent:sdp_block: Fully general SDP group knockoffs where each block is solved exactly using an interior point solver.:sdp_subopt: Chooses each blockS_{i} = γ_i * Σ_{ii}. This slightly generalizes the equi-correlated group knockoff idea proposed in Dai and Barber 2016.
groups: Vector of group membershipμ: A lengthpvector storing the true column means ofXΣ: Ap × pcovariance matrix for columns ofXm: Number of knockoffs per variable, defaults to 1.covariance_approximator: A covariance estimator, defaults toLinearShrinkage(DiagonalUnequalVariance(), :lw). See CovarianceEstimation.jl for more options.kwargs: Extra keyword arguments forsolve_s_group
How to define groups
The exported functions hc_partition_groups can be used to build a group membership vector.
A note on compute time
The computational complexity of group knockoffs scales quadratically with group size. Thus, very large groups (e.g. >100 members per group) dramatically slows down parameter estimation. In such cases, one can consider running the routine modelX_gaussian_rep_group_knockoffs which constructs group knockoffs by choosing top representatives from each group.
Reference
Dai & Barber 2016, The knockoff filter for FDR control in group-sparse and multitask regression
Knockoffs.modelX_gaussian_knockoffs — MethodmodelX_gaussian_knockoffs(X::Matrix, method::Symbol; [m], [covariance_approximator], [kwargs...])
modelX_gaussian_knockoffs(X::Matrix, method::Symbol, μ::Vector, Σ::Matrix; [m], [kwargs...])Creates model-free multivariate normal knockoffs by sequentially sampling from conditional multivariate normal distributions. The true mean μ and covariance Σ is estimated from data if not supplied.
Inputs
X: An × pnumeric matrix, each row is a sample, and each column is covariate.method: Can be one of the following:mvrfor minimum variance-based reconstructability knockoffs (alg 1 in ref 2):maxentfor maximum entropy knockoffs (alg 2 in ref 2):equifor equi-distant knockoffs (eq 2.3 in ref 1),:sdpfor SDP knockoffs (eq 2.4 in ref 1):sdp_ccdfor SDP knockoffs via coordiate descent (alg 2.2 in ref 3)
μ: Ap × 1vector of column mean ofX, defaults to column meanΣ: Ap × pmatrix of covariance ofX, defaults to a shrinkage estimator specified bycovariance_approximator.m: Number of knockoff copies per variable to generate, defaults to 1.covariance_approximator: A covariance estimator, defaults toLinearShrinkage(DiagonalUnequalVariance(), :lw)which tends to give good empirical performance when p>n. See CovarianceEstimation.jl for more options.kwargs...: Possible optional inputs to solvers specified inmethod, seesolve_MVR,solve_max_entropy, andsolve_sdp_ccd
Reference:
- "Panning for Gold: Model-X Knockoffs for High-dimensional Controlled Variable Selection" by Candes, Fan, Janson, and Lv (2018)
- "Powerful knockoffs via minimizing reconstructability" by Spector, Asher, and Lucas Janson (2020)
- "FANOK: Knockoffs in Linear Time" by Askari et al. (2020).
Covariance Approximation:
The covariance is approximated by a linear shrinkage estimator using Ledoit-Wolf with DiagonalUnequalVariance target, which seems to perform well for p>n cases. We do not simply use cov(X) since isposdef(cov(X)) is typically false. For comparison of various estimators, see: https://mateuszbaran.github.io/CovarianceEstimation.jl/dev/man/msecomp/#msecomp
Knockoffs.modelX_gaussian_rep_group_knockoffs — MethodmodelX_gaussian_rep_group_knockoffs(X, method, groups; [m], [covariance_approximator], [kwargs...])
modelX_gaussian_rep_group_knockoffs(X, method, groups, μ, Σ; [m], [kwargs...])Constructs group knockoffs by choosing representatives from each group and solving a smaller optimization problem based on the representatives only. Remaining knockoffs are generated based on a conditional independence assumption similar to a graphical model (details to be given later). The representatives are computed by choose_group_reps
Inputs
X: An × pdesign matrix. Each row is a sample, each column is a feature.method: Method for constructing knockoffs. Options are the same asmodelX_gaussian_group_knockoffsgroups: Vector ofIntdenoting group membership.groups[i]is the group ofX[:, i]covariance_approximator: A covariance estimator, defaults toLinearShrinkage(DiagonalUnequalVariance(), :lw). See CovarianceEstimation.jl for more options.μ: A lengthpvector storing the true column means ofXΣ: Ap × pcovariance matrix for columns ofXrep_threshold: Value between 0 and 1 that controls the number of representatives per group. Larger means more representatives (default 0.5)m: Number of knockoffs per variable, defaults to 1.kwargs: Extra keyword arguments forsolve_s_group
Knockoffs.normalize_col! — Methodnormalize_col!(X::AbstractVecOrMat, [center=false])Normalize each column of X so they sum to 1.
Knockoffs.prioritize_variants — Methodprioritize_variants!(index::AbstractVector, priority_vars::AbstractVector)Given (unsorted) index, we make variables in priority_vars appear first in index, preserving the original order in index and those not in priority_vars.
Example
index = [11, 4, 5, 9, 7]
priority_vars = [4, 9]
result = prioritize_variants(index, priority_vars)
result == [4, 9, 11, 5, 7]Knockoffs.rapid — Methodrapid(rapid_exe, vcffile, mapfile, d, outfolder, w, r, s, [a])Wrapper for the RaPID program.
Inputs
rapid_exe: Full path to theRaPID_v.1.7executable filevcffile: Phased VCF file namemapfile: Map file named: Actual Minimum IBD length in cMoutfolder: Output folder namew: Number of SNPs in a window for sub-samplingr: Number of runss: Minimum number of successes to consider a hit
Optional Inputs
a: Iftrue, ignore MAFs. By default (a=false) the sites are selected at random weighted by their MAFs.
Knockoffs.sample_mvn_efficient — Methodsample_mvn_efficient(C::AbstractMatrix{T}, D::AbstractMatrix{T}, m::Int)Efficiently samples from N(0, A) where
\[\begin{aligned} A &= \begin{pmatrix} C & C-D & \cdots & C-D\\ C-D & C & \cdots & C-D\\ \vdots & & \ddots & \vdots\\ C-D & C-D & & C \end{pmatrix} \end{aligned}\]
Note there are m blocks per row/col
Knockoffs.search_rank — Functionsearch_rank(A::AbstractMatrix, sk::Vector{Int}, target=0.25, verbose=false)Finds the rank (number of columns of A) that best approximates the remaining columns such that regressing each remaining variable on those selected has RSS less than some target.
Σ: Original (p × p) correlation matrixA: The (upper triangular) cholesky factor of Σsk: The (unsorted) columns of A, earlier ones are more importanttarget: Target residual level
note: we cannot do binary search because large ranks can increase residuals
Knockoffs.shift_until_PSD! — Functionshift_until_PSD!(Σ::AbstractMatrix)Keeps adding λI to Σ until the minimum eigenvalue > tol
Knockoffs.simulate_AR1 — Methodsimulate_AR1(p::Int, a=1, b=1, tol=1e-3, max_corr=1, rho=nothing)Generates p-dimensional correlation matrix for AR(1) Gaussian process, where successive correlations are drawn from Beta(a,b) independently. If rho is specified, then the process is stationary with correlation rho.
Source
https://github.com/amspector100/knockpy/blob/20eddb3eb60e0e82b206ec989cb936e3c3ee7939/knockpy/dgp.py#L61
Knockoffs.simulate_ER — Methodsimulate_ER(p::Int; [invert])Simulates a covariance matrix from a clustered Erdos-Renyi graph, which is a block diagonal matrix where each block is an Erdo-Renyi graph. The result is scaled back to a correlation matrix.
For details, see the 4th simulation routine in section 5.1 of Li and Maathius https://academic.oup.com/jrsssb/article/83/3/534/7056103?login=false
Inputs
p: Dimension of covariance matrixϕ: Probability of forming an edge between any 2 nodeslb: lower bound for the value of an edge (drawn from uniform distribution)ub: upper bound for the value of an edge (drawn from uniform distribution)invert: Whther to invert the covariance matrix (to obtain the precision)λmin: minimum eigenvalue of the resulting covariance matrixblocksize: Number of variables within each ER graph.
Knockoffs.simulate_block_covariance — Methodsimulate_block_covariance(groups, ρ, γ, num_v, w)Simulates a block covariance matrix similar to the one in Dai & Barber 2016, The knockoff filter for FDR control in group-sparse and multitask regression. That is, all diagonal elements will be 1, correlation within groups will be ρ, and correlation between groups will be ρ*γ.
Inputs
groups: Vector of group membershipρ: within group correlationγ: between group correlation
Optional arguments
num_v: Number of added rank 1 updateΣ + v1*v1' + ... + vn*vn'wherevis iidN(0, w)(default 0)w: variance of the rank 1 update used innum_v(default 1)
Knockoffs.single_linkage_distance — Methodsingle_linkage_distance(distmat, left, right)Computes the minimum distance (i.e. single-linkage distance) between members in left and members in right. Member distances are precomputed in distmat
Knockoffs.single_state_dmc_knockoff! — MethodSamples Zj, the j state of the hidden Markov chain.
Knockoffs.solve_MVR — Methodsolve_MVR(Σ::AbstractMatrix)Solves the minimum variance-based reconstructability problem for fixed-X and model-X knockoffs given correlation matrix Σ. Users should call solve_s instead of this function.
See algorithm 1 of "Powerful knockoffs via minimizing reconstructability" by Spector, Asher, and Lucas Janson (2020) https://arxiv.org/pdf/2011.14625.pdf
Knockoffs.solve_SDP — Methodsolve_SDP(Σ::AbstractMatrix)Solves the SDP problem for fixed-X and model-X knockoffs given correlation matrix Σ. Users should call solve_s instead of this function.
The optimization problem is stated in equation 3.13 of https://arxiv.org/pdf/1610.02351.pdf
Arguments
Σ: A correlation matrix (diagonals all equal to 1)m: Number of knockoffs to generate, defaults to 1optm: SDP solver. Defaults toHypatia.Optimizer(verbose=false). This can be any solver that supports the JuMP interface. For example, useSDPT3.Optimizerin SDPT3.jl package (which is a MATLAB dependency) for the best performance.
Knockoffs.solve_equi — Methodsolve_equi(Σ::AbstractMatrix)Solves the equicorrelated problem for fixed-X and model-X knockoffs given correlation matrix Σ. Users should call solve_s instead of this function.
Knockoffs.solve_group_SDP_single_block — Methodsolve_group_SDP_single_block(Σ11, ub)Solves a single block of the fully general group SDP problem. The objective is min sum_{i,j} |Σ[i,j] - S[i,j]| s.t. 0 ⪯ S ⪯ A11 - [A12 A13]inv(A22-S2 A32; A23 A33-S3)[A21; A31]
Inputs
Σ11: The block corresponding to the current group. Must be a correlation matrix.ub: The matrix defined as A11 - [A12 A13]inv(A22-S2 A32; A23 A33-S3)[A21; A31]optm: Any solver compatible with JuMP.jl
Knockoffs.solve_group_SDP_subopt — MethodSolves the SDP group knockoff problem using analogy to the equi-correlated group knockoffs. Basically, the idea is to optimize a vector γ where γ[j] multiplies Σ_jj. In the equi-correlated setting, all γ[j] is forced to be equal.
Details can be found in Dai & Barber 2016, The knockoff filter for FDR control in group-sparse and multitask regression
Knockoffs.solve_group_block_update — MethodTodo
- somehow avoid reallocating ub every iteration
- When solving each individual block,
- warmstart
- avoid reallocating S1_new
- allocate vector of models
- use loose convergence criteria
- For singleton groups, don't use JuMP and directly update
- Currently all objective values are computed based on SDP case. Need to display objective values for ME/MVR objective
Knockoffs.solve_group_equi — MethodSolves the equi-correlated group knockoff problem. Here Σ is the true covariance matrix (scaled so that it has 1 on its diagonal) and Σblocks is the block-diagonal covariance matrix where each block corresponds to groups.
Details can be found in Dai & Barber 2016, The knockoff filter for FDR control in group-sparse and multitask regression
Knockoffs.solve_group_max_entropy_hybrid — Methodsolve_group_max_entropy_hybrid(Σ, groups, [outer_iter=100], [inner_pca_iter=1],
[inner_ccd_iter=1], [tol=0.0001], [ϵ=1e-6], [m=1], [robust=false], [verbose=false])Solves the group-knockoff optimization problem based on Maximum Entropy objective. Users should call solve_s_group instead of this function.
Inputs
Σ: Correlation matrixgroups: Group membership vector
Optional inputs
outer_iter: Maximum number of outer iterations. Each outer iteration will performinner_pca_iterPCA updatesinner_ccd_iterfull optimization updates (default = 100).inner_pca_iter: Number of full PCA updates before changing to fully general coordinate descent updates (default = 1)inner_ccd_iter: Number of full general coordinate descent updates before changing to PCA updates (default = 1)tol: convergence tolerance. Algorithm converges whenabs((obj_new-obj_old)/obj_old) < tolOR when changes inSmatrix falls below 1e-4ϵ: tolerance added to the lower and upper bound, prevents numerical issues (default =1e-6)m: Number of knockoffs per variable (defaults1)robust: whether to use "robust" Cholesky updates. Ifrobust=true, alg will be ~10x slower, only use this ifrobust=falsecauses cholesky updates to fail. (defaultfalse)verbose: Whether to print intermediate results (defaultfalse)
Knockoffs.solve_group_mvr_hybrid — Methodsolve_group_mvr_hybrid(Σ, groups, [outer_iter=100], [inner_pca_iter=1],
[inner_ccd_iter=1], [tol=0.0001], [ϵ=1e-6], [m=1], [robust=false], [verbose=false])Solves the group-knockoff optimization problem based on MVR objective. Users should call solve_s_group instead of this function.
Inputs
Σ: Correlation matrixgroups: Group membership vector
Optional inputs
outer_iter: Maximum number of outer iterations. Each outer iteration will performinner_pca_iterPCA updatesinner_ccd_iterfull optimization updates (default = 100).inner_pca_iter: Number of full PCA updates before changing to fully general coordinate descent updates (default = 1)inner_ccd_iter: Number of full general coordinate descent updates before changing to PCA updates (default = 1)tol: convergence tolerance. Algorithm converges whenabs((obj_new-obj_old)/obj_old) < tolOR when changes inSmatrix falls below 1e-4ϵ: tolerance added to the lower and upper bound, prevents numerical issues (default =1e-6)m: Number of knockoffs per variable (defaults1)robust: whether to use "robust" Cholesky updates. Ifrobust=true, alg will be ~10x slower, only use this ifrobust=falsecauses cholesky updates to fail. (defaultfalse)verbose: Whether to print intermediate results (defaultfalse)
Knockoffs.solve_group_sdp_hybrid — Methodsolve_group_sdp_hybrid(Σ, groups, [outer_iter=100], [inner_pca_iter=1],
[inner_ccd_iter=1], [tol=0.0001], [ϵ=1e-6], [m=1], [robust=false], [verbose=false])Solves the group-knockoff optimization problem based on SDP objective. Users should call solve_s_group instead of this function.
Inputs
Σ: Correlation matrixgroups: Group membership vector
Optional inputs
outer_iter: Maximum number of outer iterations. Each outer iteration will performinner_pca_iterPCA updatesinner_ccd_iterfull optimization updates (default = 100).inner_pca_iter: Number of full PCA updates before changing to fully general coordinate descent updates (default = 1)inner_ccd_iter: Number of full general coordinate descent updates before changing to PCA updates (default = 1)tol: convergence tolerance. Algorithm converges whenabs((obj_new-obj_old)/obj_old) < tolOR when changes inSmatrix falls below 1e-4ϵ: tolerance added to the lower and upper bound, prevents numerical issues (default =1e-6)m: Number of knockoffs per variable (defaults1)robust: whether to use "robust" Cholesky updates. Ifrobust=true, alg will be ~10x slower, only use this ifrobust=falsecauses cholesky updates to fail. (defaultfalse)verbose: Whether to print intermediate results (defaultfalse)
Knockoffs.solve_max_entropy — Methodsolve_max_entropy(Σ::AbstractMatrix)Solves the maximum entropy knockoff problem for fixed-X and model-X knockoffs given correlation matrix Σ. Users should call solve_s instead of this function.
Reference
Algorithm 2.2 from Powerful Knockoffs via Minimizing Reconstructability: https://arxiv.org/pdf/2011.14625.pdf
Note
There is a typo in algorithm for computing ME knockoffs in "Powerful knockoffs via minimizing reconstructability" by Spector, Asher, and Lucas Janson (2020). In the supplemental section, equation 59, they needed to evaluate c_m = D^t_{-j,j}D^{-1}_{-j,-j}D_{-j,j}. They claimed the FANOK paper ("FANOK: KNOCKOFFS IN LINEAR TIME" by Askari et al. (2020)) implies that c_m = ||v_m||^2 where Lv_m = u. However, according to section A.1.2 of the FANOK paper, it seems like the actual update should be D^t_{-j,j}D^{-1}_{-j,-j}D_{-j,j} = ζ*||c_m||^2 / (ζ + ||c_m||^2) where ζ = 2Σ_{jj} - s_j.
Knockoffs.solve_s — Methodsolve_s(Σ::Symmetric, method::Symbol; m=1, kwargs...)Solves the vector s for generating knockoffs. Σ can be a general covariance matrix but it must be wrapped in the Symmetric keyword.
Inputs
Σ: A covariance matrix (one must wrapSymmetric(Σ)explicitly)method: Can be one of the following:mvrfor minimum variance-based reconstructability knockoffs (alg 1 in ref 2):maxentfor maximum entropy knockoffs (alg 2 in ref 2):equifor equi-distant knockoffs (eq 2.3 in ref 1),:sdpfor SDP knockoffs (eq 2.4 in ref 1):sdp_ccdfast SDP knockoffs via coordiate descent (alg 2.2 in ref 3)
m: Number of knockoffs per variable, defaults to 1.kwargs: Extra arguments available for specific methods. For example, to use less stringent convergence tolerance for MVR knockoffs, specifytol = 0.001. For a list of available options, seesolve_MVR,solve_max_entropy,solve_sdp_ccd,solve_SDP, orsolve_equi
Reference
- "Controlling the false discovery rate via Knockoffs" by Barber and Candes (2015).
- "Powerful knockoffs via minimizing reconstructability" by Spector, Asher, and Lucas Janson (2020)
- "FANOK: Knockoffs in Linear Time" by Askari et al. (2020).
Knockoffs.solve_s_graphical_group — Methodsolve_s_graphical_group(Σ::Symmetric, groups::Vector{Int}, group_reps::Vector{Int},
method; [m], [verbose])Solves the group knockoff problem but the convex optimization problem only runs on the representatives. The non-representative variables are assumed to be independent by groups when conditioning on the reprensetatives.
Inputs
Σ: Symmetricp × pcovariance matrixgroups:pdimensional vector of group membershipgroup_reps: Indices for the representatives.method: Method for solving group knockoff problemm: Number of knockoffs to generate per featureverbose: Whether to print informative intermediate resultskwargs...: extra arguments forsolve_s_group
Outputs
S: Matrix obtained from solving the optimization problem on the representatives.D: Ap × p(dense) matrix corresponding to the S matrix for both the representative and non-representative variables. Knockoff sampling should use this matrix. If the graphical conditional independent assumption is satisfied exactly, this matrix should be sparse, but it is always never sparse unless we usecond_indep_corrto force the covariance matrix to satisify it.obj: Objective value for solving the optimization problem on the representatives.
Knockoffs.solve_s_group — Methodsolve_s_group(Σ, groups, method; [m=1], kwargs...)Solves the group knockoff problem, returns block diagonal matrix S satisfying (m+1)/m*Σ - S ⪰ 0 where m is number of knockoffs per feature.
Inputs
Σ: A general covariance matrix wrapped bySymmetrickeywordgroups: Vector of group membership, does not need to be contiguousmethod: Method for constructing knockoffs. Options include:maxent: (recommended) for fully general maximum entropy group knockoffs:mvr: for fully general minimum variance-based reconstructability (MVR) group knockoffs:equi: for equi-correlated knockoffs. This is the methodology proposed inDai R, Barber R. The knockoff filter for FDR control in group-sparse and multitask regression. International conference on machine learning 2016 Jun 11 (pp. 1851-1859). PMLR.:sdp: Fully general SDP group knockoffs based on coodinate descent:sdp_subopt: Chooses each blockS_{i} = γ_i * Σ_{ii}. This slightly generalizes the equi-correlated group knockoff idea proposed in Dai and Barber 2016.:sdp_block: Fully general SDP group knockoffs where each block is solved exactly using an interior point solver.
m: Number of knockoffs per variable, defaults to 1.kwargs: Extra arguments available for specific methods. For example, to use less stringent convergence tolerance, specifytol = 0.001. For a list of available options, seesolve_group_mvr_hybrid,solve_group_max_entropy_hybrid,solve_group_sdp_hybrid, orsolve_group_equi
Output
S: A matrix solved so that(m+1)/m*Σ - S ⪰ 0andS ⪰ 0γ: A vector that is only non-empty for equi and suboptimal knockoff constructions. They correspond to values of γ whereS_{gg} = γΣ_{gg}. So for equi, the vector is length 1. For SDP, the vector has length equal to number of groupsobj: Final SDP/MVR/ME objective value givenS. Equi-correlated group knockoffs and singleton (non-grouped knockoffs) returns 0 because they either no objective value or it is not necessary to evaluate the objectives
Warning
This function potentially permutes the columns/rows of Σ, and puts them back at the end. Thus one should NOT call solve_s_group on the same Σ simultaneously, e.g. in a multithreaded for loop. Permutation does not happen when groups are contiguous.
Knockoffs.solve_sdp_ccd — Methodsolve_sdp_ccd(Σ::AbstractMatrix)Solves the SDP problem for fixed-X and model-X knockoffs using coordinate descent, given correlation matrix Σ. Users should call solve_s instead of this function.
Reference
Algorithm 2.2 from "FANOK: Knockoffs in Linear Time" by Askari et al. (2020).
Knockoffs.threshold — Methodthreshold(w::AbstractVector, q::Number, [method=:knockoff], [m::Int=1])Chooses a threshold τ > 0 by choosing τ to be one of the following τ = min{ t > 0 : {#j: w[j] ≤ -t} / {#j: w[j] ≥ t} ≤ q } (method=:knockoff) τ = min{ t > 0 : (1 + {#j: w[j] ≤ -t}) / {#j: w[j] ≥ t} ≤ q } (method=:knockoff)
Inputs
w: Vector of feature important statisticsq: target FDR (between 0 and 1)method: either:knockoffor:knockoff_plus(default)rej_bounds: Number of values of top W to consider (default = 10000)
Reference:
Equation 3.10 (method=:knockoff) or 3.11 (method=:knockoff_plus) of "Panning for Gold: Model-X Knockoffs for High-dimensional Controlled Variable Selection" by Candes, Fan, Janson, and Lv (2018)
Knockoffs.update_normalizing_constants! — Methodupdate_normalizing_constants!(Q::AbstractMatrix{T}, q::AbstractVector{T})Computes normalizing constants recursively using equation (5).
Inputs
Q:K × K × parray.Q[:, :, j]is aK × Kmatrix of transition probabilities forjth state, i.e. Q[l, k, j] = P(X{j} = k | X{j - 1} = l). The first transition matrix is not used.q:K × 1vector of initial probabilities
todo: efficiency
Knockoffs.Knockoff — TypeA Knockoff holds the original design matrix X, along with its knockoff Xko.
Knockoffs.KnockoffFilter — TypeA KnockoffFilter is essentially a Knockoff that has gone through a feature selection procedure, such as the Lasso. It stores, among other things, the final estimated parameters beta after applying the knockoff-filter procedure.
The debiased variable is a boolean indicating whether estimated effect size have been debiased with Lasso. The W vector stores the feature importance statistic that satisfies the flip coin property. tau is the knockoff threshold, which controls the empirical FDR at level q
Knockoffs.MarkovChainTable — TypeGenotype states are index pairs (ka, kb) where ka, kb is unordered haplotype 1 and 2. If there are K=5 haplotype motifs, then the 15 possible genotype states and their index are
(1, 1) = 1 (1, 2) = 2 (2, 2) = 6 (1, 3) = 3 (2, 3) = 7 (3, 3) = 10 (1, 4) = 4 (2, 4) = 8 (3, 4) = 11 (4, 4) = 13 (1, 5) = 5 (2, 5) = 9 (3, 5) = 12 (4, 5) = 14 (5, 5) = 15