IPAD knockoffs
This tutorial generates knockoffs based on the intertwined probabilistic factors decoupling (IPAD) method, described in the following paper
Fan, Yingying, Jinchi Lv, Mahrad Sharifvaghefi, and Yoshimasa Uematsu. "IPAD: stable interpretable forecasting with knockoffs inference." Journal of the American Statistical Association 115, no. 532 (2020): 1822-1834.
As we will see shortly,
- IPAD knockoffs are more powerful and much more efficient to generate than model-X MVR/ME/SDP knockoffs, if we can assume a low-dimensional factored model for the data matrix
X
. - If
X
doesn't really assume a factored model, then IPAD knockoff's empirical FDR will be inflated, sometimes severely so.
Our comparison uses target FDR 10%, Lasso coefficient difference statistic, and we explore 3 ways (ER, GR, VR) to choose the number of latent factors r
for IPAD knockoffs.
# load packages
using Knockoffs
using LinearAlgebra
using Random
using StatsKit
using ToeplitzMatrices
using Distributions
using Random
using CSV, DataFrames
Test1: Simulate under factored model
This is design 1 of IPAD: Stable Interpretable Forecasting with Knockoffs Inference.
- Here $n=500, p=1000$
- $X = F\Lambda' + \sqrt{r\theta}E$ where $r = 3, \theta=1$ and $F, \Lambda, E$ are $n \times r, p \times r$, and $n \times p$ iid gaussian matrices.
- $y = X\beta + \sqrt{c}\epsilon$ where $c = 0.2$
- 50 causal variables with $\beta_i = A = 0.1$
- For MVR/ME/SDP, we generate 2nd order knockoffs by estimating a shrinked covariance matrix
function compare_ipad(nsims)
n = 500 # number of samples
p = 1000 # number of covariates
m = 1 # number of knockoffs per variable
k = 50 # number of causal variables
rtrue = 3 # true rank
A = 0.1 # causal beta
θ = 1
c = 0.2 # some noise term for simulating y
sdp_powers, sdp_fdrs, sdp_times = 0.0, 0.0, 0.0
me_powers, me_fdrs, me_times = 0.0, 0.0, 0.0
mvr_powers, mvr_fdrs, mvr_times = 0.0, 0.0, 0.0
ipad_er_powers, ipad_er_fdrs, ipad_er_times = 0.0, 0.0, 0.0
ipad_gr_powers, ipad_gr_fdrs, ipad_gr_times = 0.0, 0.0, 0.0
ipad_ve_powers, ipad_ve_fdrs, ipad_ve_times = 0.0, 0.0, 0.0
for seed in 1:nsims
# simulate X
Random.seed!(seed)
F = randn(n, rtrue)
Λ = randn(p, rtrue)
C = F * Λ'
E = randn(n, p)
X = C + sqrt(rtrue * θ) * E
# simulate y
Random.seed!(seed)
βtrue = zeros(p)
βtrue[1:k] .= rand(-1:2:1, k) .* A
shuffle!(βtrue)
ϵ = randn(n)
y = X * βtrue + sqrt(c) .* ϵ
μ = zeros(p)
correct_position = findall(!iszero, βtrue)
# ipad with ER
Random.seed!(seed)
ipad_er_t = @elapsed ipad_er = ipad(X, r_method = :er, m = m)
ipad_er_ko_filter = fit_lasso(y, ipad_er)
selected = ipad_er_ko_filter.selected[3]
ipad_er_power = length(selected ∩ correct_position) / k
ipad_er_fdr = length(setdiff(selected, correct_position)) / max(length(selected), 1)
#ipad with GR
Random.seed!(seed)
ipad_gr_t = @elapsed ipad_gr = ipad(X, r_method = :gr, m = m)
ipad_gr_ko_filter = fit_lasso(y, ipad_gr)
selected = ipad_gr_ko_filter.selected[3]
ipad_gr_power = length(selected ∩ correct_position) / k
ipad_gr_fdr = length(setdiff(selected, correct_position)) / max(length(selected), 1)
# ipad with VE
Random.seed!(seed)
ipad_ve_t = @elapsed ipad_ve = ipad(X, r_method = :ve, m = m)
ipad_ve_ko_filter = fit_lasso(y, ipad_ve)
selected = ipad_ve_ko_filter.selected[3]
ipad_ve_power = length(selected ∩ correct_position) / k
ipad_ve_fdr = length(setdiff(selected, correct_position)) / max(length(selected), 1)
# ME knockoffs
Random.seed!(seed)
me_t = @elapsed me = modelX_gaussian_knockoffs(X, :maxent, m = m)
me_ko_filter = fit_lasso(y, me)
selected = me_ko_filter.selected[3]
me_power = length(selected ∩ correct_position) / k
me_fdr = length(setdiff(selected, correct_position)) / max(length(selected), 1)
# MVR knockoffs
Random.seed!(seed)
mvr_t = @elapsed mvr = modelX_gaussian_knockoffs(X, :mvr, m = m)
mvr_ko_filter = fit_lasso(y, mvr)
selected = mvr_ko_filter.selected[3]
mvr_power = length(selected ∩ correct_position) / k
mvr_fdr = length(setdiff(selected, correct_position)) / max(length(selected), 1)
# SDP (ccd) knockoffs
Random.seed!(seed)
sdp_t = @elapsed sdp = modelX_gaussian_knockoffs(X, :sdp_ccd, m = m)
sdp_ko_filter = fit_lasso(y, sdp)
selected = sdp_ko_filter.selected[3]
sdp_power = length(selected ∩ correct_position) / k
sdp_fdr = length(setdiff(selected, correct_position)) / max(length(selected), 1)
# record results
println("ipad ER: power = $ipad_er_power, FDR = $ipad_er_fdr, rank = $(ipad_er.r), time = $(round(ipad_er_t, digits=3))")
println("ipad GR: power = $ipad_gr_power, FDR = $ipad_gr_fdr, rank = $(ipad_gr.r), time = $(round(ipad_gr_t, digits=3))")
println("ipad VE: power = $ipad_ve_power, FDR = $ipad_ve_fdr, rank = $(ipad_ve.r), time = $(round(ipad_ve_t, digits=3))")
println("ME: power = $me_power, FDR = $me_fdr, time = $(round(me_t, digits=3))")
println("MVR: power = $mvr_power, FDR = $mvr_fdr, time = $(round(mvr_t, digits=3))")
println("SDP: power = $sdp_power, FDR = $sdp_fdr, time = $(round(sdp_t, digits=3))\n\n")
sdp_powers += sdp_power
sdp_fdrs += sdp_fdr
sdp_times += sdp_t
me_powers += me_power
me_fdrs += me_fdr
me_times += me_t
mvr_powers += mvr_power
mvr_fdrs += mvr_fdr
mvr_times += mvr_t
ipad_er_powers += ipad_er_power
ipad_er_fdrs += ipad_er_fdr
ipad_er_times += ipad_er_t
ipad_gr_powers += ipad_gr_power
ipad_gr_fdrs += ipad_gr_fdr
ipad_gr_times += ipad_gr_t
ipad_ve_powers += ipad_ve_power
ipad_ve_fdrs += ipad_ve_fdr
ipad_ve_times += ipad_ve_t
end
# compute average
ipad_er_powers /= nsims
ipad_er_fdrs /= nsims
ipad_er_times /= nsims
ipad_gr_powers /= nsims
ipad_gr_fdrs /= nsims
ipad_gr_times /= nsims
ipad_ve_powers /= nsims
ipad_ve_fdrs /= nsims
ipad_ve_times /= nsims
me_powers /= nsims
me_fdrs /= nsims
me_times /= nsims
mvr_powers /= nsims
mvr_fdrs /= nsims
mvr_times /= nsims
sdp_powers /= nsims
sdp_fdrs /= nsims
sdp_times /= nsims
# save in dataframe
result = DataFrame(
method=["IPAD-er", "IPAD-gr", "IPAD-ve", "ME", "MVR", "SDP"],
power=[ipad_er_powers,ipad_gr_powers,ipad_ve_powers,me_powers,mvr_powers,sdp_powers],
FDR=[ipad_er_fdrs,ipad_gr_fdrs,ipad_ve_fdrs,me_fdrs,mvr_fdrs,sdp_fdrs],
time=[ipad_er_times,ipad_gr_times,ipad_ve_times,me_times,mvr_times,sdp_times]
)
return result
end
nsims = 10
result = compare_ipad(nsims);
ipad ER: power = 0.98, FDR = 0.07547169811320754, rank = 3, time = 0.232
ipad GR: power = 0.98, FDR = 0.07547169811320754, rank = 3, time = 0.068
ipad VE: power = 1.0, FDR = 0.16666666666666666, rank = 257, time = 0.052
ME: power = 0.0, FDR = 0.0, time = 2.312
MVR: power = 0.0, FDR = 0.0, time = 2.953
SDP: power = 0.48, FDR = 0.0, time = 11.525
ipad ER: power = 1.0, FDR = 0.2647058823529412, rank = 3, time = 0.029
ipad GR: power = 1.0, FDR = 0.2647058823529412, rank = 3, time = 0.028
ipad VE: power = 1.0, FDR = 0.12280701754385964, rank = 263, time = 0.031
ME: power = 0.5, FDR = 0.0, time = 2.256
MVR: power = 0.4, FDR = 0.0, time = 2.944
SDP: power = 0.94, FDR = 0.0784313725490196, time = 11.536
ipad ER: power = 0.96, FDR = 0.02040816326530612, rank = 3, time = 0.043
ipad GR: power = 0.96, FDR = 0.02040816326530612, rank = 3, time = 0.027
ipad VE: power = 0.98, FDR = 0.09259259259259259, rank = 257, time = 0.031
ME: power = 0.66, FDR = 0.0, time = 2.268
MVR: power = 0.66, FDR = 0.0, time = 3.047
SDP: power = 0.8, FDR = 0.024390243902439025, time = 11.565
ipad ER: power = 0.94, FDR = 0.04081632653061224, rank = 3, time = 0.049
ipad GR: power = 0.94, FDR = 0.04081632653061224, rank = 3, time = 0.027
ipad VE: power = 0.92, FDR = 0.041666666666666664, rank = 260, time = 0.225
ME: power = 0.0, FDR = 0.0, time = 2.61
MVR: power = 0.0, FDR = 0.0, time = 2.943
SDP: power = 0.0, FDR = 0.0, time = 11.484
ipad ER: power = 1.0, FDR = 0.20634920634920634, rank = 3, time = 0.029
ipad GR: power = 1.0, FDR = 0.20634920634920634, rank = 3, time = 0.03
ipad VE: power = 0.98, FDR = 0.07547169811320754, rank = 256, time = 0.03
ME: power = 0.78, FDR = 0.025, time = 2.255
MVR: power = 0.64, FDR = 0.0, time = 3.009
SDP: power = 0.9, FDR = 0.1, time = 11.51
ipad ER: power = 1.0, FDR = 0.05660377358490566, rank = 3, time = 0.03
ipad GR: power = 1.0, FDR = 0.05660377358490566, rank = 3, time = 0.029
ipad VE: power = 0.9, FDR = 0.1346153846153846, rank = 253, time = 0.03
ME: power = 0.84, FDR = 0.023255813953488372, time = 2.294
MVR: power = 0.78, FDR = 0.025, time = 2.946
SDP: power = 0.76, FDR = 0.05, time = 11.468
ipad ER: power = 0.92, FDR = 0.09803921568627451, rank = 3, time = 0.033
ipad GR: power = 0.92, FDR = 0.09803921568627451, rank = 3, time = 0.036
ipad VE: power = 0.92, FDR = 0.041666666666666664, rank = 256, time = 0.029
ME: power = 0.54, FDR = 0.0, time = 2.257
MVR: power = 0.62, FDR = 0.0, time = 2.95
SDP: power = 0.8, FDR = 0.047619047619047616, time = 11.574
ipad ER: power = 1.0, FDR = 0.10714285714285714, rank = 3, time = 0.029
ipad GR: power = 1.0, FDR = 0.10714285714285714, rank = 3, time = 0.03
ipad VE: power = 1.0, FDR = 0.15254237288135594, rank = 260, time = 0.028
ME: power = 0.66, FDR = 0.0, time = 2.254
MVR: power = 0.66, FDR = 0.0, time = 3.003
SDP: power = 0.8, FDR = 0.0, time = 11.761
ipad ER: power = 1.0, FDR = 0.09090909090909091, rank = 3, time = 0.031
ipad GR: power = 1.0, FDR = 0.09090909090909091, rank = 3, time = 0.027
ipad VE: power = 1.0, FDR = 0.07407407407407407, rank = 257, time = 0.032
ME: power = 0.0, FDR = 0.0, time = 2.245
MVR: power = 0.0, FDR = 0.0, time = 2.952
SDP: power = 0.86, FDR = 0.0, time = 11.547
ipad ER: power = 0.94, FDR = 0.14545454545454545, rank = 3, time = 0.048
ipad GR: power = 0.94, FDR = 0.14545454545454545, rank = 3, time = 0.026
ipad VE: power = 0.94, FDR = 0.1896551724137931, rank = 255, time = 0.04
ME: power = 0.68, FDR = 0.0, time = 2.269
MVR: power = 0.7, FDR = 0.0, time = 2.962
SDP: power = 0.9, FDR = 0.021739130434782608, time = 11.542
# check average
@show result;
result = 6×4 DataFrame
Row │ method power FDR time
│ String Float64 Float64 Float64
─────┼──────────────────────────────────────────
1 │ IPAD-er 0.974 0.11059 0.055441
2 │ IPAD-gr 0.974 0.11059 0.032905
3 │ IPAD-ve 0.964 0.109176 0.0528907
4 │ ME 0.466 0.00482558 2.30208
5 │ MVR 0.446 0.0025 2.97096
6 │ SDP 0.724 0.032218 11.5513
Summary (when $X$ follows the IPAD model assumption)
- All methods control FDR (target = 10%)
- ER and GR method always find the correct rank (r = 3), while VE overestimates the rank
- IPAD method has much better power compared to model-X knockoffs via ME/MVR/SDP construction
- Surprisingly, ME/MVR has worse power than SDP
- IPAD method is much more efficient to construct
Test2: Try $X$ that's not a factored model
- Here $y \sim N(X\beta, 1)$ and $X_i \sim N(0, \Sigma)$ where $\Sigma$ is an AR(1) model.
- 50 causal SNPs with $\beta_i \sim \pm N(0, 0.5)$
- For MVR/ME/SDP knockoffs, we assume the true $\mu$ and $\Sigma$ are available.
function compare_ipad2(nsims)
n = 500 # number of samples
p = 1000 # number of covariates
m = 1 # number of knockoffs per variable
k = 50 # number of causal variables
sdp_powers, sdp_fdrs, sdp_times = 0.0, 0.0, 0.0
me_powers, me_fdrs, me_times = 0.0, 0.0, 0.0
mvr_powers, mvr_fdrs, mvr_times = 0.0, 0.0, 0.0
ipad_er_powers, ipad_er_fdrs, ipad_er_times = 0.0, 0.0, 0.0
ipad_gr_powers, ipad_gr_fdrs, ipad_gr_times = 0.0, 0.0, 0.0
ipad_ve_powers, ipad_ve_fdrs, ipad_ve_times = 0.0, 0.0, 0.0
for seed in 1:nsims
# simulate X
Random.seed!(seed)
Σ = simulate_AR1(p, a=3, b=1)
μ = zeros(p)
X = rand(MvNormal(μ, Σ), n)' |> Matrix
zscore!(X, mean(X, dims=1), std(X, dims=1))
# simulate y
Random.seed!(seed)
βtrue = zeros(p)
βtrue[1:k] .= rand(-1:2:1, k) .* rand(Normal(0, 0.5), k)
shuffle!(βtrue)
ϵ = randn(n)
y = X * βtrue + ϵ
μ = zeros(p)
correct_position = findall(!iszero, βtrue)
# ipad with ER
Random.seed!(seed)
ipad_er_t = @elapsed ipad_er = ipad(X, r_method = :er, m = m)
ipad_er_ko_filter = fit_lasso(y, ipad_er)
selected = ipad_er_ko_filter.selected[3]
ipad_er_power = length(selected ∩ correct_position) / k
ipad_er_fdr = length(setdiff(selected, correct_position)) / max(length(selected), 1)
#ipad with GR
Random.seed!(seed)
ipad_gr_t = @elapsed ipad_gr = ipad(X, r_method = :gr, m = m)
ipad_gr_ko_filter = fit_lasso(y, ipad_gr)
selected = ipad_gr_ko_filter.selected[3]
ipad_gr_power = length(selected ∩ correct_position) / k
ipad_gr_fdr = length(setdiff(selected, correct_position)) / max(length(selected), 1)
# ipad with VE
Random.seed!(seed)
ipad_ve_t = @elapsed ipad_ve = ipad(X, r_method = :ve, m = m)
ipad_ve_ko_filter = fit_lasso(y, ipad_ve)
selected = ipad_ve_ko_filter.selected[3]
ipad_ve_power = length(selected ∩ correct_position) / k
ipad_ve_fdr = length(setdiff(selected, correct_position)) / max(length(selected), 1)
# ME knockoffs
Random.seed!(seed)
me_t = @elapsed me = modelX_gaussian_knockoffs(X, :maxent, μ, Σ, m = m)
me_ko_filter = fit_lasso(y, me)
selected = me_ko_filter.selected[3]
me_power = length(selected ∩ correct_position) / k
me_fdr = length(setdiff(selected, correct_position)) / max(length(selected), 1)
# MVR knockoffs
Random.seed!(seed)
mvr_t = @elapsed mvr = modelX_gaussian_knockoffs(X, :mvr, μ, Σ,m = m)
mvr_ko_filter = fit_lasso(y, mvr)
selected = mvr_ko_filter.selected[3]
mvr_power = length(selected ∩ correct_position) / k
mvr_fdr = length(setdiff(selected, correct_position)) / max(length(selected), 1)
# SDP (ccd) knockoffs
Random.seed!(seed)
sdp_t = @elapsed sdp = modelX_gaussian_knockoffs(X, :sdp_ccd, μ, Σ, m = m)
sdp_ko_filter = fit_lasso(y, sdp)
selected = sdp_ko_filter.selected[3]
sdp_power = length(selected ∩ correct_position) / k
sdp_fdr = length(setdiff(selected, correct_position)) / max(length(selected), 1)
# record results
println("ipad ER: power = $ipad_er_power, FDR = $ipad_er_fdr, rank = $(ipad_er.r), time = $(round(ipad_er_t, digits=3))")
println("ipad GR: power = $ipad_gr_power, FDR = $ipad_gr_fdr, rank = $(ipad_gr.r), time = $(round(ipad_gr_t, digits=3))")
println("ipad VE: power = $ipad_ve_power, FDR = $ipad_ve_fdr, rank = $(ipad_ve.r), time = $(round(ipad_ve_t, digits=3))")
println("ME: power = $me_power, FDR = $me_fdr, time = $(round(me_t, digits=3))")
println("MVR: power = $mvr_power, FDR = $mvr_fdr, time = $(round(mvr_t, digits=3))")
println("SDP: power = $sdp_power, FDR = $sdp_fdr, time = $(round(sdp_t, digits=3))\n\n")
sdp_powers += sdp_power
sdp_fdrs += sdp_fdr
sdp_times += sdp_t
me_powers += me_power
me_fdrs += me_fdr
me_times += me_t
mvr_powers += mvr_power
mvr_fdrs += mvr_fdr
mvr_times += mvr_t
ipad_er_powers += ipad_er_power
ipad_er_fdrs += ipad_er_fdr
ipad_er_times += ipad_er_t
ipad_gr_powers += ipad_gr_power
ipad_gr_fdrs += ipad_gr_fdr
ipad_gr_times += ipad_gr_t
ipad_ve_powers += ipad_ve_power
ipad_ve_fdrs += ipad_ve_fdr
ipad_ve_times += ipad_ve_t
end
# compute average
ipad_er_powers /= nsims
ipad_er_fdrs /= nsims
ipad_er_times /= nsims
ipad_gr_powers /= nsims
ipad_gr_fdrs /= nsims
ipad_gr_times /= nsims
ipad_ve_powers /= nsims
ipad_ve_fdrs /= nsims
ipad_ve_times /= nsims
me_powers /= nsims
me_fdrs /= nsims
me_times /= nsims
mvr_powers /= nsims
mvr_fdrs /= nsims
mvr_times /= nsims
sdp_powers /= nsims
sdp_fdrs /= nsims
sdp_times /= nsims
# save in dataframe
result = DataFrame(
method=["IPAD-er", "IPAD-gr", "IPAD-ve", "ME", "MVR", "SDP"],
power=[ipad_er_powers,ipad_gr_powers,ipad_ve_powers,me_powers,mvr_powers,sdp_powers],
FDR=[ipad_er_fdrs,ipad_gr_fdrs,ipad_ve_fdrs,me_fdrs,mvr_fdrs,sdp_fdrs],
time=[ipad_er_times,ipad_gr_times,ipad_ve_times,me_times,mvr_times,sdp_times]
)
return result
end
nsims = 10
result = compare_ipad2(nsims);
ipad ER: power = 0.0, FDR = 0.0, rank = 496, time = 0.052
ipad GR: power = 0.64, FDR = 0.1794871794871795, rank = 1, time = 0.039
ipad VE: power = 0.62, FDR = 0.20512820512820512, rank = 217, time = 0.044
ME: power = 0.28, FDR = 0.0, time = 2.299
MVR: power = 0.26, FDR = 0.0, time = 3.286
SDP: power = 0.4, FDR = 0.0, time = 8.727
ipad ER: power = 0.5, FDR = 0.24242424242424243, rank = 1, time = 0.043
ipad GR: power = 0.5, FDR = 0.24242424242424243, rank = 1, time = 0.227
ipad VE: power = 0.5, FDR = 0.24242424242424243, rank = 219, time = 0.041
ME: power = 0.0, FDR = 0.0, time = 2.627
MVR: power = 0.2, FDR = 0.0, time = 3.515
SDP: power = 0.38, FDR = 0.05, time = 8.974
ipad ER: power = 0.58, FDR = 0.21621621621621623, rank = 1, time = 0.039
ipad GR: power = 0.58, FDR = 0.21621621621621623, rank = 1, time = 0.031
ipad VE: power = 0.62, FDR = 0.24390243902439024, rank = 217, time = 0.078
ME: power = 0.5, FDR = 0.10714285714285714, time = 2.364
MVR: power = 0.48, FDR = 0.07692307692307693, time = 3.309
SDP: power = 0.0, FDR = 0.0, time = 8.754
ipad ER: power = 0.46, FDR = 0.14814814814814814, rank = 480, time = 0.049
ipad GR: power = 0.62, FDR = 0.29545454545454547, rank = 10, time = 0.035
ipad VE: power = 0.66, FDR = 0.2826086956521739, rank = 213, time = 0.049
ME: power = 0.54, FDR = 0.12903225806451613, time = 2.489
MVR: power = 0.46, FDR = 0.08, time = 3.403
SDP: power = 0.4, FDR = 0.09090909090909091, time = 9.418
ipad ER: power = 0.56, FDR = 0.2, rank = 1, time = 0.058
ipad GR: power = 0.56, FDR = 0.2, rank = 1, time = 0.077
ipad VE: power = 0.56, FDR = 0.2222222222222222, rank = 217, time = 0.073
ME: power = 0.38, FDR = 0.13636363636363635, time = 2.52
MVR: power = 0.5, FDR = 0.16666666666666666, time = 3.628
SDP: power = 0.4, FDR = 0.09090909090909091, time = 8.948
ipad ER: power = 0.0, FDR = 0.0, rank = 493, time = 0.039
ipad GR: power = 0.72, FDR = 0.25, rank = 3, time = 0.048
ipad VE: power = 0.74, FDR = 0.3508771929824561, rank = 218, time = 0.039
ME: power = 0.64, FDR = 0.058823529411764705, time = 2.555
MVR: power = 0.64, FDR = 0.058823529411764705, time = 3.265
SDP: power = 0.62, FDR = 0.06060606060606061, time = 8.543
ipad ER: power = 0.76, FDR = 0.3448275862068966, rank = 3, time = 0.029
ipad GR: power = 0.76, FDR = 0.3448275862068966, rank = 3, time = 0.04
ipad VE: power = 0.76, FDR = 0.36666666666666664, rank = 214, time = 0.064
ME: power = 0.6, FDR = 0.09090909090909091, time = 2.302
MVR: power = 0.6, FDR = 0.11764705882352941, time = 3.329
SDP: power = 0.22, FDR = 0.0, time = 8.788
ipad ER: power = 0.0, FDR = 0.0, rank = 495, time = 0.064
ipad GR: power = 0.58, FDR = 0.23684210526315788, rank = 2, time = 0.036
ipad VE: power = 0.56, FDR = 0.2, rank = 213, time = 0.034
ME: power = 0.0, FDR = 0.0, time = 2.239
MVR: power = 0.0, FDR = 0.0, time = 3.351
SDP: power = 0.4, FDR = 0.13043478260869565, time = 8.523
ipad ER: power = 0.68, FDR = 0.2916666666666667, rank = 1, time = 0.047
ipad GR: power = 0.68, FDR = 0.2916666666666667, rank = 1, time = 0.036
ipad VE: power = 0.7, FDR = 0.3137254901960784, rank = 218, time = 0.034
ME: power = 0.0, FDR = 0.0, time = 2.29
MVR: power = 0.0, FDR = 0.0, time = 3.332
SDP: power = 0.0, FDR = 0.0, time = 8.96
ipad ER: power = 0.6, FDR = 0.375, rank = 1, time = 0.105
ipad GR: power = 0.6, FDR = 0.375, rank = 1, time = 0.025
ipad VE: power = 0.54, FDR = 0.38636363636363635, rank = 213, time = 0.173
ME: power = 0.44, FDR = 0.15384615384615385, time = 2.265
MVR: power = 0.36, FDR = 0.1, time = 3.223
SDP: power = 0.0, FDR = 0.0, time = 8.907
@show result;
result = 6×4 DataFrame
Row │ method power FDR time
│ String Float64 Float64 Float64
─────┼────────────────────────────────────────
1 │ IPAD-er 0.414 0.181828 0.0524697
2 │ IPAD-gr 0.624 0.263192 0.059193
3 │ IPAD-ve 0.626 0.281392 0.0629533
4 │ ME 0.338 0.0676118 2.39515
5 │ MVR 0.35 0.060006 3.36422
6 │ SDP 0.282 0.0422859 8.85414
Summary (when $X$ does not follow the factored model)
- IPAD methods have slightly~pretty inflated FDR (target = 10%)
- ER/GR/VE finds wildly differing ranks
Test3: Simulate with gnomAD panel
- Here we simulate $X_i \sim N(0, \Sigma)$ where $\Sigma$ is from the European gnomAD LD panel.
- $\Sigma$ can be downloaded and extract with the software EasyLD.jl.
function compare_ipad3(nsims)
# import panel
datadir = "/Users/biona001/Benjamin_Folder/research/4th_project_groupKO/group_knockoff_test_data"
mapfile = CSV.read(joinpath(datadir, "Groups_2_127374341_128034347.txt"), DataFrame)
groups = mapfile[!, :group]
covfile = CSV.read(joinpath(datadir, "CorG_2_127374341_128034347.txt"), DataFrame)
Σ = covfile |> Matrix{Float64}
Σ = 0.99Σ + 0.01I #ensure PSD
# test on smaller data
idx = findlast(x -> x == 263, groups) # 263 is largest group with 192 members
groups = groups[1:idx]
Σ = Σ[1:idx, 1:idx]
# simulation parameters
n = 500 # number of samples
p = size(Σ, 1) # number of covariates
m = 1 # number of knockoffs per variable
k = 50 # number of causal variables
sdp_powers, sdp_fdrs, sdp_times = 0.0, 0.0, 0.0
me_powers, me_fdrs, me_times = 0.0, 0.0, 0.0
mvr_powers, mvr_fdrs, mvr_times = 0.0, 0.0, 0.0
ipad_er_powers, ipad_er_fdrs, ipad_er_times = 0.0, 0.0, 0.0
ipad_gr_powers, ipad_gr_fdrs, ipad_gr_times = 0.0, 0.0, 0.0
ipad_ve_powers, ipad_ve_fdrs, ipad_ve_times = 0.0, 0.0, 0.0
for seed in 1:nsims
# simulate X
Random.seed!(seed)
X = rand(MvNormal(Σ), n)' |> Matrix
zscore!(X, mean(X, dims=1), std(X, dims=1));
# simulate y
Random.seed!(seed)
βtrue = zeros(p)
βtrue[1:k] .= rand(-1:2:1, k) .* randn(k)
shuffle!(βtrue)
ϵ = randn(n)
y = X * βtrue + ϵ
μ = zeros(p)
correct_position = findall(!iszero, βtrue)
# ipad with ER
Random.seed!(seed)
ipad_er_t = @elapsed ipad_er = ipad(X, r_method = :er, m = m)
ipad_er_ko_filter = fit_lasso(y, ipad_er)
selected = ipad_er_ko_filter.selected[3]
ipad_er_power = length(selected ∩ correct_position) / k
ipad_er_fdr = length(setdiff(selected, correct_position)) / max(length(selected), 1)
#ipad with GR
Random.seed!(seed)
ipad_gr_t = @elapsed ipad_gr = ipad(X, r_method = :gr, m = m)
ipad_gr_ko_filter = fit_lasso(y, ipad_gr)
selected = ipad_gr_ko_filter.selected[3]
ipad_gr_power = length(selected ∩ correct_position) / k
ipad_gr_fdr = length(setdiff(selected, correct_position)) / max(length(selected), 1)
# ipad with VE
Random.seed!(seed)
ipad_ve_t = @elapsed ipad_ve = ipad(X, r_method = :ve, m = m)
ipad_ve_ko_filter = fit_lasso(y, ipad_ve)
selected = ipad_ve_ko_filter.selected[3]
ipad_ve_power = length(selected ∩ correct_position) / k
ipad_ve_fdr = length(setdiff(selected, correct_position)) / max(length(selected), 1)
# ME knockoffs
Random.seed!(seed)
me_t = @elapsed me = modelX_gaussian_knockoffs(X, :maxent, m = m)
me_ko_filter = fit_lasso(y, me)
selected = me_ko_filter.selected[3]
me_power = length(selected ∩ correct_position) / k
me_fdr = length(setdiff(selected, correct_position)) / max(length(selected), 1)
# MVR knockoffs
Random.seed!(seed)
mvr_t = @elapsed mvr = modelX_gaussian_knockoffs(X, :mvr, m = m)
mvr_ko_filter = fit_lasso(y, mvr)
selected = mvr_ko_filter.selected[3]
mvr_power = length(selected ∩ correct_position) / k
mvr_fdr = length(setdiff(selected, correct_position)) / max(length(selected), 1)
# SDP (ccd) knockoffs
Random.seed!(seed)
sdp_t = @elapsed sdp = modelX_gaussian_knockoffs(X, :sdp_ccd, m = m)
sdp_ko_filter = fit_lasso(y, sdp)
selected = sdp_ko_filter.selected[3]
sdp_power = length(selected ∩ correct_position) / k
sdp_fdr = length(setdiff(selected, correct_position)) / max(length(selected), 1)
# record results
println("ipad ER: power = $ipad_er_power, FDR = $ipad_er_fdr, rank = $(ipad_er.r), time = $(round(ipad_er_t, digits=3))")
println("ipad GR: power = $ipad_gr_power, FDR = $ipad_gr_fdr, rank = $(ipad_gr.r), time = $(round(ipad_gr_t, digits=3))")
println("ipad VE: power = $ipad_ve_power, FDR = $ipad_ve_fdr, rank = $(ipad_ve.r), time = $(round(ipad_ve_t, digits=3))")
println("ME: power = $me_power, FDR = $me_fdr, time = $(round(me_t, digits=3))")
println("MVR: power = $mvr_power, FDR = $mvr_fdr, time = $(round(mvr_t, digits=3))")
println("SDP: power = $sdp_power, FDR = $sdp_fdr, time = $(round(sdp_t, digits=3))\n\n")
sdp_powers += sdp_power
sdp_fdrs += sdp_fdr
sdp_times += sdp_t
me_powers += me_power
me_fdrs += me_fdr
me_times += me_t
mvr_powers += mvr_power
mvr_fdrs += mvr_fdr
mvr_times += mvr_t
ipad_er_powers += ipad_er_power
ipad_er_fdrs += ipad_er_fdr
ipad_er_times += ipad_er_t
ipad_gr_powers += ipad_gr_power
ipad_gr_fdrs += ipad_gr_fdr
ipad_gr_times += ipad_gr_t
ipad_ve_powers += ipad_ve_power
ipad_ve_fdrs += ipad_ve_fdr
ipad_ve_times += ipad_ve_t
end
# compute average
ipad_er_powers /= nsims
ipad_er_fdrs /= nsims
ipad_er_times /= nsims
ipad_gr_powers /= nsims
ipad_gr_fdrs /= nsims
ipad_gr_times /= nsims
ipad_ve_powers /= nsims
ipad_ve_fdrs /= nsims
ipad_ve_times /= nsims
me_powers /= nsims
me_fdrs /= nsims
me_times /= nsims
mvr_powers /= nsims
mvr_fdrs /= nsims
mvr_times /= nsims
sdp_powers /= nsims
sdp_fdrs /= nsims
sdp_times /= nsims
# save in dataframe
result = DataFrame(
method=["IPAD-er", "IPAD-gr", "IPAD-ve", "ME", "MVR", "SDP"],
power=[ipad_er_powers,ipad_gr_powers,ipad_ve_powers,me_powers,mvr_powers,sdp_powers],
FDR=[ipad_er_fdrs,ipad_gr_fdrs,ipad_ve_fdrs,me_fdrs,mvr_fdrs,sdp_fdrs],
time=[ipad_er_times,ipad_gr_times,ipad_ve_times,me_times,mvr_times,sdp_times]
)
return result
end
nsims = 10
result = compare_ipad3(nsims);
ipad ER: power = 0.54, FDR = 0.5645161290322581, rank = 1, time = 0.036
ipad GR: power = 0.54, FDR = 0.5645161290322581, rank = 1, time = 0.03
ipad VE: power = 0.54, FDR = 0.578125, rank = 90, time = 0.033
ME: power = 0.32, FDR = 0.1111111111111111, time = 6.315
MVR: power = 0.32, FDR = 0.1111111111111111, time = 6.39
SDP: power = 0.34, FDR = 0.05555555555555555, time = 22.637
ipad ER: power = 0.28, FDR = 0.6818181818181818, rank = 1, time = 0.036
ipad GR: power = 0.28, FDR = 0.6818181818181818, rank = 1, time = 0.035
ipad VE: power = 0.28, FDR = 0.6888888888888889, rank = 91, time = 0.029
ME: power = 0.2, FDR = 0.0, time = 7.001
MVR: power = 0.2, FDR = 0.0, time = 6.373
SDP: power = 0.24, FDR = 0.0, time = 22.658
ipad ER: power = 0.46, FDR = 0.43902439024390244, rank = 1, time = 0.029
ipad GR: power = 0.46, FDR = 0.43902439024390244, rank = 1, time = 0.029
ipad VE: power = 0.46, FDR = 0.4772727272727273, rank = 90, time = 0.029
ME: power = 0.44, FDR = 0.12, time = 6.205
MVR: power = 0.46, FDR = 0.14814814814814814, time = 6.395
SDP: power = 0.5, FDR = 0.24242424242424243, time = 22.824
ipad ER: power = 0.32, FDR = 0.5151515151515151, rank = 1, time = 0.039
ipad GR: power = 0.32, FDR = 0.5151515151515151, rank = 1, time = 0.039
ipad VE: power = 0.32, FDR = 0.5294117647058824, rank = 91, time = 0.042
ME: power = 0.28, FDR = 0.0, time = 6.303
MVR: power = 0.32, FDR = 0.058823529411764705, time = 6.45
SDP: power = 0.44, FDR = 0.2903225806451613, time = 22.526
ipad ER: power = 0.46, FDR = 0.41025641025641024, rank = 1, time = 0.028
ipad GR: power = 0.46, FDR = 0.41025641025641024, rank = 1, time = 0.027
ipad VE: power = 0.46, FDR = 0.425, rank = 91, time = 0.034
ME: power = 0.42, FDR = 0.16, time = 6.166
MVR: power = 0.42, FDR = 0.16, time = 6.561
SDP: power = 0.44, FDR = 0.12, time = 23.091
ipad ER: power = 0.56, FDR = 0.4716981132075472, rank = 1, time = 0.028
ipad GR: power = 0.56, FDR = 0.4716981132075472, rank = 1, time = 0.044
ipad VE: power = 0.56, FDR = 0.4716981132075472, rank = 90, time = 0.04
ME: power = 0.46, FDR = 0.11538461538461539, time = 6.062
MVR: power = 0.42, FDR = 0.08695652173913043, time = 6.837
SDP: power = 0.48, FDR = 0.14285714285714285, time = 23.626
ipad ER: power = 0.32, FDR = 0.627906976744186, rank = 1, time = 0.062
ipad GR: power = 0.32, FDR = 0.627906976744186, rank = 1, time = 0.046
ipad VE: power = 0.34, FDR = 0.6304347826086957, rank = 89, time = 0.05
ME: power = 0.0, FDR = 0.0, time = 6.593
MVR: power = 0.0, FDR = 0.0, time = 7.597
SDP: power = 0.0, FDR = 0.0, time = 22.777
ipad ER: power = 0.42, FDR = 0.6181818181818182, rank = 1, time = 0.029
ipad GR: power = 0.42, FDR = 0.6181818181818182, rank = 1, time = 0.044
ipad VE: power = 0.42, FDR = 0.6666666666666666, rank = 90, time = 0.069
ME: power = 0.2, FDR = 0.23076923076923078, time = 6.604
MVR: power = 0.2, FDR = 0.23076923076923078, time = 6.423
SDP: power = 0.2, FDR = 0.16666666666666666, time = 23.46
ipad ER: power = 0.58, FDR = 0.4727272727272727, rank = 1, time = 0.067
ipad GR: power = 0.58, FDR = 0.4727272727272727, rank = 1, time = 0.031
ipad VE: power = 0.58, FDR = 0.48214285714285715, rank = 90, time = 0.096
ME: power = 0.32, FDR = 0.0, time = 6.368
MVR: power = 0.3, FDR = 0.0, time = 6.641
SDP: power = 0.42, FDR = 0.0, time = 22.832
ipad ER: power = 0.36, FDR = 0.4857142857142857, rank = 1, time = 0.044
ipad GR: power = 0.36, FDR = 0.4857142857142857, rank = 1, time = 0.045
ipad VE: power = 0.36, FDR = 0.4857142857142857, rank = 91, time = 0.04
ME: power = 0.28, FDR = 0.0, time = 6.269
MVR: power = 0.28, FDR = 0.0, time = 6.526
SDP: power = 0.4, FDR = 0.09090909090909091, time = 23.831
@show result;
result = 6×4 DataFrame
Row │ method power FDR time
│ String Float64 Float64 Float64
─────┼─────────────────────────────────────────
1 │ IPAD-er 0.43 0.5287 0.0398029
2 │ IPAD-gr 0.43 0.5287 0.0370067
3 │ IPAD-ve 0.432 0.543536 0.0462691
4 │ ME 0.292 0.0737265 6.38856
5 │ MVR 0.292 0.0795809 6.61936
6 │ SDP 0.346 0.110874 23.0262
Summary (when $X$ is simulated based on real data)
- IPAD methods have extremely inflated FDR (target = 10%)
- model-X Knockoffs control FDR