Introduction

This package conducts knockoff-based inference to perform genome-wide conditional independent tests based on GWAS summary statistics. The methodology is described in the following papers

Chen Z, He Z, Chu BB, Gu J, Morrison T, Sabatti C, Candes C. "Controlled Variable Selection from Summary Statistics Only? A Solution via GhostKnockoffs and Penalized Regression", arXiv preprint arXiv:2402.12724 (2024); doi: https://doi.org/10.48550/arXiv.2402.12724

Chu BB, Gu J, Chen Z, Morrison T, Candes E, He Z, Sabatti C. (2023). Second-order group knockoffs with applications to GWAS. arXiv preprint arXiv:2310.15069; doi: https://doi.org/10.48550/arXiv.2310.15069

He Z, Chu BB, Yang J, Gu J, Chen Z, Liu L, Morrison T, Bellow M, Qi X, Hejazi N, Mathur M, Le Guen Y, Tang H, Hastie T, Ionita-laza I, Sabatti C, Candes C. "In silico identification of putative causal genetic variants", bioRxiv, 2024.02.28.582621; doi: https://doi.org/10.1101/2024.02.28.582621

The main working assumption is that we do not have access to individual level genotype or phenotype data. Rather, for each SNP, we have its Z-scores with respect to some phenotype from a GWAS, and access to LD (linkage disequilibrium) data. The user is expected supply the Z-scores, while we supply the LD data in addition to some pre-computed knockoff data.

Q: When should I use GhostKnockoffGWAS?

Answer: If you already conducted a GWAS, have an output file that includes Z scores (or equivalent) for each SNP, and there exist pre-processed LD files in downloads page in which the listed population matches the ethnicities for your original GWAS study.

  • If your original study had little (e.g. <5) discoveries, then GhostKnockoffGWAS may not give better results. The methodology works better for more polygenic traits.
  • If your study subjects are somewhat admixed, one can try using the most suitable LD files, and check how much deviation there are from the LD files by examining the LD_shrinkage parameter in the output of GhostKnockoffGWAS, see this FAQ.
  • If instead you have individual level genotypes, you should run a GWAS using standard tools (e.g. PLINK, BOLT, GCTA, SAIGE, GEMMA, ...etc) before running GhostKnockoffGWAS.

Quick Start

Most users are expected to follow this workflow. Detailed explanations for each step is available in Tutorial.

  1. Go to Download Page and download (1) the software and (2) the pre-processed LD files. For example,

     wget https://github.com/biona001/GhostKnockoffGWAS/releases/download/v0.2.2/app_linux_x86.tar.gz
     wget https://zenodo.org/records/10433663/files/EUR.zip
  2. Unzip them both:

     tar -xvzf app_linux_x86.tar.gz
     unzip EUR.zip  # decompresses to ~8.7GB
  3. Prepare your input Z score file into accepted format, see Acceptable Z-scores. A toy example can be downloaded by:

     wget https://github.com/biona001/GhostKnockoffGWAS/raw/main/data/example_zfile.txt
  4. Run the executable

     app_linux_x86/bin/GhostKnockoffGWAS --zfile example_zfile.txt --LD-files EUR --N 506200 --genome-build 38 --out example_output
  5. Make Manhattan plot with this R script. See step 5 in Tutorial for more details.

Those familiar with the Julia programming language can use GhostKnockoffGWAS as a regular julia package, see usage within Julia.

More general knockoff constructions

If you are interested in the broader knockoff methodology, not necessarily based on GWAS summary statistics, see for example