Hi everyone, first ever blog post!

I am Benjamin Chu, 2nd year Ph.D student from biomathematics department at UCLA. I’m a first time GSoC student, my mentor is Kevin Keys.

I’m working on a package that does iterative hard-thresholding IHT.jl. This algorithm is fast enough (avoids hessian matrices!) that we can run a standard biology dataset today of size $$\approx 10000 \times 1,000,000$$ on a personal computer and finish within an hour or so. It’s an idea that has been around for maybe 10~20 years(?), but I guess new enough that it still doesn’t have a wiki page.

I’m planning to add 3 features to the current IHT package for analyzing GWAS (i.e. genetics) data. Some of these are more or less grind throughs, such as learning to manipulate binary datafiles, and others such as grouping predictors are quite non-trivial for me in terms of how to actually do it. But I live by the principle that I try my best and life will figure out a way (most of the time), so I’ll probably be okay.

tl;dr I’m work on genetics data, using math!