RNA-seq data for ~70,000 human samples have been aligned using a single analytic pipeline called Rail-RNA, developed and implemented by Abhi Nellore. Spearheaded by Leo Collado-Torres and including the efforts of many in our group, these data have been processed and made available in a resource called recount. While the expression data are publicly available, we lack critical phenotype information for many of the samples included in this resource. In addition to identifying technical artifacts to be removed across these data, I’m developing phenotype predictors (ie sex, tissue, etc.) from the gene expression data to make important sample information availabe across all samples within recount.