Résumé

Motivation : Generation of genotype data has been growing exponentially over the last decade. With the large size of recent datasets comes a storage and computational burden with ever increasing costs. To reduce this burden, we propose XSI, a file format with reduced storage footprint that also allows computation on the compressed data and we show how this can improve future analyses. Results : We show that xSqueezeIt (XSI) allows for a file size reduction of 4−20× compared with compressed BCF and demonstrate its potential for ‘compressive genomics’ on the UK Biobank whole-genome sequencing genotypes with 8× faster loading times, 5× faster run of homozygozity computation, 30× faster dot products computation and 280× faster allele counts. Availability and implementation : The XSI file format specifications, API and command line tool are released under open-source (MIT) license and are available at https://github.com/rwk-unil/xSqueezeIt . Supplementary information : Supplementary data are available at Bioinformatics online.

Détails

Actions

PDF