Oxford Brain Imaging Genetics Server - BIG40

Wellcome Centre for Integrative Neuroimaging (WIN/FMRIB), Oxford, UK
Department of Statistics and Actuarial Science, Simon Fraser University, Canada

Led by Lloyd T. Elliott (SFU) and Stephen Smith (Oxford)

Interactive PheWeb server live here

This open data server contains results from GWAS of almost 4,000 imaging-derived phenotypes from the multimodal brain imaging in UK Biobank. It is a major update to the original BIG server, using data from the 40,000 subject imaging data release from early 2020. The discovery sample size was 22,138 and the replication sample 11,086. Chromosomes 1:22 and X are included, resulting in associations with 17,103,079 SNPs.

The work was funded by Wellcome Trust. Compute resources were provided by the Oxford Biomedical Research Computing (BMRC) facility (a joint development between Oxford's Wellcome Centre for Human Genetics and Big Data Institute, supported by Health Data Research UK and the NIHR Oxford Biomedical Research Centre). This work was conducted in part using the UK Biobank Resource under Application Number 8107.

A preprint on the work is on bioRxiv (and see overview of Methods below).

Version history:
BIG40 24/04/20 Initial release. For a short period this will remain available here.
BIG40 29/06/20 Minor update with improved filtering on X chromosome SNPs, and provision of sample sizes for the pseudoautosomal and non-psuedoautosomal regions of the X chromosome.
BIG40 22/07/20 Minor correction to table of local peak activations (ChrX beta values were incorrect by factor of 2).
BIG40 15/10/20 Added summary stats for all 33k (disco+repro) subjects combined.

Brain imaging GWAS Data

Interactive PheWeb server: 33k subjects (discovery+replication pooled) and 22k subjects (discovery only)

Table of local-peak associations (-Log10(P) > 7.5):    Online table / Raw text

Table of IDPs (imaging-derived phenotypes) with individual IDPs' Manhattan plots
This includes names and descriptions of all IDPs, and categorisations into 16 structural and functional IDP categories (plus 1 QC category).
The table also includes links to a Manhattan plot for each IDP (column 1), and links to each IDP's UKB Showcase variable page (column 2).
The rightmost columns show the exact sample sizes per IDP, which vary slightly due to different patterns of missing data for different imaging modalities. Sample sizes for X chromosome associations also vary due to additional X chromosome exclusions, these are also shown in the "par" and "nonpar" columns. Separate values of N are given for the discovery dataset ("disc"), reproduction data ("rep") and all subjects combined ("all").

Combined PDF with all Manhattan plots (3,935 pages, 0.75GB)

Table of all variants (SNPs, etc.)    Compressed raw text table download only (due to size)
This has the following information for each variant: chr rsid pos a1 a2 af info

Summary stats downloads
Sumstats from 33k subjects (discovery and replication datasets combined)
The download for IDP 1 is:    release2/stats33k/0001.txt.gz
The download can be automated with curl:    curl -O -L -C - https://open.win.ox.ac.uk/ukbiobank/big40/release2/statsi33k/0001.txt.gz
Sumstats from 22k discovery-sample subjects
Use links such as:    release2/stats/0001.txt.gz


Brain imaging data was from the 40,000 participant release from early 2020, as processed by WIN/FMRIB on behalf of UK Biobank (Alfaro-Almagro, NeuroImage, 2018). We used all 3929 IDPs and QC measures available from UKB, as well as 6 derived summary connectivity features (see Elliott, Nature, 2018 and code). IDPs were deconfounded for an expanded new set of potential imaging confounds (Alfaro-Almagro, bioRxiv, 2020), as well as the standard 40 population genetic principal components. Subject and SNP selection, discovery and replication samples, and GWAS calculations using BGENIE, were as described in Elliott, Nature, 2018 (though with increased subject numbers). In the Manhattan plots, and the table of local-peak associations, we used the following SNP filters: MAF >= 0.01 and INFO >= 0.3 and HWE -Log10(P) <= 7. In the summary stats downloads, we used the following SNP filters: MAF >= 0.001 and INFO >= 0.3 and HWE -Log10(P) <= 7. For the X chromosome, the HWE filter was computed using genetic females only. The GWAS beta coefficient is in the direction of a2. The phenotypes are scaled to have unit variance after deconfounding, and the variants on chromosomes 1:22 are not scaled (variants for genetic males on the non-pseudoautosomal region of chromosome X are scaled to 0:2), and so a beta value of 1.0 indicates that each copy of the a2 allele generally confers an increase in the phenotype by one standard deviation.


Lloyd T. Elliott, Winfield Chen - Department of Statistics and Actuarial Science, Simon Fraser University, Canada.
Stephen Smith, Gwenaëlle Douaud, Fidel Alfaro-Almagro, Paul McCarthy, Duncan Mortimer - Wellcome Centre for Integrative Neuroimaging (WIN/FMRIB), Oxford University, UK.
Sinan Shi - Department of Statistics, Oxford University, UK.
Kevin Sharp - Genomics PLC, Oxford, UK.