Manuscript title: Genome-wide associations of human gut microbiome variation and implications for causal inference analyses Journal: Nature Microbiology Authors: David A Hughes1,2*, Rodrigo Bacigalupe3,4*, Jun Wang3,4,5, Malte C Rühlemann6, Raul Y. Tito3,4, Gwen Falony3,4, Marie Joossens3,4, Sara Vieira-Silva3,4, Liesbet Henckaerts7,6, Leen Rymenans3,4, Chloë Verspecht3,4, Susan Ring2,9, Andre Franke6, Kaitlin H. Wade1,2, Nicholas J. Timpson1,2**, Jeroen Raes3,4**. Affiliations: 1 MRC Integrative Epidemiology Unit at University of Bristol, Bristol, BS8 2BN, UK. 2 Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK. 3 Department of Microbiology and Immunology, Rega Instituut, KU Leuven–University of Leuven, Leuven, Belgium. 4 Center for Microbiology, VIB, Leuven, Belgium. 5 Institute of Microbiology, Chinese Academy of Sciences, Chaoyang District, 100101 Beijing, China. 6 Institute of Clinical Molecular Biology, Christian Albrechts University of Kiel, Kiel, Germany. 7 KU Leuven-University of Leuven, Department of Microbiology and Immunology, Leuven, Belgium. 8 KU Leuven-University Hospitals Leuven, Department of General Internal Medicine, Leuven, Belgium. 9 Bristol Bioresource Laboratories (BBL), University of Bristol, UK. ## More information can be found on the Raes Lab companion website at: - https://www.raeslab.org/companion/fgfp-gwas/ ## Data directory tree mGWAS |--gwas | |--method_score | |--bristol_md5sums.txt | `--fgfp | |--*_HB_allchr.txt.gz | `--*_RNT_allchr.txt.gz `--targeted_meta |--method_em |--bristol_md5sums.txt |--EM_targeted_meta_estimates.txt |--fgfp | |--*_HB.txt | `--*_RNT.txt |--focus | |--*_HB.txt | `--*_RNT.txt `--popgen |--*_HB.txt `--*_RNT.txt ## description of data in directories 1. gwas : directory of results for the complete FGFP 16S microbiome GWAS data set a. method_score : SNPTEST results using the method score - bristol_md5sum.txt the 128-bit MD5 hash values for all files in this directory. Can be used to insure downloaded data is complete and not corrupted. - fgfp : data from the FGFP cohort (i) *_HB_allchr.txt.gz : Presence|Absence microbial trait SNPTEST GWAS results files derived from the FGFP data set. (ii) *_RNT_allchr.txt.gz : rank normal transformed (tied values not split) continuous microbial trait SNPTEST GWAS result files derived from the FGFP data set. 2. targeted_meta : directory of results from the targeted meta analysis a. method_em : SNPTEST results using method em - bristol_md5sum.txt the 128-bit MD5 hash values for all files in this directory. Can be used to insure downloaded data is complete and not corrupted. - EM_targeted_meta_estimates.txt : the data set used in the targeted meta analysis. Results are compiled from FGFP, Focus and PopGen. - fgfp : targeted em results from the FGFP cohort (i) *_HB.txt Presence|Absence microbial trait SNPTEST GWAS results files (ii) *_RNT.txt rank normal transformed (tied values not split) continuous microbial trait SNPTEST GWAS result files. - focus : targeted em results from the FGFP cohort (i) *_HB.txt Presence|Absence microbial trait SNPTEST GWAS results files (ii) *_RNT.txt rank normal transformed (tied values not split) continuous microbial trait SNPTEST GWAS result files. - popgen : targeted em results from the FGFP cohort (i) *_HB.txt Presence|Absence microbial trait SNPTEST GWAS results files (ii) *_RNT.txt rank normal transformed (tied values not split) continuous microbial trait SNPTEST GWAS result files. ## SNPTEST flat text result file column headers defined rsid : RSID ID (taken from input files) snpid : SNP ID (taken from input files) chromosome : chromsome (taken from input files) position : base pair position (taken from input files) alleleA : allele coded as 0 alleleB : allele coded as 1 index : the 'i'th test being performed in the submitted chromosome average_maximum_posterior_call : The average maximum posterior probability across all individuals in the sample that are used for the test at each SNP.This is a measure of how much uncertainty there is at each SNP. info : imputation quality; A measure of the observed statistical information for the estimate of allele frequency of the SNP using all individuals in the sample that are used for the test at each SNP. cohort_1_AA : counts of AA genotypes in the cohort cohort_1_AB : counts of AB genotypes in the cohort cohort_1_BB : counts of BB genotypes in the cohort cohort_1_NULL : counts of NULL genotypes in the cohort all_AA : counts of AA genotpyes in all samples all_AB : counts of AB genotypes in all samples all_BB : counts of BB genotypes in all samples all_NULL : counts of NUL genotypes in all samples all_total : total number of individuals (N) in the sample cases_AA : count of AA genotypes in cases (or presence ) cases_AB : count of AB genotypes in cases (or presence ) cases_BB : count of BB genotypes in cases (or presence) cases_NULL : count of NULL genotypes in cases (or presence) cases_total : total number of cases controls_AA : count of AA genotypes in controls (or absence ) controls_AB : count of AB genotypes in controls (or absence ) controls_BB : count of BB genotypes in controls (or absence ) controls_NULL : count of NULL genotypes in controls (or absence ) controls_total : total number of controls all_maf : minor allele frequency in all individuals cases_maf : minor allele frequency in cases controls_maf : minor allele frequency in controls missing_data_proportion : The proportion of missing data across all cohorts. het_OR : Estimated odds ratios for the heterozygote genotype AB versus the (baseline) AA genotype. het_OR_lower : lower 95% confidence limits for the heterozygote genotype AB versus the (baseline) AA genotype. het_OR_upper : upper 95% confidence limits for the heterozygote genotype AB versus the (baseline) AA genotype. hom_OR : Estimated odds ratios and for the homozygote genotype BB versus the (baseline) AA genotype. hom_OR_lower : lower 95% confidence limits for the homozygote genotype BB versus the (baseline) AA genotype. hom_OR_upper : upper 95% confidence limits for the homozygote genotype BB versus the (baseline) AA genotype. all_OR : Estimated allelic odds ratios for the B allele versus the (baseline) A allele. all_OR_lower : lower 95% confidence limits for the B allele versus the (baseline) A allele. all_OR_upper : upper 95% confidence limits for the B allele versus the (baseline) A allele. frequentist_add_pvalue : p-value for the frequentist association test frequentist_add_info : imputation information score for the samples used in the association test frequentist_add_beta_1 : the effect estimate for the association test frequentist_add_se_1 : the standard errors for the association test comment : any comment or error derived by snptest