Identifying Population Specific IBD Associated Mutations, Genes and Pathways

Inflammatory bowel disease (IBD) is a group of disorders that involve chronic inflammation of the colon and small intestine. The two major types of IBD are ulcerative colitis (UC): long term inflammation and ulcers of the colon and rectum, and Crohn’s Disease (CD): inflammation of the digestive tract lining that can spread into affected tissues. IBD can be debilitating, and life threatening in severe cases. There are currently over 1.5 million estimated IBD patients in the United Stated. IBD results in substantial direct healthcare costs (over US$6 billion per year in the United States) and the incidence of IBD continues to rise worldwide. Over the last two decades genetics has been shown to be a major contributor to IBD predisposition. Most studies investigating IBD have focused on genome-wide association studies (GWAS) of common genetic variants, mostly in European populations. While it has been shown that IBD has a substantial Mendelian/monogenic component of high impact mutations, and that it is likely that a significant proportion of IBD mutations and genes are population-specific (for example: occur in African Americans and not in Europeans), there has not yet been an effort to perform in IBD (or in any other disease) a large-scale investigation of population-specific IBD-causing high-impact mutations, genes and pathways. Understanding population-specific IBD genomics would ultimately contribute to personalized medicine of IBD patients, where the population background of the patient will be crossed with the patient’s estimated IBD mutations to tailor an optimal therapy. We hypothesize that IBD has a major component of population-specific high-impact predisposing mutations and genes, and that these mutations and genes can be discovered with gene burden analyses on whole exome sequencing (WES) data of IBD patients. We therefore propose to perform an ancillary study on IBDGC’s WES data of IBD patients that consists of four major human populations: Ashkenazi Jewish, African Americans, Hispanics and Europeans. IBDGC is ideal for such a study, as it contains enough samples per population group to perform WES gene burden analyses of high-impact genetic variants (shown in preliminary results). We also have full access to Mount Sinai’s BioMe BioBank, which contains WES data of over 30,845 patients across diverse populations (the majority being the four populations of IBDGC) including IBD patients, that will boost our sample sizes and analysis power. We propose a rigorous analysis pipeline combining state-of-the-art established software with cutting-edge approaches across human population and disease genomics, as well as diverse computational algorithms to: (1) obtain a high-quality set of IBD cases and controls by genetic population estimates of Ashkenazi Jewish, African Americans, Hispanics and Europeans on IBDGC data; (2) performing a gene burden cases-controls analysis for high impact variants in each population; and (3) identifying population- specific IBD-associated mutations, genes and pathways, performing computational functional relatedness of the results to IBD, and performing phenome-wide association (PheWas) to IBD on Mount Sinai’s BioMe BioBank.