Genetic variants are conjectured to be the primary contributor to human disease. The overarching goal of our lab is to explain how variants in the human genome may lead to disease. We develop robust statistical models -- combining biophysics, machine learning, and comparative genomics -- that can integrate a wide range of data and relate genomic DNA mechanistically to gene expression and regulatory activity under both normal (i.e., healthy/wildtype) and perturbed (i.e., diseased/mutant) conditions. Development of such models is fundamental to establishing causal relationships between genomic variants and disease. We further aim to contribute in translational research by generating hypotheses about disease mechanisms that will be testable in human cells (e.g., via CRISPR editing) and disease models. Ongoing projects in the lab are focusing on gene expression at multiple scales:
Source of Specificity That DNA Binding Proteins Recognize
Transcription factors and other DNA binding proteins are the most direct regulators of gene expression. It is generally assumed that these proteins recognize sequence patterns (or motifs) when identifying their target binding sites in the genomic DNA. However, many protein-occupied regions do not contain the known motif for the corresponding protein. This has been noted in both in vivo and in vitro data. We are working to understand if there is a more accurate specificity model that can explain protein-DNA binding in all regions occupied by a protein.
Models of Gene Expression From DNA Sequence, Structure, Chromatin Configuration, and Multi-Omics Data
Gene expression is influenced by various factors. These include DNA sequence, DNA structure, expression of other genes, organization of the chromatin, histone modifications, positioning of nucleosomes, etc. It is not clear how often gene expression is the cause or the effect of these events. While our ultimate goal is to elicit such causal regulatory networks involving different factors that are currently confounded with gene expression, in our first step we are modeling gene expression from datasets on the different factors noted above.
Interaction Between Signaling Pathways and Gene Expression
Extra-cellular signaling is another strong factor influencing gene expression. Both developmental and disease-related genes are part of and regulated by signaling pathways. However, how information from signaling pathways and other regulatory factors are integrated to regulate gene expression is a poorly understood question. We are currently working to model DNA sequence mediated effects of extracellular signaling on gene expression.
Mechanisms of Chromatin Organization and Consequences of Variation in Chromatin Organization
Recent advances in chromatin conformation assays have shown that chromatin organization is a major regulator of gene expression. Interestingly, chromatin organization is itself a regulated event. We are interested to understand the mechanisms that lead to the three-dimensional organization of chromatins and the consequences of variations in chromatin organization.