Research & Initiatives

We aspire to better understand multi-level transcriptional regulation in the context of chromatin architecture with genetic variations and epigenetic modifications, by which to ultimately decode the multi-dimensional information of the development and complex diseases.

Multi-level genetic and epigenetic regulation of transcription

At the interface between Biology and Computer Science, our laboratory seeks to better understand transcriptional regulation during development and disease pathogenesis. Mainly based on high-throughput sequencing technologies, we use the statistical and computational methods combined with experimental approaches to detangle the genetic and epigenetic regulations of transcription, at both the bulk-tissue and single-cell resolutions.

1. Epigenetic regulation of transcription

Although all cells in an organism contain essentially the same DNA, cell types and functions differ because of qualitative and quantitative differences in their gene expression. Thus, control of transcription is at the heart of differentiation, development, and disease pathogenesis. Epigenetic processes, including DNA methylation, histone modification, and various noncoding RNA-mediated processes, as well as highly related chromatin conformation, are thought to influence gene expression chiefly at the level of transcription; however, other steps in the process (for example splicing and translation) may also be regulated epigenetically.

We develop and apply computation approaches on the epigenomic high-throughput data to answer how the transcription is controlled epigenetically and the functional relevance of epigenetic changes during development and disease pathogenesis.

2. Chromatin looping and transcriptional regulation

The control of gene expression involves regulatory elements that can be very far from the genes they control. Several recent technological advances, such as chromatin conformation capture (3C) based technologies, have allowed the direct detection of chromatin loops that juxtapose distant genomic sites in the nucleus.

We develop and apply computational approaches to dig the 3C-based data and try to provide new insights into

1) the functions of chromatin loops, such as the widespread impact of chromatin loops on gene activation, repression, genomic imprinting, and the function of enhancers and insulators.

2) the mechanisms that form and change the loops, such as epigenetic modifications and genetic variations, as well as transcription factor bindings and eRNA mediation.

3) reconstruction of trajectories for cell differentiation or development.

4) Multiple single-cell omics data integration and analysis.

3. Single-cell omics

Single-cell sequencing interrogates the sequence or chromatin information from individual cells with advanced next-generation sequencing technologies. It provides a higher resolution of cellular differences and a better understanding of the underlying genetic and epigenetic mechanisms of an individual cell in the context of development, adaptation, and disease pathologies.

We develop and apply computational methods on the single-cell multi-omics sequencing data to better understand the transcription of individual cells. Research interests include but are not limited to

1) single-cell sequencing data cleaning, denoising, and modeling.

2) identification of novel cell types or subclones to global patterns of stochastic gene expression for cell-to-cell heterogeneity.

4. Machine learning in multi-omics

The innovation of biotechnologies has allowed the accumulation of omics data at an alarming rate, thus introducing the era of “big data”. Extracting inherent valuable knowledge from various omics data remains a daunting problem in bioinformatics. The development and application of machine learning have largely advanced our insights into biology and biomedicine, and greatly promoted the development of therapeutic strategies, especially for precision medicine.

We develop and apply machine learning methods, most of which are deep learning approaches, for multi-omics sequencing data to better understand the multi-layer regulation of transcription. Research interests include but are not limited to

1) Computational modeling of 3D genome architecture using 3C-based data (A).

2) Machine learning-based modeling and prediction of epigenetic and chromatin states (B).

3) Genome annotation and functional characterization for gene expression (C)

4) Machine learning-assisted genome editing (D).

See Publications