Please use this identifier to cite or link to this item: http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/8028
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorNARLIKAR, LEELAVATIen_US
dc.contributor.authorBISWAS, ANUSHUAen_US
dc.date.accessioned2023-06-20T09:03:08Z-
dc.date.available2023-06-20T09:03:08Z-
dc.date.issued2023-05en_US
dc.identifier.citation197en_US
dc.identifier.urihttp://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/8028-
dc.description.abstractThe degrees of expression of the hundreds of genes in a eukaryotic cell influence its phenotype and functioning. The binding of proteins such as transcription factors to particular regulatory regions on the DNA play an important role in this process of regulation of gene expression. Mutations in these regulatory regions can affect the gene expression and can often lead to misregulation resulting in disorders and diseases. Therefore, to profile a wide range of regulation related biochemical activities, a variety of high-throughput experimental assays have been designed. They give a genome wide map of the regions having certain common characteristics for which they have been profiled. Some of the examples include STARR-seq which recognizes active enhancers, ATAC-seq, that detects accessible chromatin, ChIP-seq, which is used to identify TF binding sites. These assays report regions that are 200 to 1000 bases long, although the functional elements present in these regions are ≈ 15 bases in length. Current computational algorithms look for a common characteristic within these reported regions to identify these short sequence signatures. Evidence, however, suggests that these regions reported by the experiments have considerable heterogeneity in them. In fact, while these methods can pick up on the stronger signals, they can easily miss out on the weaker or less frequent ones. In order to explicitly characterize the heterogeneity in these regions, I considered the question as a mixture modelling problem. Our first method, DIVERSITY, was developed to cluster regions from ChIP-seq experiment into groups while simultaneously learning sequence signatures specific to each group in a de novo manner. DIVERSITY provides novel insights into the different ways in which a protein can bind DNA, including co-operative binding with other proteins. We next looked at regions identified by exonuclease-based ChIP experiments. They measure the exonuclease activity very close to the actual protein binding sites with high precision, characterizing those regions with sharp read profile distributions. Our next method ExoDiversity models these regions by learning a joint probability distribution over the distinct ChIP-exo read signals in the forward and the reverse strands and the DNA sequences. It could resolve the binding footprints of the profiled TFs at nucleotide level resolution. These differences correlated with distinct DNA structure properties and sequence conservation profiles, implying that they are likely to have some functional importance. Finally, we investigated the varied mechanisms of TF binding at the regulatory regions. The combinatorial control exhibited by the TFs at these were modelled by our method cisDiversity. Without requiring any prior knowledge of TF, cell type or organism cisDiversity could determine discrete regulatory modules with particular combinations of TF binding sites when it was applied to multiple datasets across diverse types of species. We believe that our model-based approach of explaining the data in terms of various sequence components provides a comprehensive understanding of the regulatory information encoded in the data.en_US
dc.language.isoenen_US
dc.subjectmixture modellingen_US
dc.subjectmotif discoveryen_US
dc.subjecthigh throughput sequencingen_US
dc.subjectChIP seqen_US
dc.subjectGibbs samplingen_US
dc.subjecttranscription factor bindingen_US
dc.subjectregulatory regionsen_US
dc.subjectsequence conservationen_US
dc.titleMixture modelling to characterize diversity in DNA regionsen_US
dc.typeThesisen_US
dc.description.embargoNo Embargoen_US
dc.type.degreePh.Den_US
dc.contributor.departmentDept. of Data Scienceen_US
dc.contributor.registration20213301en_US
Appears in Collections:PhD THESES

Files in This Item:
File Description SizeFormat 
20213301_Anushua_Biswas_PhD_Thesis.pdfPhD thesis56.4 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.