Digital Repository

One size does not fit all: On how Markov model order dictates performance of genomic sequence analyses

Show simple item record

dc.contributor.author Narlikar, Leelavati en_US
dc.contributor.author Mehta, Nidhi en_US
dc.contributor.author GALANDE, SANJEEV en_US
dc.contributor.author Arjunwadkar, Mihir en_US
dc.date.accessioned 2019-02-14T05:03:28Z
dc.date.available 2019-02-14T05:03:28Z
dc.date.issued 2013-02 en_US
dc.identifier.citation Nucleic Acids Research, 41(3), 1416-1424. en_US
dc.identifier.issn 0305-1048 en_US
dc.identifier.issn 1362-4962 en_US
dc.identifier.uri http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/1717
dc.identifier.uri https://doi.org/10.1093/nar/gks1285 en_US
dc.description.abstract The structural simplicity and ability to capture serial correlations make Markov models a popular modeling choice in several genomic analyses, such as identification of motifs, genes and regulatory elements. A critical, yet relatively unexplored, issue is the determination of the order of the Markov model. Most biological applications use a predetermined order for all data sets indiscriminately. Here, we show the vast variation in the performance of such applications with the order. To identify the ‘optimal’ order, we investigated two model selection criteria: Akaike information criterion and Bayesian information criterion (BIC). The BIC optimal order delivers the best performance for mammalian phylogeny reconstruction and motif discovery. Importantly, this order is different from orders typically used by many tools, suggesting that a simple additional step determining this order can significantly improve results. Further, we describe a novel classification approach based on BIC optimal Markov models to predict functionality of tissue-specific promoters. Our classifier discriminates between promoters active across 12 different tissues with remarkable accuracy, yielding 3 times the precision expected by chance. Application to the metagenomics problem of identifying the taxum from a short DNA fragment yields accuracies at least as high as the more complex mainstream methodologies, while retaining conceptual and computational simplicity. en_US
dc.language.iso en en_US
dc.publisher Oxford University Press en_US
dc.subject Markov model en_US
dc.subject Genomic sequence en_US
dc.subject Structural simplicity en_US
dc.subject Phylogeny reconstruction en_US
dc.subject computational simplicity en_US
dc.subject 2013 en_US
dc.title One size does not fit all: On how Markov model order dictates performance of genomic sequence analyses en_US
dc.type Article en_US
dc.contributor.department Dept. of Biology en_US
dc.identifier.sourcetitle Nucleic Acids Research en_US
dc.publication.originofpublisher Foreign en_US


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account