Probabilistic Unsupervised Learning with Heterogeneous Noisy Data

PARDHI, SAMARTH

Please use this identifier to cite or link to this item: http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/8825

Title:	Probabilistic Unsupervised Learning with Heterogeneous Noisy Data
Authors:	NARLIKAR, LEELAVATI PARDHI, SAMARTH Dept. of Mathematics 20191078
Keywords:	Bayesian Theory Model-based Clustering Mixture Model Markov Chain Monte Carlo Feature Selection
Issue Date:	May-2024
Citation:	59
Abstract:	Mixture models are widely used in situations where the interest is in the probability densities. In such cases, the focus is usually to estimate parameters and mixing probabilities with successful clustering. But in real-life applications, analysis becomes substantially challenging for two reasons; the first one is when we introduce heterogeneous features of continuous, categorical, and even ordinal type, and the second one is too many features make the algorithm computationally expensive and not all features contribute for the inference. Feature selection will be implemented for the same. Goal of this project is to summarize the existing methods and develop model-based approaches that are robust and scalable. These approaches will enable model-based clustering simultaneously selecting relevant features within heterogeneous data. Idea is to use Bayesian framework using the Gibbs sampling techniques. These are very popular techniques in mixture models. We investigate Gaussian and Categorical features in model-based clustering, assuming the number of cluster is finite and does not grow with sample size; such frameworks are called finite mixture model. These proposed methods are compared with maximum likelihood approach, which uses the Expected-Maximisation algorithm.
URI:	http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/8825
Appears in Collections:	MS THESES

Files in This Item:

File	Description	Size	Format
20191078_Samarth_Pardhi_MS_Thesis.pdf	MS Thesis	6.3 MB	Adobe PDF	View/Open

Show full item record