Application of imitation learning in automating end-to-end exploratory data analysis

PATEL, DEVARSH

DR Home
→
THESES & PROJECT REPORTS
→
MS THESES
→
View Item

dc.contributor.advisor	Pate, Hima
dc.contributor.advisor	Manwani, Naresh
dc.contributor.author	PATEL, DEVARSH
dc.date.accessioned	2023-05-19T07:00:33Z
dc.date.available	2023-05-19T07:00:33Z
dc.date.issued	2023-05
dc.identifier.citation	80	en_US
dc.identifier.uri	http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/7933
dc.description.abstract	One of the open problems in data science is how to automate the end-to-end EDA process, which involves exploring the dataset, identifying patterns, outliers, and relationships among variables, and preparing the data for further analysis or modeling. Some of the existing approaches try to frame this problem as a Sequential Decision Making Problem and use Reinforcement Learning (RL) to solve it. However, a major challenge in this approach is how to define and assign rewards for each action (such as GROUP, FILTER, etc.) that is taken during the EDA process. These rewards are essential for RL to learn an optimal policy. The rewards are usually manually defined using various interestingness measures that capture how relevant or informative an action is given the current state of the analysis. However, these measures may not be able to capture all the important aspects of an action, such as its impact on subsequent actions or its alignment with the analysis goals. We present a novel end-to-end EDA method that learns to perform data analysis tasks from human expert EDA notebooks without explicitly relying on any interestingness mea sures. Our method uses an imitation learning framework that learns the optimal policy for EDA by mimicking the actions of expert data analysts. Specifically, we employ generative adversarial imitation learning (GAIL) which allows our model to capture the essential as pects of data analysis in various domains. Our method can generate EDA notebooks that are comparable to human-generated ones in terms of quality and diversity. The proposed approach is able to generate EDA sessions on different datasets that share the same schema. We evaluate our method on existing datasets for AutoEDA benchmarking and on synthetic datasets. We show that our method surpasses the current state-of-the-art end-to-end EDA method on various performance metrics and can generalize well on unseen datasets. Moreover, we show that the EDA sessions (generated using the learned model with our method) use a diverse set of interestingness measures for each step of the EDA process as a byproduct.	en_US
dc.language.iso	en	en_US
dc.subject	REINFORCEMENT LEARNING	en_US
dc.subject	IMITATION LEARNING	en_US
dc.subject	GENERATIVE ADVERSARIAL IMITATION LEARNING	en_US
dc.subject	GAIL	en_US
dc.subject	EXPLORATORY DATA ANALYSIS	en_US
dc.subject	EDA	en_US
dc.title	Application of imitation learning in automating end-to-end exploratory data analysis	en_US
dc.title.alternative	Application of imitation learning in automating end to end exploratory data analysis	en_US
dc.type	Thesis	en_US
dc.description.embargo	One Year	en_US
dc.type.degree	BS-MS	en_US
dc.contributor.department	Dept. of Data Science	en_US
dc.contributor.registration	20181222	en_US

Files in this item

Name: 20181222_Devarsh_ ...

Size: 1.122Mb

Format: PDF

Description: MS Thesis

View/Open

This item appears in the following Collection(s)

MS THESES [1980]
Thesis submitted to IISER Pune in partial fulfilment of the requirements for the BS-MS Dual Degree Programme/MSc. Programme/MS-Exit Programme

Show simple item record

Search Repository

Advanced Search

Browse

All of Repository
This Collection
- Titles
- Authors
- By Advisor
- By Issue Date
- Subjects
- By Type
- By Department

Application of imitation learning in automating end-to-end exploratory data analysis

Files in this item

This item appears in the following Collection(s)

Search Repository

Browse

All of Repository

This Collection

My Account