Cell Detection in Tabular data

MANOHAR, SHARDUL

DR Home
→
THESES & PROJECT REPORTS
→
MS THESES
→
View Item

dc.contributor.advisor	Mundankar, Ajinkya
dc.contributor.author	MANOHAR, SHARDUL
dc.date.accessioned	2023-12-22T04:03:41Z
dc.date.available	2023-12-22T04:03:41Z
dc.date.issued	2023-11
dc.identifier.citation	40	en_US
dc.identifier.uri	http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/8370
dc.description.abstract	The primary goal of our project is to create a non - deep learning solution for effectively segmenting cells within tabular data, accommodating tables with or without gridlines. We have devised an algorithm based on K-Means Clustering to facilitate cell segmentation within tables, irrespective of the presence of gridlines. Our approach involves identifying clusters of characters, often representing words or numbers, and subsequently calculating their centres of mass. We create distinct arrays for the x and y coordinates of these centres. Employing K-Means clustering separately on x coordinates and y coordinates of centres, we determine the optimal number of clusters, denoted as 'k,' from 1 to a predefined maximum value ('max_k') using a novel method for selecting the most suitable 'k', as the existing methods yielded unsatisfactory results. Subsequently, we discern rows and columns separately by employing K-Means clustering with the determined 'k' and identify individual cells through the intersection of these rows and columns. In addition, we have developed an alternative algorithm tailored for tables containing gridlines. In this scenario, we use canny edge detection and hough transform to detect lines, followed by the identification of intersection points. We use intersection points to detect gridlines. Using these detected gridlines, we reconstruct the table structure.	en_US
dc.language.iso	en	en_US
dc.subject	Table cell detection	en_US
dc.subject	Cell detection in tabular data	en_US
dc.title	Cell Detection in Tabular data	en_US
dc.type	Article	en_US
dc.description.embargo	No Embargo	en_US
dc.type.degree	BS-MS	en_US
dc.contributor.department	Dept. of Data Science	en_US
dc.contributor.registration	20181104	en_US

Files in this item

Name: 20181104_Shardul_ ...

Size: 3.533Mb

Format: PDF

Description: MS Thesis

View/Open

This item appears in the following Collection(s)

MS THESES [2219]
Thesis submitted to IISER Pune in partial fulfilment of the requirements for the BS-MS Dual Degree Programme/MSc. Programme/MS-Exit Programme

Show simple item record

Search Repository

Advanced Search

Browse

All of Repository
This Collection
- Titles
- Authors
- By Advisor
- By Issue Date
- Subjects
- By Type
- By Department

Cell Detection in Tabular data

Files in this item

This item appears in the following Collection(s)

Search Repository

Browse

All of Repository

This Collection

My Account