Main Article Content
Due to data revolution in the 21st century, processing the ever-increasing volume of documents has become essential. Most of the data in the banking, financial and administrative disciplines is still stored on physical documents. There is a great necessity to process these documents using automation. A majority of useful data in these documents is stored in the form of tables. To maintain the value of data extracted, the data from tables needs to be extracted by maintaining the tabular structure. We have used an image processing approach for extracting these tables and the data contained in them. We perform operations on scanned documents to identify rows and columns of the table. We then extract the textual data using Optical Character recognition from each cell of the table. We used this approach for extracting bordered tables and achieved more than 90% accuracy in extracting the tabular data accurately.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
- Shubham Paliwal, Vishwanath D, Rohit Rahul, Monika Sharma, Lovekesh Vig, “TableNet: Deep Learning model for end-to-end table detection and tabular data extraction for scanned document images”/
- Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Dengel, Sheraz Ahmed, “Deepdesrt: deep-learning for detection and structure recognition of tables in document images”/
- Basilios Gatos, Dimitrios Danatsas, Ioannis Pratikakis, Stavros J. Perantonis, “Automatic Table detection in document images”/
- Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, Zhouiun Li, “Tablebank: Table benchmark for image-based table detection and recognition”/
- Aditya Kekare, Abhishek Jachak, Atharva Gosavi, P.S. Hanwate, “Techniques for detecting and extracting tabular data from PDFs and scanned documents: A survey”/
- S. Deivalakshmi, K. Chaitanya, P. Palanisamy, “Detection of table structure and content extraction from scanned documents”/