Due to data revolution in the 21st century, processing the ever-increasing volume of documents has become essential. Most of the data in the banking, financial and administrative disciplines is still stored on physical documents. There is a great necessity to process these documents using automation. A majority of useful data in these documents is stored in the form of tables. To maintain the value of data extracted, the data from tables needs to be extracted by maintaining the tabular structure. We have used an image processing approach for extracting these tables and the data contained in them. We perform operations on scanned documents to identify rows and columns of the table. We then extract the textual data using Optical Character recognition from each cell of the table. We used this approach for extracting bordered tables and achieved more than 90% accuracy in extracting the tabular data accurately.


Image Processing, Optical Character Recognition

Aditya Kekare, Atharva Gosavi, Abhishek Jachak, and Amit Deshmane, "IMAGE PROCESSING APPROACH FOR EXTRACTING TABLES FROM SCANNED DOCUMENTS", IEJRD - International Multidisciplinary Journal, vol. 5, no. 5, p. 5, Jun. 2020.


