使用img2table将图片转化为excel表格的脚本(https://github.com/xavctn/img2table)。
安装img2table package,使用pandas的ocr对中文比较好,默认的Tesseract似乎对中文支持不是特别好
pip install img2table[paddle]
运行脚本,修改输入输出文件名运行即可
# pip install img2table[paddle]
from img2table.ocr import PaddleOCR
from img2table.document import Image
src = 'Scan_0001.jpg'
dest = 'Scan_0001.xlsx'
# Instantiation of OCR
ocr = PaddleOCR(lang='ch')
# Instantiation of document, either an image or a PDF
doc = Image(src)
# Extraction of tables and creation of a xlsx file containing tables
doc.to_xlsx(dest=dest,
ocr=ocr,
implicit_rows=False,
implicit_columns=False,
borderless_tables=False,
min_confidence=50)