使用img2table将图片转化为excel表格的脚本(https://github.com/xavctn/img2table)。
安装img2table package,使用pandas的ocr对中文比较好,默认的Tesseract似乎对中文支持不是特别好
pip install img2table[paddle]
运行脚本,修改输入输出文件名运行即可
# pip install img2table[paddle] from img2table.ocr import PaddleOCR from img2table.document import Image src = 'Scan_0001.jpg' dest = 'Scan_0001.xlsx' # Instantiation of OCR ocr = PaddleOCR(lang='ch') # Instantiation of document, either an image or a PDF doc = Image(src) # Extraction of tables and creation of a xlsx file containing tables doc.to_xlsx(dest=dest, ocr=ocr, implicit_rows=False, implicit_columns=False, borderless_tables=False, min_confidence=50)