img2table实用脚本

使用img2table将图片转化为excel表格的脚本(https://github.com/xavctn/img2table)。

安装img2table package,使用pandas的ocr对中文比较好,默认的Tesseract似乎对中文支持不是特别好

pip install img2table[paddle]

运行脚本,修改输入输出文件名运行即可

# pip install img2table[paddle]
from img2table.ocr import PaddleOCR
from img2table.document import Image

src = 'Scan_0001.jpg'
dest = 'Scan_0001.xlsx'

# Instantiation of OCR
ocr = PaddleOCR(lang='ch')

# Instantiation of document, either an image or a PDF
doc = Image(src)

# Extraction of tables and creation of a xlsx file containing tables
doc.to_xlsx(dest=dest,
            ocr=ocr,
            implicit_rows=False,
            implicit_columns=False,
            borderless_tables=False,
            min_confidence=50)

发表评论