使用gradio创建一个提取pdf、excel中表格数据的demo

这篇具有很好参考价值的文章主要介绍了使用gradio创建一个提取pdf、excel中表格数据的demo。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

使用Gradio创建一个提取pdf、excel中表格数据的demo

在线体验地址 (https://swanhub.co/patch/TabularScan/demo)

大家可以在上面的链接中试用，需求不大也不用自己弄代码了。
后续大家如果有一些代码或功能想快速部署、提供服务，不管是 AI 项目或是 web 项目，也可以直接托管在 swanhub开源社区上，方便快捷，而且免费

最近需要对pdf、excel文件中的表格进行提取，用于一些分析，所以使用python完成了一个小工具，可以处理上传的pdf、excel文件，将其中所有表格提取出后存入数组输出：

import gradio as gr
import pdfplumber
import os
import openpyxl


def process_pdf(file):
    file_extension = os.path.splitext(file.orig_name)[-1]

    tables = []

    if file_extension == ".pdf":
        with pdfplumber.open(file.orig_name) as pdf:
            for page in pdf.pages:
                table = page.extract_tables()
                tables.append(table)
    elif file_extension == '.xlsx':
        excel = openpyxl.load_workbook(file.orig_name)
        for name in excel.sheetnames:
            sheet = excel[name]

            max_row = sheet.max_row
            max_column = sheet.max_column

            for row in sheet.iter_rows(values_only=True):
                row_data = []
                for cell_value in row:
                    row_data.append(cell_value)  # 将单元格值添加到当前行的数据列表
                tables.append(row_data)  # 将当前行的数据列表添加到主数组

    return tables


iface = gr.Interface(
    fn=process_pdf,
    inputs=gr.inputs.File(type="file"),
    outputs="text",
    title="上传 PDF/Excel 文件",
    description="提取上传文件中的所有表格，并以数组形式输出",
)

iface.launch()