一、系统架构概述

本系统以PyQt5为核心构建图形用户界面（GUI），通过集成主流云服务商的文字识别API实现多场景OCR功能，并采用SQLite轻量级数据库完成识别结果的持久化存储。系统架构分为三层：

表现层：PyQt5负责界面交互与结果展示，支持拖拽上传、实时预览及历史记录查询。
服务层：封装百度AI文字识别API调用逻辑，支持通用文字、身份证、银行卡、驾驶证四类识别模式。
数据层：SQLite数据库存储识别记录，包含时间戳、文件路径、识别结果及分类标签。

该设计兼顾开发效率与运行性能，PyQt5的跨平台特性与SQLite的无服务器配置使其适用于个人电脑及轻量级服务器环境。

二、PyQt5图形界面实现

1. 主窗口设计

主窗口采用QMainWindow框架，包含以下核心组件：

菜单栏：文件操作（打开、保存）、识别模式切换、历史记录管理。
工具栏：快速调用识别功能按钮。
中央区域：左侧为文件上传区（支持拖拽），右侧为识别结果展示区（QTextEdit或QTableWidget）。
状态栏：显示当前操作状态及识别进度。

from PyQt5.QtWidgets import QMainWindow, QVBoxLayout, QHBoxLayout, QLabel, QPushButton, QTextEdit, QFileDialog
class MainWindow(QMainWindow):
    def __init__(self):
        super().__init__()
        self.initUI()
    def initUI(self):
        self.setWindowTitle("多功能OCR识别系统")
        self.setGeometry(100, 100, 800, 600)
        # 中央部件布局
        central_widget = QWidget()
        main_layout = QHBoxLayout()
        # 左侧文件上传区
        left_layout = QVBoxLayout()
        self.file_label = QLabel("拖拽文件至此或点击选择")
        self.upload_btn = QPushButton("选择文件")
        self.upload_btn.clicked.connect(self.open_file_dialog)
        left_layout.addWidget(self.file_label)
        left_layout.addWidget(self.upload_btn)
        # 右侧结果展示区
        right_layout = QVBoxLayout()
        self.result_text = QTextEdit()
        self.result_text.setReadOnly(True)
        right_layout.addWidget(QLabel("识别结果："))
        right_layout.addWidget(self.result_text)
        main_layout.addLayout(left_layout, 1)
        main_layout.addLayout(right_layout, 2)
        central_widget.setLayout(main_layout)
        self.setCentralWidget(central_widget)
    def open_file_dialog(self):
        file_path, _ = QFileDialog.getOpenFileName(self, "选择图片", "", "Images (*.png *.jpg *.bmp)")
        if file_path:
            self.file_label.setText(f"已选择: {file_path}")
            # 调用识别逻辑
            self.recognize_text(file_path)
    def recognize_text(self, file_path):
        # 此处调用百度AI API，后续章节详述
        pass

2. 拖拽功能实现

通过重写dragEnterEvent和dropEvent方法支持文件拖拽上传：

from PyQt5.QtCore import Qt, QMimeData
class DropArea(QLabel):
    def __init__(self, parent):
        super().__init__("拖拽文件至此", parent)
        self.setAlignment(Qt.AlignCenter)
        self.setStyleSheet("""
            QLabel {
                border: 2px dashed #aaa;
                padding: 20px;
            }
        """)
        self.setAcceptDrops(True)
    def dragEnterEvent(self, event):
        if event.mimeData().hasUrls():
            event.acceptProposedAction()
    def dropEvent(self, event):
        for url in event.mimeData().urls():
            file_path = url.toLocalFile()
            if file_path.lower().endswith(('.png', '.jpg', '.bmp')):
                self.setText(f"已选择: {file_path}")
                # 触发识别逻辑
                parent = self.parent()
                if hasattr(parent, 'recognize_text'):
                    parent.recognize_text(file_path)
                break

三、百度AI文字识别API集成

1. API调用封装

系统封装了百度AI文字识别的四种模式，通过统一接口处理请求与响应：

import requests
import base64
import json
class BaiduOCR:
    def __init__(self, api_key, secret_key):
        self.api_key = api_key
        self.secret_key = secret_key
        self.access_token = self._get_access_token()
    def _get_access_token(self):
        auth_url = f"https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id={self.api_key}&client_secret={self.secret_key}"
        response = requests.get(auth_url)
        return response.json().get("access_token")
    def recognize(self, image_path, recognition_type="general"):
        with open(image_path, "rb") as f:
            image_base64 = base64.b64encode(f.read()).decode("utf-8")
        url_map = {
            "general": "https://aip.baidubce.com/rest/2.0/ocr/v1/general_basic",
            "id_card": "https://aip.baidubce.com/rest/2.0/ocr/v1/idcard",
            "bank_card": "https://aip.baidubce.com/rest/2.0/ocr/v1/bankcard",
            "driving_license": "https://aip.baidubce.com/rest/2.0/ocr/v1/driving_license"
        }
        url = url_map.get(recognition_type)
        if not url:
            raise ValueError("不支持的识别类型")
        headers = {"Content-Type": "application/x-www-form-urlencoded"}
        params = {"access_token": self.access_token}
        data = {"image": image_base64, "recognize_granularity": "big"}
        response = requests.post(url, params=params, headers=headers, data=data)
        return response.json()

2. 多模式识别实现

通用文字识别：适用于普通文档、截图等场景。
身份证识别：自动区分正面（人像面）与反面（国徽面），提取姓名、身份证号等信息。
银行卡识别：返回卡号、有效期、银行名称。
驾驶证识别：解析证号、姓名、准驾车型、有效期等字段。

调用示例：

ocr = BaiduOCR("your_api_key", "your_secret_key")
result = ocr.recognize("id_card.jpg", "id_card")
print(json.dumps(result, indent=2))

四、SQLite数据持久化

1. 数据库设计

创建records表存储识别历史：

CREATE TABLE IF NOT EXISTS records (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
    file_path TEXT NOT NULL,
    recognition_type TEXT NOT NULL,
    result TEXT NOT NULL,
    accuracy REAL
);

2. Python操作示例

使用sqlite3模块实现数据存取：

import sqlite3
from datetime import datetime
class OCRDatabase:
    def __init__(self, db_path="ocr_records.db"):
        self.conn = sqlite3.connect(db_path)
        self._init_db()
    def _init_db(self):
        cursor = self.conn.cursor()
        cursor.execute("""
            CREATE TABLE IF NOT EXISTS records (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
                file_path TEXT NOT NULL,
                recognition_type TEXT NOT NULL,
                result TEXT NOT NULL,
                accuracy REAL
            );
        """)
        self.conn.commit()
    def add_record(self, file_path, recognition_type, result, accuracy=None):
        cursor = self.conn.cursor()
        cursor.execute("""
            INSERT INTO records (file_path, recognition_type, result, accuracy)
            VALUES (?, ?, ?, ?)
        """, (file_path, recognition_type, result, accuracy))
        self.conn.commit()
    def get_records(self, recognition_type=None, limit=10):
        cursor = self.conn.cursor()
        if recognition_type:
            cursor.execute("SELECT * FROM records WHERE recognition_type=? ORDER BY timestamp DESC LIMIT ?",
                          (recognition_type, limit))
        else:
            cursor.execute("SELECT * FROM records ORDER BY timestamp DESC LIMIT ?", (limit,))
        return cursor.fetchall()

五、系统优化与扩展建议

性能优化：
- 对大图片进行压缩或分块处理，减少API调用耗时。
- 使用多线程分离UI线程与识别线程，避免界面卡顿。
功能扩展：
- 增加批量识别功能，支持多文件同时处理。
- 添加识别结果导出为Excel或PDF的选项。
- 实现自动分类存储，按识别类型建立子目录。
错误处理：
- 捕获API调用异常（如网络错误、配额超限）。
- 验证图片格式与大小，提前过滤无效文件。

六、总结

本系统通过整合PyQt5、百度AI文字识别API与SQLite数据库，实现了集多场景识别与数据管理于一体的OCR工具。其优势在于：

用户友好：直观的GUI与拖拽操作降低使用门槛。
功能全面：覆盖主流证件与文档识别需求。
轻量高效：无需部署复杂服务，适合个人及小型团队使用。

开发者可基于此框架进一步扩展功能，如增加更多识别类型或集成其他云服务API，构建更强大的文档处理平台。

多功能OCR系统：PyQt5、百度AI与SQLite的深度整合