Python图片处理全攻略：格式转换与OCR文字识别实战指南

一、引言：图片处理的核心需求

在数字化办公与自动化流程中，图片格式转换与文字识别（OCR）是两大高频需求。例如，将扫描的合同PDF转换为可编辑的Word文档，或从截图提取关键数据，均依赖这两项技术。Python凭借其丰富的生态库（如Pillow、OpenCV、Tesseract OCR），成为实现此类功能的首选工具。本文将系统阐述如何通过Python实现图片格式转换与OCR文字识别，并提供可落地的代码示例。

二、图片格式转换：Pillow库的深度应用

1. Pillow库基础与安装

Pillow是Python Imaging Library（PIL）的分支，支持JPEG、PNG、BMP等数十种格式的读写与转换。安装命令如下：

pip install pillow

2. 基础格式转换实现

通过Image.save()方法可轻松实现格式转换。示例代码：

from PIL import Image
def convert_image_format(input_path, output_path, output_format):
    """
    图片格式转换函数
    :param input_path: 输入图片路径
    :param output_path: 输出图片路径
    :param output_format: 目标格式（如'JPEG', 'PNG'）
    """
    try:
        img = Image.open(input_path)
        img.save(output_path, format=output_format)
        print(f"转换成功：{input_path} → {output_path}")
    except Exception as e:
        print(f"转换失败：{e}")
# 示例：将PNG转为JPEG
convert_image_format("input.png", "output.jpg", "JPEG")

3. 高级功能：批量转换与质量优化

批量转换：通过os.listdir()遍历文件夹，结合多线程加速处理。

质量参数：JPEG格式支持quality参数（1-100），控制压缩率与画质平衡。

def batch_convert(input_dir, output_dir, output_format, quality=85):
  import os
  if not os.path.exists(output_dir):
      os.makedirs(output_dir)
  for filename in os.listdir(input_dir):
      if filename.lower().endswith(('.png', '.jpg', '.bmp')):
          input_path = os.path.join(input_dir, filename)
          output_path = os.path.join(output_dir, f"{os.path.splitext(filename)[0]}.{output_format.lower()}")
          img = Image.open(input_path)
          img.save(output_path, format=output_format, quality=quality)

三、OCR文字识别：Tesseract OCR实战

1. Tesseract OCR安装与配置

安装：通过pip install pytesseract安装Python封装库，同时需下载Tesseract OCR引擎（官网下载）。
配置环境变量：将Tesseract安装路径（如C:\Program Files\Tesseract-OCR）添加至系统PATH。

2. 基础文字识别实现

import pytesseract
from PIL import Image
def extract_text_from_image(image_path):
    """
    提取图片中的文字
    :param image_path: 图片路径
    :return: 识别结果字符串
    """
    try:
        img = Image.open(image_path)
        text = pytesseract.image_to_string(img, lang='chi_sim+eng')  # 支持中英文
        return text
    except Exception as e:
        print(f"识别失败：{e}")
        return None
# 示例
text = extract_text_from_image("document.png")
print(text)

3. 优化识别准确率：预处理技术

（1）二值化处理（增强对比度）

def preprocess_image(image_path, output_path):
    img = Image.open(image_path).convert('L')  # 转为灰度图
    # 自适应阈值二值化
    from PIL import ImageOps
    inverted_img = ImageOps.invert(img)
    threshold = 128
    binary_img = inverted_img.point(lambda x: 255 if x > threshold else 0)
    binary_img.save(output_path)
    return output_path
# 预处理后识别
preprocessed_path = preprocess_image("blurry.png", "preprocessed.png")
text = extract_text_from_image(preprocessed_path)

（2）OpenCV高级预处理

安装OpenCV：pip install opencv-python

import cv2
import numpy as np
def cv2_preprocess(image_path):
    img = cv2.imread(image_path)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    # 高斯模糊降噪
    blurred = cv2.GaussianBlur(gray, (5, 5), 0)
    # 自适应阈值
    thresh = cv2.adaptiveThreshold(blurred, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, 
                                  cv2.THRESH_BINARY, 11, 2)
    return thresh
# 结合OpenCV与Tesseract
processed_img = cv2_preprocess("noisy.png")
cv2.imwrite("cv_processed.png", processed_img)
text = extract_text_from_image("cv_processed.png")

四、企业级应用场景与优化建议

1. 自动化文档处理流水线

结合格式转换与OCR，可构建如下流水线：

扫描仪输出多页TIFF → 转换为单页PNG
预处理去噪 → OCR识别 → 结构化数据存储（如Excel/数据库）

2. 性能优化策略

多线程处理：使用concurrent.futures并行处理图片。
语言模型优化：针对特定领域（如法律、医疗）训练定制Tesseract模型。
云服务集成：对高并发需求，可调用AWS Textract或Azure Computer Vision API（需注意成本与数据隐私）。

3. 错误处理与日志记录

import logging
logging.basicConfig(filename='image_processing.log', level=logging.INFO)
def safe_convert_and_ocr(input_path, output_dir):
    try:
        # 格式转换
        output_path = f"{output_dir}/converted.png"
        convert_image_format(input_path, output_path, "PNG")
        # OCR识别
        text = extract_text_from_image(output_path)
        logging.info(f"成功处理：{input_path} → 识别字数：{len(text)}")
        return text
    except Exception as e:
        logging.error(f"处理失败：{input_path}, 错误：{e}")
        return None

五、总结与展望

本文系统介绍了Python实现图片格式转换与OCR文字识别的完整方案，涵盖Pillow库的基础与高级用法、Tesseract OCR的配置与优化，以及OpenCV预处理技术。实际应用中，开发者可根据需求组合这些技术，构建高效的图片处理流水线。未来，随着深度学习模型（如CRNN、Transformer）的普及，OCR的准确率与多语言支持将进一步提升，Python生态也将持续完善相关工具库。

关键建议：

对复杂背景图片，优先使用OpenCV预处理。
中英文混合场景需指定lang='chi_sim+eng'。
批量处理时务必添加错误处理与日志记录。

通过掌握本文技术，开发者可轻松应对合同识别、报表数据提取等常见业务场景，显著提升工作效率。