iOS Speech框架实战:语音转文字的完整实现指南
一、Speech框架核心价值与适用场景
Speech框架是Apple在iOS 10中引入的语音识别API,相较于传统第三方SDK,其优势体现在三个方面:
- 系统级集成:直接调用设备内置语音引擎,无需网络请求(离线模式)
- 隐私保护:所有语音处理在本地完成,符合App Store隐私政策要求
- 性能优化:针对A系列芯片深度优化,延迟控制在200ms以内
典型应用场景包括:
- 实时会议记录(如配合ReplayKit实现屏幕共享+语音转写)
- 无障碍功能开发(为视障用户提供语音导航)
- 健身应用指令识别(如”开始跑步”、”暂停计时”)
- 医疗行业术语转录(需配合自定义词汇表)
二、基础环境配置
1. 权限声明
在Info.plist中添加两个必要键值:
<key>NSSpeechRecognitionUsageDescription</key><string>需要麦克风权限实现语音转文字功能</string><key>NSMicrophoneUsageDescription</key><string>需要访问麦克风以捕获语音输入</string>
2. 框架导入
import Speech
3. 权限请求流程
func requestSpeechRecognitionPermission() {SFSpeechRecognizer.requestAuthorization { authStatus inDispatchQueue.main.async {switch authStatus {case .authorized:print("语音识别权限已授权")case .denied:print("用户拒绝权限")case .restricted:print("设备限制语音识别")case .notDetermined:print("权限状态未确定")@unknown default:break}}}}
三、核心功能实现
1. 基础语音转写
let audioEngine = AVAudioEngine()let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "zh-CN"))var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?var recognitionTask: SFSpeechRecognitionTask?func startRecording() {// 配置音频会话let audioSession = AVAudioSession.sharedInstance()try! audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)try! audioSession.setActive(true, options: .notifyOthersOnDeactivation)// 创建识别请求recognitionRequest = SFSpeechAudioBufferRecognitionRequest()guard let recognitionRequest = recognitionRequest else { return }// 设置识别任务recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest) { result, error inif let result = result {let transcribedText = result.bestTranscription.formattedStringprint("转写结果: \(transcribedText)")// 实时更新UI需在主线程DispatchQueue.main.async {self.textView.text = transcribedText}}if error != nil {self.stopRecording()print("识别错误: \(error?.localizedDescription ?? "")")}}// 配置音频引擎let inputNode = audioEngine.inputNodelet recordingFormat = inputNode.outputFormat(forBus: 0)inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { buffer, _ inrecognitionRequest.append(buffer)}audioEngine.prepare()try! audioEngine.start()}func stopRecording() {audioEngine.stop()recognitionRequest?.endAudio()recognitionTask?.finish()recognitionTask = nilrecognitionRequest = nil}
2. 高级功能实现
多语言支持
// 支持中英文混合识别let mixedLocale = Locale(identifier: "zh-Hans-CN") // 中文为主let speechRecognizer = SFSpeechRecognizer(locale: mixedLocale)// 或动态切换语言func switchLanguage(to localeIdentifier: String) {guard let newRecognizer = SFSpeechRecognizer(locale: Locale(identifier: localeIdentifier)) else {print("不支持该语言")return}speechRecognizer = newRecognizer// 需重新创建recognitionTask}
自定义词汇表
// 创建词汇表let vocabulary = SFSpeechRecognitionVocabulary()vocabulary.addTerm("Xcode") // 添加专业术语vocabulary.addTerm("SwiftUI")// 应用到识别器let config = SFSpeechRecognizer.supportedLocales().first { $0.identifier == "zh-CN" }let customRecognizer = SFSpeechRecognizer(locale: config!)customRecognizer?.supportsOnDeviceRecognition = true // 启用离线模式
实时处理优化
// 使用分段回调提高响应速度recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest,delegate: self) // 需实现SFSpeechRecognitionTaskDelegateextension ViewController: SFSpeechRecognitionTaskDelegate {func speechRecognitionTask(_ task: SFSpeechRecognitionTask,didHypothesizeTranscription transcription: SFTranscription) {// 收到临时识别结果时调用let partialText = transcription.formattedStringupdateUI(with: partialText)}func speechRecognitionTaskFinishedReadingAudio(_ task: SFSpeechRecognitionTask) {print("音频输入完成")}}
四、错误处理与性能优化
1. 常见错误处理
| 错误类型 | 处理方案 |
|---|---|
| SFSpeechRecognizerError.notReady | 检查麦克风权限和网络连接 |
| SFSpeechRecognizerError.audioBufferTooSmall | 调整bufferSize参数(建议512-2048) |
| SFSpeechRecognizerError.requestTimeout | 设置recognitionRequest.shouldReportPartialResults = true |
| 音频引擎启动失败 | 检查AVAudioSession配置,确保无其他应用占用麦克风 |
2. 性能优化技巧
-
离线模式配置:
if let onDeviceRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "zh-CN")) {onDeviceRecognizer.supportsOnDeviceRecognition = true// 优先使用离线识别}
-
内存管理:
- 及时调用
finish()方法释放识别任务 - 使用弱引用避免循环引用
weak var weakSelf = selfrecognitionTask?.finish()weakSelf?.recognitionTask = nil
- 电池优化:
- 在后台任务中暂停识别
- 监听设备电量变化动态调整识别精度
五、完整实现示例
import UIKitimport Speechimport AVFoundationclass SpeechRecognitionViewController: UIViewController {@IBOutlet weak var textView: UITextView!@IBOutlet weak var recordButton: UIButton!private let audioEngine = AVAudioEngine()private var speechRecognizer: SFSpeechRecognizer?private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?private var recognitionTask: SFSpeechRecognitionTask?private var isRecording = falseoverride func viewDidLoad() {super.viewDidLoad()setupSpeechRecognizer()requestSpeechRecognitionPermission()}private func setupSpeechRecognizer() {let locale = Locale(identifier: "zh-CN")speechRecognizer = SFSpeechRecognizer(locale: locale)}private func requestSpeechRecognitionPermission() {SFSpeechRecognizer.requestAuthorization { authStatus inDispatchQueue.main.async {guard authStatus == .authorized else {self.showPermissionAlert()return}self.recordButton.isEnabled = true}}}private func showPermissionAlert() {let alert = UIAlertController(title: "权限不足",message: "请在设置中开启麦克风权限",preferredStyle: .alert)alert.addAction(UIAlertAction(title: "确定", style: .default))present(alert, animated: true)}@IBAction func toggleRecording(_ sender: UIButton) {isRecording.toggle()if isRecording {startRecording()sender.setTitle("停止", for: .normal)} else {stopRecording()sender.setTitle("开始", for: .normal)}}private func startRecording() {// 配置音频会话let audioSession = AVAudioSession.sharedInstance()try! audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)try! audioSession.setActive(true)// 创建识别请求recognitionRequest = SFSpeechAudioBufferRecognitionRequest()guard let recognitionRequest = recognitionRequest else { return }// 设置识别任务recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest) { result, error inif let result = result {let text = result.bestTranscription.formattedStringDispatchQueue.main.async {self.textView.text = text}}if error != nil {self.stopRecording()print("识别错误: \(error?.localizedDescription ?? "")")}}// 配置音频引擎let inputNode = audioEngine.inputNodelet recordingFormat = inputNode.outputFormat(forBus: 0)inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { buffer, _ inrecognitionRequest.append(buffer)}audioEngine.prepare()try! audioEngine.start()}private func stopRecording() {audioEngine.stop()recognitionRequest?.endAudio()recognitionTask?.finish()recognitionTask = nilrecognitionRequest = nil}deinit {stopRecording()try? AVAudioSession.sharedInstance().setActive(false)}}
六、最佳实践建议
- 离线优先策略:在支持的设备上优先使用
supportsOnDeviceRecognition - 动态语言检测:通过
SFTranscription的segment属性分析语言变化 - 省电模式:当设备电量低于20%时自动降低识别精度
- 用户引导:首次使用时展示麦克风权限获取的明确说明
- 结果校验:对专业术语使用正则表达式进行二次校验
通过Speech框架实现的语音转文字功能,在保持高准确率的同时,能有效保护用户隐私。开发者应根据具体场景选择在线/离线模式,并合理处理权限请求流程,以提供流畅的用户体验。