iOS Speech框架实战:语音转文字的完整实现指南

iOS Speech框架实战:语音转文字的完整实现指南

一、Speech框架核心价值与适用场景

Speech框架是Apple在iOS 10中引入的语音识别API,相较于传统第三方SDK,其优势体现在三个方面:

  1. 系统级集成:直接调用设备内置语音引擎,无需网络请求(离线模式)
  2. 隐私保护:所有语音处理在本地完成,符合App Store隐私政策要求
  3. 性能优化:针对A系列芯片深度优化,延迟控制在200ms以内

典型应用场景包括:

  • 实时会议记录(如配合ReplayKit实现屏幕共享+语音转写)
  • 无障碍功能开发(为视障用户提供语音导航)
  • 健身应用指令识别(如”开始跑步”、”暂停计时”)
  • 医疗行业术语转录(需配合自定义词汇表)

二、基础环境配置

1. 权限声明

在Info.plist中添加两个必要键值:

  1. <key>NSSpeechRecognitionUsageDescription</key>
  2. <string>需要麦克风权限实现语音转文字功能</string>
  3. <key>NSMicrophoneUsageDescription</key>
  4. <string>需要访问麦克风以捕获语音输入</string>

2. 框架导入

  1. import Speech

3. 权限请求流程

  1. func requestSpeechRecognitionPermission() {
  2. SFSpeechRecognizer.requestAuthorization { authStatus in
  3. DispatchQueue.main.async {
  4. switch authStatus {
  5. case .authorized:
  6. print("语音识别权限已授权")
  7. case .denied:
  8. print("用户拒绝权限")
  9. case .restricted:
  10. print("设备限制语音识别")
  11. case .notDetermined:
  12. print("权限状态未确定")
  13. @unknown default:
  14. break
  15. }
  16. }
  17. }
  18. }

三、核心功能实现

1. 基础语音转写

  1. let audioEngine = AVAudioEngine()
  2. let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "zh-CN"))
  3. var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
  4. var recognitionTask: SFSpeechRecognitionTask?
  5. func startRecording() {
  6. // 配置音频会话
  7. let audioSession = AVAudioSession.sharedInstance()
  8. try! audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
  9. try! audioSession.setActive(true, options: .notifyOthersOnDeactivation)
  10. // 创建识别请求
  11. recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
  12. guard let recognitionRequest = recognitionRequest else { return }
  13. // 设置识别任务
  14. recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest) { result, error in
  15. if let result = result {
  16. let transcribedText = result.bestTranscription.formattedString
  17. print("转写结果: \(transcribedText)")
  18. // 实时更新UI需在主线程
  19. DispatchQueue.main.async {
  20. self.textView.text = transcribedText
  21. }
  22. }
  23. if error != nil {
  24. self.stopRecording()
  25. print("识别错误: \(error?.localizedDescription ?? "")")
  26. }
  27. }
  28. // 配置音频引擎
  29. let inputNode = audioEngine.inputNode
  30. let recordingFormat = inputNode.outputFormat(forBus: 0)
  31. inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { buffer, _ in
  32. recognitionRequest.append(buffer)
  33. }
  34. audioEngine.prepare()
  35. try! audioEngine.start()
  36. }
  37. func stopRecording() {
  38. audioEngine.stop()
  39. recognitionRequest?.endAudio()
  40. recognitionTask?.finish()
  41. recognitionTask = nil
  42. recognitionRequest = nil
  43. }

2. 高级功能实现

多语言支持

  1. // 支持中英文混合识别
  2. let mixedLocale = Locale(identifier: "zh-Hans-CN") // 中文为主
  3. let speechRecognizer = SFSpeechRecognizer(locale: mixedLocale)
  4. // 或动态切换语言
  5. func switchLanguage(to localeIdentifier: String) {
  6. guard let newRecognizer = SFSpeechRecognizer(locale: Locale(identifier: localeIdentifier)) else {
  7. print("不支持该语言")
  8. return
  9. }
  10. speechRecognizer = newRecognizer
  11. // 需重新创建recognitionTask
  12. }

自定义词汇表

  1. // 创建词汇表
  2. let vocabulary = SFSpeechRecognitionVocabulary()
  3. vocabulary.addTerm("Xcode") // 添加专业术语
  4. vocabulary.addTerm("SwiftUI")
  5. // 应用到识别器
  6. let config = SFSpeechRecognizer.supportedLocales().first { $0.identifier == "zh-CN" }
  7. let customRecognizer = SFSpeechRecognizer(locale: config!)
  8. customRecognizer?.supportsOnDeviceRecognition = true // 启用离线模式

实时处理优化

  1. // 使用分段回调提高响应速度
  2. recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest,
  3. delegate: self) // 需实现SFSpeechRecognitionTaskDelegate
  4. extension ViewController: SFSpeechRecognitionTaskDelegate {
  5. func speechRecognitionTask(_ task: SFSpeechRecognitionTask,
  6. didHypothesizeTranscription transcription: SFTranscription) {
  7. // 收到临时识别结果时调用
  8. let partialText = transcription.formattedString
  9. updateUI(with: partialText)
  10. }
  11. func speechRecognitionTaskFinishedReadingAudio(_ task: SFSpeechRecognitionTask) {
  12. print("音频输入完成")
  13. }
  14. }

四、错误处理与性能优化

1. 常见错误处理

错误类型 处理方案
SFSpeechRecognizerError.notReady 检查麦克风权限和网络连接
SFSpeechRecognizerError.audioBufferTooSmall 调整bufferSize参数(建议512-2048)
SFSpeechRecognizerError.requestTimeout 设置recognitionRequest.shouldReportPartialResults = true
音频引擎启动失败 检查AVAudioSession配置,确保无其他应用占用麦克风

2. 性能优化技巧

  1. 离线模式配置

    1. if let onDeviceRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "zh-CN")) {
    2. onDeviceRecognizer.supportsOnDeviceRecognition = true
    3. // 优先使用离线识别
    4. }
  2. 内存管理

  • 及时调用finish()方法释放识别任务
  • 使用弱引用避免循环引用
    1. weak var weakSelf = self
    2. recognitionTask?.finish()
    3. weakSelf?.recognitionTask = nil
  1. 电池优化
  • 在后台任务中暂停识别
  • 监听设备电量变化动态调整识别精度

五、完整实现示例

  1. import UIKit
  2. import Speech
  3. import AVFoundation
  4. class SpeechRecognitionViewController: UIViewController {
  5. @IBOutlet weak var textView: UITextView!
  6. @IBOutlet weak var recordButton: UIButton!
  7. private let audioEngine = AVAudioEngine()
  8. private var speechRecognizer: SFSpeechRecognizer?
  9. private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
  10. private var recognitionTask: SFSpeechRecognitionTask?
  11. private var isRecording = false
  12. override func viewDidLoad() {
  13. super.viewDidLoad()
  14. setupSpeechRecognizer()
  15. requestSpeechRecognitionPermission()
  16. }
  17. private func setupSpeechRecognizer() {
  18. let locale = Locale(identifier: "zh-CN")
  19. speechRecognizer = SFSpeechRecognizer(locale: locale)
  20. }
  21. private func requestSpeechRecognitionPermission() {
  22. SFSpeechRecognizer.requestAuthorization { authStatus in
  23. DispatchQueue.main.async {
  24. guard authStatus == .authorized else {
  25. self.showPermissionAlert()
  26. return
  27. }
  28. self.recordButton.isEnabled = true
  29. }
  30. }
  31. }
  32. private func showPermissionAlert() {
  33. let alert = UIAlertController(title: "权限不足",
  34. message: "请在设置中开启麦克风权限",
  35. preferredStyle: .alert)
  36. alert.addAction(UIAlertAction(title: "确定", style: .default))
  37. present(alert, animated: true)
  38. }
  39. @IBAction func toggleRecording(_ sender: UIButton) {
  40. isRecording.toggle()
  41. if isRecording {
  42. startRecording()
  43. sender.setTitle("停止", for: .normal)
  44. } else {
  45. stopRecording()
  46. sender.setTitle("开始", for: .normal)
  47. }
  48. }
  49. private func startRecording() {
  50. // 配置音频会话
  51. let audioSession = AVAudioSession.sharedInstance()
  52. try! audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
  53. try! audioSession.setActive(true)
  54. // 创建识别请求
  55. recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
  56. guard let recognitionRequest = recognitionRequest else { return }
  57. // 设置识别任务
  58. recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest) { result, error in
  59. if let result = result {
  60. let text = result.bestTranscription.formattedString
  61. DispatchQueue.main.async {
  62. self.textView.text = text
  63. }
  64. }
  65. if error != nil {
  66. self.stopRecording()
  67. print("识别错误: \(error?.localizedDescription ?? "")")
  68. }
  69. }
  70. // 配置音频引擎
  71. let inputNode = audioEngine.inputNode
  72. let recordingFormat = inputNode.outputFormat(forBus: 0)
  73. inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { buffer, _ in
  74. recognitionRequest.append(buffer)
  75. }
  76. audioEngine.prepare()
  77. try! audioEngine.start()
  78. }
  79. private func stopRecording() {
  80. audioEngine.stop()
  81. recognitionRequest?.endAudio()
  82. recognitionTask?.finish()
  83. recognitionTask = nil
  84. recognitionRequest = nil
  85. }
  86. deinit {
  87. stopRecording()
  88. try? AVAudioSession.sharedInstance().setActive(false)
  89. }
  90. }

六、最佳实践建议

  1. 离线优先策略:在支持的设备上优先使用supportsOnDeviceRecognition
  2. 动态语言检测:通过SFTranscriptionsegment属性分析语言变化
  3. 省电模式:当设备电量低于20%时自动降低识别精度
  4. 用户引导:首次使用时展示麦克风权限获取的明确说明
  5. 结果校验:对专业术语使用正则表达式进行二次校验

通过Speech框架实现的语音转文字功能,在保持高准确率的同时,能有效保护用户隐私。开发者应根据具体场景选择在线/离线模式,并合理处理权限请求流程,以提供流畅的用户体验。