iOS Speech框架实战:语音转文字全流程解析

引言

在移动应用开发中,语音转文字功能已成为提升用户体验的重要工具。iOS系统提供的Speech框架为开发者提供了高效、稳定的语音识别能力,支持实时转录、多语言识别及上下文理解等高级特性。本文将系统讲解如何利用Speech框架实现完整的语音转文字功能,从基础配置到高级优化,帮助开发者快速构建可靠的语音识别服务。

一、Speech框架核心组件解析

Speech框架主要由SFSpeechRecognizerSFSpeechAudioBufferRecognitionRequestSFSpeechRecognitionTask三个核心类构成:

  1. SFSpeechRecognizer:语音识别器主类,负责管理识别任务的生命周期。需注意其仅在支持语音识别的设备上有效(iOS 10+)。
  2. 音频请求对象SFSpeechAudioBufferRecognitionRequest用于处理实时音频流,SFSpeechURLRecognitionRequest则适用于预录制的音频文件。
  3. 识别任务SFSpeechRecognitionTask封装了识别结果和状态回调,通过代理方法实时反馈识别进度。

二、权限配置与初始化

2.1 添加隐私权限

Info.plist中必须添加NSSpeechRecognitionUsageDescription字段,明确说明语音识别的使用目的(如”本应用需要语音识别功能以实现语音输入”)。未配置此项将导致运行时崩溃。

2.2 创建识别器实例

  1. import Speech
  2. let audioEngine = AVAudioEngine()
  3. var speechRecognizer: SFSpeechRecognizer?
  4. var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
  5. var recognitionTask: SFSpeechRecognitionTask?
  6. func setupRecognizer() {
  7. speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "zh-CN")) // 中文识别
  8. guard let recognizer = speechRecognizer else {
  9. print("语音识别器初始化失败")
  10. return
  11. }
  12. // 检查服务可用性
  13. if !recognizer.isAvailable {
  14. print("语音识别服务当前不可用")
  15. }
  16. }

三、实时语音识别实现

3.1 音频引擎配置

  1. func startRecording() throws {
  2. // 配置音频会话
  3. let audioSession = AVAudioSession.sharedInstance()
  4. try audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
  5. try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
  6. // 创建识别请求
  7. recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
  8. guard let request = recognitionRequest else {
  9. fatalError("无法创建识别请求")
  10. }
  11. request.shouldReportPartialResults = true // 启用实时反馈
  12. // 启动识别任务
  13. recognitionTask = speechRecognizer?.recognitionTask(with: request) { result, error in
  14. if let result = result {
  15. let transcribedText = result.bestTranscription.formattedString
  16. print("实时识别结果: \(transcribedText)")
  17. // 处理最终结果(result.isFinal == true时)
  18. }
  19. if let error = error {
  20. print("识别错误: \(error.localizedDescription)")
  21. self.stopRecording()
  22. }
  23. }
  24. // 配置音频输入
  25. let inputNode = audioEngine.inputNode
  26. let recordingFormat = inputNode.outputFormat(forBus: 0)
  27. inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
  28. self.recognitionRequest?.append(buffer)
  29. }
  30. audioEngine.prepare()
  31. try audioEngine.start()
  32. }

3.2 停止录音与清理资源

  1. func stopRecording() {
  2. if audioEngine.isRunning {
  3. audioEngine.stop()
  4. recognitionRequest?.endAudio()
  5. audioEngine.inputNode.removeTap(onBus: 0)
  6. }
  7. recognitionTask?.cancel()
  8. recognitionTask = nil
  9. }

四、高级功能实现

4.1 多语言支持

通过修改Locale参数实现多语言识别:

  1. // 英文识别
  2. let enRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))
  3. // 日语识别
  4. let jaRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "ja-JP"))

4.2 离线识别配置

在iOS 13+系统中,可通过requiresOnDeviceRecognition属性启用离线识别:

  1. let config = SFSpeechRecognizer.AuthorizationStatus.authorized
  2. if #available(iOS 13, *) {
  3. speechRecognizer = SFSpeechRecognizer(locale: Locale.current)
  4. speechRecognizer?.supportsOnDeviceRecognition = true
  5. recognitionRequest?.requiresOnDeviceRecognition = true // 强制离线识别
  6. }

4.3 错误处理机制

  1. func handleRecognitionError(_ error: Error) {
  2. switch error {
  3. case SFSpeechRecognizerError.audioError:
  4. showAlert(title: "音频错误", message: "无法访问麦克风")
  5. case SFSpeechRecognizerError.insufficientPermissions:
  6. showAlert(title: "权限不足", message: "请在设置中开启麦克风权限")
  7. case SFSpeechRecognizerError.recognizerNotAvailable:
  8. showAlert(title: "服务不可用", message: "当前设备不支持语音识别")
  9. default:
  10. showAlert(title: "识别错误", message: error.localizedDescription)
  11. }
  12. }

五、性能优化建议

  1. 音频格式优化:使用16kHz单声道音频可获得最佳识别效果
  2. 内存管理:及时释放不再使用的识别任务和请求对象
  3. 网络策略:离线模式下禁用网络请求可降低功耗
  4. 结果过滤:对实时结果进行后处理,去除重复词和无关字符

六、常见问题解决方案

6.1 识别延迟问题

  • 启用shouldReportPartialResults获取中间结果
  • 调整AVAudioPCMBuffer的缓冲区大小(建议512-1024样本)
  • 在后台线程处理识别结果,避免阻塞主线程

6.2 准确率提升技巧

  • 限制识别语言范围(避免自动语言检测)
  • 在安静环境下使用
  • 对专业术语建立自定义词汇表(iOS 15+支持)

七、完整示例代码

  1. import UIKit
  2. import Speech
  3. import AVFoundation
  4. class VoiceRecognitionViewController: UIViewController {
  5. let audioEngine = AVAudioEngine()
  6. var speechRecognizer: SFSpeechRecognizer?
  7. var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
  8. var recognitionTask: SFSpeechRecognitionTask?
  9. override func viewDidLoad() {
  10. super.viewDidLoad()
  11. setupSpeechRecognizer()
  12. requestAuthorization()
  13. }
  14. func setupSpeechRecognizer() {
  15. speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "zh-CN"))
  16. }
  17. func requestAuthorization() {
  18. SFSpeechRecognizer.requestAuthorization { authStatus in
  19. DispatchQueue.main.async {
  20. switch authStatus {
  21. case .authorized:
  22. print("语音识别权限已授权")
  23. case .denied:
  24. self.showAlert(title: "权限被拒绝", message: "请在设置中开启麦克风权限")
  25. case .restricted:
  26. self.showAlert(title: "权限受限", message: "无法访问语音识别服务")
  27. case .notDetermined:
  28. print("权限未确定")
  29. @unknown default:
  30. break
  31. }
  32. }
  33. }
  34. }
  35. @IBAction func startRecording(_ sender: UIButton) {
  36. do {
  37. try startAudioEngine()
  38. sender.setTitle("停止录音", for: .normal)
  39. } catch {
  40. showAlert(title: "错误", message: error.localizedDescription)
  41. }
  42. }
  43. func startAudioEngine() throws {
  44. let audioSession = AVAudioSession.sharedInstance()
  45. try audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
  46. try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
  47. recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
  48. guard let request = recognitionRequest else {
  49. fatalError("无法创建识别请求")
  50. }
  51. request.shouldReportPartialResults = true
  52. recognitionTask = speechRecognizer?.recognitionTask(with: request) { result, error in
  53. if let result = result {
  54. let text = result.bestTranscription.formattedString
  55. print("识别结果: \(text)")
  56. if result.isFinal {
  57. DispatchQueue.main.async {
  58. // 更新UI显示最终结果
  59. }
  60. }
  61. }
  62. if let error = error {
  63. self.handleRecognitionError(error)
  64. }
  65. }
  66. let inputNode = audioEngine.inputNode
  67. let recordingFormat = inputNode.outputFormat(forBus: 0)
  68. inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { buffer, _ in
  69. self.recognitionRequest?.append(buffer)
  70. }
  71. audioEngine.prepare()
  72. try audioEngine.start()
  73. }
  74. @IBAction func stopRecording(_ sender: UIButton) {
  75. if audioEngine.isRunning {
  76. audioEngine.stop()
  77. recognitionRequest?.endAudio()
  78. audioEngine.inputNode.removeTap(onBus: 0)
  79. }
  80. recognitionTask?.cancel()
  81. recognitionTask = nil
  82. sender.setTitle("开始录音", for: .normal)
  83. }
  84. func showAlert(title: String, message: String) {
  85. let alert = UIAlertController(title: title, message: message, preferredStyle: .alert)
  86. alert.addAction(UIAlertAction(title: "确定", style: .default))
  87. present(alert, animated: true)
  88. }
  89. func handleRecognitionError(_ error: Error) {
  90. // 实现错误处理逻辑
  91. }
  92. }

结论

Speech框架为iOS开发者提供了强大而灵活的语音识别能力,通过合理配置音频引擎、优化识别参数和实施完善的错误处理,可以构建出稳定高效的语音转文字应用。在实际开发中,建议结合具体场景进行功能定制,如添加标点符号预测、说话人分离等高级特性,以提升用户体验。随着iOS系统的持续演进,Speech框架的功能也将不断完善,值得开发者持续关注。