iOS 10 Speech框架实战:打造高效语音转文本应用

iOS 10 Speech框架实战:打造高效语音转文本应用

一、iOS 10 Speech框架概述

iOS 10首次引入的Speech框架(Speech.framework)为开发者提供了强大的语音识别能力,支持实时语音转文本(STT)功能。相较于第三方API,其核心优势在于:

  1. 原生集成:无需依赖网络请求,降低延迟;
  2. 隐私保护:所有处理在设备端完成,数据不外传;
  3. 多语言支持:默认支持50+种语言及方言(如中文、英文、西班牙语等)。

该框架通过SFSpeechRecognizer类管理识别任务,结合SFSpeechAudioBufferRecognitionRequest处理音频流,实现高效的语音到文本转换。

二、开发前准备:权限与配置

1. 添加权限描述

Info.plist中添加以下键值对,声明麦克风使用权限及语音识别用途:

  1. <key>NSSpeechRecognitionUsageDescription</key>
  2. <string>本应用需要语音识别功能以实现语音转文本</string>
  3. <key>NSMicrophoneUsageDescription</key>
  4. <string>本应用需要访问麦克风以录制语音</string>

2. 导入框架

在Swift文件中导入Speech框架:

  1. import Speech

3. 检查授权状态

使用SFSpeechRecognizer.authorizationStatus()检查当前授权状态,并请求权限:

  1. func requestSpeechAuthorization() {
  2. SFSpeechRecognizer.requestAuthorization { authStatus in
  3. DispatchQueue.main.async {
  4. switch authStatus {
  5. case .authorized:
  6. print("语音识别权限已授予")
  7. case .denied, .restricted, .notDetermined:
  8. print("用户拒绝或未决定权限")
  9. @unknown default:
  10. break
  11. }
  12. }
  13. }
  14. }

三、核心功能实现:语音转文本流程

1. 初始化语音识别器

创建SFSpeechRecognizer实例,并指定语言(可选):

  1. let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "zh-CN")) // 中文识别
  2. guard let recognizer = speechRecognizer else {
  3. print("当前语言不支持语音识别")
  4. return
  5. }

2. 创建识别请求

使用SFSpeechAudioBufferRecognitionRequest处理实时音频流:

  1. let recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
  2. guard let request = recognitionRequest else {
  3. print("无法创建识别请求")
  4. return
  5. }
  6. request.shouldReportPartialResults = true // 启用实时反馈

3. 配置音频引擎

通过AVAudioEngine捕获麦克风输入,并将音频数据传递给识别请求:

  1. let audioEngine = AVAudioEngine()
  2. var recognitionTask: SFSpeechRecognitionTask?
  3. func startRecording() throws {
  4. // 配置音频会话
  5. let audioSession = AVAudioSession.sharedInstance()
  6. try audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
  7. try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
  8. // 添加输入节点
  9. let inputNode = audioEngine.inputNode
  10. let recordingFormat = inputNode.outputFormat(forBus: 0)
  11. inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { buffer, _ in
  12. request.append(buffer)
  13. }
  14. // 启动音频引擎
  15. audioEngine.prepare()
  16. try audioEngine.start()
  17. // 启动识别任务
  18. recognitionTask = speechRecognizer.recognitionTask(with: request) { result, error in
  19. if let result = result {
  20. let transcribedText = result.bestTranscription.formattedString
  21. print("实时转文本结果: \(transcribedText)")
  22. }
  23. if let error = error {
  24. print("识别错误: \(error.localizedDescription)")
  25. self.stopRecording()
  26. }
  27. }
  28. }

4. 停止录音与清理资源

  1. func stopRecording() {
  2. audioEngine.stop()
  3. audioEngine.inputNode.removeTap(onBus: 0)
  4. recognitionTask?.cancel()
  5. recognitionTask = nil
  6. }

四、高级功能与优化

1. 处理多语言识别

动态切换识别语言:

  1. func updateRecognizerLanguage(to localeIdentifier: String) {
  2. speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: localeIdentifier))
  3. }

2. 离线识别模式

通过requiresOnDeviceRecognition属性启用离线识别(需iOS 13+):

  1. request.requiresOnDeviceRecognition = true // 仅限设备端处理

3. 性能优化策略

  • 降低采样率:使用AVAudioFormat(commonFormat: .pcmFormatFloat32, sampleRate: 16000)减少数据量。
  • 后台模式:在Info.plist中添加UIBackgroundModes并包含audio,支持后台录音。
  • 错误重试机制:捕获SFSpeechErrorCode错误并实现自动重试逻辑。

五、完整代码示例

  1. import UIKit
  2. import Speech
  3. import AVFoundation
  4. class VoiceToTextViewController: UIViewController {
  5. private let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "zh-CN"))!
  6. private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
  7. private var recognitionTask: SFSpeechRecognitionTask?
  8. private let audioEngine = AVAudioEngine()
  9. @IBOutlet weak var textView: UITextView!
  10. @IBOutlet weak var recordButton: UIButton!
  11. override func viewDidLoad() {
  12. super.viewDidLoad()
  13. requestSpeechAuthorization()
  14. }
  15. @IBAction func toggleRecording(_ sender: UIButton) {
  16. if audioEngine.isRunning {
  17. stopRecording()
  18. recordButton.setTitle("开始录音", for: .normal)
  19. } else {
  20. try! startRecording()
  21. recordButton.setTitle("停止录音", for: .normal)
  22. }
  23. }
  24. private func startRecording() throws {
  25. recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
  26. guard let request = recognitionRequest else { return }
  27. request.shouldReportPartialResults = true
  28. recognitionTask = speechRecognizer.recognitionTask(with: request) { result, error in
  29. if let result = result {
  30. self.textView.text = result.bestTranscription.formattedString
  31. }
  32. if let error = error {
  33. print("错误: \(error.localizedDescription)")
  34. self.stopRecording()
  35. }
  36. }
  37. let audioSession = AVAudioSession.sharedInstance()
  38. try audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
  39. try audioSession.setActive(true)
  40. let inputNode = audioEngine.inputNode
  41. let recordingFormat = inputNode.outputFormat(forBus: 0)
  42. inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { buffer, _ in
  43. request.append(buffer)
  44. }
  45. audioEngine.prepare()
  46. try audioEngine.start()
  47. }
  48. private func stopRecording() {
  49. audioEngine.stop()
  50. audioEngine.inputNode.removeTap(onBus: 0)
  51. recognitionTask?.cancel()
  52. recognitionTask = nil
  53. recognitionRequest = nil
  54. }
  55. }

六、常见问题与解决方案

  1. 权限被拒:检查Info.plist是否包含NSSpeechRecognitionUsageDescription
  2. 无识别结果:确保麦克风输入正常,且环境噪音低于60dB。
  3. 语言不支持:调用SFSpeechRecognizer.supportedLocales()获取可用语言列表。

通过以上步骤,开发者可快速集成iOS 10的Speech框架,构建高效、稳定的语音转文本应用。结合实际场景优化参数(如采样率、缓冲区大小),可进一步提升识别准确率与用户体验。