Android语音识别:零基础到完整实现的指南

一、语音识别技术基础与Android实现路径

语音识别(Speech Recognition)是将人类语音转换为文本的技术,在Android系统中主要通过SpeechRecognizer类实现。该功能依赖系统预装的语音识别引擎(如Google语音服务),开发者无需集成第三方SDK即可实现基础功能。

1.1 核心实现原理

Android语音识别流程分为四个阶段:

  1. 初始化阶段:创建SpeechRecognizer实例并配置参数
  2. 录音阶段:通过AudioRecord采集麦克风数据
  3. 识别阶段:将音频流发送至系统识别服务
  4. 结果返回:通过回调接口返回识别文本

系统底层使用WebRTC的音频处理模块进行降噪和特征提取,识别引擎采用深度神经网络(DNN)模型,支持中英文等80+种语言。

1.2 开发前准备

AndroidManifest.xml中添加必要权限:

  1. <uses-permission android:name="android.permission.RECORD_AUDIO" />
  2. <uses-permission android:name="android.permission.INTERNET" /> <!-- 离线识别需额外配置 -->

对于Android 10及以上版本,需动态申请麦克风权限:

  1. private fun checkPermission() {
  2. if (ContextCompat.checkSelfPermission(this, Manifest.permission.RECORD_AUDIO)
  3. != PackageManager.PERMISSION_GRANTED) {
  4. ActivityCompat.requestPermissions(this,
  5. arrayOf(Manifest.permission.RECORD_AUDIO),
  6. REQUEST_RECORD_AUDIO_PERMISSION)
  7. }
  8. }

二、完整代码实现

2.1 基础识别实现

  1. class VoiceRecognitionActivity : AppCompatActivity() {
  2. private lateinit var speechRecognizer: SpeechRecognizer
  3. private lateinit var recognitionIntent: Intent
  4. override fun onCreate(savedInstanceState: Bundle?) {
  5. super.onCreate(savedInstanceState)
  6. setContentView(R.layout.activity_main)
  7. checkPermission()
  8. initSpeechRecognizer()
  9. findViewById<Button>(R.id.btn_start).setOnClickListener {
  10. startListening()
  11. }
  12. }
  13. private fun initSpeechRecognizer() {
  14. speechRecognizer = SpeechRecognizer.createSpeechRecognizer(this)
  15. recognitionIntent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH).apply {
  16. putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
  17. RecognizerIntent.LANGUAGE_MODEL_FREE_FORM)
  18. putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE, packageName)
  19. putExtra(RecognizerIntent.EXTRA_MAX_RESULTS, 5) // 返回最多5个候选结果
  20. }
  21. speechRecognizer.setRecognitionListener(object : RecognitionListener {
  22. override fun onResults(results: Bundle?) {
  23. val matches = results?.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION)
  24. matches?.let {
  25. val resultText = it[0] // 取第一个识别结果
  26. findViewById<TextView>(R.id.tv_result).text = resultText
  27. }
  28. }
  29. override fun onError(error: Int) {
  30. val errorMsg = when (error) {
  31. SpeechRecognizer.ERROR_AUDIO -> "音频错误"
  32. SpeechRecognizer.ERROR_CLIENT -> "客户端错误"
  33. SpeechRecognizer.ERROR_NETWORK -> "网络错误"
  34. else -> "未知错误"
  35. }
  36. Toast.makeText(this@VoiceRecognitionActivity, errorMsg, Toast.LENGTH_SHORT).show()
  37. }
  38. // 其他必要回调方法...
  39. override fun onReadyForSpeech(params: Bundle?) {}
  40. override fun onBeginningOfSpeech() {}
  41. override fun onRmsChanged(rmsdB: Float) {}
  42. override fun onBufferReceived(buffer: ByteArray?) {}
  43. override fun onEndOfSpeech() {}
  44. override fun onPartialResults(partialResults: Bundle?) {}
  45. override fun onEvent(eventType: Int, params: Bundle?) {}
  46. })
  47. }
  48. private fun startListening() {
  49. speechRecognizer.startListening(recognitionIntent)
  50. }
  51. override fun onDestroy() {
  52. super.onDestroy()
  53. speechRecognizer.destroy()
  54. }
  55. }

2.2 高级功能扩展

2.2.1 离线识别配置

在支持设备上启用离线识别:

  1. recognitionIntent.apply {
  2. putExtra(RecognizerIntent.EXTRA_PREFER_OFFLINE, true)
  3. putExtra(RecognizerIntent.EXTRA_LANGUAGE, "zh-CN") // 指定中文
  4. }

2.2.2 语音指令识别

通过EXTRA_LANGUAGE_MODEL指定特定场景:

  1. // 适用于短语音指令
  2. putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_WEB_SEARCH)
  3. // 适用于数字识别
  4. putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM)
  5. putExtra(RecognizerIntent.EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS, 1000)

2.2.3 持续监听实现

通过循环调用实现持续识别:

  1. private var isListening = false
  2. private fun toggleListening() {
  3. if (isListening) {
  4. speechRecognizer.stopListening()
  5. isListening = false
  6. } else {
  7. speechRecognizer.startListening(recognitionIntent)
  8. isListening = true
  9. }
  10. }

三、性能优化与最佳实践

3.1 音频参数调优

  1. // 设置音频采样率(需设备支持)
  2. val audioParams = Bundle().apply {
  3. putInt(AudioManager.EXTRA_PREFERRED_SAMPLING_RATE, 16000) // 16kHz
  4. putInt(AudioManager.EXTRA_PREFERRED_CHANNEL_CONFIG, AudioFormat.CHANNEL_IN_MONO)
  5. }
  6. recognitionIntent.putExtras(audioParams)

3.2 错误处理机制

  1. override fun onError(error: Int) {
  2. when (error) {
  3. SpeechRecognizer.ERROR_NO_MATCH -> {
  4. // 无匹配结果时的处理
  5. showRetryDialog()
  6. }
  7. SpeechRecognizer.ERROR_RECOGNIZER_BUSY -> {
  8. // 识别服务忙时的处理
  9. Handler(Looper.getMainLooper()).postDelayed({
  10. startListening()
  11. }, 1000)
  12. }
  13. // 其他错误处理...
  14. }
  15. }

3.3 内存管理

  1. // 在Activity的onPause中暂停识别
  2. override fun onPause() {
  3. super.onPause()
  4. if (isListening) {
  5. speechRecognizer.cancel()
  6. isListening = false
  7. }
  8. }
  9. // 使用WeakReference避免内存泄漏
  10. private class WeakRecognitionListener(activity: VoiceRecognitionActivity) : RecognitionListener {
  11. private val weakActivity = WeakReference(activity)
  12. override fun onResults(results: Bundle?) {
  13. weakActivity.get()?.run {
  14. // 处理结果
  15. }
  16. }
  17. // 其他方法实现...
  18. }

四、常见问题解决方案

4.1 识别延迟优化

  • 减少EXTRA_MAX_RESULTS数量(默认5个)
  • 使用EXTRA_PARTIAL_RESULTS获取实时中间结果
  • 限制音频输入长度:
    1. putExtra(RecognizerIntent.EXTRA_SPEECH_INPUT_MIN_LENGTH_MILLIS, 3000) // 最小3秒
    2. putExtra(RecognizerIntent.EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS, 1500) // 1.5秒静默结束

4.2 多语言支持

  1. // 自动检测语言
  2. putExtra(RecognizerIntent.EXTRA_LANGUAGE, "zh-CN") // 强制中文
  3. // 或
  4. putExtra(RecognizerIntent.EXTRA_LANGUAGE, "en-US") // 强制英文
  5. // 混合语言识别(需系统支持)
  6. putExtra(RecognizerIntent.EXTRA_LANGUAGE, "cmn-Hans-CN") // 简体中文

4.3 兼容性处理

  1. // 检查设备是否支持语音识别
  2. private fun isSpeechRecognitionAvailable(): Boolean {
  3. val pm = packageManager
  4. val activities = pm.queryIntentActivities(
  5. Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH), 0)
  6. return activities.size > 0
  7. }
  8. // 备用方案实现
  9. if (!isSpeechRecognitionAvailable()) {
  10. // 显示不支持提示或跳转应用市场
  11. Toast.makeText(this, "设备不支持语音识别", Toast.LENGTH_LONG).show()
  12. }

五、进阶功能实现

5.1 语音唤醒词检测

结合AudioRecord实现自定义唤醒词:

  1. private fun setupWakeWordDetection() {
  2. val bufferSize = AudioRecord.getMinBufferSize(
  3. 16000, AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT)
  4. val audioRecord = AudioRecord(
  5. MediaRecorder.AudioSource.MIC, 16000,
  6. AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT, bufferSize)
  7. audioRecord.startRecording()
  8. val thread = Thread {
  9. val buffer = ShortArray(bufferSize / 2)
  10. while (isListening) {
  11. val read = audioRecord.read(buffer, 0, buffer.size)
  12. if (read > 0) {
  13. // 在此实现唤醒词检测算法
  14. if (detectWakeWord(buffer)) {
  15. runOnUiThread { startListening() }
  16. }
  17. }
  18. }
  19. }
  20. thread.start()
  21. }

5.2 实时语音转写

通过EXTRA_PARTIAL_RESULTS实现:

  1. recognitionIntent.putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, true)
  2. // 在RecognitionListener中处理
  3. override fun onPartialResults(partialResults: Bundle?) {
  4. val partialMatches = partialResults?.getStringArrayList(
  5. SpeechRecognizer.RESULTS_RECOGNITION)
  6. partialMatches?.let {
  7. val interimText = it[0]
  8. findViewById<TextView>(R.id.tv_interim).text = interimText
  9. }
  10. }

六、测试与调试技巧

6.1 日志分析

  1. // 启用详细日志
  2. adb shell setprop log.tag.SpeechRecognizer VERBOSE
  3. adb logcat | grep SpeechRecognizer

6.2 模拟语音输入

使用Android Studio的模拟器功能:

  1. 打开Extended Controls面板
  2. 选择Microphone选项
  3. 录制或上传音频文件进行测试

6.3 性能监控

  1. // 监控识别耗时
  2. private var recognitionStartTime: Long = 0
  3. override fun onBeginningOfSpeech() {
  4. recognitionStartTime = System.currentTimeMillis()
  5. }
  6. override fun onResults(results: Bundle?) {
  7. val duration = System.currentTimeMillis() - recognitionStartTime
  8. Log.d("SpeechPerf", "Recognition took $duration ms")
  9. }

通过以上完整实现方案,开发者可以快速构建具备专业级语音识别功能的Android应用。实际开发中需根据具体场景调整参数,并做好异常处理和性能优化。”