Vue实现AI问答小助手(3)：录音与语音转文字全流程指南

一、技术选型与核心原理

在Vue项目中实现录音与语音转文字功能，需结合浏览器原生API与第三方语音识别服务。核心流程分为三步：录音数据采集、音频格式处理、语音转文字API调用。

1.1 录音技术选型

浏览器原生提供MediaRecorder API，支持通过麦克风采集音频数据，生成Blob或ArrayBuffer格式的音频文件。其优势在于无需额外插件，兼容Chrome、Firefox等现代浏览器。

// 初始化录音
async startRecording() {
  const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
  this.mediaRecorder = new MediaRecorder(stream, {
    mimeType: 'audio/wav', // 推荐格式，兼容性较好
    audioBitsPerSecond: 128000
  });
  this.audioChunks = [];
  this.mediaRecorder.ondataavailable = (event) => {
    this.audioChunks.push(event.data);
  };
  this.mediaRecorder.start();
}

1.2 语音转文字服务对比

服务类型	优势	限制
浏览器原生API	无需网络，隐私性强	仅支持英文，准确率低
第三方云服务	支持多语言，准确率高	需API调用，可能产生费用
WebAssembly库	离线可用，可控性强	模型体积大，首次加载慢

推荐采用混合方案：优先尝试浏览器原生API（如Web Speech API的SpeechRecognition），失败后回退到云服务。

二、录音功能实现细节

2.1 权限管理与错误处理

用户必须明确授权麦克风权限。需监听navigator.mediaDevices.getUserMedia的拒绝事件，并提供友好提示。

try {
  const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
} catch (err) {
  if (err.name === 'NotAllowedError') {
    alert('请允许麦克风权限以使用语音功能');
  } else {
    alert(`录音初始化失败: ${err.message}`);
  }
  return;
}

2.2 音频数据优化

采样率调整：通过MediaRecorder的audioBitsPerSecond控制音质与文件大小平衡。
分块处理：实时上传音频片段（如每2秒），减少内存占用。
格式转换：若后端要求MP3，可使用lamejs等库在前端转换。

// 停止录音并生成音频文件
stopRecording() {
  this.mediaRecorder.stop();
  this.mediaRecorder.onstop = () => {
    const audioBlob = new Blob(this.audioChunks, { type: 'audio/wav' });
    this.audioUrl = URL.createObjectURL(audioBlob);
    // 上传audioBlob至后端或调用语音识别
  };
}

三、语音转文字集成方案

3.1 浏览器原生API实践

Web Speech API的SpeechRecognition接口支持实时语音转文字，但仅限英文且准确率有限。

// 初始化语音识别
const recognition = new (window.SpeechRecognition || 
  window.webkitSpeechRecognition)();
recognition.lang = 'en-US';
recognition.interimResults = true; // 实时返回中间结果
recognition.onresult = (event) => {
  const transcript = Array.from(event.results)
    .map(result => result[0].transcript)
    .join('');
  this.aiResponse = transcript; // 显示识别结果
};
recognition.start();

3.2 第三方云服务集成（以通用REST API为例）

若需高准确率或多语言支持，可调用云服务API。以下为伪代码示例：

async function transcribeAudio(audioBlob) {
  const formData = new FormData();
  formData.append('audio', audioBlob, 'recording.wav');
  const response = await fetch('https://api.example.com/asr', {
    method: 'POST',
    body: formData,
    headers: {
      'Authorization': 'Bearer YOUR_API_KEY'
    }
  });
  const data = await response.json();
  return data.transcript; // 返回识别文本
}

3.3 离线方案：WebAssembly库

使用Vosk等WebAssembly库可在浏览器内实现离线语音识别，适合对隐私要求高的场景。

<!-- 引入Vosk库 -->
<script src="https://unpkg.com/@alphacep/vosk-browser@0.3.15/vosk.js"></script>
<script>
  async function loadModel() {
    const model = await Vosk.createModel('https://example.com/models/vosk-model-small-en-us-0.15.zip');
    const recognizer = new Vosk.Recognizer({ model, sampleRate: 16000 });
    // 传入音频数据并获取结果
  }
</script>

四、性能优化与用户体验

4.1 录音状态管理

使用Vuex或Pinia管理录音状态，避免组件间状态混乱。

// store/modules/audio.js
export default {
  state: () => ({
    isRecording: false,
    audioUrl: null,
    transcript: ''
  }),
  mutations: {
    SET_RECORDING(state, value) {
      state.isRecording = value;
    },
    SET_TRANSCRIPT(state, text) {
      state.transcript = text;
    }
  }
};

4.2 错误恢复机制

网络中断处理：缓存音频片段，网络恢复后重试。
API限流应对：实现指数退避算法，避免频繁请求。

async function safeTranscribe(audioBlob, retries = 3) {
  for (let i = 0; i < retries; i++) {
    try {
      return await transcribeAudio(audioBlob);
    } catch (err) {
      if (i === retries - 1) throw err;
      await new Promise(resolve => setTimeout(resolve, 1000 * Math.pow(2, i)));
    }
  }
}

4.3 跨平台兼容性

移动端适配：检测navigator.userAgent，提示用户使用Chrome或Safari。
iOS限制：iOS需在用户交互事件（如点击）中触发录音，否则会被拦截。

五、安全与隐私考虑

数据加密：上传音频前使用CryptoJS加密，防止中间人攻击。
隐私政策：明确告知用户音频数据的用途与存储期限。
本地处理优先：敏感场景下优先使用WebAssembly离线方案。

六、完整代码示例

<template>
  <div>
    <button @click="toggleRecording">{{ isRecording ? '停止' : '开始' }}录音</button>
    <div v-if="transcript">识别结果: {{ transcript }}</div>
    <audio v-if="audioUrl" :src="audioUrl" controls></audio>
  </div>
</template>
<script>
export default {
  data() {
    return {
      isRecording: false,
      mediaRecorder: null,
      audioChunks: [],
      audioUrl: null,
      transcript: ''
    };
  },
  methods: {
    async toggleRecording() {
      if (this.isRecording) {
        this.stopRecording();
      } else {
        await this.startRecording();
      }
    },
    async startRecording() {
      try {
        const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
        this.mediaRecorder = new MediaRecorder(stream, { mimeType: 'audio/wav' });
        this.audioChunks = [];
        this.mediaRecorder.ondataavailable = (event) => {
          this.audioChunks.push(event.data);
        };
        this.mediaRecorder.onstop = async () => {
          const audioBlob = new Blob(this.audioChunks, { type: 'audio/wav' });
          this.audioUrl = URL.createObjectURL(audioBlob);
          // 调用语音转文字（示例为伪代码）
          this.transcript = await this.transcribeAudio(audioBlob);
        };
        this.mediaRecorder.start();
        this.isRecording = true;
      } catch (err) {
        alert(`录音失败: ${err.message}`);
      }
    },
    stopRecording() {
      this.mediaRecorder.stop();
      this.isRecording = false;
    },
    async transcribeAudio(audioBlob) {
      // 实际项目中替换为真实API调用
      return new Promise(resolve => {
        setTimeout(() => resolve('示例识别结果'), 1000);
      });
    }
  }
};
</script>

七、总结与扩展建议

渐进式增强：优先实现核心功能，再逐步添加实时识别、多语言支持等高级特性。
测试覆盖：重点测试移动端兼容性、网络中断场景及高并发情况。
监控告警：集成Sentry等工具监控语音识别失败率，及时优化模型或切换服务商。

通过以上方案，开发者可在Vue项目中高效实现录音与语音转文字功能，为用户提供自然流畅的语音交互体验。