ASR 语音识别

所有 ASR 引擎实现 recognizer.SpeechRecognitionEngine 接口，通过工厂按供应商标识创建。

接口

type SpeechRecognitionEngine interface {
    Init(resultCb SpeechRecognitionResult, errorCb RecognitionError)
    Vendor() string
    ConnAndReceive(dialogId string) error
    SendAudioBytes(data []byte) error
    SendEnd() error
    StopConn() error
}

识别采用回调驱动：持续送入 PCM 音频帧，通过回调接收实时转写结果。

基本用法

import "github.com/LingByte/lingllm/recognizer"

factory := recognizer.NewTranscriberFactory()

cfg, _ := recognizer.NewTranscriberConfigFromMap("qcloud", map[string]interface{}{
    "appId":     "your-app-id",
    "secretId":  "your-secret-id",
    "secretKey": "your-secret-key",
})

engine, err := factory.CreateTranscriber(cfg)
if err != nil {
    panic(err)
}

engine.Init(
    func(text string, isLast bool, duration time.Duration, uuid string) {
        fmt.Printf("[%v] %s (final=%t)\n", duration, text, isLast)
    },
    func(err error, isFatal bool) {
        fmt.Printf("ASR 错误 (fatal=%t): %v\n", isFatal, err)
    },
)

engine.ConnAndReceive("dialog-1")
engine.SendAudioBytes(pcmFrame) // 16kHz PCM
engine.SendEnd()
engine.StopConn()

支持的引擎

标识	引擎	说明
`qcloud`	腾讯云	中文优化
`deepgram`	Deepgram	英文为主
`google`	Google	多语言
`aws`	AWS	多语言
`baidu`	百度	中文优化
`volcengine`	火山引擎	中文优化
`volcengine_llm`	火山大模型 ASR	大模型增强
`funasr`	FunASR	开源，可本地部署
`funasr_realtime`	FunASR 实时	流式本地识别
`whisper`	Whisper	开源，可本地部署
`gladia`	Gladia	多语言

运行时获取完整列表：factory.GetSupportedVendors()

音频格式

推荐输入：16kHz、16-bit、单声道 PCM
其他格式需先通过 media 包重采样

示例

参考 examples/voice-demo。

类型	地址
源码	github.com/LingByte/lingllm/tree/main/recognizer
Go 文档	pkg.go.dev/github.com/LingByte/lingllm/recognizer
示例	voice-demo
上游文档	腾讯云 ASR · Deepgram · Google STT

ASR 语音识别

接口

基本用法

支持的引擎

音频格式

示例

相关地址

本页内容