說明:環(huán)境MacOS 11.13.3 [MacBook Pro (Retina, 15-inch, Mid 2015)]
DeepSpeech是mozilla利用Tensorflow實現(xiàn)的一種語音識別引擎,參見https://github.com/mozilla/DeepSpeech。
1) 創(chuàng)建項目目錄
mkdir deepspeech
cd deepspeech
2)創(chuàng)建虛擬環(huán)境
virtualenv env-deepspeech --system-site-packages
3)使虛擬環(huán)境生效
source env-deepspeech/bin/activate
4)安裝deepspeech
pip install deepspeech
此時可以運行deepspeech -h檢查環(huán)境是否正常,我運行時出現(xiàn)下面的錯誤:
RuntimeError: module compiled against API version 0xc but this version of numpy is 0x9
此時只要升級一下numpy即可:
pip install --upgrade numpy
5) 獲取已訓練好的模型并解壓縮
wget? https://github.com/mozilla/DeepSpeech/releases/download/v0.1.1/deepspeech-0.1.1-models.tar.gz
tar xvzf deepspeech-0.1.1-models.tar.gz
然后在當前目錄中會生成一個models文件夾,保存了deepspeech訓練出來的模型:
-rw-r--r--? 1 none? staff? ? ? ? 329 11 18 03:25 alphabet.txt
-rw-r--r--? 1 none? staff? 1601028778 11 18 03:25 lm.binary
-rw-r--r--? 1 none? staff? 490978889? 1 17 22:09 output_graph.pb
-rw-r--r--? 1 none? staff? ? 43550345 11 18 03:25 trie
6)準備一個16K采樣,16bit,單聲道的wav文件
我將女兒的英語聽力mp3轉(zhuǎn)成16K,16bit,mono的文件
deepspeech models/output_graph.pb models/alphabet.txt models/lm.binary models/trie test.wav
結(jié)果會出現(xiàn)錯誤,然后退出,錯誤信息如下:
libc++abi.dylib: terminating with uncaught exception of type lm::FormatLoadException: native_client/kenlm/lm/read_arpa.cc:65 in void lm::ReadARPACounts(util::FilePiece &, std::vector &) threw FormatLoadException.
first non-empty line was "1414678853" not \data\. Byte: 11
估計是通過pip安裝的deepspeech版本存在問題,下載原代碼自己編譯:
git clone https://github.com/mozilla/DeepSpeech
cd DeepSpeech/
python util/taskcluster.py --arch osx --target .
查看目錄下生成了deepspeech,說明編譯成功了,將deepspeech拷貝到原來的目錄中再運行:
./deepspeech models/output_graph.pb models/alphabet.txt models/lm.binary models/trie test.wav
識別結(jié)果如下:
he aihtbebyureunittwoo smell and taste lisenanti one the water melon is big and round to here are two parts on the tall tree lets get them three taste these grapes are the nice for what a nice lemon it smells good five what would you like id like some oranges six what of those they are strawverieslisenancircle one is it lemenjuce or oringtuce to taste these grapes or they taste three taste what is it here are some strowberies for you five or those pairs sweet or sour six what would you like watermolanjuceororangejucepage nine beat yo year listen choose and complete one its nice here lets ever pigknack what do you have jo guess they are round they are or range they smell nice what are the kitty oh they are oringers i like sweetorringers here you are thank you to we do you have alice close your eyes now taste it oudesitase its soar is it a lemon yes youre right it sour but nice we can make some lemenjece three im thirsty what do you have beat look a big water melon wow thats great i like what i mean for what do you have in your pack kitty at them are the grape so strawberies there small and round their grapes i think yes they are grapes do you like grapes pen no i like stroperies five what do you like del uatamolans or apples guess they are sweet big and round a pulse no youre wrong i like water melons?
說實話識別率還有點低,而且耗時特別長:
real 4m31.706s
user 8m35.168s
sys 0m14.004s
注意:有運行deepspeech時有可能會找不到libsox2庫,使用brew安裝一下sox庫即可。