五月天成年网,亚洲在线视频99

(歡迎轉(zhuǎn)載。本文源地址：http://blog.csdn.net/speeds3/article/details/76209152)

最近打算嘗試一下OLAMI在游戲中應(yīng)用的可能性，這里做一下記錄。

unity官方教程中的幾個(gè)項(xiàng)目很精簡，但看起來很不錯(cuò)，里面有全套的資源。最后我選擇了tanks-tutorial來做這個(gè)實(shí)驗(yàn)。

下載和修改項(xiàng)目

首先按照教程下好項(xiàng)目，把坦克移動(dòng)和射擊的代碼加上。這時(shí)就已經(jīng)可以稱的上是一個(gè)“游戲”了，可以控制坦克在地圖上環(huán)游，也可以開炮。雖然缺少了挨揍的敵人，但是對(duì)設(shè)想的用語音控制坦克移動(dòng)和射擊已經(jīng)足夠了。這里我把地圖擴(kuò)大了一些，把坦克的速度降了一些，這樣不至于幾下就開到了地圖的邊緣。

修改速度

準(zhǔn)備語義理解服務(wù)

接下來就可以開始加入語音功能了。OLAMI官網(wǎng)有c#的示例，示例中分別有cloud-speech-recognition和natural-language-understanding兩個(gè)部分，前者字面意思似乎是語音識(shí)別，后者看起來是自然語義理解，里面又分為speech-input和text-input兩部分，只是speech-input是空的?？纯磖eadme，原來已經(jīng)包含在cloud-speech-recognition了。由于在這里不關(guān)心語音識(shí)別，所以就把他倆當(dāng)作一樣使用了，一個(gè)對(duì)應(yīng)語音理解，是我們需要的部分，一個(gè)對(duì)應(yīng)文字理解，可以用來測(cè)試，正好。

把SpeechApiSample.cs和NluApiSample.cs拖入unity里，稍作修改就可以直接使用。

在移動(dòng)和射擊腳本中添加語音控制接口

因?yàn)榇蛩銓?shí)現(xiàn)的方案是語音和鍵盤混合輸入，鍵盤輸入能打斷語音控制的輸入，所以這里要保存一些狀態(tài)，記錄是否是通過語音在控制行動(dòng)或轉(zhuǎn)向，以及語音轉(zhuǎn)向的角度和當(dāng)前已經(jīng)轉(zhuǎn)過的角度。代碼如下：

TankMovement.cs

  // 語音控制中已經(jīng)轉(zhuǎn)過的角度
  private float turnAmount = 0f;
  // 語音控制中希望轉(zhuǎn)到的角度
  private float turnTarget = 0f;
  // 記錄是否是語音控制移動(dòng)的狀態(tài)
  private bool voiceMove;
  // 記錄是否是語音轉(zhuǎn)向的狀態(tài)
  private bool voiceTurn;

  private void Update () {
        // Store the value of both input axes.
        float movement = Input.GetAxis (m_MovementAxisName);
        if (movement != 0) {
            voiceMove = false;
            m_MovementInputValue = movement;
        } else if (!voiceMove) {
            m_MovementInputValue = 0f;
        }

        float turn = Input.GetAxis (m_TurnAxisName);
        if (turn != 0) {
            voiceTurn = false;
            m_TurnInputValue = turn;
        } else if (!voiceTurn) {
            m_TurnInputValue = 0f;
        }
        EngineAudio ();
    }

  private void Turn () {
        // Determine the number of degrees to be turned based on the input, speed and time between frames.
        float turn = m_TurnInputValue * m_TurnSpeed * Time.deltaTime;

        if (turnTarget != 0) {
            turnAmount += turn;
            if (turnTarget > 0) {
                if (turnAmount > turnTarget) {
                    m_TurnInputValue = 0f;
                    turnTarget = 0f;
                    turnAmount = 0f;
                    voiceTurn = false;
                }
            } else {
                if (turnAmount < turnTarget) {
                    m_TurnInputValue = 0f;
                    turnTarget = 0f;
                    turnAmount = 0f;
                    voiceTurn = false;
                }
            }
        }

        // Make this into a rotation in the y axis.
        Quaternion turnRotation = Quaternion.Euler (0f, turn, 0f);

        // Apply this rotation to the rigidbody's rotation.
        m_Rigidbody.MoveRotation (m_Rigidbody.rotation * turnRotation);
    }

    public void VoiceMove(float movement) {
        if (movement != 0) {
            voiceMove = true;
            m_MovementInputValue = movement;
        } else {
            voiceMove = false;
            m_MovementInputValue = 0f;
        }
    }

    public void VoiceTurn(float turn) {
        if (turn == 0) {
            voiceTurn = false;
            return;
        }
        turnTarget = turn;
        voiceTurn = true;
        if (turn > 0) {
            m_TurnInputValue = 1.0f;
        } else {
            m_TurnInputValue = -1.0f;
        }

    }

轉(zhuǎn)向和移動(dòng)稍有些不同，移動(dòng)時(shí)只要模擬按鍵值一直是1就可以，轉(zhuǎn)向就有一個(gè)轉(zhuǎn)到多少度的問題。所以Turn的代碼里加了一些處理。

TankShootin中就比較簡單，直接添加方法：

public void VoiceFire() {
    m_CurrentLaunchForce = m_MaxLaunchForce / 2;
    Fire ();
}

考慮到語音輸入本身需要時(shí)間，這里沒有加入冷卻的代碼，而且蓄力直接定為滿格的1/2。

為了方便之后在錄音和輸入文本后使用，將語音控制包裝到TankVoiceControl中，并將腳本附加到tank上。

TankVoiceControl.cs

using System.Collections;
using System.Collections.Generic;
using UnityEngine;

public class TankVoiceControl : MonoBehaviour {

    TankMovement move;

    TankShooting shooting;

    // Use this for initialization
    void Start () {
        move = GetComponent<TankMovement> ();
        shooting = GetComponent<TankShooting> ();
    }

    // Update is called once per frame
    void Update () {

    }

    public void VoiceMove(float movement) {
        move.VoiceMove (movement);
    }

    public void VoiceTurn(float turn) {
        move.VoiceTurn (turn);
    }

    public void VoiceFire() {
        shooting.VoiceFire ();
    }

  // 處理OLAMI解析出來的語義
    public void ProcessSemantic(Semantic sem) {
        if (sem.app == "game") {
            string modifier = sem.modifier [0];
            Slot[] slots = sem.slots;
            switch (modifier) {
            case "move":
                {
                    string move = "0f";
                    foreach (Slot slot in slots) {
                        if (slot.name == "movement") {
                            move = slot.value;
                        }
                    }
                    VoiceMove (float.Parse (move));
                }
                break;
            case "stop":
                {
                    VoiceMove (0f);
                }
                break;
            case "leftturn":
                {
                    string turn = "0f";
                    foreach (Slot slot in slots) {
                        if (slot.name == "turn") {
                            turn = slot.value;
                        }
                    }
                    VoiceTurn (0 - float.Parse (turn));
                }
                break;
            case "rightturn":
                {
                    string turn = "0f";
                    foreach (Slot slot in slots) {
                        if (slot.name == "turn") {
                            turn = slot.value;
                        }
                    }
                    VoiceTurn (float.Parse (turn));
                }
                break;
            case "fire":
                {
                    VoiceFire ();
                }
                break;
            }
            return;
        }
    }
}

ProcessSemantic方法用來處理OLAMI接口返回的語義。

在OLAMI平臺(tái)添加語義

其實(shí)我的語義是在ProcessSemantic之前就寫好了的，不過先規(guī)劃好語義再去OLAMI添加也沒什么問題。

添加語義

加完之后別忘了發(fā)布，再在應(yīng)用管理頁面配置上剛加的NLI模塊。

用文本來測(cè)試語義解析

現(xiàn)在可以來測(cè)試一下語義能不能起作用了。這里是場(chǎng)景增加一個(gè)InputField，on end edit的回調(diào)函數(shù)中調(diào)用NluApiSample的GetRecognitionResult方法的。當(dāng)然這其中少不了一些封裝。

on end edit的回調(diào)函數(shù)

public void OnSubmitText(string text) {
        string result = VoiceService.GetInstance().sendText (text);
        VoiceResult voiceResult = JsonUtility.FromJson<VoiceResult> (result);
        if (voiceResult.status.Equals ("ok")) {
            Nli[] nlis = voiceResult.data.nli;
            if (nlis.Length != 0) {
                foreach (Nli nli in nlis) {
                    if (nli.type == "game") {
                        foreach (Semantic sem in nli.semantic) {
                            voiceControl.ProcessSemantic (sem);
                            return;
                        }
                    }
                }
            }
        }
    }

VoiceService的sendText方法

public string sendText(string text) {
        return nluApi.GetRecognitionResult ("nli", text);
    }

保存腳本，測(cè)試。文本的語義理解速度非常快，雖然是通過http請(qǐng)求的方式拿結(jié)果，但在我的機(jī)器上測(cè)試時(shí)感覺不到延時(shí)，坦克的轉(zhuǎn)向、移動(dòng)都很順暢。

增加錄音功能

unity中提供了一個(gè)Microphone類來實(shí)現(xiàn)麥克風(fēng)的功能，可以直接得到AudioClip對(duì)象。這里采用按下F1開始錄音，松開結(jié)束錄音的方式。錄音長度暫定為5秒。由于olami接口支持的是wav格式的PCM錄音，所以在github上找到一個(gè)WavUtility來做轉(zhuǎn)換。

VoiceController.cs

using System.Collections;
using System.Collections.Generic;
using UnityEngine.UI;
using UnityEngine;
using System;
using System.Threading;

public class VoiceController : MonoBehaviour {
    AudioClip audioclip;

    bool recording;

    [SerializeField]
    TankVoiceControl voiceControl;

    // Use this for initialization
    void Start () {
    }

    // Update is called once per frame
    void Update () {
        if (Input.GetKeyDown (KeyCode.F1)) {
            recording = true;
        } else if (Input.GetKeyUp(KeyCode.F1)) {
            recording = false;
        }
    }

    void LateUpdate() {
        if (recording) {
            if (!Microphone.IsRecording (null)) {
        // 開始錄音
                audioclip = Microphone.Start (null, false, 5, 16000);
            }
        } else {
            if (Microphone.IsRecording(null)) {
                Microphone.End (null);
                if (audioclip != null) {
          // WavUtility中有方法必須在主線程中執(zhí)行，所以只能放在這里轉(zhuǎn)換
                    byte[] audiodata = WavUtility.FromAudioClip (audioclip);
          // 將發(fā)送錄音的過程放到新線程里，減少主線程卡頓
                    Thread thread = new Thread (new ParameterizedThreadStart(process));
                    thread.Start ((object) audiodata);
                }
            }

        }
    }

    void process(object obj) {
        byte[] audiodata = (byte[]) obj;
        string result = VoiceService.GetInstance ().sendSpeech (audiodata);
        audioclip = null;
        Debug.Log (result);
        VoiceResult voiceResult = JsonUtility.FromJson<VoiceResult> (result);
        if (voiceResult.status.Equals ("ok")) {
            Nli[] nlis = voiceResult.data.nli;
            if (nlis != null && nlis.Length != 0) {
                foreach (Nli nli in nlis) {
                    if (nli.type == "game") {
                        foreach (Semantic sem in nli.semantic) {
                            voiceControl.ProcessSemantic (sem);
                        }
                    }
                }
            }
        }
    }
}

// 下面的幾個(gè)class用于解析json數(shù)據(jù)。
[Serializable]
public class VoiceResult {
    public VoiceData data;
    public string status;
}

[Serializable]
public class VoiceData {
    public Nli[] nli;
}

[Serializable]
public class Nli {
    public DescObj desc;
    public Semantic[] semantic;
    public string type;
}

[Serializable]
public class DescObj {
    public string result;
    public int status;
}

[Serializable]
public class Semantic {
    public string app;
    public string input;
    public Slot[] slots;
    public string[] modifier;
    public string customer;
}

[Serializable]
public class Slot {
    public string name;
    public string value;
    public string[] modifier;
}

測(cè)試

現(xiàn)在可以啟動(dòng)游戲，試試語音的控制了。在我的機(jī)器上，從錄音結(jié)束到坦克開始行動(dòng)大概要一兩秒的時(shí)間。不過說前進(jìn)，后退之后不用一直按著按鍵，感覺還是不錯(cuò)的。還可以說“左轉(zhuǎn)1800度”來看坦克傻傻的轉(zhuǎn)圈。

總結(jié)

總的來說，雖然是在線語義理解，但OLAMI還是可以用在游戲中實(shí)時(shí)性要求不是特別高的場(chǎng)景，比如自動(dòng)向前跑動(dòng)。OLAMI在文本語義理解上的速度表現(xiàn)更是出乎意料的好。如果能提高語音識(shí)別的速度，例如提供離線包，相信語音控制應(yīng)用的范圍會(huì)更大一些。這個(gè)游戲后續(xù)我還會(huì)繼續(xù)完善，敬請(qǐng)期待。