ffmpeg 音視頻播放,同步,硬解

背景

win10自1803版本發(fā)布以來(lái),取消了內(nèi)置的h265的視頻解碼,雖然能安裝插件可以播放,但是在一個(gè)支持硬解8k視頻的N卡上,居然以軟解的方式播放。顛覆了Windows平臺(tái)上DXVA(DirectX Video Acceleration)的認(rèn)知,不得已,只能通過(guò)NVidia提供的sdk來(lái)硬解視頻,其中用到了ffmpeg,這是一個(gè)很好的開始,在開發(fā)過(guò)程中學(xué)到了關(guān)于音視頻的不少知識(shí),在此分享。

播放視頻

視頻播放的本質(zhì)是將一堆序列圖片按照幀頻一張一張顯示出來(lái)。幀頻決定了切換下一張圖片的時(shí)間,而這里的圖片指的是ARGB的像素集,而不是壓縮過(guò)的png或者JPEG等。

假設(shè)一個(gè)視頻fps(幀頻)是30,尺寸是1920x1080,時(shí)長(zhǎng)30秒,那么原始數(shù)據(jù)的大小是 1920x1080x30x30x4=7G,但是實(shí)際上視頻文件不會(huì)這么大,充其量幾十M,那么播放視頻就成了將數(shù)據(jù)解壓成ARGB的像素集,然后根據(jù)幀頻一張一張的顯示出來(lái)。

視頻解碼

視頻解碼其實(shí)就是兩步

1 ,根據(jù)視頻的編碼格式,解出每幀的圖片。

2,將每幀的圖片色彩空間轉(zhuǎn)成RGB的色彩空間。

解出每幀的圖片

視頻的編碼格式有很多,比如h.264, hevc,vp9等,使用ffmpeg時(shí)可以使用-c:v 指定視頻的編碼

圖片發(fā)自簡(jiǎn)書App

上面就是將視頻編碼指定為VP9的webm視頻。

使用ffmpeg完整的視頻解碼如下:

//打開媒體
AVFormatContext *fmtc = NULL;
avformat_open_input(&fmtc, "the video file path", NULL, NULL);
avformat_find_stream_info(fmtc, NULL);
int videoIndex = av_find_best_stream(fmtc, AVMEDIA_TYPE_VIDEO, -1, -1, NULL, 0);

//根據(jù)媒體的編碼格式創(chuàng)建解碼器
AVCodecContext* avctx = avcodec_alloc_context3(NULL);
auto st = fmtc->streams[videoIndex];
avcodec_parameters_to_context(avctx, st->codecpar);
AVCodec*  codec = avcodec_find_decoder(avctx->codec_id);
avcodec_open2(avctx, codec, NULL);

//視頻解碼
AVPacket* packet
av_init_packet(&packet);
packet.data = 0;
packet.size = 0;
AVFrame* frame = av_frame_alloc();

while (av_read_frame(fmtc, &packet) >= 0)
{

    if (packet.stream_index == videoIndex)
    {

        avcodec_send_packet(avctx, packet);

        while(true)
        {
            int ret = avcodec_receive_frame(avctx, frame);
            if(ret == AVERROR(EAGAIN) || ret == AVERROR_EOF)
                break;
            if(ret < 0)
            {
                throw std::exception("CantDecode");
            }
            //receive new frame
        }
        if(packet.data) av_packet_unref(&packet);
        continue;
    }
    av_packet_unref(&packet);
}


//釋放操作
avcodec_free_context(&avctx);
av_frame_free(&frame);
avformat_close_input(&fmtc);


上面的代碼中,可以看出ffmpeg解碼最終得到AVFrame,AVFrame是解壓之后的視頻幀圖片,AVFrame是靠AVPacket和AVCodecContext得來(lái)的,AVPacket是壓縮的視頻幀圖片,通過(guò)讀取AVFormatContext得來(lái)的,AVCodecContext是通過(guò)視頻的編碼格式創(chuàng)建的。

在獲取avframe的時(shí)候還有一個(gè)while循環(huán),按理說(shuō),一個(gè)avpacket不就對(duì)應(yīng)一個(gè)avframe嗎?其實(shí)不然,在有些編碼下,雖然一個(gè)avpacket對(duì)應(yīng)一個(gè)avframe,但完整的圖像信息還得靠相鄰的avpacket才能完全解出,在視頻壓縮過(guò)程中,通過(guò)各種算法來(lái)減少數(shù)據(jù)的容量,這里最為常見的為IPB(Intra coded frames, Predicted pictures, and Bi-directional predictive pictures)。即一個(gè)avpacket可能是I幀,或是P幀,或B幀,I幀只需考慮本幀;P幀記錄的是與前一幀的差別;B幀記錄的是前一幀及后一幀的差別。所以這里的while也就明白了,當(dāng)遇到B幀時(shí),還得等下一個(gè)avpacket。

轉(zhuǎn)成RGB

當(dāng)我們拿到AVFrame的時(shí)候還不能直接顯示,因?yàn)槊總€(gè)像素的色彩空間不一定是RGB的,大部分都是YUV的,我們需要把YUV轉(zhuǎn)成RGB。
在ffmpeg中使用參數(shù)-pix_fmt可以指定像素的色彩空間。

ffmpeg -i in.mov -c:v libx264 -pix_fmt yuv420p out.mp4

YUV色彩空間不同于RGB,“Y”表示明亮度(Luminance、Luma),“U”和“V”則是色度、濃度(Chrominance、Chroma)。失去UV的圖像只是一張灰度圖,加上UV的圖像變成彩色的了,所以在壓縮過(guò)程中,四個(gè)像素點(diǎn)的Y的共享一個(gè)UV就是YUV420,兩個(gè)像素點(diǎn)的Y共享一個(gè)UV就是YUV422,一個(gè)Y對(duì)飲一個(gè)UV就是YUV444…
在YUV420P中的P指的是planar數(shù)據(jù),即YUV是分開存儲(chǔ)的,這也就是為什么在AVFrame的data屬性是byte* data[8]的,data[0]是Y分量,data[1]是U分量,data[2]是V分量。
色彩轉(zhuǎn)換代碼如下:

//色彩模式轉(zhuǎn)換
//此段代碼在打開視頻流處
int w = fmtc->streams[videoIndex]->codecpar->width;
int h = fmtc->streams[videoIndex]->codecpar->height;

SwsContext* swsctx = 0;
uint8_t* pixels = new bytep[ w *  h * 4] ;

///此段代碼在得到avframe處
swsctx = sws_getCachedContext(swsctx, frame->width, frame->height, (AVPixelFormat)frame->format, w, h, AV_PIX_FMT_RGB32, SWS_BICUBIC, 0, 0, 0);
AVPicture pict = { { 0 } };
avpicture_fill(&pict, pixels, AV_PIX_FMT_RGB32, frame->width, frame->width);
sws_scale(swsctx, frame->data, frame->linesize, 0, frame->height, pict.data, pict.linesize);

視頻播放

當(dāng)我們得到RGB像素值時(shí)就可以根據(jù)幀頻顯示視頻了。

//獲取當(dāng)前計(jì)算機(jī)運(yùn)行的時(shí)間
double getTime()
{
    __int64 freq = 0;
    __int64 count = 0;
    if (QueryPerformanceFrequency((LARGE_INTEGER*)&freq) && freq > 0 && QueryPerformanceCounter((LARGE_INTEGER*)&count))
    {
        return (double)count / (double)freq * 1000.0;
    }
    return 0.0;
}

double interval = 1000.0/av_q2d(fmtc->streams[videoIndex]->r_frame_rate);
double estimateTime = frameIndex * interval;    // 預(yù)計(jì)時(shí)間
double actualTime = (getTime() - startTime);    //實(shí)際時(shí)間

上面的代碼中,根據(jù)幀頻得到了預(yù)計(jì)時(shí)間,實(shí)際時(shí)間是當(dāng)前時(shí)間減去開始播放的時(shí)間,要是實(shí)際時(shí)間小于預(yù)計(jì)時(shí)間,那么需要sleep一會(huì),等到預(yù)計(jì)時(shí)間下一幀顯示,反之則要盡快顯示下一幀。

硬件解碼

上面代碼中,如果一個(gè)視頻的幀頻是30幀/秒,意味著每幀切換的時(shí)間是33毫秒,那么問(wèn)題來(lái)了,如果在33毫秒內(nèi)沒(méi)有解完下一幀的圖像,視頻播放就會(huì)延時(shí)或者是丟幀,但如果用硬件解碼,那么出現(xiàn)這個(gè)問(wèn)題的幾率就大大降低了。畢竟GPU在多核處理圖像方面不是CPU所比肩的。
硬件解碼的原理是將avpacket直接提交給gpu,然后gpu解碼,得到一個(gè)surface交由應(yīng)用程序處理,這個(gè)surface存在顯存中,這里用的是DirectX9,即在cpu中以IDirect3DTexture9形式間接訪問(wèn),作為貼圖直接渲染出來(lái)。需要注意的是這里的avpacket 在h264,hevc編碼中剔除了pps等信息,需要加回來(lái)才能提交給GPU,如下

//硬件解碼
//(SPS)Sequence Paramater Set, (PPS)Picture Paramater Set,
//Convert an H.264 bitstream from length prefixed mode to start code prefixed mode (as defined in the Annex B of the ITU-T H.264 specification).
AVPacket* pktFiltered
AVBSFContext *bsfc = NULL;
av_init_packet(&pktFiltered);
pktFiltered.data = 0;
pktFiltered.size = 0;

const AVBitStreamFilter *bsf = av_bsf_get_by_name("h264_mp4toannexb" / "hevc_mp4toannexb");
av_bsf_alloc(bsf, &bsfc);
avcodec_parameters_copy(bsfc->par_in, fmtc->streams[videoIndex]->codecpar);
av_bsf_init(bsfc);

av_bsf_send_packet(bsfc, packet);
av_bsf_receive_packet(bsfc, &pktFiltered);

硬件解碼的具體示例參看NVidia的sdk示例。

半硬件解碼

視頻編碼有千萬(wàn)種,但gpu對(duì)于能解的編碼,分辨率有嚴(yán)格的要求,如果硬件不支持,我們還得靠cpu解碼,但這里我們可以把yuv轉(zhuǎn)rgb的代碼放到gpu端處理 以減輕cpu的壓力,yuv轉(zhuǎn)rgb的源碼在ffmpeg中有,但沒(méi)有參考的價(jià)值,因?yàn)獒槍?duì)cpu優(yōu)化成int算法,而gpu擅長(zhǎng)float的運(yùn)算。下面示例代碼是yuv420p轉(zhuǎn)rgb。

//GPU代碼
//顏色空間轉(zhuǎn)換 YUV420P to RGB
float4x4 colormtx;
texture  tex0; 
texture  tex1; 
texture  tex2; 
sampler sam0 =  sampler_state { Texture = <tex0>;  MipFilter = LINEAR; MinFilter = LINEAR;  MagFilter = LINEAR; };
sampler sam1 =  sampler_state { Texture = <tex1>;  MipFilter = LINEAR; MinFilter = LINEAR;  MagFilter = LINEAR; };
sampler sam2 =  sampler_state { Texture = <tex2>;  MipFilter = LINEAR; MinFilter = LINEAR;  MagFilter = LINEAR; };


float4 c = float4(tex2D(sam0, uv).a, tex2D(sam1, uv).a, tex2D(sam2, uv).a, 1); 
color = mul(c, colormtx); 

//CPU代碼
D3DXMATRIXA16 yuv2rgbMatrix()
{
    /*
            FLOAT r = (1.164 * (Y - 16) + 1.596 * (V - 128));
            FLOAT g = (1.164 * (Y - 16) - 0.813 * (V - 128) - 0.391 * (U - 128));
            FLOAT b = (1.164 * (Y - 16) + 2.018 * (U - 128));

            FLOAT r = 1.164 * Y + 1.596*V - 1.596*128.0/255.0 - 1.164*16.0/255.0;
            FLOAT g = 1.164 * Y - 0.391*U - 0.813*V - 1.164*16.0/255.0+0.813*128.0/255.0+0.391*128.0/255.0;
            FLOAT b = 1.164 * Y + 2.018*U - 1.164*16.0/255.0  - 2.018*128.0/255.0;

        */
    D3DXMATRIXA16 m(
        1.164, 0, 1.596, -1.596*128.0 / 255.0 - 1.164*16.0 / 255.0,
        1.164, -0.391, -0.813, -1.164*16.0 / 255.0 + 0.813*128.0 / 255.0 + 0.391*128.0 / 255.0,
        1.164, 2.018, 0, -1.164*16.0 / 255.0 - 2.018*128.0 / 255.0,
        0, 0, 0, 1
    );
    D3DXMatrixTranspose(&m, &m);
    return m;
}

void update(AVFrame* frame)
{
    int w = ctx_->textureWidth();
    int h = ctx_->textureHeight();
    int w2 = w /2;
    int h2 = h / 2;
    auto device = ctx_->getDevice3D(); //IDirect3DDevice9Ex
    

    if (!texY_) //IDirect3DTexture9* 
    {
        auto effect = render_->effect(); //ID3DXEffect* 
        device->CreateTexture(w, h, 1, D3DUSAGE_DYNAMIC, D3DFMT_A8, D3DPOOL_DEFAULT, &texY_, NULL);
        device->CreateTexture(w2, h2, 1, D3DUSAGE_DYNAMIC, D3DFMT_A8, D3DPOOL_DEFAULT, &texU_, NULL);
        device->CreateTexture(w2, h2, 1, D3DUSAGE_DYNAMIC, D3DFMT_A8, D3DPOOL_DEFAULT, &texV_, NULL);
        effect->SetTexture("tex0", texY_);
        effect->SetTexture("tex1", texU_);
        effect->SetTexture("tex2", texV_);
        D3DXMATRIXA16 m = yuv2rgbMatrix();
        effect->SetMatrix("colormtx", &m);
    }
    upload(frame->data[0], frame->linesize[0], h,  texY_);
    upload(frame->data[1], frame->linesize[1], h2, texU_);
    upload(frame->data[2], frame->linesize[2], h2, texV_);
}

void upload(uint8_t * data, int linesize, int h, IDirect3DTexture9* tex)
{
    D3DLOCKED_RECT locked = { 0 };
    HRESULT hr = tex->LockRect(0, &locked, NULL, D3DLOCK_DISCARD);
    if (SUCCEEDED(hr))
    {
        uint8_t* dst = (uint8_t*)locked.pBits;
        int size = linesize < locked.Pitch ? linesize : locked.Pitch;
        for (INT y = 0; y < h; y++)
        {
            CopyMemory(dst, data, size);
            dst += locked.Pitch;
            data += linesize;
        }
        tex->UnlockRect(0);

    }
}
        


再舉一例,是關(guān)于YUV422P10LE轉(zhuǎn)化成argb的,YUV422P10LE 每個(gè)像素占36bits,其中alpha占12bits,YUV各占8bits,但ffmpeg保存的數(shù)據(jù)是四個(gè)分量各占12bits,每個(gè)分量?jī)蓚€(gè)字節(jié)保存,這里用D3DFMT_L16創(chuàng)建的貼圖。

//YUV422P10LE to ARGB
///< planar YUV 4:4:4,36bpp, (1 Cr & Cb sample per 1x1 Y samples), 12b alpha, little-endian
//YUVA444P12LE to ARGB
//GPU Code
float4x4 colormtx;
texture  tex0; 
texture  tex1; 
texture  tex2; 
texture  tex3; 
sampler sam0 =  sampler_state { Texture = <tex0>;  MipFilter = LINEAR; MinFilter = LINEAR;  MagFilter = LINEAR; };
sampler sam1 =  sampler_state { Texture = <tex1>;  MipFilter = LINEAR; MinFilter = LINEAR;  MagFilter = LINEAR; };
sampler sam2 =  sampler_state { Texture = <tex2>;  MipFilter = LINEAR; MinFilter = LINEAR;  MagFilter = LINEAR; };
sampler sam3 =  sampler_state { Texture = <tex3>;  MipFilter = LINEAR; MinFilter = LINEAR;  MagFilter = LINEAR; };,

float4 c = float4(tex2D(sam0, uv).x, tex2D(sam1, uv).x, tex2D(sam2, uv).x, 0.06248569466697185); //0xfff/0xffff
c = c * 16.003663003663004; //0xffff/0xfff
color = mul(c, colormtx); 
color.a = tex2D(sam3, uv).x * 16.003663003663004; 


//CPU Code
int w = ctx_->textureWidth();
int h = ctx_->textureHeight();
auto device = ctx_->getDevice3D(); //IDirect3DDevice9Ex


if (!texY_) //IDirect3DTexture9* 
{
    auto effect = render_->effect(); //ID3DXEffect* 
    check_hr(device->CreateTexture(w, h, 1, D3DUSAGE_DYNAMIC, D3DFMT_L16, D3DPOOL_DEFAULT, &texY_, NULL)); //12b Y
    check_hr(device->CreateTexture(w, h, 1, D3DUSAGE_DYNAMIC, D3DFMT_L16, D3DPOOL_DEFAULT, &texU_, NULL)); //12b U
    check_hr(device->CreateTexture(w, h, 1, D3DUSAGE_DYNAMIC, D3DFMT_L16, D3DPOOL_DEFAULT, &texV_, NULL)); //12b V
    check_hr(device->CreateTexture(w, h, 1, D3DUSAGE_DYNAMIC, D3DFMT_L16, D3DPOOL_DEFAULT, &texA_, NULL)); //12b A
    effect->SetTexture("tex0", texY_);
    effect->SetTexture("tex1", texU_);
    effect->SetTexture("tex2", texV_);
    effect->SetTexture("tex3", texA_);
    D3DXMATRIXA16 m = uv2rgbMatrix();
    effect->SetMatrix("colormtx", &m);
}
upload(frame->data[0], frame->linesize[0], h, texY_);
upload(frame->data[1], frame->linesize[1], h, texU_);
upload(frame->data[2], frame->linesize[2], h, texV_);
upload(frame->data[3], frame->linesize[3], h, texA_);

播放音頻

原始音頻數(shù)據(jù)有幾個(gè)重要的參數(shù),采樣率(Sample per second - sps),通道(channel),每個(gè)采樣占用的bit數(shù) (bits per sample - bps)。
播放音頻實(shí)際上就是把音頻數(shù)據(jù)不停的發(fā)送到聲卡上,聲卡根據(jù)sps,channel,bps產(chǎn)生聲音。比如一段音頻數(shù)據(jù)大小是4M,采樣率是44100,channel是2,bps是16位,如果將這段數(shù)據(jù)發(fā)送給聲卡,那么過(guò)(4x1024x1024x8)/(44100x2x16)秒后 聲卡會(huì)告訴你聲音播放完了。

使用wave api播放音頻

在Windows上,可以使用wave api播放音頻,播放步驟是打開,寫入,關(guān)閉??梢允褂密浖old wave導(dǎo)出原始的音頻數(shù)據(jù),另存為snd文件,導(dǎo)出時(shí)注意配置聲道,采樣率和bps。下面一個(gè)播放sps是44100,channel是2,bps是16bit的原始音頻數(shù)據(jù)的代碼。

//Play audio use the Wave API
#include <mmsystem.h>

const byte* pcmData =  ....  //假設(shè)這個(gè)要播放的音頻數(shù)據(jù)和數(shù)據(jù)大小
int pcmSize = ....
openAudio();
writeAudio(pcmData, pcmSize);
closeAudio();

////////////////////////////
#define AUDIO_DEV_BLOCK_SIZE 8192
#define AUDIO_DEV_BLOCK_COUNT 4

HWAVEOUT dev = 0;
int available = 0;
WAVEHDR* blocks = 0;
int index = 0;
Mutex mtx;//自定義類 基于EnterCriticalSection 和 LeaveCriticalSection 實(shí)現(xiàn)的

void openAudio()
{

    WAVEFORMATEX wfx = {0};
    wfx.nSamplesPerSec = 44100;
    wfx.wBitsPerSample = 16;
    wfx.nChannels = 2;
    wfx.cbSize = 0;
    wfx.wFormatTag = WAVE_FORMAT_PCM;
    wfx.nBlockAlign = (wfx.wBitsPerSample * wfx.nChannels) >> 3;
    wfx.nAvgBytesPerSec = wfx.nBlockAlign * wfx.nSamplesPerSec;
    waveOutOpen(&dev, WAVE_MAPPER, &wfx, (DWORD_PTR)waveOutProc, (DWORD_PTR)0, CALLBACK_FUNCTION);


    blocks = new WAVEHDR[AUDIO_DEV_BLOCK_COUNT];
    memset(blocks, 0, sizeof(WAVEHDR) * AUDIO_DEV_BLOCK_COUNT);
    for (int i = 0; i < AUDIO_DEV_BLOCK_COUNT; i++)
    {
        blocks[i].lpData = new char[AUDIO_DEV_BLOCK_SIZE];
        blocks[i].dwBufferLength = AUDIO_DEV_BLOCK_SIZE;
    }

}

void closeAudio()
{
    for (int i = 0; i < AUDIO_DEV_BLOCK_COUNT; i++)
    {
        if (blocks[i].dwFlags & WHDR_PREPARED)
        {
            waveOutUnprepareHeader(dev_, &blocks[i], sizeof(WAVEHDR));
        }
        delete blocks[i].lpData;
    }
    delete blocks; 
    waveOutClose(dev);
}

void writeAudio(const byte* data, int size)
{
    if (!bok_)return;
    WAVEHDR* current;
    int remain;
    current = &blocks[index];
    while (size > 0) 
    {
        if (current->dwFlags & WHDR_PREPARED)
        {
            waveOutUnprepareHeader(dev, current, sizeof(WAVEHDR));
        }
        if (size < (int)(AUDIO_DEV_BLOCK_SIZE - current->dwUser))
        {
            memcpy(current->lpData + current->dwUser, data, size);
            current->dwUser += size;
            break;
        }
        remain = AUDIO_DEV_BLOCK_SIZE - current->dwUser;
        memcpy(current->lpData + current->dwUser, data, remain);
        size -= remain;
        data += remain;
        current->dwBufferLength = AUDIO_DEV_BLOCK_SIZE;
        waveOutPrepareHeader(dev, current, sizeof(WAVEHDR));
        waveOutWrite(dev, current, sizeof(WAVEHDR));


        mtx.lock();
        available--;
        mtx.unlock();

        while (!available)
        {
            Sleep(10);
        }
        index++;
        index %= AUDIO_DEV_BLOCK_COUNT;
        current = &blocks[index];
        current->dwUser = 0;
    }
}



上面代碼中在open的時(shí)候設(shè)置了回調(diào)函數(shù)waveOutProc,當(dāng)該函數(shù)被調(diào)用的時(shí)候說(shuō)明一個(gè)8192大小的音頻數(shù)據(jù)塊被播放完,在writeaudio里,不停的循環(huán)寫入大小為8192四個(gè)數(shù)據(jù)塊,這四個(gè)數(shù)據(jù)塊預(yù)先寫進(jìn)去(waveOutWrite),在等waveOutProc回調(diào)時(shí),又有可用的數(shù)據(jù)塊再接著寫,這樣就可以連續(xù)的播放聲音了。

ffmpeg解壓音頻

同樣地,在視頻文件中音頻也是壓縮過(guò)的,一幀一幀的,解出音頻的完整代碼如下:


//打開視頻文件
AVFormatContext *fmtc = NULL;
avformat_network_init();
avformat_open_input(&fmtc, "video file path", NULL, NULL);
avformat_find_stream_info(fmtc, NULL);
int autdioIndex = av_find_best_stream(fmtc, AVMEDIA_TYPE_AUDIO, -1, -1, NULL, 0);

//創(chuàng)建音頻解碼器
AVCodecContext* avctx = avcodec_alloc_context3(NULL);
auto st = fmtc->streams[autdioIndex];
avcodec_parameters_to_context(avctx, st->codecpar);
AVCodec*  codec = avcodec_find_decoder(avctx->codec_id);
avcodec_open2(avctx, codec, NULL);



//解碼音頻
AVFrame* frame = av_frame_alloc();
AVPacket pkt;
av_init_packet(&pkt);
pkt.data = NULL;
pkt.size = 0;

while (av_read_frame(fmtc, &pkt) >= 0)
{

    if (pkt.stream_index == autdioIndex)
    {
        int gotFrame = 0;
        if (avcodec_decode_audio4(avctx, frame, &gotFrame, &pkt) < 0) {
            //fprintf(stderr, "Error decoding audio frame (%s)\n", av_err2str(ret));
            break
        }
        if (gotFrame) {
            writeAudio(frame->extended_data[0], linesize);
        }
    }
    
}

//關(guān)閉
avcodec_free_context(&avctx);
av_frame_free(&frame);
if(pkt.data)av_packet_unref(&pkt);
avformat_close_input(&fmtc_);
return 0;

不難看出,和解碼視頻如出一轍,最終的音頻數(shù)據(jù)在AVFrame中。

音頻轉(zhuǎn)化

雖然上面的例子中我們解出音頻并且播放出來(lái)了,但是有個(gè)條件,就是視頻文件中的音頻的sps是44100,bps是16,channel是2,否則播放的音頻是不正常的。
我們知道,視頻文件中sps,bps,channel不是固定的,這就需要我們轉(zhuǎn)換下我們能播放的采樣率,轉(zhuǎn)化代碼如下

//音頻轉(zhuǎn)化
//創(chuàng)建音頻編碼轉(zhuǎn)換器
auto devSampleFormat = 16 == 8 ? AV_SAMPLE_FMT_U8 : AV_SAMPLE_FMT_S16;
SwrContext * swrc = swr_alloc();
av_opt_set_int(swrc, "in_channel_layout", av_get_default_channel_layout(avctx->channels), 0);
av_opt_set_int(swrc, "in_sample_rate", avctx->sample_rate, 0);
av_opt_set_sample_fmt(swrc, "in_sample_fmt", avctx->sample_fmt, 0);

av_opt_set_int(swrc, "out_channel_layout", av_get_default_channel_layout(2), 0);
av_opt_set_int(swrc, "out_sample_rate", 44100, 0);
av_opt_set_sample_fmt(swrc, "out_sample_fmt", devSampleFormat, 0);
swr_init(swrc);


struct SwrBuffer
{
    int samplesPerSec;
    int numSamples, maxNumSamples;
    uint8_t **data;
    int channels;
    int linesize;
    
};
SwrBuffer dst = {0};
dst.samplesPerSec = dev.samplesPerSec();
dst.channels = dev.channels();
dst.numSamples = dst.maxNumSamples = av_rescale_rnd(numSamples, dst.samplesPerSec, avctx->sample_rate, AV_ROUND_UP);
av_samples_alloc_array_and_samples(&dst.data, &dst.linesize, dst.channels, dst.numSamples, devSampleFormat, 0);

//轉(zhuǎn)換音頻
dst.numSamples = av_rescale_rnd(swr_get_delay(swrc, avctx->sample_rate) + frame->nb_samples, dst.samplesPerSec, avctx->sample_rate, AV_ROUND_UP);

if (dst.numSamples > dst.maxNumSamples) {
    av_freep(&dst.data[0]);
    av_samples_alloc(dst.data, &dst.linesize, dst.channels, dst.numSamples, devSampleFormat, 1);
    dst.maxNumSamples = dst.numSamples;
}
/* convert to destination format */
ret = swr_convert(swrc, dst.data, dst.numSamples, (const uint8_t**)frame->data, frame->nb_samples);
if (ret < 0) {
    //error
}
int bufsize = av_samples_get_buffer_size(&dst.linesize, dst.channels, ret, devSampleFormat, 1);
if (bufsize < 0) {
    //fprintf(stderr, "Could not get sample buffer size\n");
}
writeAudio(dst.data[0], bufsize);


查看聲卡設(shè)備支持的參數(shù)

上面示例中播放聲音一直用的44100,16,2,沒(méi)錯(cuò),這也是我打開聲卡設(shè)備所用的參數(shù),如果聲卡不支持這個(gè)參數(shù)那么waveOutOpen會(huì)調(diào)用失敗,如何判斷聲卡支持的參數(shù),代碼如下:

WAVEINCAPS caps = {0};
if(waveInGetDevCaps(0, &caps, sizeof(caps)) == MMSYSERR_NOERROR)
{
    //checkCaps(caps.dwFormats, WAVE_FORMAT_96S16, 96000, 2, 16);
    //checkCaps(caps.dwFormats, WAVE_FORMAT_96S08, 96000, 2, 8);
}
void checkCaps(DWORD devfmt, DWORD fmt, int sps, int channels, int bps)
{
    if (bps_)return;
    if (devfmt & fmt)
    {
        bps_ = bps;
        channels_ = channels;
        sps_ = sps;
    }

}

音視頻同步

視頻和音頻播放起來(lái)需要同步,不能各播各的,那樣很可能出現(xiàn)的問(wèn)題是口型對(duì)不上聲音,這里我們將用視頻同步到音頻的方式同步音視頻。
我們知道根據(jù)采樣率等信息,完全可以知道音頻播放了多長(zhǎng)時(shí)間,那么根據(jù)這個(gè)時(shí)間就可以把視頻同步上,偽代碼如下:

int audioFrameIndex = 0;
int videoFrameIndex = 0;

//Thread 1

while (true)
{
    decodeAudioData();
    writeAudio(...);
}
void CALLBACK waveOutProc(HWAVEOUT hWaveOut, UINT uMsg, DWORD_PTR dwInstance, DWORD dwParam1, DWORD dwParam2)
{
    if (uMsg != WOM_DONE)
        return;
    ...
    audioFrameIndex ++;
}
//Thread 2
double audioBitsPerSec = audioDev->bitsPerSample() * audioDev->samplesPerSec() * audioDev->channels();
double interval = 1000.0/av_q2d(fmtc->streams[videoIndex]->r_frame_rate);
while (true)
{
    
    if (!decodeVideoFrame())
        continue;
    videoFrameIndex ++;
    while (true)
    {
        double bits = audioFrameIndex * AUDIO_DEV_BLOCK_SIZE * 8.0;
        double ms = bits / audioBitsPerSec * 1000.0; //實(shí)際播放時(shí)間
        double to = videoFrameIndex * interval; //預(yù)計(jì)播放時(shí)間
        if (ms < to)//Need false then wait
        {
            Sleep(1);
            continue;
        }
        presentVideoFrame();
        break;
    }
}


這里多線程中略去了線程鎖的問(wèn)題,且行且小心

后記

這是本人在實(shí)際開發(fā)過(guò)程中的一些見地,不足之處忘大家多多指正。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容