WebRTC 音頻數(shù)據(jù)處理中,期望可以實現(xiàn)音頻數(shù)據(jù)處理及傳輸,延時低,互動性好,聲音平穩(wěn)無抖動,碼率低消耗帶寬少等。在數(shù)據(jù)傳輸上,WebRTC 采用基于 UDP 的 RTP/RTCP 協(xié)議,RTP/RTCP 本身不提供數(shù)據(jù)的可靠傳輸及質(zhì)量保障。公共互聯(lián)網(wǎng)這種分組交換網(wǎng)絡(luò),天然具有數(shù)據(jù)包傳輸?shù)膩G失、重復(fù)、亂序及延時等問題。WebRTC 音頻數(shù)據(jù)處理的這些目標(biāo)很難同時實現(xiàn),WebRTC 的音頻網(wǎng)絡(luò)對抗實現(xiàn)中針對不同情況對這些目標(biāo)進(jìn)行平衡。
這里更仔細(xì)地看一下 WebRTC 音頻數(shù)據(jù)處理管線,并特別關(guān)注與音頻網(wǎng)絡(luò)對抗相關(guān)的邏輯。
WebRTC 的音頻數(shù)據(jù)接收及解碼播放控制管線
前面在 WebRTC 的音頻數(shù)據(jù)編碼及發(fā)送控制管線 一文中分析了 WebRTC 的音頻數(shù)據(jù)編碼及發(fā)送控制相關(guān)邏輯,這里再來看一下 WebRTC 的音頻數(shù)據(jù)接收及解碼播放過程。
WebRTC 的音頻數(shù)據(jù)接收處理的概念抽象層面的完整流程大體如下:
----------------------------- -------------------------- ---------------------------
| | | | | |
| webrtc::AudioDeviceModule | <== | webrtc::AudioTransport | <== | webrtc::AudioProcessing |
| | | | | |
----------------------------- -------------------------- ---------------------------
/ \
||
+=+===============================+=+
| |
--------------------------------------------
| |
| webrtc::AudioMixer |
| |
--------------------------------------------
/ \
| |
------------------------- ---------------------------------------------------------
| | | |
| cricket::MediaChannel | ==> | webrtc::AudioMixer::Source/webrtc::AudioReceiveStream |
| | | |
------------------------- ---------------------------------------------------------
||
\ /
------------------------------------------- ---------------------
| | | |
| cricket::MediaChannel::NetworkInterface | <== | webrtc::Transport |
| | | |
------------------------------------------- ---------------------
對于 WebRTC 的音頻數(shù)據(jù)接收處理過程,webrtc::AudioDeviceModule 負(fù)責(zé)把聲音 PCM 數(shù)據(jù)通過系統(tǒng)接口送進(jìn)設(shè)備播放出來。webrtc::AudioDeviceModule 內(nèi)部一般會起專門的播放線程,由播放線程驅(qū)動整個解碼播放過程。webrtc::AudioTransport 作為一個適配和膠水模塊,它把音頻數(shù)據(jù)播放和 webrtc::AudioProcessing 的音頻數(shù)據(jù)處理及混音等結(jié)合起來,它通過 webrtc::AudioMixer 同步獲取并混音各個遠(yuǎn)端音頻流,這些混音之后的音頻數(shù)據(jù)除了返回給 webrtc::AudioDeviceModule 用于播放外,還會被送進(jìn) webrtc::AudioProcessing,以作為回聲消除的參考信號。webrtc::AudioMixer::Source / webrtc::AudioReceiveStream 為播放過程提供解碼之后的數(shù)據(jù)。RTCP 反饋在 webrtc::AudioMixer::Source / webrtc::AudioReceiveStream 中會通過 webrtc::Transport 發(fā)送出去。webrtc::Transport 也是一個適配和膠水模塊,它通過 cricket::MediaChannel::NetworkInterface 實際將數(shù)據(jù)包發(fā)送網(wǎng)絡(luò)。cricket::MediaChannel 從網(wǎng)絡(luò)中接收音頻數(shù)據(jù)包并送進(jìn) webrtc::AudioMixer::Source / webrtc::AudioReceiveStream。
如果將音頻數(shù)據(jù)接收處理流水線上的適配和膠水模塊省掉,音頻數(shù)據(jù)接收處理流水線將可簡化為類似下面這樣:
----------------------------- ---------------------------
| | | |
| webrtc::AudioDeviceModule | <== | webrtc::AudioProcessing |
| | | |
----------------------------- ---------------------------
/ \
||
--------------------------------------------
| |
| webrtc::AudioMixer |
| |
--------------------------------------------
/ \
| |
------------------------- ---------------------------------------------------------
| | | |
| cricket::MediaChannel | ==> | webrtc::AudioMixer::Source/webrtc::AudioReceiveStream |
| | | |
------------------------- ---------------------------------------------------------
||
\ /
------------------------------------------------------------------------
| |
| cricket::MediaChannel::NetworkInterface |
| |
------------------------------------------------------------------------
webrtc::AudioMixer::Source / webrtc::AudioReceiveStream 是整個過程的中心,其實現(xiàn)位于 webrtc/audio/audio_receive_stream.h / webrtc/audio/audio_receive_stream.cc,相關(guān)的類層次結(jié)構(gòu)如下圖:

在 RTC 中,為了實現(xiàn)交互和低延遲,音頻數(shù)據(jù)接收處理不能只做包的重排序和解碼,它還要充分考慮網(wǎng)絡(luò)對抗,如 PLC 及發(fā)送 RTCP 反饋等,這也是一個相當(dāng)復(fù)雜的過程。WebRTC 的設(shè)計大量采用了控制流與數(shù)據(jù)流分離的思想,這在 webrtc::AudioReceiveStream 的設(shè)計與實現(xiàn)中也有體現(xiàn)。分析 webrtc::AudioReceiveStream 的設(shè)計與實現(xiàn)時,也可以從配置及控制,和數(shù)據(jù)流兩個角度來看。
可以對 webrtc::AudioReceiveStream 執(zhí)行的配置和控制主要有如下這些:
- NACK,jitter buffer 最大大小,payload type 與 codec 的映射等;
- 配置用于把 RTCP 包發(fā)送到網(wǎng)絡(luò)的
webrtc::Transport、解密參數(shù)等; -
webrtc::AudioReceiveStream的生命周期控制,如啟動停止等;
對于數(shù)據(jù)流,一是從網(wǎng)絡(luò)中接收到的數(shù)據(jù)包被送進(jìn) webrtc::AudioReceiveStream;二是播放時,webrtc::AudioDeviceModule 從 webrtc::AudioReceiveStream 獲得解碼后的數(shù)據(jù),并送進(jìn)播放設(shè)備播放出來;三是 webrtc::AudioReceiveStream 發(fā)送 RTCP 反饋包給發(fā)送端以協(xié)助實現(xiàn)擁塞控制,對編碼發(fā)送過程產(chǎn)生影響。
webrtc::AudioReceiveStream 的實現(xiàn)中,最主要的數(shù)據(jù)處理流程 —— 音頻數(shù)據(jù)接收、解碼及播放過程,及相關(guān)模塊如下圖:

這個圖中的箭頭表示數(shù)據(jù)流動的方向,數(shù)據(jù)在各個模塊中處理的先后順序為自左向右。圖中下方紅色的框中是與網(wǎng)絡(luò)對抗密切相關(guān)的邏輯。
webrtc::AudioReceiveStream 的實現(xiàn)的數(shù)據(jù)處理流程中,輸入數(shù)據(jù)為音頻網(wǎng)絡(luò)數(shù)據(jù)包和對端發(fā)來的 RTCP 包,來自于 cricket::MediaChannel,輸出數(shù)據(jù)為解碼后的 PCM 數(shù)據(jù),被送給 webrtc::AudioTransport,以及構(gòu)造的 RTCP 反饋包,如 TransportCC、RTCP NACK 包,被送給 webrtc::Transport 發(fā)出去。
webrtc::AudioReceiveStream 的實現(xiàn)內(nèi)部,音頻網(wǎng)絡(luò)數(shù)據(jù)包最終被送進(jìn) NetEQ 的緩沖區(qū) webrtc::PacketBuffer 里,播放時 NetEQ 做解碼、PLC 等,解碼后的數(shù)據(jù)提供給 webrtc::AudioDeviceModule。
WebRTC 音頻數(shù)據(jù)接收處理流水線的搭建過程
這里先來看一下,webrtc::AudioReceiveStream 實現(xiàn)的這個數(shù)據(jù)處理流水線的搭建過程。
webrtc::AudioReceiveStream 實現(xiàn)的數(shù)據(jù)處理管線是分步驟搭建完成的。我們圍繞上面的 webrtc::AudioReceiveStream 數(shù)據(jù)處理流程圖 來看這個過程。
在 webrtc::AudioReceiveStream 對象創(chuàng)建,也就是 webrtc::voe::(anonymous namespace)::ChannelReceive 對象創(chuàng)建時,會創(chuàng)建一些關(guān)鍵對象,并建立部分對象之間的聯(lián)系,這個調(diào)用過程如下:
#0 webrtc::voe::(anonymous namespace)::ChannelReceive::ChannelReceive(webrtc::Clock*, webrtc::NetEqFactory*, webrtc::AudioDeviceModule*, webrtc::Transport*, webrtc::RtcEventLog*, unsigned int, unsigned int, unsigned long, bool, int, bool, bool, rtc::scoped_refptr<webrtc::AudioDecoderFactory>, absl::optional<webrtc::AudioCodecPairId>, rtc::scoped_refptr<webrtc::FrameDecryptorInterface>, webrtc::CryptoOptions const&, rtc::scoped_refptr<webrtc::FrameTransformerInterface>)
(this=0x61b000008c80, clock=0x602000003bb0, neteq_factory=0x0, audio_device_module=0x614000010040, rtcp_send_transport=0x619000017cb8, rtc_event_log=0x613000011f40, local_ssrc=4195875351, remote_ssrc=1443723799, jitter_buffer_max_packets=200, jitter_buffer_fast_playout=false, jitter_buffer_min_delay_ms=0, jitter_buffer_enable_rtx_handling=false, enable_non_sender_rtt=false, decoder_factory=..., codec_pair_id=..., frame_decryptor=..., crypto_options=..., frame_transformer=...) at webrtc/audio/channel_receive.cc:517
#2 webrtc::voe::CreateChannelReceive(webrtc::Clock*, webrtc::NetEqFactory*, webrtc::AudioDeviceModule*, webrtc::Transport*, webrtc::RtcEventLog*, unsigned int, unsigned int, unsigned long, bool, int, bool, bool, rtc::scoped_refptr<webrtc::AudioDecoderFactory>, absl::optional<webrtc::AudioCodecPairId>, rtc::scoped_refptr<webrtc::FrameDecryptorInterface>, webrtc::CryptoOptions const&, rtc::scoped_refptr<webrtc::FrameTransformerInterface>)
(clock=0x602000003bb0, neteq_factory=0x0, audio_device_module=0x614000010040, rtcp_send_transport=0x619000017cb8, rtc_event_log=0x613000011f40, local_ssrc=4195875351, remote_ssrc=1443723799, jitter_buffer_max_packets=200, jitter_buffer_fast_playout=false, jitter_buffer_min_delay_ms=0, jitter_buffer_enable_rtx_handling=false, enable_non_sender_rtt=false, decoder_factory=..., codec_pair_id=..., frame_decryptor=..., crypto_options=..., frame_transformer=...) at webrtc/audio/channel_receive.cc:1137
#3 webrtc::internal::(anonymous namespace)::CreateChannelReceive(webrtc::Clock*, webrtc::AudioState*, webrtc::NetEqFactory*, webrtc::AudioReceiveStream::Config const&, webrtc::RtcEventLog*) (clock=0x602000003bb0, audio_state=
0x628000004100, neteq_factory=0x0, config=..., event_log=0x613000011f40) at webrtc/audio/audio_receive_stream.cc:79
#4 webrtc::internal::AudioReceiveStream::AudioReceiveStream(webrtc::Clock*, webrtc::PacketRouter*, webrtc::NetEqFactory*, webrtc::AudioReceiveStream::Config const&, rtc::scoped_refptr<webrtc::AudioState> const&, webrtc::RtcEventLog*) (this=
0x61600005be80, clock=0x602000003bb0, packet_router=
0x61c000060908, neteq_factory=0x0, config=..., audio_state=..., event_log=0x613000011f40)
at webrtc/audio/audio_receive_stream.cc:103
#5 webrtc::internal::Call::CreateAudioReceiveStream(webrtc::AudioReceiveStream::Config const&) (this=
0x620000001080, config=...) at webrtc/call/call.cc:954
#6 cricket::WebRtcVoiceMediaChannel::WebRtcAudioReceiveStream::WebRtcAudioReceiveStream(webrtc::AudioReceiveStream::Config, webrtc::Call*) (this=0x60b000010fd0, config=..., call=0x620000001080) at webrtc/media/engine/webrtc_voice_engine.cc:1220
#7 cricket::WebRtcVoiceMediaChannel::AddRecvStream(cricket::StreamParams const&) (this=0x619000017c80, sp=...)
at webrtc/media/engine/webrtc_voice_engine.cc:2025
#8 cricket::BaseChannel::AddRecvStream_w(cricket::StreamParams const&) (this=0x619000018180, sp=...)
ebrtc/pc/channel.cc:567
#9 cricket::BaseChannel::UpdateRemoteStreams_w(std::vector<cricket::StreamParams, std::allocator<cricket::StreamParams> > const&, webrtc::SdpType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)
(this=0x619000018180, streams=std::vector of length 1, capacity 1 = {...}, type=webrtc::SdpType::kOffer, error_desc=0x7ffff2387e00)
at webrtc/pc/channel.cc:725
#10 cricket::VoiceChannel::SetRemoteContent_w(cricket::MediaContentDescription const*, webrtc::SdpType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) (this=0x619000018180, content=0x6130000003c0, type=webrtc::SdpType::kOffer, error_desc=0x7ffff2387e00)
at webrtc/pc/channel.cc:926
#11 cricket::BaseChannel::SetRemoteContent(cricket::MediaContentDescription const*, webrtc::SdpType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) (this=0x619000018180, content=0x6130000003c0, type=webrtc::SdpType::kOffer, error_desc=0x7ffff2387e00)
at webrtc/pc/channel.cc:292
webrtc::AudioReceiveStream 通過 webrtc::Call 創(chuàng)建,傳入 webrtc::AudioReceiveStream::Config,其中包含與 NACK、jitter buffer 最大大小、payload type 與 codec 的映射相關(guān),及 webrtc::Transport 等各種配置。
webrtc::voe::(anonymous namespace)::ChannelReceive 對象的構(gòu)造函數(shù)如下:
ChannelReceive::ChannelReceive(
Clock* clock,
NetEqFactory* neteq_factory,
AudioDeviceModule* audio_device_module,
Transport* rtcp_send_transport,
RtcEventLog* rtc_event_log,
uint32_t local_ssrc,
uint32_t remote_ssrc,
size_t jitter_buffer_max_packets,
bool jitter_buffer_fast_playout,
int jitter_buffer_min_delay_ms,
bool jitter_buffer_enable_rtx_handling,
bool enable_non_sender_rtt,
rtc::scoped_refptr<AudioDecoderFactory> decoder_factory,
absl::optional<AudioCodecPairId> codec_pair_id,
rtc::scoped_refptr<FrameDecryptorInterface> frame_decryptor,
const webrtc::CryptoOptions& crypto_options,
rtc::scoped_refptr<FrameTransformerInterface> frame_transformer)
: worker_thread_(TaskQueueBase::Current()),
event_log_(rtc_event_log),
rtp_receive_statistics_(ReceiveStatistics::Create(clock)),
remote_ssrc_(remote_ssrc),
acm_receiver_(AcmConfig(neteq_factory,
decoder_factory,
codec_pair_id,
jitter_buffer_max_packets,
jitter_buffer_fast_playout)),
_outputAudioLevel(),
clock_(clock),
ntp_estimator_(clock),
playout_timestamp_rtp_(0),
playout_delay_ms_(0),
rtp_ts_wraparound_handler_(new rtc::TimestampWrapAroundHandler()),
capture_start_rtp_time_stamp_(-1),
capture_start_ntp_time_ms_(-1),
_audioDeviceModulePtr(audio_device_module),
_outputGain(1.0f),
associated_send_channel_(nullptr),
frame_decryptor_(frame_decryptor),
crypto_options_(crypto_options),
absolute_capture_time_interpolator_(clock) {
RTC_DCHECK(audio_device_module);
network_thread_checker_.Detach();
acm_receiver_.ResetInitialDelay();
acm_receiver_.SetMinimumDelay(0);
acm_receiver_.SetMaximumDelay(0);
acm_receiver_.FlushBuffers();
_outputAudioLevel.ResetLevelFullRange();
rtp_receive_statistics_->EnableRetransmitDetection(remote_ssrc_, true);
RtpRtcpInterface::Configuration configuration;
configuration.clock = clock;
configuration.audio = true;
configuration.receiver_only = true;
configuration.outgoing_transport = rtcp_send_transport;
configuration.receive_statistics = rtp_receive_statistics_.get();
configuration.event_log = event_log_;
configuration.local_media_ssrc = local_ssrc;
configuration.rtcp_packet_type_counter_observer = this;
configuration.non_sender_rtt_measurement = enable_non_sender_rtt;
if (frame_transformer)
InitFrameTransformerDelegate(std::move(frame_transformer));
rtp_rtcp_ = ModuleRtpRtcpImpl2::Create(configuration);
rtp_rtcp_->SetSendingMediaStatus(false);
rtp_rtcp_->SetRemoteSSRC(remote_ssrc_);
// Ensure that RTCP is enabled for the created channel.
rtp_rtcp_->SetRTCPStatus(RtcpMode::kCompound);
}
webrtc::voe::(anonymous namespace)::ChannelReceive 對象的構(gòu)造函數(shù)的執(zhí)行過程如下:
- 創(chuàng)建了一個
webrtc::acm2::AcmReceiver對象,建立起了下圖中標(biāo)號為 1 和 2 的這兩條連接; - 創(chuàng)建了一個
webrtc::ModuleRtpRtcpImpl2對象,在創(chuàng)建這個對象時傳入的configuration參數(shù)的outgoing_transport配置項指向了傳入的webrtc::Transport,建立起了下圖中標(biāo)號為 3 和 4 的這兩條連接;

圖中標(biāo)為綠色的模塊為這個階段已經(jīng)接入 webrtc::voe::(anonymous namespace)::ChannelReceive 的模塊,標(biāo)為黃色的則為那些還沒有接進(jìn)來的模塊;實線箭頭表示這個階段已經(jīng)建立的連接,虛線箭頭則表示還沒有建立的連接。
在 ChannelReceive 的 RegisterReceiverCongestionControlObjects() 函數(shù)中,webrtc::PacketRouter 被接進(jìn)來:
#0 webrtc::voe::(anonymous namespace)::ChannelReceive::RegisterReceiverCongestionControlObjects(webrtc::PacketRouter*)
(this=0x61b000008c80, packet_router=0x61c000060908) at webrtc/audio/channel_receive.cc:786
#1 webrtc::internal::AudioReceiveStream::AudioReceiveStream(webrtc::Clock*, webrtc::PacketRouter*, webrtc::AudioReceiveStream::Config const&, rtc::scoped_refptr<webrtc::AudioState> const&, webrtc::RtcEventLog*, std::unique_ptr<webrtc::voe::ChannelReceiveInterface, std::default_delete<webrtc::voe::ChannelReceiveInterface> >)
(this=0x61600005be80, clock=0x602000003bb0, packet_router=0x61c000060908, config=..., audio_state=..., event_log=0x613000011f40, channel_receive=std::unique_ptr<webrtc::voe::ChannelReceiveInterface> = {...}) at webrtc/audio/audio_receive_stream.cc:130
#2 webrtc::internal::AudioReceiveStream::AudioReceiveStream(webrtc::Clock*, webrtc::PacketRouter*, webrtc::NetEqFactory*, webrtc::AudioReceiveStream::Config const&, rtc::scoped_refptr<webrtc::AudioState> const&, webrtc::RtcEventLog*)
(this=0x61600005be80, clock=0x602000003bb0, packet_router=0x61c000060908, neteq_factory=0x0, config=..., audio_state=..., event_log=0x613000011f40)
at webrtc/audio/audio_receive_stream.cc:98
#3 webrtc::internal::Call::CreateAudioReceiveStream(webrtc::AudioReceiveStream::Config const&) (this=0x620000001080, config=...)
at webrtc/call/call.cc:954
這個操作也發(fā)生在 webrtc::AudioReceiveStream 對象創(chuàng)建期間。ChannelReceive 的 RegisterReceiverCongestionControlObjects() 函數(shù)的實現(xiàn)如下:
void ChannelReceive::RegisterReceiverCongestionControlObjects(
PacketRouter* packet_router) {
RTC_DCHECK_RUN_ON(&worker_thread_checker_);
RTC_DCHECK(packet_router);
RTC_DCHECK(!packet_router_);
constexpr bool remb_candidate = false;
packet_router->AddReceiveRtpModule(rtp_rtcp_.get(), remb_candidate);
packet_router_ = packet_router;
}
這里 webrtc::PacketRouter 和 webrtc::ModuleRtpRtcpImpl2 被連接起來,前面圖中標(biāo)號為 5 的這條連接也建立起來了。NetEQ 在需要音頻解碼器時創(chuàng)建音頻解碼器,這個過程這里不再贅述。
這樣 webrtc::AudioReceiveStream 內(nèi)部的數(shù)據(jù)處理管線的狀態(tài)變?yōu)槿缦聢D所示:

webrtc::AudioReceiveStream 的生命周期函數(shù) Start() 被調(diào)用時,webrtc::AudioReceiveStream 被加進(jìn) webrtc::AudioMixer:
#0 webrtc::internal::AudioState::AddReceivingStream(webrtc::AudioReceiveStream*) (this=0x628000004100, stream=0x61600005be80)
at webrtc/audio/audio_state.cc:59
#1 webrtc::internal::AudioReceiveStream::Start() (this=0x61600005be80) at webrtc/audio/audio_receive_stream.cc:201
#2 cricket::WebRtcVoiceMediaChannel::WebRtcAudioReceiveStream::SetPlayout(bool) (this=0x60b000010fd0, playout=true)
at webrtc/media/engine/webrtc_voice_engine.cc:1289
#3 cricket::WebRtcVoiceMediaChannel::SetPlayout(bool) (this=0x619000017c80, playout=true)
at webrtc/media/engine/webrtc_voice_engine.cc:1865
#4 cricket::VoiceChannel::UpdateMediaSendRecvState_w() (this=0x619000018180) at webrtc/pc/channel.cc:811
這樣 webrtc::AudioReceiveStream 的數(shù)據(jù)處理管線就此搭建完成。整個音頻數(shù)據(jù)處理管線的狀態(tài)變?yōu)槿缦聢D所示:

WebRTC 音頻數(shù)據(jù)接收處理的主要過程
WebRTC 音頻數(shù)據(jù)接收處理的實現(xiàn)中,保存從網(wǎng)絡(luò)上接收的音頻數(shù)據(jù)包的緩沖區(qū)為 NetEQ 的 webrtc::PacketBuffer,收到音頻數(shù)據(jù)包并保存進(jìn) NetEQ 的 webrtc::PacketBuffer 的過程如下面這樣:
#0 webrtc::PacketBuffer::InsertPacketList(std::__cxx11::list<webrtc::Packet, std::allocator<webrtc::Packet> >*, webrtc::DecoderDatabase const&, absl::optional<unsigned char>*, absl::optional<unsigned char>*, webrtc::StatisticsCalculator*, unsigned long, unsigned long, int)
(this=0x606000030e60, packet_list=0x7ffff2629810, decoder_database=..., current_rtp_payload_type=0x61600005c5c5, current_cng_rtp_payload_type=0x61600005c5c7, stats=0x61600005c180, last_decoded_length=480, sample_rate=16000, target_level_ms=80)
at webrtc/modules/audio_coding/neteq/packet_buffer.cc:216
#1 webrtc::NetEqImpl::InsertPacketInternal(webrtc::RTPHeader const&, rtc::ArrayView<unsigned char const, -4711l>)
(this=0x61600005c480, rtp_header=..., payload=...) at webrtc/modules/audio_coding/neteq/neteq_impl.cc:690
#2 webrtc::NetEqImpl::InsertPacket(webrtc::RTPHeader const&, rtc::ArrayView<unsigned char const, -4711l>)
(this=0x61600005c480, rtp_header=..., payload=...) at webrtc/modules/audio_coding/neteq/neteq_impl.cc:170
#3 webrtc::acm2::AcmReceiver::InsertPacket(webrtc::RTPHeader const&, rtc::ArrayView<unsigned char const, -4711l>)
(this=0x61b000008e48, rtp_header=..., incoming_payload=...) at webrtc/modules/audio_coding/acm2/acm_receiver.cc:136
#4 webrtc::voe::(anonymous namespace)::ChannelReceive::OnReceivedPayloadData(rtc::ArrayView<unsigned char const, -4711l>, webrtc::RTPHeader const&) (this=0x61b000008c80, payload=..., rtpHeader=...) at webrtc/audio/channel_receive.cc:340
#5 webrtc::voe::(anonymous namespace)::ChannelReceive::ReceivePacket(unsigned char const*, unsigned long, webrtc::RTPHeader const&)
(this=0x61b000008c80, packet=0x60700002b670 "\220\357\037\261\377\364?\a\350\224\177\276", <incomplete sequence \336>, packet_length=67, header=...) at webrtc/audio/channel_receive.cc:719
#6 webrtc::voe::(anonymous namespace)::ChannelReceive::OnRtpPacket(webrtc::RtpPacketReceived const&)
(this=0x61b000008c80, packet=...) at webrtc/audio/channel_receive.cc:669
#7 webrtc::RtpDemuxer::OnRtpPacket(webrtc::RtpPacketReceived const&) (this=0x620000001330, packet=...)
at webrtc/call/rtp_demuxer.cc:249
#8 webrtc::RtpStreamReceiverController::OnRtpPacket(webrtc::RtpPacketReceived const&)
(this=0x6200000012d0, packet=...) at webrtc/call/rtp_stream_receiver_controller.cc:52
#9 webrtc::internal::Call::DeliverRtp(webrtc::MediaType, rtc::CopyOnWriteBuffer, long) (this=
0x620000001080, media_type=webrtc::MediaType::AUDIO, packet=..., packet_time_us=1654829839622021)
at webrtc/call/call.cc:1606
#10 webrtc::internal::Call::DeliverPacket(webrtc::MediaType, rtc::CopyOnWriteBuffer, long)
(this=0x620000001080, media_type=webrtc::MediaType::AUDIO, packet=..., packet_time_us=1654829839622021)
at webrtc/call/call.cc:1637
#11 cricket::WebRtcVoiceMediaChannel::OnPacketReceived(rtc::CopyOnWriteBuffer, long)::$_2::operator()() const
(this=0x606000074c68) at webrtc/media/engine/webrtc_voice_engine.cc:2229
播放時,webrtc::AudioDeviceModule 最終會向 NetEQ 請求 PCM 數(shù)據(jù),此時 NetEQ 會從 webrtc::PacketBuffer 中取出數(shù)據(jù)包并解碼。網(wǎng)絡(luò)中傳輸?shù)囊纛l數(shù)據(jù)包中包含的音頻采樣點和 webrtc::AudioDeviceModule 每次請求的音頻采樣點不一定是完全相同的,比如采樣率為 48kHz 的音頻,webrtc::AudioDeviceModule 每次請求 10ms 的數(shù)據(jù),也就是 480 個采樣點,而 OPUS 音頻編解碼器每個編碼幀中包含 20ms 的數(shù)據(jù),也就是 960 個采樣點,這樣 NetEQ 返回 webrtc::AudioDeviceModule 每次請求的采樣點之后,可能會有解碼音頻數(shù)據(jù)的剩余,這需要一個專門的 PCM 數(shù)據(jù)緩沖區(qū)。這個數(shù)據(jù)緩沖區(qū)為 NetEQ 的 webrtc::SyncBuffer。
webrtc::AudioDeviceModule 請求播放數(shù)據(jù)的大體過程如下面這樣:
#0 webrtc::SyncBuffer::GetNextAudioInterleaved (this=0x606000062a80, requested_len=480, output=0x628000010110)
at webrtc/modules/audio_coding/neteq/sync_buffer.cc:86
#1 webrtc::NetEqImpl::GetAudioInternal (this=0x61600005c480, audio_frame=0x628000010110, muted=0x7fffdc92a990, action_override=...)
at webrtc/modules/audio_coding/neteq/neteq_impl.cc:939
#2 webrtc::NetEqImpl::GetAudio (this=0x61600005c480, audio_frame=0x628000010110, muted=0x7fffdc92a990, current_sample_rate_hz=0x7fffdcc933b0,
action_override=...) at webrtc/modules/audio_coding/neteq/neteq_impl.cc:239
#3 webrtc::acm2::AcmReceiver::GetAudio (this=0x61b000008e48, desired_freq_hz=48000, audio_frame=0x628000010110, muted=0x7fffdc92a990)
at webrtc/modules/audio_coding/acm2/acm_receiver.cc:151
#4 webrtc::voe::(anonymous namespace)::ChannelReceive::GetAudioFrameWithInfo (this=0x61b000008c80, sample_rate_hz=48000,
audio_frame=0x628000010110) at webrtc/audio/channel_receive.cc:388
#5 webrtc::internal::AudioReceiveStream::GetAudioFrameWithInfo (this=0x61600005be80, sample_rate_hz=48000, audio_frame=0x628000010110)
at webrtc/audio/audio_receive_stream.cc:393
#6 webrtc::AudioMixerImpl::GetAudioFromSources (this=0x61d000021280, output_frequency=48000)
at webrtc/modules/audio_mixer/audio_mixer_impl.cc:205
#7 webrtc::AudioMixerImpl::Mix (this=0x61d000021280, number_of_channels=2, audio_frame_for_mixing=0x6280000042e8)
at webrtc/modules/audio_mixer/audio_mixer_impl.cc:175
#8 webrtc::AudioTransportImpl::NeedMorePlayData (this=0x6280000041e0, nSamples=441, nBytesPerSample=4, nChannels=2, samplesPerSec=44100,
audioSamples=0x61c000080080, nSamplesOut=@0x7fffdc929c00: 0, elapsed_time_ms=0x7fffdc929cc0, ntp_time_ms=0x7fffdc929ce0)
at webrtc/audio/audio_transport_impl.cc:215
#9 webrtc::AudioDeviceBuffer::RequestPlayoutData (this=0x614000010058, samples_per_channel=441)
at webrtc/modules/audio_device/audio_device_buffer.cc:303
#10 webrtc::AudioDeviceLinuxPulse::PlayThreadProcess (this=0x61900000ff80)
at webrtc/modules/audio_device/linux/audio_device_pulse_linux.cc:2106
再來看 WebRTC 的音頻數(shù)據(jù)處理、編碼和發(fā)送過程
更加仔細(xì)地審視 WebRTC 的音頻數(shù)據(jù)處理、編碼和發(fā)送過程,更完整地將網(wǎng)絡(luò)對抗考慮進(jìn)來, WebRTC 的音頻數(shù)據(jù)處理、編碼和發(fā)送過程,及相關(guān)模塊如下圖:

在 WebRTC 的音頻數(shù)據(jù)處理、編碼和發(fā)送過程中,編碼器對于網(wǎng)絡(luò)對抗起著巨大的作用。WebRTC 通過一個名為 audio network adapter (ANA) 的模塊,根據(jù)網(wǎng)絡(luò)狀況,對編碼過程進(jìn)行調(diào)節(jié)。
pacing 模塊平滑地將媒體數(shù)據(jù)發(fā)送到網(wǎng)絡(luò),擁塞控制 congestion control 模塊通過影響 pacing 模塊來影響媒體數(shù)據(jù)發(fā)送的過程,以達(dá)到控制擁塞的目的。
WebRTC 的音頻網(wǎng)絡(luò)對抗概述
由 WebRTC 的音頻采集、處理、編碼和發(fā)送過程,及音頻的接收、解碼、處理及播放過程,可以粗略梳理出 WebRTC 的音頻網(wǎng)絡(luò)對抗的復(fù)雜機(jī)制:
- OPUS audio codec:OPUS 支持的音頻編碼配置有帶內(nèi) FEC,DTX,CBR/VBR,碼率等。
- RED。
- audio network adapter (ANA),ANA 通過根據(jù)網(wǎng)絡(luò)狀況,影響編碼過程來做網(wǎng)絡(luò)對抗,主要用在 OPUS 編碼器中。ANA 可以影響編碼過程的 5 個參數(shù):
- 帶內(nèi) FEC,OPUS 編碼器可以生成帶內(nèi) FEC,當(dāng)有丟包時,可以通過 FEC 信息部分恢復(fù)丟失的信息,盡管 FEC 的信息質(zhì)量可能不是很高;用來抗丟包;
- DTX,當(dāng)要編碼的數(shù)據(jù)長期為空數(shù)據(jù)時,可以生成 DTX 包來降低碼率,這種機(jī)制可能會導(dǎo)致延遲變大;
- 碼率;
- 幀長度,OPUS 支持從 10ms 到 120 ms 的編碼幀長度;
- 通道數(shù)。
- pacing,數(shù)據(jù)包的平滑發(fā)送。
- congestion_controller/goog_cc,擁塞控制探測網(wǎng)絡(luò)狀況,并通過影響 pacing 來影響發(fā)送節(jié)奏。
- NACK,丟包時,接收端請求發(fā)送端重傳部分?jǐn)?shù)據(jù)包;NACK 列表由 NetEQ 維護(hù)。
- Jitter buffer,重排序數(shù)據(jù)包,抗網(wǎng)絡(luò)抖動。NetEQ 保存接收的音頻網(wǎng)絡(luò)數(shù)據(jù)包的地方。
- PLC,丟包時,生成丟失的數(shù)據(jù)。由 NetEQ 執(zhí)行。
沒看到 WebRTC 有音頻帶外 FEC 機(jī)制的實現(xiàn)。
參考文章
干貨|一文讀懂騰訊會議在復(fù)雜網(wǎng)絡(luò)下如何保證高清音頻
Done.