版本記錄
| 版本號(hào) | 時(shí)間 |
|---|---|
| V1.0 | 2017.12.31 |
前言
AAC(Advanced Audio Coding),中文名:高級(jí)音頻編碼,出現(xiàn)于1997年,基于MPEG-2的音頻編碼技術(shù)。由Fraunhofer IIS、杜比實(shí)驗(yàn)室、AT&T、Sony等公司共同開(kāi)發(fā),目的是取代MP3格式。2000年,MPEG-4標(biāo)準(zhǔn)出現(xiàn)后,AAC重新集成了其特性,加入了SBR技術(shù)和PS技術(shù),為了區(qū)別于傳統(tǒng)的
MPEG-2 AAC又稱(chēng)為MPEG-4 AAC。
PCM編碼
脈沖編碼調(diào)制(Pulse Code Modulation,PCM),由A.里弗斯于1937年提出的,這一概念為數(shù)字通信奠定了基礎(chǔ),60年代它開(kāi)始應(yīng)用于市內(nèi)電話(huà)網(wǎng)以擴(kuò)充容量,使已有音頻電纜的大部分芯線的傳輸容量擴(kuò)大24~48倍。到70年代中、末期,各國(guó)相繼把脈碼調(diào)制成功地應(yīng)用于同軸電纜通信、微波接力通信、衛(wèi)星通信和光纖通信等中、大容量傳輸系統(tǒng)。80年代初,脈沖編碼調(diào)制已用于市話(huà)中繼傳輸和大容量干線傳輸以及數(shù)字程控交換機(jī),并在用戶(hù)話(huà)機(jī)中采用。
在光纖通信系統(tǒng)中,光纖中傳輸?shù)氖?a target="_blank" rel="nofollow">二進(jìn)制光脈沖“0”碼和“1”碼,它由二進(jìn)制數(shù)字信號(hào)對(duì)光源進(jìn)行通斷調(diào)制而產(chǎn)生。而數(shù)字信號(hào)是對(duì)連續(xù)變化的模擬信號(hào)進(jìn)行抽樣、量化和編碼產(chǎn)生的,稱(chēng)為PCM(Pulse-code modulation),即脈沖編碼調(diào)制。這種電的數(shù)字信號(hào)稱(chēng)為數(shù)字基帶信號(hào),由PCM電端機(jī)產(chǎn)生。現(xiàn)在的數(shù)字傳輸系統(tǒng)都是采用脈碼調(diào)制(Pulse-code modulation)體制。PCM最初并非傳輸計(jì)算機(jī)數(shù)據(jù)用的,而是使交換機(jī)之間有一條中繼線不是只傳送一條電話(huà)信號(hào)。PCM有兩個(gè)標(biāo)準(zhǔn)(表現(xiàn)形式)即E1和T1。
中國(guó)采用的是歐洲的E1標(biāo)準(zhǔn)。T1的速率是1.544Mbit/s,E1的速率是2.048Mbit/s。
脈沖編碼調(diào)制可以向用戶(hù)提供多種業(yè)務(wù),既可以提供從2M到155M速率的數(shù)字?jǐn)?shù)據(jù)專(zhuān)線業(yè)務(wù),也可以提供話(huà)音、圖象傳送、遠(yuǎn)程教學(xué)等其他業(yè)務(wù)。特別適用于對(duì)數(shù)據(jù)傳輸速率要求較高,需要更高帶寬的用戶(hù)使用。
自然界中的聲音非常復(fù)雜,波形極其復(fù)雜,通常我們采用的是脈沖代碼調(diào)制編碼,即PCM編碼。PCM通過(guò)抽樣、量化、編碼三個(gè)步驟將連續(xù)變化的模擬信號(hào)轉(zhuǎn)換為數(shù)字編碼。
- 抽樣:對(duì)模擬信號(hào)進(jìn)行周期性?huà)呙?,把時(shí)間上連續(xù)的信號(hào)變成時(shí)間上離散的信號(hào);
- 量化:用一組規(guī)定的電平,把瞬時(shí)抽樣值用最接近的電平值來(lái)表示,通常是用二進(jìn)制表示;
- 編碼:用一組二進(jìn)制碼組來(lái)表示每一個(gè)有固定電平的量化值;
iOS中AAC編碼情況
iOS平臺(tái)支持AAC編碼器,主要使用AudioToolbox中的AudioConverter API。之所以做AAC編碼器是因?yàn)樵谧鲆粋€(gè)HLS的功能,HLS要求的TS文件,需要視頻采用H264編碼,音頻采用AAC編碼。H264可以使用硬件或軟件編碼器,前面已經(jīng)介紹。AAC也可以使用硬件或者軟件編碼,iOS全都支持。
AAC是一種專(zhuān)為聲音數(shù)據(jù)設(shè)計(jì)的文件壓縮格式。與MP3不同,它采用了全新的算法進(jìn)行編碼,更加高效,具有更高的“性?xún)r(jià)比”。利用AAC格式,可使人感覺(jué)聲音質(zhì)量沒(méi)有明顯降低的前提下,更加小巧。蘋(píng)果ipod、諾基亞手機(jī)支持AAC格式的音頻文件。
- 優(yōu)點(diǎn):相對(duì)于mp3,AAC格式的音質(zhì)更佳,文件更小。
- 不足:AAC屬于有損壓縮的格式,與時(shí)下流行的APE、FLAC等無(wú)損格式相比音質(zhì)存在“本質(zhì)上”的差距。加之,傳輸速度更快的USB3.0和16G以上大容量MP3正在加速普及,也使得AAC頭上“小巧”的光環(huán)不復(fù)存在。
iOS上把PCM音頻編碼成AAC音頻流
- 設(shè)置編碼器(codec),并開(kāi)始錄制;
- 收集到PCM數(shù)據(jù),傳給編碼器;
- 編碼完成回調(diào)callback,寫(xiě)入文件。
具體原理如下所示:

創(chuàng)建并配置AVCaptureSession
創(chuàng)建AVCaptureSession,然后找到音頻的AVCaptureDevice,根據(jù)音頻device創(chuàng)建輸入并添加到session,最后添加output到session。audioFileHandle是NSFileHandle,用戶(hù)寫(xiě)入編碼后的AAC音頻到文件。
- (void)startCapture
{
self.mCaptureSession = [[AVCaptureSession alloc] init];
mCaptureQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
mEncodeQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
AVCaptureDevice *audioDevice = [[AVCaptureDevice devicesWithMediaType:AVMediaTypeAudio] lastObject];
self.mCaptureAudioDeviceInput = [[AVCaptureDeviceInput alloc] initWithDevice:audioDevice error:nil];
if ([self.mCaptureSession canAddInput:self.mCaptureAudioDeviceInput]) {
[self.mCaptureSession addInput:self.mCaptureAudioDeviceInput];
}
self.mCaptureAudioOutput = [[AVCaptureAudioDataOutput alloc] init];
if ([self.mCaptureSession canAddOutput:self.mCaptureAudioOutput]) {
[self.mCaptureSession addOutput:self.mCaptureAudioOutput];
}
[self.mCaptureAudioOutput setSampleBufferDelegate:self queue:mCaptureQueue];
NSString *audioFile = [[NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES) lastObject] stringByAppendingPathComponent:@"abc.aac"];
[[NSFileManager defaultManager] removeItemAtPath:audioFile error:nil];
[[NSFileManager defaultManager] createFileAtPath:audioFile contents:nil attributes:nil];
audioFileHandle = [NSFileHandle fileHandleForWritingAtPath:audioFile];
[self.mCaptureSession startRunning];
}
Converter的創(chuàng)建
創(chuàng)建一個(gè)Converter,也就是一個(gè)AAC Encoder,輸入?yún)?shù)分別是源和目的的數(shù)據(jù)格式。在AAC編碼的場(chǎng)景下,源格式就是采集到的PCM數(shù)據(jù),目的格式就是AAC。
extern OSStatus
AudioConverterNew( const AudioStreamBasicDescription* inSourceFormat,
const AudioStreamBasicDescription* inDestinationFormat,
AudioConverterRef* outAudioConverter) __OSX_AVAILABLE_STARTING(__MAC_10_1,__IPHONE_2_0);
AudioStreamBasicDescription inAudioStreamBasicDescription;
FillOutASBDForLPCM()
inAudioStreamBasicDescription.mFormatID = kAudioFormatLinearPCM;
inAudioStreamBasicDescription.mSampleRate = 44100;
inAudioStreamBasicDescription.mBitsPerChannel = 16;
inAudioStreamBasicDescription.mFramesPerPacket = 1;
inAudioStreamBasicDescription.mBytesPerFrame = 2;
inAudioStreamBasicDescription.mBytesPerPacket = inAudioStreamBasicDescription.mBytesPerFrame * inAudioStreamBasicDescription.mFramesPerPacket;
inAudioStreamBasicDescription.mChannelsPerFrame = 1;
inAudioStreamBasicDescription.mFormatFlags = kLinearPCMFormatFlagIsPacked | kLinearPCMFormatFlagIsSignedInteger | kLinearPCMFormatFlagIsNonInterleaved;
inAudioStreamBasicDescription.mReserved = 0;
AudioStreamBasicDescription outAudioStreamBasicDescription = {0};
// Always initialize the fields of a new audio stream basic description structure to zero, as shown here: ...
outAudioStreamBasicDescription.mChannelsPerFrame = 1;
outAudioStreamBasicDescription.mFormatID = kAudioFormatMPEG4AAC;
UInt32 size = sizeof(outAudioStreamBasicDescription);
AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &size, &outAudioStreamBasicDescription);
OSStatus status = AudioConverterNew(&inAudioStreamBasicDescription, &outAudioStreamBasicDescription, &_audioConverter);
if(status != 0)
{
NSLog(@"setup converter failed: %d", (int)status);
}
這樣就創(chuàng)建了AAC編碼器,默認(rèn)情況下,Apple會(huì)創(chuàng)建一個(gè)硬件編碼器,如果硬件不可用,會(huì)創(chuàng)建軟件編碼器。硬件AAC編碼器的編碼時(shí)延很高,需要buffer大約2秒的數(shù)據(jù)才會(huì)開(kāi)始編碼。而軟件編碼器的編碼時(shí)延就是正常的,只要喂給1024個(gè)樣點(diǎn),就會(huì)開(kāi)始編碼。
指定使用軟件編碼器
如何使用指定的軟件編碼器。
- (AudioClassDescription *)getAudioClassDescriptionWithType:(UInt32)type
fromManufacturer:(UInt32)manufacturer
{
static AudioClassDescription desc;
UInt32 encoderSpecifier = type;
OSStatus st;
UInt32 size;
st = AudioFormatGetPropertyInfo(kAudioFormatProperty_Encoders,
sizeof(encoderSpecifier),
&encoderSpecifier,
&size);
if (st) {
NSLog(@"error getting audio format propery info: %d", (int)(st));
return nil;
}
unsigned int count = size / sizeof(AudioClassDescription);
AudioClassDescription descriptions[count];
st = AudioFormatGetProperty(kAudioFormatProperty_Encoders,
sizeof(encoderSpecifier),
&encoderSpecifier,
&size,
descriptions);
if (st) {
NSLog(@"error getting audio format propery: %d", (int)(st));
return nil;
}
for (unsigned int i = 0; i < count; i++) {
if ((type == descriptions[i].mSubType) &&
(manufacturer == descriptions[i].mManufacturer)) {
memcpy(&desc, &(descriptions[i]), sizeof(desc));
return &desc;
}
}
return nil;
}
AudioClassDescription *desc = [self getAudioClassDescriptionWithType:kAudioFormatMPEG4AAC fromManufacturer:kAppleSoftwareAudioCodecManufacturer];
OSStatus status = AudioConverterNewSpecific(&inAudioStreamBasicDescription, &outAudioStreamBasicDescription, 1, desc, &_audioConverter);
設(shè)置編碼碼率參數(shù)
UInt32 ulBitRate = 64000;
UInt32 ulSize = sizeof(ulBitRate);
status = AudioConverterSetProperty(_audioConverter, kAudioConverterEncodeBitRate, ulSize, &ulBitRate);
AAC并不是隨便的碼率都可以支持。比如如果PCM采樣率是44100KHz,那么碼率可以設(shè)置64000bps,如果是16K,可以設(shè)置為32000bps。
獲取編碼器最大輸出
UInt32 value = 0;
size = sizeof(value);
AudioConverterGetProperty(_audioConverter, kAudioConverterPropertyMaximumOutputPacketSize, &size, &value);
開(kāi)始編碼
獲取出來(lái)的Value表示編碼器最大輸出的包大小。
然后調(diào)用AudioConverterFillCOmplexBuffer進(jìn)行編碼。
AudioBufferList outAudioBufferList = {0};
outAudioBufferList.mNumberBuffers = 1;
outAudioBufferList.mBuffers[0].mNumberChannels = 1;
outAudioBufferList.mBuffers[0].mDataByteSize = value;//value是上面查詢(xún)到的值
outAudioBufferList.mBuffers[0].mData = new int8[value];
UInt32 ioOutputDataPacketSize = 1;
status = AudioConverterFillComplexBuffer(_audioConverter, inInputDataProc, (__bridge voidvoid *)(self), &ioOutputDataPacketSize, &outAudioBufferList, NULL);
編碼接口中,inInputDataProc是一個(gè)輸入數(shù)據(jù)的回調(diào)函數(shù)。用來(lái)喂PCM數(shù)據(jù)給Converter,ioOutputDataPacketSize為1表示編碼產(chǎn)生1幀數(shù)據(jù)即返回。outAudioBufferList用來(lái)存放編碼后的數(shù)據(jù)。
inInputDataProc中的處理如下:
static OSStatus inInputDataProc(AudioConverterRef inAudioConverter, UInt32 *ioNumberDataPackets, AudioBufferList *ioData, AudioStreamPacketDescription **outDataPacketDescription, voidvoid *inUserData)
{
AACEncoder *encoder = (__bridge AACEncoder *)(inUserData);
UInt32 requestedPackets = *ioNumberDataPackets;
uint8_t *buffer;
uint32_t bufferLength = requestedPackets * 2;
uint32_t bufferRead;
bufferRead = [encoder.pcmPool readBuffer:&buffer withLength:bufferLength];
if (bufferRead == 0) {
*ioNumberDataPackets = 0;
return -1;
}
ioData->mBuffers[0].mData = buffer;
ioData->mBuffers[0].mDataByteSize = bufferRead;
ioData->mNumberBuffers = 1;
ioData->mBuffers[0].mNumberChannels = 1;
*ioNumberDataPackets = bufferRead >> 1;
return noErr;
}
添加ADTS頭
AAC音頻格式有ADIF和ADTS:
-
ADIF:Audio Data Interchange Format音頻數(shù)據(jù)交換格式。這種格式的特征是可以確定的找到這個(gè)音頻數(shù)據(jù)的開(kāi)始,不需進(jìn)行在音頻數(shù)據(jù)流中間開(kāi)始的解碼,即它的解碼必須在明確定義的開(kāi)始處進(jìn)行。故這種格式常用在磁盤(pán)文件中。 -
ADTS:Audio Data Transport Stream音頻數(shù)據(jù)傳輸流。這種格式的特征是它是一個(gè)有同步字的比特流,解碼可以在這個(gè)流中任何位置開(kāi)始。它的特征類(lèi)似于mp3數(shù)據(jù)流格式。
AudioConverterFillComplexBuffer返回的是AAC原始碼流,需要在AAC每幀添加ADTS頭,調(diào)用adtsDataForPacketLength方法生成,最后把數(shù)據(jù)寫(xiě)入audioFileHandle的文件。
對(duì)于TS文件來(lái)說(shuō),每個(gè)AAC數(shù)據(jù)需要增加一個(gè)adts頭,adts頭是一個(gè)7bit的數(shù)據(jù),通過(guò)adts可以得知AAC數(shù)據(jù)的編碼參數(shù),方便解碼器進(jìn)行解碼。adts頭的計(jì)算方法如下:
- (NSData*) adtsDataForPacketLength:(NSUInteger)packetLength
{
int adtsLength = 7;
charchar *packet = (charchar *)malloc(sizeof(char) * adtsLength);
// Variables Recycled by addADTStoPacket
int profile = 2; //AAC LC
//39=MediaCodecInfo.CodecProfileLevel.AACObjectELD;
int freqIdx = 8; //16KHz
int chanCfg = 1; //MPEG-4 Audio Channel Configuration. 1 Channel front-center
NSUInteger fullLength = adtsLength + packetLength;
// fill in ADTS data
packet[0] = (char)0xFF; // 11111111 = syncword
packet[1] = (char)0xF9; // 1111 1 00 1 = syncword MPEG-2 Layer CRC
packet[2] = (char)(((profile-1)<<6) + (freqIdx<<2) +(chanCfg>>2));
packet[3] = (char)(((chanCfg&3)<<6) + (fullLength>>11));
packet[4] = (char)((fullLength&0x7FF) >> 3);
packet[5] = (char)(((fullLength&7)<<5) + 0x1F);
packet[6] = (char)0xFC;
NSData *data = [NSData dataWithBytesNoCopy:packet length:adtsLength freeWhenDone:YES];
return data;
}
后記
未完,待續(xù)~~~
