Apple是如何定義Audio的
In Core Audio, the following definitions apply:
- An audio stream is a continuous series of data that represents a sound, such as a song.
- A channel is a discrete track of monophonic audio. A monophonic stream has one channel; a stereo stream has two channels.
- A sample is single numerical value for a single audio channel in an audio stream.
- A frame is a collection of time-coincident samples. For instance, a linear PCM stereo sound file has two samples per frame, one for the left channel and one for the right channel.
- A packet is a collection of one or more contiguous frames. A packet defines the smallest meaningful set of frames for a given audio data format, and is the smallest data unit for which time can be measured. In linear PCM audio, a packet holds a single frame. In compressed formats, it typically holds more; in some formats, the number of frames per packet varies.
- The sample rate for a stream is the number of frames per second of uncompressed (or, for compressed formats, the equivalent in decompressed) audio.
AudioStreamBasicDescription 結(jié)構(gòu)
struct AudioStreamBasicDescription
{
Float64 mSampleRate;
AudioFormatID mFormatID;
AudioFormatFlags mFormatFlags;
UInt32 mBytesPerPacket;
UInt32 mFramesPerPacket;
UInt32 mBytesPerFrame;
UInt32 mChannelsPerFrame;
UInt32 mBitsPerChannel;
UInt32 mReserved;
};
typedef struct AudioStreamBasicDescription AudioStreamBasicDescription;
PCM時采樣頻率叫做sample rate。
每一次采樣可以得到若干采樣數(shù)據(jù),對應(yīng)多個channel。
每一個采樣點得到的若干采樣數(shù)據(jù)組合起來,叫做一個frame。
若干frame組合起來叫做一個packet。
AudioStreamBasicDescription 各字段的含義
mSampleRate
- 采樣率,表示錄音設(shè)備在單位時間內(nèi)對聲音信號進行了多少次采樣,常用的采樣率有 16000 32000 44100 等
AudioFormatID
采樣數(shù)據(jù)的類型,PCM,AAC等
kAudioFormatLinearPCM = 'lpcm',
kAudioFormatMPEG4AAC = 'aac ',
kAudioFormatMPEGLayer3 = '.mp3',
mFormatFlags
描述AudioBufferList的格式
kAudioFormatFlagIsFloat = (1U << 0), // 0x1
kAudioFormatFlagIsBigEndian = (1U << 1), // 0x2
kAudioFormatFlagIsSignedInteger = (1U << 2), // 0x4
kAudioFormatFlagIsPacked = (1U << 3), // 0x8
kAudioFormatFlagIsAlignedHigh = (1U << 4), // 0x10
kAudioFormatFlagIsNonInterleaved = (1U << 5), // 0x20
kAudioFormatFlagIsNonMixable = (1U << 6), // 0x40
kAudioFormatFlagIsFloat
是否是浮點數(shù), 沒有設(shè)置,默認是 int 類型
kAudioFormatFlagIsBigEndian
是否是大端, 沒有設(shè)置,默認是小端
kAudioFormatFlagIsSignedInteger
是否是 signed int, 沒有設(shè)置,默認是 unsigned int
kAudioFormatFlagIsPacked
是否mBitsPerChannel 會占滿整個通道,如果沒有占滿, 就會依高位對齊或低位對齊。
沒有設(shè)置的時候,滿足 ((mBitsPerSample / 8) * mChannelsPerFrame) == mBytesPerFrame 的條件,默認會設(shè)置此選項。
kAudioFormatFlagIsNonInterleaved
設(shè)置 是否是平面類型,是否是交錯類型。
音頻數(shù)據(jù)的layout是分交錯布局和平面布局,一個雙聲道音頻數(shù)據(jù)為例則數(shù)據(jù)有兩種布局的可能
- 交錯布局:LRLRLR...
- 平面布局:
- 平面1 LLLLLL...
- 平面2 RRRRRR...
mChannelsPerFrame
描述音頻文件的聲道數(shù)。 單聲道 1 雙聲道 2 。這個值不能為0
mBitsPerChannel
每個音頻樣本的bit位數(shù),1byte = 8bit,一般值為 8 16 32
mBytesPerFrame
每一音頻幀中的字節(jié)數(shù)
計算方法
- 交錯布局: mBytesPerFrame = mBitsPerChannel / 8 * mBitsPerChannel
- 平面布局: mBytesPerFrame = mBitsPerChannel / 8
mFramesPerPacket
一個數(shù)據(jù)包中的幀數(shù),每個packet的幀數(shù)。如果是未壓縮的音頻數(shù)據(jù),值是1。動態(tài)幀率格式,這個值是一個較大的固定數(shù)字,比如說AAC的1024。如果是動態(tài)大小幀數(shù)(比如Ogg格式)設(shè)置為0。
mBytesPerPacket
一個數(shù)據(jù)包中的字節(jié)數(shù),mBytesPerPacket = mBytesPerFrame * mFramesPerPacket
mReserved
填充結(jié)構(gòu)以強制統(tǒng)一 8 字節(jié)對齊。必須設(shè)置為 0