[工作]音频压缩

[工作]音频压缩[工作]音频压缩音频压缩 CD音质的音频信号需要1.411Mbps传输带宽。如果要使网络传输成为现实，显然需要进行实质性压缩。各种各样的音频压缩算法开发出来了，MPEG音频算法或许是最流行的一种。MPEG算法分为三层，其中MP3最有效，最著名。在互联网上，人们可以获得大量MP3格式的音乐。不过，并非所有这些音乐都是合法取得的;因此，就出现了大量由艺术家和版权所有人提起的诉讼。 MP3是MPEG视频压缩标准中的音频部分。音频压缩可以通过两种方式实现。在“波形编码”中，用“傅里叶变换”这种数学方法将波形信号变换为频...

[工作]音频压缩音频压缩 CD音质的音频信号需要1.411Mbps传输带宽。如果要使网络传输成为现实，显然需要进行实质性压缩。各种各样的音频压缩算法开发出来了，MPEG音频算法或许是最流行的一种。MPEG算法分为三层，其中MP3最有效，最著名。在互联网上，人们可以获得大量MP3格式的音乐。不过，并非所有这些音乐都是合法取得的;因此，就出现了大量由艺术家和版权所有人提起的诉讼。 MP3是MPEG视频压缩标准中的音频部分。音频压缩可以通过两种方式实现。在“波形编码”中，用“傅里叶变换”这种数学方法将波形信号变换为频率分量。每个分量的幅度采用最小方式进行编码，目的是用尽可能少的比特数在另一端准确重现波形。另一种方法就是“知觉编码”。这种方法利用了人类听觉系统的缺陷，采用让人耳听不出来差别的方式对信号进行编码，尽管在示波器上观看重放波形差别很大。知觉编码技术是建立在“心理声学”基础之上的。“心理声学”研究的是人类感知声音的方式。而MP3建立在知觉编码基础之上。知觉编码的一个关键特性是:一些声音可以掩蔽另一些声音。想象一下，在一个温暖的夏日，你正在收听长笛演奏会的实况广播。突然，附近有一组工人开动了手提钻，并开始切割街道路面。谁也听不见长笛的声音了，它的声音被手提钻的声音掩盖了。从传输的角度看，现在只要对手提钻所在的频率进行编码就足够了，因为听众再也无法听到长笛声。这种现象称作“频率掩蔽”——某个频率上响度较大的声音能掩盖另一个频率上响度较小的声音。假如响度大的声音不存在的话，这个响度小的声音本来是可以听到的。事实上，即使在手提钻停止工作后的一小段时间内，还是听不到长笛的。因为手提钻开始工作的时候，人耳调低了其增益;而将人耳增益再次调高需要一段时间。这个效应称作“暂时掩蔽”。为了使这些效应更加量化，想象一下实验1.在安静的房间里，一个人将耳机连至计算机的声卡上，计算机产生一个小功率100Hz纯净正弦波，正弦波的功率在缓慢增加。这个人被告知:当他听到这个音的时候就敲击一下键。计算机记录着当前的功率电平，然后在200Hz，300Hz和其他频率上重复这个实验，直到人耳听觉的极限。在对多个实验结果进行平均之后，就得到了一张和图20.1() 相像的有关“具备多大功率的单音才能被听到”的对数——对数图。从该曲线中，直接可以得出如下结论:对“功率在可听门限以下”的频率成分进行编码是绝对没有必要的。例如，在图20.1()中，如果100Hz频率信号的功率为20dB，那么这个信号就可以从输出信号中略去，却不会出现可察觉的音质降低;因为100Hz上20dB的功率在可听电平之下。现在考察一下实验(2)。计算机再次运行实验(1)，但这一次用一个固定幅度正弦波叠加到测试频率上。我们发现频率位于150Hz附近的信号的可听门限提高了。这次观测可以得到如下结果:通过跟踪哪些信号会被临近频带更强的信号屏蔽，我们就可以在编码信号中忽略更多的频率成分，从而节省了数据位数。在图20.1中，输出完全可以忽略125Hz信号，而没有人能听出差别;甚至某个频带上的一个强信号消失了，但由于“暂时掩蔽”特性，在接下来一段恢复期内也可以忽略被屏蔽的频率。MP3算法的核心就是利用傅里叶变换获得声音在每个频率上的功率，然后输出那些不被屏蔽的频率，并用尽可能少的比特数对其进行编码。有了这些背景知识，我们现在可以看一下MP3编码是如何进行的。音频压缩使用32kHz、44.1 kHz或48 kHz对波形进行采样。采样可以是单声道的，也可以是双声道的，并且可以选用如下配置之一: 1( 单声道 2( 双单声道 3( 非联合立体声 4( 联合立体声首先，要选择输出数据率。MP3能将一张摇滚乐CD压缩至96kbps，而几乎没有课觉察的音质下降;即便是摇滚乐爱好者也听不出音质下降。对于钢琴音乐会而言，至少需要128kbps。两个数据率的不同源于摇滚乐的“信噪比”远远高于音乐会。也可以选择更低的输出数据率，但音质上会出现一些下降。在这之后，样本以1152为一组进行处理。每组样本首先通过32个数字滤波器，从而得到32个频带。同时，输入信号进入心理学模型已决定被屏蔽的频率。下一步，32频带中的每个频带进一步变换得以更好的频谱分辨率。再下一步，将可用的比特数分配给每个频带，谱功率大的“未屏蔽”频带分配到较多的比特数，谱功率小的“未屏蔽”频带分配到较少的比特数，而完全被屏蔽的频带不分配比特数。最后，用“哈弗曼编码”方法对这些数据进行编码。“哈弗曼编码” 将短码字分配给出现频繁的数据，而将长码字分配给出现不频繁的数据。事实上，还不止这些，还有各种不同的技术用来进行噪声消除、抗混叠和声道间冗余的挖掘，到时这些内容已超出了本书的范围。 udio compression CD-qulity udio signls need to 1.411Mbps bndwidth. If you wnt the network to become relity, cler need for substntil compression. wide rnge of udio compression lgorithms developed out, MPEG udio lgorithm is perhps the most populr one. MPEG lgorithm is divided into three lyers, of which the most effective MP3, the most fmous. On the Internet, people cn get lot of music in MP3 formt. However, not ll the music is leglly obtined; Therefore, there hs been lrge number of rtists nd copyright holders by the litigtion. MP3 is the MPEG video compression stndrd in the udio portion. udio compression cn be chieved in two wys. In the "wveform coding", using "Fourier trnsform" this mthemticl pproch to the wveform signl is trnsformed into frequency components. The mgnitude of ech component is encoded using the smllest wy, the purpose is to use s little s possible the number of bits t the other end ccurtely reproduce the wveform. nother method is the "perception of coding." This method uses the humn uditory system's shortcomings, the use of people's ers do not come out different wys to encode the signl, lthough wtching the reply wveform on the oscilloscope very different. Perceptul coding technology is bsed on the "psycho-coustic" bsis. "Psychocoustics," the study of humn perception of sound pproch. The MP3 encoding built on the bsis of perception. key feture of perceptul coding is: some of the sound cn msk other sounds. Imgine wrm summer dy, you re listening to flute live concert brodcst. Suddenly, group of workers ner the strt of the jckhmmer nd begn cutting the street pvement. Who cn not her the sound of the flute, its sound is overshdowed mobile drilling sound. From the trnsmission point of view, nd now s long s the frequency of hnd-held drill where the encoding is sufficient, becuse the udience could no longer her the long whistle. This phenomenon is clled "frequency msking" - the loudness of frequency higher frequency sounds cn msk the other smll sound loudness. If the loudness of the sound does not exist lrge, then the loudness of the sound could hve been little herd. In fct, even in the hnd drill to stop working fter short period of time, or her the flute. Becuse the jckhmmer strted working, the humn er to reduce its gin; nd the humn er gin incresed gin tke some time. This effect is clled "temporry shelter." In order to quntify these effects further, imgine n experiment in quiet room, hedset connected to the computer's sound crd, the computer genertes low-power 100Hz pure sine wve, sine wve power is slowly incresed. This mn ws told: When he herd this sound when he hit button. Computer records of the current power level, then 200Hz, 300Hz, nd repet the experiment on other frequencies, until the limit of humn hering. In the verge number of experimentl results, you get mp nd 20.1 () similr to the "hve more power in order to be herd the tone" of the log - log plot. From the curve, you cn directly drw the following conclusions: the "power in the udible threshold below" to encode the frequency components is bsolutely not necessry. For exmple, in Figure 20.1 (), if the 100Hz frequency signl power is 20dB, then the signl from the output signl cn be omitted, but it does not pper to reduce perceived sound qulity; becuse the power t 100Hz 20dB udible on the power level below. Now look t the experiment (2). Computer running the experiment gin (1), but this time with fixed mplitude sine wve superimposed on the test frequency. We found tht the frequency t 150Hz udible signl ner the threshold incresed. This observtion cn be the following results: By trcking the frequency bnd which signl is stronger signl ner the shield, we cn ignore the encoded signl in more frequency components, thus sving the dt bits. In Figure 20.1, the 125Hz signl output cn be ignored, nd no one cn her the difference; or even strong signl on frequency bnd disppered, but becuse of "temporry shelter" feture, in the next period of recovery period cn be ignored blocked frequency. MP3 lgorithm is the core of Fourier trnsform to obtin the sound power t ech frequency, then the output frequency tht is not msked, nd with s little s possible the number of bits to be encoded. With this bckground, we cn now look t how MP3 encoding is crried out. udio compression using 32kHz, 44.1 kHz or 48 kHz smpling the wveform. Smpling cn be mono, dul-chnnel cn lso be, nd cn choose one of the following configurtions: 1. Mono 2. Dul Mono 3. Non-Joint Stereo 4. Joint Stereo First, select the output dt rte. MP3 cn compress rock CD to 96kbps, but virtully no decline in sound qulity wreness course; even rock fns hve herd no sound down. For pino concert, the t lest 128kbps. From two different dt rtes rock of "signl to noise rtio" is much higher thn concert. You cn lso choose lower output dt rtes, but the sound qulity will be some decline. fter tht, group of 1152 smples for processing. Ech smple of 32 first through digitl filter, resulting in 32 frequency bnds. Menwhile, the input signl into the psychology model hs been decided by the frequency shielding. Next, 32 bnds in ech bnd cn be further trnsformed to better spectrl resolution. The next step, the vilble number of bits ssigned to ech frequency bnd, spectrl power of big "not shielded" bnds ssigned to the more bits, the power spectrum of smll "unshielded" bnd ssigned to the smller number of bits, nd The bnd is not completely shielded distribution of bits. Finlly, "H Fumn coding" method of dt encoding. "Hfu Mn Code" to short code words pper frequently ssigned to the dt, while the long code word does not occur frequently ssigned to the dt. In fct, more thn tht, there re vrious different techniques used for noise reduction, nti-lising nd chnnel redundncy between the excvtion, when the content is beyond the scope of this book.

                    本文档为【[工作]音频压缩】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，
                    图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。 
 该文档来自用户分享，如有侵权行为请发邮件ishare@vip.sina.com联系网站客服，我们会及时删除。

                    [版权声明] 本站所有资料为用户分享产生，若发现您的权利被侵害，请联系客服邮件isharekefu@iask.cn，我们尽快处理。

                    本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权，请谨慎使用。

                    网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传，仅限个人学习分享使用，禁止用于任何广告和商用目的。
                

下载需要：免费已有0 人下载

立即下载

[工作]音频压缩

你可能还喜欢