音频压缩(doc) 音频压缩 CD音质的音频信号需要1.411Mbps传输带宽。如果要使网络传输成为现实,显然需要进行实质性压缩。各种各样的音频压缩算法开发出来了,MPEG音频算法或许是最流行的一种。MPEG算法分为三层,其中MP3最有效,最著名。在互联网上,人们可以获得大量MP3格式的音乐。不过,并非所有这些音乐都是合法取得的;因此,就出现了大量由艺术家和版权所有人提起的诉讼。 MP3是MPEG视频压缩 标准 excel标准偏差excel标准偏差函数exl标准差函数国标检验抽样标准表免费下载红头文件格式标准下载 中的音频部分。音频压缩可以通过两种方式实现。在“波形编码”中,用“傅里叶变换”这种数学方法将波形信号变换为频率分量。每个分量的幅度采用最小方式进行编码,目的是用尽可能少的比特数在另一端准确重现波形。另一种方法就是“知觉编码”。这种方法利用了人类听觉系统的缺陷,采用让人耳听不出来差别的方式对信号进行编码,尽管在示波器上观看重放波形差别很大。知觉编码技术是建立在“心理声学”基础之上的。“心理声学”研究的是人类感知声音的方式。而MP3建立在知觉编码基础之上。 知觉编码的一个关键特性是:一些声音可以掩蔽另一些声音。想象一下,在一个温暖的夏日,你正在收听长笛演奏会的实况广播。突然,附近有一组工人开动了手提钻,并开始切割街道路面。谁也听不见长笛的声音了,它的声音被手提钻的声音掩盖了。从传输的角度看,现在只要对手提钻所在的频率进行编码就足够了,因为听众再也无法听到长笛声。 这种现象称作“频率掩蔽”——某个频率上响度较大的声音能掩盖另一个频率上响度较小的声音。假如响度大的声音不存在的话,这个响度小的声音本来是可以听到的。事实上,即使在手提钻停止工作后的一小段时间内,还是听不到长笛的。因为手提钻开始工作的时候,人耳调低了其增益;而将人耳增益再次调高需要一段时间。这个效应称作“暂时掩蔽”。 为了使这些效应更加量化,想象一下实验1.在安静的房间里,一个人将耳机连至计算机的声卡上,计算机产生一个小功率100Hz纯净正弦波,正弦波的功率在缓慢增加。这个人被告知:当他听到这个音的时候就敲击一下键。计算机记录着当前的功率电平,然后在200Hz,300Hz和其他频率上重复这个实验,直到人耳听觉的极限。在对多个实验结果进行平均之后,就得到了一张和图20.1(a)相像的有关“具备多大功率的单音才能被听到”的对数——对数图。 从该曲线中,直接可以得出如下结论:对“功率在可听门限以下”的频率成分进行编码是绝对没有必要的。例如,在图20.1(a)中,如果100Hz频率信号的功率为20dB,那么这个信号就可以从输出信号中略去,却不会出现可察觉的音质降低;因为100Hz上20dB的功率在可听电平之下。 现在考察一下实验(2)。计算机再次运行实验(1),但这一次用一个固定幅度正弦波叠加到测试频率上。我们发现频率位于150Hz附近的信号的可听门限提高了。 这次观测可以得到如下结果:通过跟踪哪些信号会被临近频带更强的信号屏蔽,我们就可以在编码信号中忽略更多的频率成分,从而节省了数据位数。在图20.1中,输出完全可以忽略125Hz信号,而没有人能听出差别;甚至某个频带上的一个强信号消失了,但由于“暂时掩蔽”特性,在接下来一段恢复期内也可以忽略被屏蔽的频率。MP3算法的核心就是利用傅里叶变换获得声音在每个频率上的功率,然后输出那些不被屏蔽的频率,并用尽可能少的比特数对其进行编码。 有了这些背景知识,我们现在可以看一下MP3编码是如何进行的。音频压缩使用32kHz、44.1 kHz或48 kHz对波形进行采样。采样可以是单声道的,也可以是双声道的,并且可以选用如下配置之一: 1( 单声道 2( 双单声道 3( 非联合立体声 4( 联合立体声 首先,要选择输出数据率。MP3能将一张摇滚乐CD压缩至96kbps,而几乎没有课觉察的音质下降;即便是摇滚乐爱好者也听不出音质下降。对于钢琴音乐会而言,至少需要128kbps。两个数据率的不同源于摇滚乐的“信噪比”远远高于音乐会。也可以选择更低的输出数据率,但音质上会出现一些下降。 在这之后,样本以1152为一组进行处理。每组样本首先通过32个数字滤波器,从而得到32个频带。同时,输入信号进入心理学模型已决定被屏蔽的频率。下一步,32频带中的每个频带进一步变换得以更好的频谱分辨率。再下一步,将可用的比特数分配给每个频带,谱功率大的“未屏蔽”频带分配到较多的比特 数,谱功率小的“未屏蔽”频带分配到较少的比特数,而完全被屏蔽的频带不分 配比特数。最后,用“哈弗曼编码”方法对这些数据进行编码。“哈弗曼编码” 将短码字分配给出现频繁的数据,而将长码字分配给出现不频繁的数据。 事实上,还不止这些,还有各种不同的技术用来进行噪声消除、抗混叠和声 道间冗余的挖掘,到时这些内容已超出了本书的范围。 Audio compression CD-quality audio signals need to 1.411Mbps bandwidth. If you want the network to become a reality, a clear need for substantial compression. A wide range of audio compression algorithms developed out, MPEG audio algorithm is perhaps the most popular one. MPEG algorithm is divided into three layers, of which the most effective MP3, the most famous. On the Internet, people can get a lot of music in MP3 format. However, not all the music is legally obtained; Therefore, there has been a large number of artists and copyright holders by the litigation. MP3 is the MPEG video compression standard in the audio portion. Audio compression can be achieved in two ways. In the "waveform coding", using "Fourier transform" this mathematical approach to the waveform signal is transformed into frequency components. The magnitude of each component is encoded using the smallest way, the purpose is to use as little as possible the number of bits at the other end accurately reproduce the waveform. Another method is the "perception of coding." This method uses the human auditory system's shortcomings, the use of people's ears do not come out different ways to encode the signal, although watching the replay waveform on the oscilloscope very different. Perceptual coding technology is based on the "psycho-acoustic" basis. "Psychoacoustics," the study of human perception of sound approach. The MP3 encoding built on the basis of perception. A key feature of perceptual coding is: some of the sound can mask other sounds. Imagine a warm summer day, you are listening to flute live concert broadcast. Suddenly, a group of workers near the start of the jackhammer and began cutting the street pavement. Who can not hear the sound of the flute, its sound is overshadowed mobile drilling sound. From the transmission point of view, and now as long as the frequency of hand-held drill where the encoding is sufficient, because the audience could no longer hear the long whistle. This phenomenon is called "frequency masking" - the loudness of a frequency higher frequency sounds can mask the other small sound loudness. If the loudness of the sound does not exist large, then the loudness of the sound could have been little heard. In fact, even in the hand drill to stop working after a short period of time, or hear the flute. Because the jackhammer started working, the human ear to reduce its gain; and the human ear gain increased again take some time. This effect is called "temporary shelter." In order to quantify these effects further, imagine an experiment in a quiet room, a headset connected to the computer's sound card, the computer generates a low-power 100Hz pure sine wave, sine wave power is slowly increased. This man was told: When he heard this sound when he hit a button. Computer records of the current power level, then 200Hz, 300Hz, and repeat the experiment on other frequencies, until the limit of human hearing. In the average number of experimental results, you get a map and 20.1 (a) similar to the "have more power in order to be heard the tone" of the log - log plot. From the curve, you can directly draw the following conclusions: the "power in the audible threshold below" to encode the frequency components is absolutely not necessary. For example, in Figure 20.1 (a), if the 100Hz frequency signal power is 20dB, then the signal from the output signal can be omitted, but it does not appear to reduce perceived sound quality; because the power at 100Hz 20dB audible on the power level below. Now look at the experiment (2). Computer running the experiment again (1), but this time with a fixed amplitude sine wave superimposed on the test frequency. We found that the frequency at 150Hz audible signal near the threshold increased. This observation can be the following results: By tracking the frequency band which signal is stronger signal near the shield, we can ignore the encoded signal in more frequency components, thus saving the data bits. In Figure 20.1, the 125Hz signal output can be ignored, and no one can hear the difference; or even a strong signal on a frequency band disappeared, but because of "temporary shelter" feature, in the next period of recovery period can be ignored blocked frequency. MP3 algorithm is the core of Fourier transform to obtain the sound power at each frequency, then the output frequency that is not masked, and with as little as possible the number of bits to be encoded. With this background, we can now look at how MP3 encoding is carried out. Audio compression using 32kHz, 44.1 kHz or 48 kHz sampling the waveform. Sampling can be mono, dual-channel can also be, and can choose one of the following configurations: 1. Mono 2. Dual Mono 3. Non-Joint Stereo 4. Joint Stereo First, select the output data rate. MP3 can compress a rock CD to 96kbps, but virtually no decline in sound quality awareness course; even rock fans have heard no sound down. For piano concert, the at least 128kbps. From two different data rates rock of "signal to noise ratio" is much higher than concert. You can also choose a lower output data rates, but the sound quality will be some decline. After that, a group of 1152 samples for processing. Each sample of 32 first through digital filter, resulting in 32 frequency bands. Meanwhile, the input signal into the psychology model has been decided by the frequency shielding. Next, 32 bands in each band can be further transformed to better spectral resolution. The next step, the available number of bits assigned to each frequency band, spectral power of a big "not shielded" bands assigned to the more bits, the power spectrum of small "unshielded" band assigned to the smaller number of bits, and The band is not completely shielded distribution of bits. Finally, "Ha Fuman coding" method of data encoding. "Hafu Man Code" to short code words appear frequently assigned to the data, while the long code word does not occur frequently assigned to the data. In fact, more than that, there are various different techniques used for noise reduction, anti-aliasing and channel redundancy between the excavation, when the content is beyond the scope of this book.
