Human Ears

Scottsdale, AZ USA Photo by Yuichi

This page is to discuss about music reproduction process including acoustic property of human ears, functions of human brain and finally a layer system of music reproduction. There is no consensus about the layer system regarding music reproduction process in the listening room, yet. Therefore, this is my thought and my proposal. There are many misunderstandings about the “Music Reproduction”. Some says “This cable generates new horizon of sound!” The other says “It stays the same, just a nice to have staffs”. Or some group of people say that a tube amplifier is absolutely better than a solid state one” or many other opinions exist in this world. To understand the “Music Reproduction” I get started from property of human ears, then, I explain how the sound signal is treated in a brain. Finally I propose the layer system to reproduce sound and music in our brain.

Property of human ears

First, we have to understand the property of human ears. Ears are one of the most important input devices to the brain as well as eyes. We also sense sound pressure by our skin, too. However, we discuss about ears as a input device to the brain, here.

Sound is transmitted from a sound source to our ears by the air. Ears introduce incoming sound wave into a extended auditory canal for gain amplification. Then, a tympanic membrane is vibrated by sound pressure. The mechanical vibration is transmitted to three tiny bones. Vibration of bones is now transferred to a cochlea , then to cochlear nerves. Signals are transmitted to the brain.

This is the famous Flecher-Munson curve. To hear the same phon at 1KHz, how much SPL is needed in each frequency band, is shown. This means that obtaining the same ear output in the low frequency domain, we need to have higher SPL than SPL at 1KHz. Which means that we need more speaker energy in lower frequency in smaller phon.
Data from Wikipedia Data from Wikipedia
This is a curve came from the latest research activities. The data is from ISO226 updated in 2003. Difference between higher and lower phon in the low frequency domain is much lager than the Flecher-Munson curve. ISO defines that this curves as the “Equal Loudness Curve”. So, we refer this curve when we compensate the “Equal Loudness” in a listening room.
A compensation curves are shown. Higher frequency compensation is not needed because differences between curves are almost identical. Therefore, practically low frequency compensation is applied to make the sound more real.
Data from Wikipedia (ISO226-2003) Data from Yuichi's article in MJ

This is a horizontal directivity property of human ears. Frequency is at 8KHz. One side (right) curve is inverted to generate other side (left) curve and assembled for better visibility on this chart. The width of the face is neglected. The horizontal directivity property is generally fair for all angles.

This is a vertical directivity property of human ears. Frequency is also at 8KHz. One side (right) curve is inverted to generate other side (left) and assembled for better visibility on this chart. The width of the face is neglected. Upper side direction has almost flat sensitivity, but down side is poor because of the body.
Data: from Hirotake Yoshizawa and Makoto Namekata,
Gunma Sangyou Gijutu Center 2004
Data: from Hirotake Yoshizawa and Makoto Namekata,
GunmaSangyou Gijutu Center 2004
I am not so happy to see this data. But, unfortunately the data was taken by researchers and now it is ISO standard. Anyway, people get old, so high frequency sensitivity gets down. This is the data for male. Surprisingly enough, degradation starts from 2KHz, and from age 40. This is the same data for female. High frequency sensitivity of ladies ears also gets down, but not like male case. Ladies ears keep better sensitivity than gents. My God!
Edited from Data: ISO-7029 Edited from Data: ISO-7029

This is a STI (Speech Transmission Index) and a classification of easiness of speech recognition. X axis shows reberveration time (sec) and Y axis is the grade of easiness (STI).ISO defines STI five classes including “Excellent”, “Good”, ”Fair”, “Poor” and “Bad” level.

This data shows shifting of the STI by age. My God! Again, aging of human ears degrades speech recognition capability. Therefore, older people have to listen music or vocal in noise free environment. Otherwise, they get frustrated.
Data from Yuichi's article in MJ
Magazine, ISO and AIST No.26
Data: Edited from AIST No.26 presentation

Above data is from statistics of human ears. Needless to say, the performance and properties depend upon each individual person. It depends upon individual nature, past training, environment where he/she has lived and etc. It is not difficult matter to know that our frequency sensitivity by conventional method. We can use sine wave generator from free download side and try to hear by comparing microphone output. Please !!! start from ZERO volume of the amplifier and use higher quality speaker. And see your sensitivity at 5K, 10K, 15HHz.

Property of Human brain

Next, we try to understand human brain, a cerebrum in this case, which is the most mysterious space in our body. I only try to see its properties from the point of music reproduction, here. I try to explain in very simple way.

Picture of the cerebrum shows a basic flow diagram from the sound signal input, which is the output of ears, to the output of the brain. But in parallel, knowledge, visual images and others related to the music are also input to the brain. 

Output signal from ears is transmitted to cochlear nerves. These nerves are parallel lines and feed these signals to a auditory cortex in a cerebral cortex by each frequency band. The transmitted data includes the “Sound data” not the “Music data”, at this point yet.
(2) The signal is now brought to filters. This function eliminates noises. On top, some sort of sound information which is not preferable topics are also eliminated.
Then, selected sound information is transmitted to a hippocampus. The hippocampus is a temporary memory or a working memory to keep the sound or series of sounds for a certain period of time. The contents in the temporary memory will disappear soon, or may be in an hour or so.

However, we can recognize music scenario here, so it may have some functions to assemble music from series of sound here, too.
(4) Information which has strong impression or repetitive memorization in the hippocampus is now transmitted to the long term memory or non volatile memory area. In this memory, there are lots of information are already stored including his/her knowledge, past experiences, happy/sad feelings, visual and sound images, music images, unique value, etc. etc.

New ideas or new findings or value will be generated from new combination of those pieces of information.
Output from the memory triggers a left cortex and a right cortex. Information from the hippocampus may also directly hits cortexs.
The right cortex feels excitement including positive and negative feeling. In case of positive stimuli are added, poeple feel hope, happiness and vivid feelings. On the other hand people feel fear or frustration when negative stimuli are given.
(7) The left cortex handles logical function. If frequency response of a speaker system is flat, then it decides the reproduced sound must be good, for example.
(8) Memorized information does not stay the same. The brain always fetches memory and process it and then returns it back. Information is changed a little bit. Most of old days memory bacome nicer. We forget some un-wanted one. (9)
There is a common database for all human. Most of people do not like some sort of noise or sound such as glass rubbing. Most of all people have some favorite sounds or tempos. Those influences all the decision making process in the brain. Therefore, we can share the same feelings by this common database among individuals.
Finally, the cerebrum generates a control and or a activation signals to the body.
Rules of brain operations related to music reproduction process are as follows.

(A) : In general, the brain tries to draw out a conclusion with the most economical way even if obtained conclusion is true or not. This is because the brain has to finish current job as soon as possible and to make itself ready condition for the next urgent job(s) which may come soon.

(B) : The brain always likes to stay in the most stable and to be in comfortable situation. In case the brain has unsolved problem or unhappy situation, this is unstable condition. So, the brain tries to solve the problem as soon as possible. Sometimes the brain reaches temporary solution, it is happy even if the solution does not eliminates a real cause.
ex. In case, someone bought very expensive audio cables, he/she tries to understand the sound must be very good and feels satisfaction even if physical sound stay exactly the same as before. This short cut to the conclusion is come from the behavior (A) and (B) above. “OK, no doubt! I am satisfied with. My investment is fully justified.”

(C) : Visual information is superior to auditory information in most of the cases. When we listen the music, we also see the speaker system or amplifiers or interior of the room by our eyes. We have already memorized beautiful photo information, good catalogue data regarding amplifiers or speakers and may be gold plated terminals / cables and etc. These visual information greatly influence to the sound or the music in the brain not by ears. It is always happen. Good restaurants serve not only good food but also good atmosphere including plates and decoration with candles. 

(D) : Re-generation of memory

Even if there is no input from the input device such as ears, eyes, nose or skin, the brain generates output, independently from the input. Older person does not sense higher frequency sound physically. However, he/she has music memories stored long time ago when they were young. So, they listen music 70% of all information in the listening room now, however they can hear 100% in the brain.

(E) : 24H365D operation

The brain works 24 hour 365 days. It does not sleep. When we are in bed, the brain fetches data from the memory and return it back. It may modify the data or add some or combine with other data in the memory. Therefore, what we remember is not always the same as before.

Sound Reproduction Layers

Left chart describes a process from the “Music creation” to the “Music Reproduction”. First, some impression exists. Then, (a) musician(s) tries to express his/her impression by sound or music. There are many people involved in the process. However, these activities generate a series of sound as a conclusion. The sound is captured by microphone(s) and recorded in the master tape or HDD. The sound is pressed on a recording media after some editing process. Sound is now in the media. Or, in the file server for distribution through the internet in recent years.

  The sound is now reproduced in the listening room using sophisticated equipment. The listener hears the sound. The sound reaches to listeners ears and assembled as a music in their brain. Finally, they try to reproduce original impression. However, it is impossible to do this work 100%. Because they are not the same as the musician. 

So, they try to create their own impression or music based upon their interpretation.
 Series of process is realized by a layers system as illustrated in this picture. Left side layers are for the music creation process. A bottom line layer is for generation of media. This process yields CD or LP or digital data for music listeners. And the right side layers are for reproduction or re-creation of music.

There are some discussions related to quality of source sound. If we mix up quality of source and quality of reproduction, we get lost the way where we go. Therefore, we forget about the quality of source, first. There are many contribution factors to decide the sound quality in between the source data and the sound reaches to our ears. There are mostly influenced by equipment quality including a CD Player, amplifiers, speakers, room acoustics and performance of ears. Performance of ears includes not only the equal loudness but also our own ear property such as frequency response, dynamic range, S/N ratio, and etc.

Then, captured sound by ears is brought to the brain system, the brain reproduces the original music. However, problem is that most of listeners do not know the original sound or music except live listeners. So, reference may be "listener’s feelings". “I prefer this, or I don’t like” in most of the cases.

This is physical process from the recording media to human ears. In electronics equipment interface world, usually output impedance is lower enough than input impedance of next stage. Therefore, the next stage does not influence the previous stage.
The loudspeaker and the power amplifier behave differently in case DF (dumping factor) is low such as a tube amp. Largest contributors to the sound quality are loudspeakers and a listening room as of today. Technology of other electronics components is well established.

Let's start from the bottom. One of the most important factors to obtain high quality reproduction system is to have good physical property. This belongs to the bottom layer. Quality of the CD player to the loud speaker system including the listening room can be physically measured. The properties include such as the frequency response, the distortions, the signal to noise ratio, the dynamic range, the reflection from the walls, the acoustic property including the RT60 and others at the point of the listening location. This layer is for the “Sound” reproduction. Because of this reason, many people share the parameters or properties, and improve those because reference is relatively clear.

In the middle layer, it get a little complex. Every individual has different ears and different feelings. In this layer, we can adjust and tune up or compensate the sound for the equal loudness, for the ear property and for the acoustics of listening room and for loudspeaker property. There are no physical references. Therefore, the reference may be “I prefer this sound or music” or “I don’t like” may happen. Compensation can also be done for original source. Good example is that some individuals enhance lower frequency for certain CD because they like it.

The highest layer is much more difficult. This is a kind of creation of the listener's music or reproduction refers to their old memories and imagination. Sometimes it can be bad signal to noise ratio. In this case, noise is a part of the music which recalls 50-60 years ago. The cable matter exists in this layer. Some expensive cables is not effective for physical sound quality. Only the blind test can detect the difference. But, some people like to spend money for expensive cables, because he listen better sound in their brain. I do not feel the difference between expensive cables and normal cables by ears, but by the brain sometimes. Feelings fluctuate day by day and as time goes by.

 Most important thing is that we always start from the bottom layer. Then, we go up. If we satisfied, that is fine. If not, we can anytime back to the bottom layer. Then, we do not get lost.

When we discuss music reproduction quality, we better clarify if it is sound quality or muic quality or image quality. Those are located in separate layers. Then, we can focus the discussion in a same layer. Now, we can communicate
each other.

Another headache question is a reverberation of a listening room. If there is no reverberations in the room, reproduced sound is not realistic in general. Adequate reverberation must be added even in the listening room in the current two channel stereo technology. The reverberation time is displayed so called RT60 which is the time from 0dB to –60dB. So, how much second? Is it for the classical music? Solo? Jazz? Instrumental? Vocal? It is widely said that the time is less than 1 sec depending upon how he or she likes. This is the most money spending matter for reproduction of music but it is very important.

.... Return to Top Page