Understanding IP Communications: Cisco VoIP September 17
So people generally understand Voice over IP as making phone calls over the internet instead of through phone lines. Speaking in the same general manner, this is correct.
This post is going to explain Voice and VoIP further using the Cisco CallManager product line for reference (CallManager 4.2 specifically).
Voice Telephony Fundamentals
It is known that the human ear can hear a range of frequencies from 20 Hz to 20,000 Hz, on the other hand the human voice generates frequencies from 300 - 3300 Hz. So what does this have to do with telephony? With standards in place and the long and short of how things came to be in the development of telecommunications/signal processing, the audible frequencies applicable to telephones are from 0 - 4,000 Hz, a relative frequency range of the human voice.
A critical component to the fundamental understanding of Voice is the Nyquist Theorem. The theorem, simply stated and paraphrased says, “for you to exactly reconstruct an audible signal (as in the case of telecommunications) you must sample it at two times the frequency being sampled, in this case two times 4,000 hz which is the max audio frequency for telephony. 2 x 4,000 = 8,000 Hz. So how is the sound reconstructed? You must take miniature recordings of x length(for example 8 bit long recordings), y number of times per second. This process is called sampling.
Sampling is the process of taking an analog signal like the human voice, and recording specified lengths of the sound at a defined number of intervals per second. When those “chunks” of voice are played back, the human ear (and mind) cannot distinguish this from continuous analog audible speech. However, this depends on how frequent the sampling of the Voice occurs. Please remember these starting and stopping points of the individual recording are created by an 8 bit measurement, taken at the sampling frequency. Thus why sampling is needed, and referenced in the next paragraph. A visual example is located below.
Spoken Voice:

Voice after being sampled:

Standard telecommunication digital voice sampling occurs at 8,000 Hz which is 8,000 times per second, or once every 125 ms. As mentioned before, every time Voice is sampled an 8 bit measurement is taken. This 125 ms interval is critical in planning your network, and implementing QoS therein. It is also a huge factor when planning convergence and redundancy (the HSRP protocol for example). The reasoning for this will be discussed later in this posting.
So why is it that people always say that Voice bandwidth is 64k? Well, you are sampling an 8 bit measurement 8,000 times per second. 8 bits x 8,000 Hz = 64,000 kilobits per second (Kbps), which is your bandwidth. That’s why. A much more comprehensive and complicated explanation of this can be found on the tech notes for Voice over IP on Cisco’s site. There is also great documentation on this subject matter on Cisco’s documentation page. For the sake of simplicity, the length of the recording, and number of times per second a recording happens, are standardized in to what are called voice codec’s. Each codec is different with the difference being how long (in terms of bits) the sample is, or how frequent a recording takes place, all among other things which are outside of the scope of this post.
As previously mentioned, there are different codec’s used each have a substantial effect on the “quality” of the audio being transmitted and received. Codec’s that Cisco CallManager use are G.729 and G.711. There are others such as G.729a and G.720, however CallManager only supports G.729 and G.711. G.729 has a bit rate of 8 Kbps, while G.711 has a bit rate of 64 kbps. To keep it simple G.711 has a higher bit rate and delivers the better quality voice. It is important to note that you can have different codec’s running at different bit rates, which is very confusing to most people, but stated in the next paragraph.
This brings us to the quality of Voice codec’s. There are two methods that Voice codec’s are measured by, and given numerical scores based on quality. The one used here is the Perceptual Speech Quality Measurement (PSQM) rating. The rating system is on a scale of 1 to 5, with 5 being the best. G.711 at 64 kbps rates at a 4.8, where G.729 and G.729a (at 64 Kbps) rate 4.2 and 3.8 respectively.
*Note: Conversion between these codec’s must be done at the CallManager.*
Many more granular details exist in the IP Telephony(IP-T) world like signaling. Since topics like this are branched out to separate areas like Enterprise and Service Provider scenarios, this part of the discussion will be skipped. The discussion of SIP/Skinny/H.323 for Enterprise IP-T, and MGCP/MEGACO/SS7 for Service Providers and the explanation of the differences therein are outside the scope of the goal of these postings. You are more than welcome to learn the skipped information from Cisco’s documentation page.

Tom Callan (Sacman Dec 6
This was very informative and well written. I had to read it twice to fully understand it tbh.