Speech Processing

By | June 5, 2015

               Speech is a natural mode of communication for people. We learn all the relevant skills during early childhood, without instruction, and we continue to rely on speech communication throughout our lives. It comes so naturally to us that we don’t realize how complex a phenomenon speech is. Speech recognition is basically making a computer understand spoken language. By understand we mean react appropriately or convert the input speech into another medium. Speech recognition is more and more useful now a days. Various interactive softwares are available in market today but they are useful for general-purpose computers. With the growth in the needs for embedded computing and the demand for embedding platforms, it is required that speech recognition systems are available on them too.


Speech recognition basically means talking to a computer, having it recognise what we are saying, and lastly doing it in real time. This process fundamentally functions as a pipeline that converts PCM (Pulse Code Modulation) digital audio from a sound card into recognised speech. Speech recognition is basically making a computer understand spoken language. By understand we mean react appropriately or convert the input speech into another medium. We humans have natural speech recognition. Articulation produces sound waves, which the ear conveys to the brain for processing. The basic question is how might a computer do it? It does it in three ways-Digitization, acoustic analysis of speech signal and linguistic interpretation.

Steps in speech processing


 Digitization is basically analog to digital conversion of speech signal, followed by sampling and quantising the signal.

 Using filters to measure energy levels for various points on the frequency spectrum does this. Knowing the relative importance of different frequency bands (for speech) makes this process more efficient.

Sampling: Samples are taken from continuous signal are in periodic moments tn=n.T

which sizes corresponds to immediate values of continuous signal in sampling time tn. T is the Sampling period and n=0,1,…, ˆž.

According to Shannon´s sampling theorem the frequency of sampling fv must be twice as the maximum frequency of analog signal fm.

Quantization: is the operation which allows the change of signal with continuous variable to signal With finite number of values. This is as shown in (Fig 1)


Fig.1- Quantization

B.Separating speech from background noise:

We can do this by using two microphones Noise cancelling microphones Two mics, one facing speaker, the other facing away Ambient noise is roughly same for both mics knowing which bits of the signal relate to speech-Spectrograph analysis.


 First and very important step when recognizing speech is the signal processing. It creates output for classificators. In order of faster classifying these information must be reduced to lowest possible rate with insignificant loss of information content. This is very important especially for embedded systems in cars, which have less memory and operating output than PC.

Author: Ravi Bandakkanavar

A Techie, Blogger, Web Designer, Programmer by passion who aspires to learn new Technologies every day. It has been 6 years since I have been publishing articles and enjoying every bit of it. I want to share knowledge and build a great community with people like you.

Suggested read for you:

Did it help? Comment here..