Understanding Objective Voice Evaluation and Acoustic Analysis Techniques
Nov 29, 2024

Introduction
Speech is the expression or the ability to express thoughts and feelings by articulate sounds. Phonation is a term used to describe the physical and physiological processes of vocal fold vibration in the production of speech sounds.
Dysphonia: Impaired voice production due to abnormal vocal fold vibration.
Aphonia: No voice or whispery voice associated with no vocal fold vibration.
Hoarseness is the non-specific and general term used to describe any change in voice quality. Criteria for the normal voice: it should be and have clarity and stability in a wide range of settings.
- Appropriate for gender and age
- Can fulfill its linguistic and paralinguistic functions.
- Should not get easily fatigued.
- Not associated with the phonatory discomfort or the pain.
Normal Speech and Voice Production
Normal voice production requires three essential elements
A pressure gradient across the vocal folds is created by the flow of expired air from the lungs against the partly closed vocal folds. Vocal folds of appropriate structure, mass, and elasticity that approximate with appropriate tension to allow them to vibrate at a range of frequencies. A resonating chamber, the vocal tract, whose size and shape can be changed to modulate the acoustic properties of sound generated by the vocal folds.
Essential features of the acoustic properties of voice. The sound wave can be characterized acoustically in terms of
- Fundamental frequency (Hertz or Hz)
- Frequency spectrum amplitude
- Intensity (dB)
Fundamental Frequency
It is the rate of vibration of the vocal folds (cycles per second) that determines the fundamental frequency (FO). Normal vocal fold vibration consists of a spectrum of frequencies, which are multiples of this fundamental frequency. They are also called harmonics or overtones. The multiples of fundamental frequency are called harmonics, which are continuous and represent the fundamental frequency.
For good and 'normal' voice production, three conditions are required
It should have uniform quasi-periodic vibration of the vocal folds. There should be a well-defined harmonic structure of the voice signal radiating from the mouth. A voice signal that is loud enough or has enough sound intensity (energy) to overcome the threshold of hearing of the listener, which means the voice should be audible.
Also read: Management of Deformed Nasal Dorsum
Pathological Voice Production
Abnormalities in the mass, elasticity, and tensioning of the vocal folds affect the frequency rate, and depending on the regularity of vibration, it can change the voice pitch and lead to the pathological voice. Suppose there are abnormalities in the dimensions or structure of the vocal tract. In that case, it can affect the energy levels and harmonic structure of the radiated sound, causing the voice to sound strained or effortful. Inadequate control or amount of subglottic pressure.
Normal voice vs. Pathological voice
A generally accepted and pragmatic definition of a normal voice is one described as having the following characteristics:
- It is audible, clear, or stable in a wide range of acoustic settings.
- It is appropriate for the gender and age of the speaker.
- It is capable of fulfilling its linguistic and paralinguistic functions.
- It does not fatigue easily.
- It is not associated with discomfort and pain on phonation.
Pathological voice can be defined as one that does not fulfill the criteria above. Use of voice evaluation
- It helps to provide a measure of the severity of the disorder and the degree of variance established normal values.
- Voice evaluation as an outcome measure to help assess responsiveness to treatment.
- Laryngeal visual assessment
- This assessment is helpful for the inspection of the structure and dynamic function of the larynx and the rest of the vocal tract, together with the vibratory patterns of the vocal folds during phonation.
- A flexible fiberoptic endoscope is used.
- Visualize Larynx and Identify Abnormality in Nasal Airway, Nasopharyngeal Airway, Orophyaryngeal Airway.
- Diagnose: mucosal abnormality, nodule, cyst, Reinke's edema.
- Laryngeal Visual Assessment can be combined with Stroboscopy.
| Comparison of CAPE-V and GRBAS | ||
| CAPE-V | GRABS | |
| Parameters | Overall Severity Roughness Breathiness Strain Pitch Loudness | G – Grade (degree of overall voice abnormality) R – Roughness B – Breathiness A – Asthenia S – Strain |
| Rating Scale | Continuous visual analog scale, with 0 = normal endpoint, 100 = severe endpoint | Ordinal scale 4: Point Likert scale with 0 = normal, 3 = extreme |
| Protocol for administration | Yes | No |
| Guidelines for analysis | Yes | No |
Also read: Acute Infections of the Larynx

Acoustic Analysis
Acoustic analysis provides quantitative measures based on the voice signal (waveform and spectrum) recorded using a microphone placed near the mouth. The microphone acts as a transducer, converting the acoustic signal into an electrical signal. The amplified electrical signal is most commonly recorded directly to the hard disk as uncompressed.wav files. A variety of free and commercial software programs are available for display, measurement, and statistical analysis of the acoustic waveform and spectrum. There are three main types of voice material used in acoustic analysis
- Sustained vowels
- Fluent speech
- Consonant-vowel sequences
PRAAT is the software used to analyze the voice effectively. Part of the acoustic Analysis
Fundamental frequency
Fundamental frequency is a measure of the rate of vibration of the vocal folds. It is the inverse of the time taken to complete a single vibratory cycle It is measured in cycles per second or Hertz (Hz). In males, the fundamental frequency drops from young adulthood till middle age and rises again in old age. In women, the sound fundamental frequency remains fairly constant from 20 to 50 years and then drops.
Sound intensity and sound pressure level (SPL)
The amplitude of the signal relates to its strength. As listeners, we perceive this as loudness.
In practice, sound intensity is measured in terms of the logarithmic ratio of the absolute sound pressure to a reference sound pressure level (SPL) expressed in decibels.
Jitter and Shimmer
It is normally possible for an individual to produce a vowel sound for several seconds with little variation (perturbation) in the frequency (jitter) or intensity (shimmer). Pathological voice samples have been shown to have higher levels of jitter and shimmer than normal. However, there is a poor correlation between perceived vocal quality and acoustic measures of jitter and shimmer.
Also read: Inflammatory Disorders and Autoimmune Diseases of Larynx
Voice Range Profile (VPR) or Phonetogram
A voice range profile (VRP) is a visual display of the dynamic range of the voice in terms of frequency and vocal intensity. It is used in both adults and children.
Voice Spectrogram
The acoustic output of the vocal tract resulting from the interaction of the vocal fold vibration with the vocal tract can be displayed graphically in a 3D way in what is known as a spectrogram. Time from the beginning of the vowel utterance to the end is displayed on the x-axis. A logarithmic display of the frequency distribution is seen on the y-axis, while the amplitude or amount of energy in the spectrum for a given frequency or frequency band is represented by increasingly dark shades of grey.
Harmonics to noise ratio
The harmonics-to-noise ratio (HNR) (measured in dB) is the mean intensity of an average waveform (noise-free) divided by the mean intensity of the isolated noise component for the series of waveforms in the utterance. The greater the noise, the lower the HNR. The noise element can be generated from irregular (aperiodic) vibration of the vocal folds or some other structure within the vocal tract (e.g., the false cords) and/or turbulent airflow (e.g., air leakage through an incompletely closed glottis).
Also read: Benign Lesions of the Larynx
Electro-laryngography
The electrolaryngograph (ELG) consists of two electrodes placed on the skin on either side of the thyroid cartilage. A high-frequency current (3 megahertz) is applied between the two electrodes and held at a constant voltage. Vocal fold vibration changes the electrical conductance between the electrodes The resulting waveform can be analyzed automatically to obtain measures of the rate of vocal fold vibration (fundamental frequency, Fx) and frequency perturbations (e.g. jitter).
- Laryngograph electrodes placed over thyroid cartilage.
- Speech (acoustic) waveform is required using a tie clip or head-mounted microphone.
- The ELG wave form reflects the variation in conductance between the electrodes during phonation. In modal voice, the vocal folds close more rapidly than on opening, giving a sharper rise in the waveform.
Aerodynamic measures
Clinically, there are three main factors that can be measured that are of interest in voice production:
- Air Volume
- Air flow
- Subglottic pressure
Voice accumulators
Voice accumulators' are wearable voice-monitoring systems that can provide reliable and objective measures of voice use during daily activities away from a clinic environment.
They can potentially provide a better understanding of the role of daily voice use in the causation of voice disorders.
Also read: Upper Airway Obstruction: Causes, Management
Hope you found this blog helpful for your ENT residency Larynx preparation. For more informative and interesting posts like these, keep reading PrepLadder’s blogs.

PrepLadder Medical
Get access to all the essential resources required to ace your medical exam Preparation. Stay updated with the latest news and developments in the medical exam, improve your Medical Exam preparation, and turn your dreams into a reality!
Navigate Quickly
Introduction
Normal Speech and Voice Production
Fundamental Frequency
Pathological Voice Production
Normal voice vs. Pathological voice
Acoustic Analysis
Fundamental frequency
Sound intensity and sound pressure level (SPL)
Jitter and Shimmer
Voice Range Profile (VPR) or Phonetogram
Voice Spectrogram
Harmonics to noise ratio
Electro-laryngography
Aerodynamic measures
Voice accumulators
Top searching words
The most popular search terms used by aspirants
- ENT Residency Larynx
- ENT Residency Larynx Preparation
PrepLadder for Residency
Avail 24-Hr Free Trial