How does tibetan throat singing work




















Move the body of your tongue back and forth. Keeping the tip of your tongue on the roof of your mouth. Think of it as shifting between an "R" and an "L" sound with your tongue. Slowly change the shape of your lips to adjust the sound. Think of moving your mouth from an "E" sound to a "U" sound "as if saying "see you" without the "s". This changes the shape of your lips and the "resonance" of your mouth how sound bounces around inside.

Do this slowly. Bring it all together to throat sing. Everyone's mouth is a little different and there is no perfect formula for tongue position, mouth opening, or volume.

Start with your basic "oooo" note, and then: [5] X Research source Place your tongue near the roof of your mouth in a "r" position. Move your lips slowly between the "E" and "U" vowel sounds. Slowly curl your tongue back and away from the your lips. When you hear your overtones, stop moving your mouth and hold the tone. Method 2. Practice with some background noise. These will hide your normal vocal tones and make your high-pitched "whistling" tones louder.

Try practicing in the shower, while you drive, or while the TV is on in the back Don't worry if you cannot hear the overtones at first.

It is difficult to hear yourself singing overtones when you first begin, even if you are making them properly, because of the the resonance in your head.

Sing with a loud, bright voice. When they are first starting out, most people don't give enough power and energy behind their voice, To get the "ooooo" sound right, imagine you are trying to sing as someone squeezes your throat.

Your voice will need to loud and forceful, and this will help you create overtones. The best way to sing more beautifully and richly is to discover your true voice in the actual world, ex. Focus on singing from your upper chest. There is a difference between your "chest voice" and you "head voice. A chest voice feels "resonant," and you can feel the vibrations along your upper chest. Practice changing notes. Once you can comfortably make sing with overtones, you can learn to make melodies by moving your lips and adjusting your base note.

Listen to real life examples. Throat singing is found in cultures from Alaska to Mongolia and South Africa. The Smithsonian museum has an incredible collection of videos from these cultures , as well as some tutorials for burgeoning throat singers.

Did you know you can get expert answers for this article? Unlock expert answers by supporting wikiHow. Jonathan Stancato Voice Coach. Jonathan Stancato. Support wikiHow by unlocking this expert answer. Not Helpful 13 Helpful 2. Blow your nose before a performance, use bigger breaths, and sing from the gut.

Not Helpful 3 Helpful But Tuva, a republic sandwiched between Siberia and Mongolia, is known for a different sound. So yes, you heard a stringed instrument. But all those other notes are made by one person. He is singing two notes at one time.

How does he do it? Well, that is the topic of our latest Macroscope video series. Hey, Luke. Hello, Aaron. What is unique about Tuvan throat singing? So if you listen to it if, you listened to that, there was this whistling noise. And that noise is meant to kind of mimic the sounds of the wind on the steppe, the Tuvan steppe.

Some of the styles sound a little bit like animals. OK, so Aaron— well, you know what? We have a clip of the high style. Can we hear that? So what are throat singers doing to get this two distinct note effect? And the truth is exactly that every time that we use our voice, we have a fundamental frequency. Our vocal folds are vibrating so many times per second.

However, their MRI data reveal limited detail since they were static images of singers already in the biphonation state. Small variations in vocal tract geometry can have pronounced effects on produced song Story et al. To understand which features of vocal tract morphology are crucial to biophonation, a dynamic description of vocal tract morphology would be required.

Here we study the dynamic changes in the vocal tracts of multiple expert practitioners from Tuva as they produce Khoomei. We use MRI to acquire volumetric 3D shape of the vocal tract of a singer during biphonation. Then, we capture the dynamic changes in a midsagittal slice of the vocal tract as singers transition from tonal to biphonic singing while making simultaneous audio recordings of the song.

We use these empirical data to guide our use of a computational model, which allows us to gain insight into which features of vocal tract morphology are responsible for the singing phonetics observed during biophonic Khoomei song e.

We focus specifically on the Sygyt or Sigit style of Khoomei Aksenov, We made measurements from three Tuvan singers performing Khoomei in the Sygyt style designated as T1—T3 and one T4 in a non-Sygyt style. Songs were analyzed using short-time Fourier transforms STFT , which provide detailed information in both temporal and spectral domains. We recorded the singers transitioning from normal singing into biphonation, Figure 1 showing this transition for three singers.

The f 0 of their song is marked in the figure approximately Hz for subject T2, Hz for both T1 and T3 and the overtone structure appears as horizontal bands. Varying degrees of vibrato can be observed, dependent upon the singer Figure 1 ; see also longer spectrograms in Appendix 1—figure 6 and Appendix 1—figure 7.

Most of the energy in their song is concentrated in the overtones and no subharmonics i. In contrast to these three singers, singer T4 performing in a non-Sygyt style exhibited a fundamental frequency of approximately Hz, although significant energy additionally appears around 50—55 Hz, well below an expected subharmonic Appendix 1—figure 5.

If we take a slice, that is a time-point from the spectrogram and plot the spectrum, we can observe the peaks to infer the formant structure from this representation of the sound red-dashed lines in Figure 1 and Appendix 1—figure 4.

As the singers transition from normal singing to biphonation, we see that the formant structure changes significantly and the positions of formant peaks shift dramatically and rapidly. Note that considering time points before and after the transitions also provides an internal control for both normal and focused song types Appendix 1—figure 4.

Once in the biphonation mode, all three singers demonstrate overtones in a narrow spectral band around 1. Whereas the energy in the low-frequency region associated with the first formant below Hz is roughly constant between the normal-singing and focused states, there is a dramatic change in the spectrum for the higher formants above Hz.

In normal singing i. In the focused state after the transition, the energy above Hz becomes narrowly focused in the 1.

However once the transition occurs red triangle in Figure 1 , those values are large upwards of 0. For one of the singers T2 the situation was more complex, as he created multiple focused formants Figure 1 middle panels and Appendix 1—figure 6 , Appendix 1—figure 8. The second focused state was not explicitly dependent upon the first: The first focused state clearly moves and transitions between approximately 1.

Thus the focused states are not harmonically related. Unlike the other singers, T2 not only has a second focused state, but also had more energy in the higher overtones Figure 1. As such, singer T2 also exhibited a different e R time course, which took on values that could be relatively large even prior to the transition.

This may be because he took multiple ways to approach the transition into a focused state e. Plotting spectra around the transition from normal to biphonation singing in a waterfall plot indicates that the sharp focused filter is achieved by merging two broader formants together F 2 and F 3 in Figure 2 ; Kob, The superimposed arrows are color-coded to help visualize how the formants change about the transition, chiefly with F3 shifting to merge with F2.

This plot also indicates the second focused state centered just above 3 kHz is a sharpened F4 formant. While we can infer the shape of the formants in Khoomei by examining audio recordings, such analysis is not conclusive in explaining the mechanism used to achieve these formants.

The working hypothesis was that vocal tract shape determines these formants. Therefore, it was crucial to examine the shape and dynamics of the vocal tract to determine whether the acoustic measurements are consistent with this hypothesis. To accomplish this, we obtained MRI data from one of the singers T2 that are unique in two regards.

First, there are two types of MRI data reported here: steady-state volumetric data Figure 3 and Appendix 1—figure 18 and dynamic midsagittal images at several frames per second that capture changes in vocal tract position Figure 4A—B and Appendix 1—figure Second is that the dynamic data allow us to examine vocal tract changes as song transitions into a focused state e.

Airspaces were determined manually green areas behind tongue tip, red for beyond. The shadow from the dental post is visible in the axial view on the left hand side and stops near the midline leaving that view relatively unaffected. B Reconstructed airspace of the vocal tract from four different perspectives. The red circle highlights the presence of the piriform sinuses Dang and Honda, A 2D measurement of tract shape.

The inner and outer profiles were manually traced, whereas the centerline white dots was found with an iterative bisection technique. The distance from the inner to outer profile was measured along a line perpendicular to each point on the centerline thin white lines. B Collection of cross-distance measurements plotted as a function of distance from the glottis. Area function can be computed directly from these values and is derived by assuming the cross-distances to be equivalent diameters of circular cross-sections see Materials and methods.

C Schematic indicating associated modeling assumptions, including vocal tract configuration as in panel B adapted from Bunton et al. D Model frequency response calculated from the associated area function stemming from panels B and C. Each labeled peak can be considered a formant frequency and the dashed circle indicates merging of formants F2 and F3. The human vocal tract begins at the vocal folds and ends at the lips.

Airflow produced by the vocal cords sets the air-column in the tract into vibration, and its acoustics determine the sound that emanates from the mouth. The vocal tract is effectively a tube-like cavity whose shape can be altered by several articulators: the jaw, lips, tongue, velum, epiglottis, larynx and trachea Figure 4C. Producing speech or song requires that the shape of the vocal tract, and hence its acoustics, are precisely controlled Story, Several salient aspects of the vocal tract during the production of Khoomei can be observed in the volumetric MRI data.

The most important feature however, is that there are two distinct and relevant constrictions when in the focused state, corresponding roughly to the uvula and alveolar ridge. Additionally, the vocal tract is expanded in the region just anterior to the alveolar ridge Figure 4A. The retroflex position of the tongue tip and blade produces a constriction at 14 cm, and also results in the opening of this sublingual space.

It is the degree of constriction at these two locations that is hypothesized to be the primary mechanism for creating and controlling the frequency at which the formant is focused. Having established that the shape of vocal tract during Khoomei does indeed have two constrictions, consistent with observations from other groups, the primary goals of our modeling efforts were to use the dynamic MRI data as morphological benchmarks and capture the merging of formants to create the focused states as well as the dynamic transitions into them.

Here, the vibrating vocals folds act as the broadband sound source with the f 0 and associated overtone cascade , while resonances of the vocal tract, considered as a series of 1-D concatenated tubes of variable uniform radius, act as a primary filter. We begin with a first order assumption that the system behaves linearly, which allows us for a simple multiplicative relationship between the source and filter in the spectral domain e.

Acoustic characteristics of the vocal tract can be captured by transforming the three-dimensional configuration Figure 3 into a tube with variation in its cross-sectional area from the glottis to the lips Figure 4 and Figure 5. This representation of the vocal tract shape is called an area function , and allows for calculation of the corresponding frequency response function from which the formant frequencies can be determined with a one-dimensional wave propagation algorithm.

Although the area function can be obtained directly from a 3D vocal tract reconstruction e. Instead, a cross-sectional area function was measured from the midsagittal slice of the 3D image set see Materials and methods and Appendix for details. Thus, the MRI data provided crucial bounds for model parameters: the locations of primary constrictions and thereby the associated area functions.

A The frames from dynamic MRI with red and blue dashed circles highlighting the location of the key vocal tract constrictions. B Model-based vocal tract shapes stemming from the MRI data, including both the associated area functions top inset and frequency response functions bottom inset. C O indicates the constriction near the alveolar ridge while C P the constriction near the uvula in the upper pharynx.

C Waveform and corresponding spectrogram of audio from singer T2 a spectrogram from the model is shown in Appendix 1—figure Note that the merged formants lie atop either the 7th overtone i. The frequency response functions derived from the above static volumetric MRI data e. Clearly, if F 2 and F 3 could be driven closer together in frequency, they would merge and form a single formant with unusually high amplitude. We hypothesize that this mechanism could be useful for effectively amplifying a specific overtone, such that it becomes a prominent acoustic feature in the sound produced by a singer, specifically the high frequency component of Khoomei.

Next, we used the model in conjunction with time-resolved MRI data to investigate how the degree of constriction and expansion at different locations along the vocal tract axis could be a mechanism for controlling the transition from normal to overtone singing and the pitch while in the focused state. These results are summarized in Figure 5 further details are in the Appendix.

While the singers are in the normal song mode, there are no obvious strong constrictions in their vocal tracts e. After they transition, in each MRI from the focused state, we observe a strong constriction near the alveolar ridge. We also observe a constriction near the uvula in the upper pharynx, but the degree of constriction here varies. If we examine the simultaneous audio recordings, we find that variations in this constriction are co-variant with the frequency of the focused formant.

From this, we surmise that the mechanism for controlling the enhancement of voice harmonics is the degree of constriction near the alveolar ridge in the oral cavity labeled C O in Figure 5 , which affects the proximity of F 2 and F 3 to each other Appendix 1—figure Additionally, the degree of constriction near the uvula in the upper pharynx C P controls the actual frequency at which F 2 and F 3 converge Appendix 1—figure Other parts of the vocal tract, specifically the expansion anterior to C O , may also contribute since they also show small co-variations with the focused formant frequency Appendix 1—figure Taken together, the model confirms and explains how these articulatory changes give rise to the observed acoustic effects.

This study has shown that Tuvan singers performing Sygyt-style Khoomei exercise precise control of the vocal tract to effectively merge multiple formants together. They morph their vocal tracts so to create a sustained focused state that effectively filters an underlying stable array of overtones. Some singers are even capable of producing additional foci at higher frequencies. Below, we argue that a linear framework i. That is, since the filter characteristics are highly sensitive to vocal tract geometry, precise biomechanical motor control of the singers is sufficient to achieve a focused state without invoking nonlinearities or a second source as found in other vocalization types e.

Lastly, we describe several considerations associated with how focused overtone song produces such a salient percept by virtue of a pitch decoherence. The notion of a focused state is mostly consistent with vocal tract filter-based explanations for biphonation in previous studies e. Further, the present data appear in several ways inconsistent with conclusions from previous studies of Khoomei, especially those that center on effects that arise from changes in the source.

Three salient examples are highlighted. First, we observed overtone structure to be highly stable, though some vibrato may be present. Second, a single sharply defined harmonic alone is not sufficient to get the salient perception of a focused state, as had been suggested by Levin and Edgerton Pressed phonation, also referred to as ventricular voice, occurs when glottal flow is affected by virtue of tightening the laryngeal muscles such that the ventricular folds are brought into vibration.

This has the perceptual effect of adding a degree of roughness to the voice sound Lindestad et al. There, a harmonic at 1. It is not until the cluster of overtones at 3—3. Third, we do not observe subharmonics, which contrasts a prior claim Lindestad et al. An underlying biophysical question is whether focused overtone song arises from inherently linear or nonlinear processes.

Given that Khoomei consists of the voicing of two or more pitches at once and exhibits dramatic and fast transitions from normal singing to biphonation, nonlinear phenomena may seem like an obvious candidate Herzel and Reuter, It should be noted that Herzel and Reuter go so far to define biphonation explicitly through the lens of nonlinearity. We relax such a definition and argue for a perceptual basis for delineating the boundaries of biphonation. Certain frog species exhibit biphonation, and it has been suggested that their vocalizations can arise from complex nonlinear oscillatory regimes of separate elastically coupled masses Suthers et al.

Further, the appearance of abrupt changes in physiological systems as seen in Figure 1 has been argued to be a flag for nonlinear mechanisms Goldberger et al. Our results present two lines of evidence that argue against Sygyt-style Khoomei arising primarily from a nonlinear process.

First, the underlying harmonic structure of the vocal fold source appears highly stable through the transition into the focused state Figure 1.

There is little evidence of subharmonics. A source spectral structure that is comprised of an f 0 and integral harmonics would suggest a primarily linear source mechanism. Second is that our modeling efforts, which are chiefly linear in nature, reasonably account for the sudden and salient transition. That is, the model is readily sufficient to capture the characteristic that small changes in the vocal tract can produce large changes in the filter.

Thereby, precise and fast motor control of the articulators in a linear framework accounts for the transitions into and out of the focused state. Thus, in essence, Sygyt-style Khoomei could be considered a linear means to achieve biphonation. Nevertheless, features that appear transiently in spectrograms do provide hints of source nonlinearity, such as the brief appearance of subharmonics in some instances Appendix 1—figure 15B. This provides an opportunity to address the limitations of the current modeling efforts and to highlight future considerations.

We suggest that further analysis e. Several potential areas for improvement are: nonlinear source—filter coupling Titze et al.

Although this study did not directly assess the percept associated with these vocal productions, the results raise pressing questions about how the spectro-temporal signatures of biphonic Khoomei described here create the classical perception of Sygyt-style Khoomei as two distinct sounds Aksenov, The first, the low-pitched drone, which is present during both the normal singing and the focused-state biphonation intervals, reflects the pitch associated with f 0 , extracted from the harmonic representation of the stimulus.

The frequency resolution of the peripheral auditory system is such that these low-order harmonics are individually resolved by the cochlea, and it appears that such filtering is an important prerequisite for pitch extraction associated with that common f 0. The second sound, the high-pitched melody, is present only during the focused-state intervals and probably reflects a pitch associated with the focused formant. An open question, however, is why this focused formant would be perceived incoherently as a separate pitch Shamma et al.

The auditory system tends to group together concurrent harmonics into a single perceived object with a common pitch Roberts et al. The fact that the focused formant is so narrow apparently leads the auditory system to interpret this sound as if it were a separate tone, independent of the low harmonics associated with the drone percept, thereby effectively leading to a pitch decoherence.

This perceptual separation could be attributable to a combination of both bottom-up i. From the bottom-up standpoint, even if the focused formant is broad enough to encompass several harmonic components, the fact that it consists of harmonics at or above 10 f 0 i. Instead, the formant will be represented as a single spectral peak, similar to the representation of a single pure tone at the formant frequency. Although the interaction of harmonic components at this cochlear location will generate amplitude modulation at a rate equal to the f 0 Plack and Oxenham, , it has been argued that a common f 0 is a weak cue for binding low- and high-frequency formants Culling and Darwin, Rather, other top-down mechanisms of auditory-object formation may play a more important role in generating a perception of two separate objects in Khoomei.

For example, the rapid onsets of the focused formant may enhance its perceptual separation from the constant drone Darwin, Further, the fact that the focused formant has a variable frequency i. Although it has been argued that FM differences between harmonic sounds generally have little influence on their perceived separation Darwin, , others have reported enhanced separation in the special case in which one complex was static and the other had applied FM Summerfield and Culling, — similar to the first and second formants during the Tuvan focused state.

The perceptual separation of the two sounds in the Tuvan song might be further affected by a priori expectations about the spectral qualities of vocal formants Billig et al. Because a narrow formant occurs so rarely in natural singing and speech, the auditory system might be pre-disposed against perceiving it as a phonetic element, limiting its perceptual integration with the other existing formants.

When three or four individual frequency-modulated sinusoids are presented at formant frequencies in lieu of natural formants, listeners can, with sufficient training, perceive the combination as speech Remez et al. Nevertheless, listeners largely perceive these unnatural individual pure tones as separate auditory objects Remez et al.

Further research exploring these considerations would help close the production—perception circle underlying the unique percept arising from Tuvan throat song. A sample rate of 96 kHz was used. Spectral analysis was done using custom-coded software in Matlab. Harmonics black circles in Figure 1 were estimated using a custom-coded peak-picking algorithm. This can be readily computed from the spectrogram data as follows.

Then integrate across frequency, first for a limited range spanning [ f L , f H ] e. The ratio of these two is then defined as e R , and takes on values between 0 and 1. This can be expressed more explicitly as:. Values of e R for the waveforms used in Figure 1 are shown in Appendix 1—figures 2 and 3. The participant was fitted with an MRI compatible noise-cancelling microphone Optoacoustics, Mazor, Israel mounted directly above the lips.

The latency of the microphone and noise-cancelling algorithm was 24 ms.



0コメント

  • 1000 / 1000