This is the module 1 lab worksheet annotated with comments and answers (generally in red)
You don’t need to submit anything for this lab
This lab is really about exploring articulation and having a go at using Praat. You don’t need to learn everything about Praat, though there are some resources linked below if you want to find out more. Similarly, our goal isn’t to do close phonetic transcription here, but rather to have some practice linking articulation to the phonetic terms and the IPA.
After opening Praat you should be able to see two windows:
Download the following sound file to your computer: seashells.wav
Open the sound in Praat:
Open
on the Objects windows to show a drop down menuRead from file...
open
You should now see a line in the Objects list called: Sound seashells.
Praat will allow you to do many things with this new Sound object , but for now let’s just open it and have a listen.
View & Edit
You should see a new window that looks like this:
The horizontal axis shows different points in time. The panels show different representations of the recording.
The bottom three bars give some play back controls, clicking on the bar labelled:
Total duration
: will play the entire recordingVisible part
: will just play the bit you can see (e.g., if you’ve zoomed in)On the bottom left corner you’ll see some buttons (all
, in
, out
, sel
, bak
). You can play with them to zoom in and out. You can move around time but scrolling left or right with your mouse.
Clicking on different spots will show you different information about the audio at that point in time. For example, in the image above we see red vertical and a horizontal dashed lines.
If you click on the spectrogram panel, you’ll also get a red horizontal line that gives you the coordinates of the point you clicked on in the spectrogram where the x-axis (horizontal) is time, and the y-axis (vertical) is frequency (in Hertz). We won’t get into it this week, but it can be helpful for measuring different properties of the spectrogram by hand.
You may see a blue bar superimposed on the spectrogram (as in the screenshot above). This represents the estimated pitch at a point in time (more accurately: Fundamental Frequency - we’ll talk about this more in Module 2!). This is estimated using a different algorithm from the spectrogram so the fact that you see it here is really just a design choice from the makers of this software.
You can turn the pitch track on and off by clicking on the Pitch
menu at the top of the window and checking/unchecking Show Pitch
. The default method used here is “filtered autocorrelation” which you can see from the check mark in the menu.
Automation Warning: If the pitch tracking is good (as it is in the example above) you should be able to see a relatively smooth contour that matches your perception of when pitch goes up and down through the speech. Unfortunately, pitch tracking can be quite prone to error. Almost all pitch trackers are sensitive to the range settings (i.e. expected minimum and maximum pitch values in Hertz). If the expected range is doesn’t really match the speaker’s actually range you can get errors like octave doubling and halving. You will also get errors if the phonation is “non-modal”, e.g. creaky or breathy. Sometimes data driven studies don’t bother to check this and then end up with spurious results.
To change the range settings click on Pitch settings
from the Pitch
menu. The default range for Praat (50-800Hz) is ok, but you can often do better if you tweak this (e.g., see this paper).
Another common overlay is the Intensity estimate (green or yellow). You can turn this on by going to the Intensity
Menu and clicking on Show Intensity
. This will essentially give you a measure of the loudness in time (based on the amplitude of the wave). You should see that the peak structure in this contour broadly represent syllables in the speech.
There are many other menus at the top of the window that will show other overlays (e.g. Formants
, Pulses
). Feel free to click around and see what they do. You can even use the functions in the Edit
to cut and paste speech segments!
For the moment, we will just press on and learn what we need to as we go, so we can get started analysing some speech. If you want to learn more about Praat, there are several tutorials linked from the Praat website (which also hosts a lot of documentation): tutorials page. You may also find this video based guide by Richard Ogden (University of York) helpful: video guide.
Now that we’ve got the basics of Praat, let’s go back to thinking about speech articulation. Specifically, we’re going to use Praat to visualise and analyse what’s going on in some tongue twisters! These are phrases that are difficult to say properly. Thinking about why they are difficult to articulate will hopefully help better understand difference in place and manner of articulation.
Let’s start with some classic English ones, recorded with fast and slow speaking rates:
This one is, of course, hte same tongue twister as the one we looked at above by spoken by a different speaker - can you tell just by looking at the waveform or spectrogram?
You should be able to see some differences in the following two visualisations: the first one is Catherine’s recording, the second one is the link above (slow), recorded by Simon King. You should see similarities in the spectrogram but they aren’t completely the same. Part of the difference hear is the difference in their voices, but some of this is also just recording conditions (i.e., noise). The waveforms also look different, but you can’t really tell that one recording is by another speaker just by looking at it. Various other factors could be changing the waveform and spectrogram.
</span>
Please note, for this lab it really doesn’t matter if you can say these correctly! In fact, errors will probably be more useful!
Task: Before we start analysing these in Praat, try saying each of these phrases out aloud. You may wish to take turns with the people next to you in the lab and then discuss the following questions (but it’s totally fine to do this on your own).
Questions
What parts of these phrases are difficult to say? What words tend be be said incorrectly? Are there specific phones that are difficult?
This part of the lab is really to get you thinking about your articulators, but here’s some observations (which will be repeated in the later tasks)
Seashells: the difficult here is usually mixing up “she” and “sea”, i.e. the place of articulation of the syllable initial fricative. I tend to say “by the she shore”. The difference in place between “s” [s] and “sh” [ʃ] is quite small, so it’s easy to hit the wrong target.
Peter piper: In this case you have alternation between “p” (1st syllable) and other oral stops “t”, “p”, “k” (second syllables). The vowel variation can also trip people up - the first 3 words have high front vowels in the first syllable, you then sort of expect the last three to be lower front [e], but you also have the “pickled” in there. The addition of the “l” in “pickled” also trips me up.
benevolent elephants: I always want to say “benelovent elephants”. This seems to be a planning/rhyme thing: You have “seventy seven be-“ but then “-nevolent elephants”, which breaks the pattern of the first two onsets pattern matching in place ([n]-[l], vs [l]-[f]). You probably get priming for the “elephants” pattern because it’s a more common word.
</span>
Do you need to speak slower than you usually would to say these correctly?
</span>
What happens when you try to say them faster?
We’ll do some analysis on these one by one.
Some more practice with Praat. This time adding TextGrids for annotations.
</span>
Download and open one of the recordings of “She sells sea shells by the sea shore”. In the following, I’ll just use the first example (seashells.wav
, spoken by Catherine) but you can use one of the others (spoken by Simon) if you prefer.
A big reason Praat is so popular with phoneticians is that it’s convenient for annotation. Let’s add some textgrids to do annotations now.
Sound seashells
object in the Objects windowAnnotation
button to the rightTo TextGrid...
You should see a little popup window named Sound: To TextGrid which you can use to set the annotation parameters. Edit the parameters there as follows:
All Tier Names
: delete “Mary John bell” and replace it with “Phone Word Errors”Which of these are point tiers
: write ErrorsOk
You should now see a new TextGrid seashells
object in the Objects window.
Sound seashells
and TextGrid seashells
objects so both are highlighted and then click on View & edit
.You should now see the sound viewer with the waveform and spectrogram up top, but now also 3 blank annotation tiers: Phone
, Word
, and Errors
. The first two are interval tiers, while the last is a point tier. As the name suggests, we use interval tiers to annotate spans of time (intervals!), and point tiers to annotate specific points in time. The choice to make the Errors
tier a point tier here is a bit abitrary and just for illustrating what you can do with Praat.
Toggling the IPA symbol selector: You’ll probably see a large table of IPA symbols on the right of the viewer. You can use this to add IPA symbols into annotations, but it takes up a lot of space. So, for the moment, let’s just hide this by clicking the pink crossed boxed at the top right of this. You should see it turns to a pink triangle - clicking on this will show the IPA symbol table again.
You should see a vertical line with some circles on the text tiers.
Word
tier to make a boundary. You should now see a vertical red line on the Word tier.Some things to note:
Question: When you are transcribing speech how to do you know whether “sea” and “shells” are should transcribed as separate words or as a compound word (“seashells”)?
The goal here is to break down articulations in terms of manner and place (hence the annotation task). We also start to see the relationship between speech sounds, i.e., phones, the waveform and the spectrogram. Even without much knowledge of what a spectrogram is, you should be able to see that there are some consistent patterns associated with specific types of speech sounds.
</span>
The tricksy bit of this tongue twister is the syllable initial consonants (aka syllable onsets). Let’s see what’s going on by annotating the first phone in each syllable for place and manner, in the phone tier. You may find it useful to say the phrase yourself and to determine what your articulators are doing.
Add boundaries for the start and end of the first phone in each of the syllables in the recording.
Using the IPA chart, annotate each the syllable initial phone interval with:
The interval box itself will be too small to see the full anotation, but you can see and edit the full thing up the top of the window. Here’s the first one as an example (re-expanding the IPA symbol selector):
Let’s now look at the pattern of movement for vowels. Again, you may find it useful to say the phrase yourself and to determine what your articulators are doing.
Error
tier.
Questions:
Even without any training in spectrogram interpretation (i.e. acoustic phonetics), you should be able to see that fricatives are quite distinctive from other consonants in the spectrogram!
Questions:
The recording sounds pretty clear if you just listen to it. The spectrogram is a good example of how reduced some vowels can get! The stops are fairly clear.
Usually we’d mark the start of each stop at the point of closure (when the tongue or lips actually stops air going through your mouth). But it’s not that clear here where the stop segments should start and end because he’s speaking at such a slow rate! In this case a little bit arbitrary. If you were doing this for a fine-grained phonetics analysis you would need to make some decisions on how to be consistent with this. For automatic word transcription, we’ll see that we don’t need to be too worried about what the exact boundary is as long as it is consistent.
</span>
Tongue-twister: “Peter Piper picked a peck of pickled peppers”
We see this in the spectrogram a light/grey colored section (closure) followed by a dark vertical line (burst). After the burst there is often a period of aspiration: turbulent air flow from the burst. In the spectrogram this will look like a fading away of the burst. In module 2 we’ll see that we can guess which plosive from the spectrogram, often by the way it affects the spectrum of the following vowel.
Now let’s try recording a tongue twister yourself and analysing it. If you speak a language other than English, you might like to try one in another language.
After recording yourself in Praat (see instructions below), try to identify the articulation patterns that cause difficulty. Again, think about whether the confusions/errors that arise are in terms of placement of articulators. Describe this in terms of voicing, place and manner for consonants. For vowels, think about tongue frontness, height, and rounding. We’ve focused mostly on consonants in this lab, but don’t worry we’ll do more on vowels next week.
This is part of the lab is more of an extension and that you can have a go at recording yourself. I think it’s also fun to see that tongue twisters are something common across languages!
I wouldn’t expect full analyses of these examples. The main thing is to think about what your articulators are doing if you try to say some of these.
If you do the recording in a noisy environment, you’ll see that the noise makes the spectrogram less clear.
</span>
Some more English example:
You can find many more on the internet!
And for inspiration, here are a some tongue twisters in other languages offered up by members of the Centre for Speech Technology Research (including the lab tutors):
You can also find many others linked in the description of this video by Hank Green: Tongue twisters. This also has a nice discussion of why tongue twisters are hard!
New
in the Praat Objects windowThere are 3 main parameters you can change:
untitled
to whatever you’d like it to be.Record
to start recording and Stop
to stop the recording.When you start speaking you’ll see some colours appear in the the Meter box. This will tell you if the sound level is at an appropriate level. If you see some green movement, you should fine. If you see the meter go into yellow up into red, the sound is too loud to capture faithfully and you likely get distortion in the recording. This usually happens if your microphone volume is too high and/or you’re too close to the microphone (we’ll come back to this in module 3).
The following shows the meter going into the red (produced by clapping several times with the microphone set to high volume):
play
button. When you’re happy with it click Save to list & Close
.You should now see a new Sound
object in the Objects window with the name you gave your recording.
As for the other examples, add a TextGrid for annotations and inspect the audio.
The goal of this lab was to get you thinking about how people create speech using actual physical articulators in our vocal tracts. Tongue twisters show that this process, between thinking and speaking, is actually very complicated.
Speaking is also constrained by the physicality of our actual articulators and respitory systems. It’s actually proven very hard to reproduce human speech in purely physical models. You can get an idea of how difficult this problem is by looking at the work from the lab of Prof. Takayuki Arai (Sophia University, Japan). See this recent paper, for example:
There are several other very interesting demos on the lab’s youtube page: Acoustic-phonetics demonstrations
You’re probably now getting to understand why most humanoid robots don’t even attempt include vocal tracts! Instead, we generally synthesize speech waveforms using non-physical means on computers and play them out some speakers. To do this we’ll need to understand how we can “see speech” just from the waveform: i.e., acoustic phonetics. This is the focus of module 2.