You don’t need to submit anything for this lab
After opening Praat you should be able to see two windows:
Download the following sound file to your computer: seashells.wav
Open the sound in Praat:
Open
on the Objects windows to show a drop down menuRead from file...
open
You should now see a line in the Objects list called: Sound seashells.
Praat will allow you to do many things with this new Sound object , but for now let’s just open it and have a listen.
View & Edit
You should see a new window that looks like this:
The horizontal axis shows different points in time. The panels show different representations of the recording.
The bottom three bars give some play back controls, clicking on the bar labelled:
Total duration
: will play the entire recordingVisible part
: will just play the bit you can see (e.g., if you’ve zoomed in)On the bottom left corner you’ll see some buttons (all
, in
, out
, sel
, bak
). You can play with them to zoom in and out. You can move around time but scrolling left or right with your mouse.
Clicking on different spots will show you different information about the audio at that point in time. For example, in the image above we see red vertical and a horizontal dashed lines.
If you click on the spectrogram panel, you’ll also get a red horizontal line that gives you the coordinates of the point you clicked on in the spectrogram where the x-axis (horizontal) is time, and the y-axis (vertical) is frequency (in Hertz). We won’t get into it this week, but it can be helpful for measuring different properties of the spectrogram by hand.
You may see a blue bar superimposed on the spectrogram (as in the screenshot above). This represents the estimated pitch at a point in time (more accurately: Fundamental Frequency - we’ll talk about this more in Module 2!). This is estimated using a different algorithm from the spectrogram so the fact that you see it here is really just a design choice from the makers of this software.
You can turn the pitch track on and off by clicking on the Pitch
menu at the top of the window and checking/unchecking Show Pitch
. The default method used here is “filtered autocorrelation” which you can see from the check mark in the menu.
Automation Warning: If the pitch tracking is good (as it is in the example above) you should be able to see a relatively smooth contour that matches your perception of when pitch goes up and down through the speech. Unfortunately, pitch tracking can be quite prone to error. Almost all pitch trackers are sensitive to the range settings (i.e. expected minimum and maximum pitch values in Hertz). If the expected range is doesn’t really match the speaker’s actually range you can get errors like octave doubling and halving. You will also get errors if the phonation is “non-modal”, e.g. creaky or breathy. Sometimes data driven studies don’t bother to check this and then end up with spurious results.
To change the range settings click on Pitch settings
from the Pitch
menu. The default range for Praat (50-800Hz) is ok, but you can often do better if you tweak this (e.g., see this paper).
Another common overlay is the Intensity estimate (green or yellow). You can turn this on by going to the Intensity
Menu and clicking on Show Intensity
. This will essentially give you a measure of the loudness in time (based on the amplitude of the wave). You should see that the peak structure in this contour broadly represent syllables in the speech.
There are many other menus at the top of the window that will show other overlays (e.g. Formants
, Pulses
). Feel free to click around and see what they do. You can even use the functions in the Edit
to cut and paste speech segments!
For the moment, we will just press on and learn what we need to as we go, so we can get started analysing some speech. If you want to learn more about Praat, there are several tutorials linked from the Praat website (which also hosts a lot of documentation): tutorials page. You may also find this video based guide by Richard Ogden (University of York) helpful: video guide.
Now that we’ve got the basics of Praat, let’s go back to thinking about speech articulation. Specifically, we’re going to use Praat to visualise and analyse what’s going on in some tongue twisters! These are phrases that are difficult to say properly. Thinking about why they are difficult to articulate will hopefully help better understand difference in place and manner of articulation.
Let’s start with some classic English ones, recorded with fast and slow speaking rates:
This one is, of course, hte same tongue twister as the one we looked at above by spoken by a different speaker - can you tell just by looking at the waveform or spectrogram?
Please note, for this lab it really doesn’t matter if you can say these correctly! In fact, errors will probably be more useful!
Task: Before we start analysing these in Praat, try saying each of these phrases out aloud. You may wish to take turns with the people next to you in the lab and then discuss the following questions (but it’s totally fine to do this on your own).
Questions
We’ll do some analysis on these one by one.
Download and open one of the recordings of “She sells sea shells by the sea shore”. In the following I’ll just use the first example (seashells.wav
, spoken by Catherine) but you can use one of the others (spoken by Simon) if you prefer.
A big reason Praat is so popular with phoneticians is that it’s convenient for annotation. Let’s add some textgrids to do annotations now.
Sound seashells
object in the Objects windowAnnotation
button to the rightTo TextGrid...
You should see a little popup window named Sound: To TextGrid which you can use to set the annotation parameters. Edit the parameters there as follows:
All Tier Names
: delete “Mary John bell” and replace it with “Phone Word Errors”Which of these are point tiers
: write ErrorsOk
You should now see a new TextGrid seashells
object in the Objects window.
Sound seashells
and TextGrid seashells
objects so both are highlighted and then click on View & edit
.You should now see the sound viewer with the waveform and spectrogram up top, but now also 3 blank annotation tiers: Phone
, Word
, and Errors
. The first two are interval tiers, while the last is a point tier. As the name suggests, we use interval tiers to annotate spans of time (intervals!), and point tiers to annotate specific points in time. The choice to make the Errors
tier a point tier here is a bit abitrary and just for illustrating what you can do with Praat.
Toggling the IPA symbol selector: You’ll probably see a large table of IPA symbols on the right of the viewer. You can use this to add IPA symbols into annotations, but it takes up a lot of space. So, for the moment, let’s just hide this by clicking the pink crossed boxed at the top right of this. You should see it turns to a pink triangle - clicking on this will show the IPA symbol table again.
You should see a vertical line with some circles on the text tiers.
Word
tier to make a boundary. You should now see a vertical red line on the Word tier.Some things to note:
The tricksy bit of this tongue twister is the syllable initial consonants (aka syllable onsets). Let’s see what’s going on by annotating the first phone in each syllable for place and manner, in the phone tier. You may find it useful to say the phrase yourself and to determine what your articulators are doing.
Add boundaries for the start and end of the first phone in each of the syllables in the recording.
Using the IPA chart, annotate each the syllable initial phone interval with:
The interval box itself will be too small to see the full anotation, but you can see and edit the full thing up the top of the window. Here’s the first one as an example (re-expanding the IPA symbol selector):
Let’s now look at the pattern of movement for vowels. Again, you may find it useful to say the phrase yourself and to determine what your articulators are doing.
Error
tier.
Questions:
Even without any training in spectrogram interpretation (i.e. acoustic phonetics), you should be able to see that fricatives are quite distinctive from other consonants in the spectrogram!
Questions:
Tongue-twister: “Peter Piper picked a peck of pickled peppers”
Now let’s try recording a tongue twister yourself and analysing it. If you speak a language other than English, you might like to try one in another language.
After recording yourself in Praat (see instructions below), try to identify the articulation patterns that cause difficulty. Again, think about whether the confusions/errors that arise are in terms of placement of articulators. Describe this in terms of voicing, place and manner for consonants. For vowels, think about tongue frontness, height, and rounding. We’ve focused mostly on consonants in this lab, but don’t worry we’ll do more on vowels next week.
Some more English example:
You can find many more on the internet!
And for inspiration, here are a some tongue twisters in other languages offered up by members of the Centre for Speech Technology Research (including the lab tutors):
You can also find many others linked in the description of this video by Hank Green: Tongue twisters. This also has a nice discussion of why tongue twisters are hard!
New
in the Praat Objects windowThere are 3 main parameters you can change:
untitled
to whatever you’d like it to be.Record
to start recording and Stop
to stop the recording.When you start speaking you’ll see some colours appear in the the Meter box. This will tell you if the sound level is at an appropriate level. If you see some green movement, you should fine. If you see the meter go into yellow up into red, the sound is too loud to capture faithfully and you likely get distortion in the recording. This usually happens if your microphone volume is too high and/or you’re too close to the microphone (we’ll come back to this in module 3).
The following shows the meter going into the red (produced by clapping several times with the microphone set to high volume):
play
button. When you’re happy with it click Save to list & Close
.You should now see a new Sound
object in the Objects window with the name you gave your recording.
As for the other examples, add a TextGrid for annotations and inspect the audio.
The goal of this lab was to get you thinking about how people create speech using actual physical articulators in our vocal tracts. Tongue twisters show that this process, between thinking and speaking, is actually very complicated.
Speaking is also constrained by the physicality of our actual articulators and respitory systems. It’s actually proven very hard to reproduce human speech in purely physical models. You can get an idea of how difficult this problem is by looking at the work from the lab of Prof. Takayuki Arai (Sophia University, Japan). See this recent paper, for example:
There are several other very interesting demos on the lab’s youtube page: Acoustic-phonetics demonstrations
You’re probably now getting to understand why most humanoid robots don’t even attempt include vocal tracts! Instead, we generally synthesize speech waveforms using non-physical means on computers and play them out some speakers. To do this we’ll need to understand how we can “see speech” just from the waveform: i.e., acoustic phonetics. This is the focus of module 2.