Playing with sounds – Nao is a parrot
This post is about beginner’s work on DSP, Digital Sound Processing. We describe how to change the frequency of a wav file, then how to use it to let a robot repeat what you say but quickly, like a parrot .
First step : read a wav file.
I record a small sentence ( with audacity ) then export it in a wav file to test.
I read it with a pyaudio python code example :
[code language="python"] import pyaudio import wave import sys # for my test my_file="PierreVoice.wav" wf = wave.open(my_file, 'rb') print (wf.getsampwidth()) # 2 print (wf.getnchannels()) # 2 print (wf.getframerate()) # 48008 # instantiate PyAudio p = pyaudio.PyAudio() # open stream stream = p.open(format=p.get_format_from_width(wf.getsampwidth()), channels=wf.getnchannels(), rate=wf.getframerate(), output=True) # define chunk size CHUNK = 1024 # read data data = wf.readframes(CHUNK) # play stream while len(data) > 0: stream.write(data) # read next chunk data = wf.readframes(CHUNK) # stop stream stream.stop_stream() stream.close() # close PyAudio p.terminate()
Important things are : Frequency , here 48000Hz (framerate with 8 bytes more from python) , and channels, here 2 : left and right.
When we open the output stream, we set these values from original file in order to have the right restitution.
A voice beyond the grave
Channels are interleaved and the player takes values at the good frequency for each channel. What happens if we cheat the player saying it that there is only one channel :
[code] open stream stream = p.open(format=p.get_format_from_width(wf.getsampwidth()), # channels=wf.getnchannels() channels = 1, rate=wf.getframerate(), output=True) [/code]
The player suppose that all values are for the unique channel, so it has twice more data than natural as each point is repeated : ( left, right) goes to (unique, unique). So voice speed is reduced by two and you can hear a loud and drawling voice.
A voice like a duck
Now, we can also cheat the player telling it that the record was done with a faster rate. For example twice the real one.
[code] stream = p.open(format=p.get_format_from_width(wf.getsampwidth()), channels=wf.getnchannels(), #rate=wf.getframerate(), rate = 96000, output=True) [/code]
Now, the voice is very speedy and looks like a duck voice.
I find it is a bit quick to understand the sentence. I remember in the old days we played the 45 rpm vinyl as 78 rpm. Ratio is 78/45 , applied to 48000Hz -> 83148 Kz. Try it with 84000 (funny) :
[code] rate = 84000, [code]
Second step : patch a wav file
Previous step is ok as long as we read the file with our own program.
If we need the same effect with any standard player, we must change the frequency inside the wav file.
Wikipedia Wav description
Wav format is well documented in wikipedia
I use Notepad++ with an hex plugin to observe the beginning of the file.
(caution : hex plugin bugs with large file . Better to use Ultraedit if you have ) :
Find the original frequency in hex value
You can see that the frequency is here at location 018h and following : 80 bb 00 00 in hexadecimal (see the red square.)
Take care of LSB format of numbers
One must take care of binary format : For a long on four bytes , the order of bytes is : 4, 3, 2, 1 : least significant byte ( LSB) first .
The value (1234) must be read upside down. Here : 00 00 bb 80 Hex
00 00 bb 80 hex is 48000 decimal . OK , that’s what is expected, 48Khz.
Patch the frequency value in wav file
It’s easy to calculate a new frequency more speedy.
Let’s say we want to put it at 84000 Hz. In hex this is 14820h . (i use a calculator in dev mode) .
To have the right sequence, we do the job for LSBs : 00 01 48 20 -> 20 48 01 00
There is no more need of pyaudio : only pure python :
[code] # r+ = read/write, b=binary wf = open(my_file, 'r+b') # frequency on 4 (reverse) bytes start at 0x18 wf.seek(0x18) # optional check if ok : must return 80 bb 00 00 for k in range(0,4): freq = wf.read(1) trace = str(k) +" :" + hex(ord(freq)) print (trace) # set to 84000Hz 20 48 01 00 freqNew = chr(0x20)+chr(0x48)+chr(0x01)+chr(0x00) # write data at the right place wf.seek(0x18) wf.write(freqNew) wf.close() [/code]
Remember that you destroy previous value. If you want to restore it, note it before, then set again the original frequency :
freqNew = chr(0x88)+chr(0xbb)+chr(0x00)+chr(0x00)
Let Nao do the parrot
Record and play with choregraphe
My Nao is not young: I have a body V2 and a head V3. So i must stay on 2.1.4 version.
The simplest project
We set first a Say box to ask for some interaction.
Then a record sound box with the option of a Temporary storage and a 4 channels Wav format.
Then a Play sound file Box that we can chain with the temporary name generated by the record box.
Caution : this is not the play sound box, but a secondary box inside. The play sound allows to parameter the file name which is given to a play sound file box. We need to push the file name, not to set it in a parameter.
To get the right box : Expand a play sound box, then copy the play sound file box at the root.
Check the behavior on robot.
Nao must say ‘Hello, tell me something’.
Then what you say is recorded in a temporary wav file.
Then after some seconds (parameter of the wait inside the record box ) , recording stops and Nao plays the recorded sound : you can hear your own voice ( and some noise from fan out …)
Let Nao repeat like a duck
Now it’s time to apply our Wav patch : add a script box in the flow and connect it ::
This box receive the temporary file name on input, then apply the patch we discuss previously :
[code] def onInput_onStart(self, p_file): # log temporary file name.( if needed to get it with Filezilla ) self.logger.info("file :" + p_file) # r+ = read/write, b=binary wf = open(p_file, 'r+b') # frequency on 4 (reverse) bytes start at 0x18 wf.seek(0x18) freq = wf.read(4) # optional check if ok :here returns 80 bb 00 00 for k in range(0, 4): self.logger.info(hex(ord(freq[k]))) # set to 84000Hz 20 48 01 00 freqNew = chr(0x20) + chr(0x48) + chr(0x01) + chr(0x00) # write at right place wf.seek(0x18) wf.write(freqNew) wf.close() self.logger.info("file closed") self.onStopped(p_file) [/code]
Now everything you say is repeated quickly with a funny voice. Enjoy it with children !
A little enhancement
Better to set the frequency directly in a box parameter. I add a ‘khz’ parameter to the script box and take it in account in the code :
( here 24Khz accelerate if you choose a 16Khz – one channel record mode. For 48khz, set 84 as previously )
... # get the KILO hz parameter , put it in hz and hex khz = self.getParameter("khz") shz = hex(khz * 1000) # avoid to be too short and skip '0x' at the beginning of hex shz = "00000000"+shz[2:] # now : 0000000014820 :take in reverse order 20 48 01 00 and convert it freqNew = chr(int(shz[-2:],16)) freqNew += chr(int(shz[-4:-2],16)) freqNew += chr(int(shz[-6:-4],16)) freqNew += chr(int(shz[-8:-6],16)) # write at right place ...
Some strange things (for geeks)
Wrong order of logs
You can see in previous code the loop over self.logger.info(hex(ord(freq[k]))) that returns 80 bb 00 00. It happened to me, in the very beginning of a new session, that the log shows : bb 00 80 00. I download the file from robot: it was ok. So i think that the order of the log events is not always guarantee. If you are order dependant, like here, it’s better to number by yourself your logs.
Wrong documentation in Nao’s Alaudio.
When you choose a format in the record box, you have a choice between a WAV file and a OGG file.
The documentation says that wav file is 4 channels, 48 khz (right) and the Ogg file is one (front) channel 16Khz.
I spent some hours trying to find a good OGG documentation to see where is set the frequency, don’t find … Then i download the OGG file from robot ( with Filezilla, on a ssh/ftp connection in order to see \ ) , look at it in hex and find that :
Despite the ogg file option, the record sound box in python create a WAV File named .ogg !
So the behavior we design works well with 4 channels 48Khz named Wav and also with the 1 channel 16khz named abusively ogg.
Patch a wav file is the simplest way to create a funny parrot that repeat what you say. It can be used quite everywhere as shown with Nao, as it leaves the audio input and output as is.
But the sound is not good due to noise of the robot itself.
To deal with the collected sound, we must work on the data itself.
This will be another DSP beginners chapter .