-
Notifications
You must be signed in to change notification settings - Fork 52
How to extract the phonemes? #11
Copy link
Copy link
Open
Description
Unfortunately your reference concerning phonemes does not provide a reference other than the link to CMU Sphinx.
I did a bit of research and ended up with the following code:
def create_phoneme(audio_wave_file):
with wave.open(audio_wave_file, "rb") as audio:
decoder = Decoder(samprate=audio.getframerate(), allphone=ps.get_model_path("en-us/en-us-phone.lm.bin"))
decoder.start_utt()
decoder.process_raw(audio.getfp().read(), full_utt=True)
decoder.end_utt()
input_phoneme_list = []
if decoder.hyp():
segments = decoder.seg()
for seg in segments:
input_phoneme_list.append({'phone': seg.word, 'phone_end_frame': seg.end_frame})
else:
raise Exception('Phoneme recognition failed')
total_number_of_frames_in_audio = int(input_phoneme_list[-1]['phone_end_frame'] / 100 * ASSUMED_FRAME_RATE)
print(total_number_of_frames_in_audio)
frame_index = 0
phone_list = []
phone_index = 0
while frame_index < total_number_of_frames_in_audio:
if (frame_index * 100 / ASSUMED_FRAME_RATE) < input_phoneme_list[phone_index]['phone_end_frame']:
phone_list.append(input_phoneme_list[phone_index]['phone'])
frame_index += 1
else:
phone_index += 1
with open(str("phindex.json")) as f:
ph2index = json.load(f)
phonemes = []
for p in phone_list:
if p in ph2index:
phonemes.append(ph2index[p])
else:
print(f"Weird Phoneme found: {p}. Ignoring...")
phonemes.append(31) # Silence
phone_list = phonemes
print("Phoneme generation done")
return phone_list
I'm using the phindex.json file from https://github.com/FuxiVirtualHuman/AAAI22-one-shot-talking-face/blob/main/phindex.json and a ASSUMED_FRAME_RATE of 30 (this seems to match the number of phonemes you have in the samples rather than 25 as referenced in the papers).
However my phonemes look a lot different as compared to your samples for the sample wave files. What am I doing wrong?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels