Author Information

Sarah ReynoldsFollow

Is this project an undergraduate, graduate, or faculty project?

Graduate

group

What campus are you from?

Daytona Beach

Authors' Class Standing

Sarah Reynolds, Graduate Student Daniel Machado, Senior

Lead Presenter's Name

Sarah Reynolds

Faculty Mentor Name

Omar Ochoa

Abstract

This poster presents a novel form of analysis of synthetic voices. This initial study uses an analysis method that has been created through a transformer-enabled phonological framework. The study emphasizes the use of phoneme-level analysis to evaluate and improve synthetic voices, an interdisciplinary study of artificial intelligence and linguistics. Using transformer models such as wav2vec for speech-to-text (STT) and DeepPhonemizer for grapheme-to-phoneme conversion, this work analyzes audio samples from synthetic voices generated by OpenAI's text-to-speech (TTS) system. By applying an adapted Wagner-Fischer algorithm to calculate phoneme distances, the study identifies key differences in phoneme accuracy and variability between human and synthetic voices. The results demonstrate that while synthetic voices exhibit lower phoneme variation than human speakers, they offer opportunities for further refinement to enhance naturalness and diversity. This approach not only aids in improving synthetic voice generation but also provides insights into future work that can create more unique and personalized synthetic voices through machine learning and phonological integration.

Did this research project receive funding support from the Office of Undergraduate Research.

Yes, Spark Grant

Share

COinS
 

Phonological Insights for Synthetic Speech: A Transformer-Driven Phoneme Evaluation Approach

This poster presents a novel form of analysis of synthetic voices. This initial study uses an analysis method that has been created through a transformer-enabled phonological framework. The study emphasizes the use of phoneme-level analysis to evaluate and improve synthetic voices, an interdisciplinary study of artificial intelligence and linguistics. Using transformer models such as wav2vec for speech-to-text (STT) and DeepPhonemizer for grapheme-to-phoneme conversion, this work analyzes audio samples from synthetic voices generated by OpenAI's text-to-speech (TTS) system. By applying an adapted Wagner-Fischer algorithm to calculate phoneme distances, the study identifies key differences in phoneme accuracy and variability between human and synthetic voices. The results demonstrate that while synthetic voices exhibit lower phoneme variation than human speakers, they offer opportunities for further refinement to enhance naturalness and diversity. This approach not only aids in improving synthetic voice generation but also provides insights into future work that can create more unique and personalized synthetic voices through machine learning and phonological integration.

 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.