Is this project an undergraduate, graduate, or faculty project?
Graduate
group
What campus are you from?
Daytona Beach
Authors' Class Standing
Sarah Reynolds, Graduate Student Daniel Machado, Senior
Lead Presenter's Name
Sarah Reynolds
Faculty Mentor Name
Omar Ochoa
Abstract
This poster presents a novel form of analysis of synthetic voices. This initial study uses an analysis method that has been created through a transformer-enabled phonological framework. The study emphasizes the use of phoneme-level analysis to evaluate and improve synthetic voices, an interdisciplinary study of artificial intelligence and linguistics. Using transformer models such as wav2vec for speech-to-text (STT) and DeepPhonemizer for grapheme-to-phoneme conversion, this work analyzes audio samples from synthetic voices generated by OpenAI's text-to-speech (TTS) system. By applying an adapted Wagner-Fischer algorithm to calculate phoneme distances, the study identifies key differences in phoneme accuracy and variability between human and synthetic voices. The results demonstrate that while synthetic voices exhibit lower phoneme variation than human speakers, they offer opportunities for further refinement to enhance naturalness and diversity. This approach not only aids in improving synthetic voice generation but also provides insights into future work that can create more unique and personalized synthetic voices through machine learning and phonological integration.
Did this research project receive funding support from the Office of Undergraduate Research.
Yes, Spark Grant
Phonological Insights for Synthetic Speech: A Transformer-Driven Phoneme Evaluation Approach
This poster presents a novel form of analysis of synthetic voices. This initial study uses an analysis method that has been created through a transformer-enabled phonological framework. The study emphasizes the use of phoneme-level analysis to evaluate and improve synthetic voices, an interdisciplinary study of artificial intelligence and linguistics. Using transformer models such as wav2vec for speech-to-text (STT) and DeepPhonemizer for grapheme-to-phoneme conversion, this work analyzes audio samples from synthetic voices generated by OpenAI's text-to-speech (TTS) system. By applying an adapted Wagner-Fischer algorithm to calculate phoneme distances, the study identifies key differences in phoneme accuracy and variability between human and synthetic voices. The results demonstrate that while synthetic voices exhibit lower phoneme variation than human speakers, they offer opportunities for further refinement to enhance naturalness and diversity. This approach not only aids in improving synthetic voice generation but also provides insights into future work that can create more unique and personalized synthetic voices through machine learning and phonological integration.