dissertation on speech recognition

End-to-End Speech Recognition Models

For the past few decades, the bane of Automatic Speech Recognition (ASR) systems have been phonemes and Hidden Markov Models (HMMs). HMMs assume conditional indepen-dence between observations, and the reliance on explicit phonetic representations requires expensive handcrafted pronunciation dictionaries. Learning is often via detached proxy problems, and there especially exists a disconnect between acoustic model performance and actual speech recognition performance. Connectionist Temporal Classification (CTC) character models were recently proposed attempts to solve some of these issues, namely jointly learning the pronunciation model and acoustic model. However, HMM and CTC models still suffer from conditional independence assumptions and must rely heavily on language models during decoding. In this thesis, we question the traditional paradigm of ASR and highlight the limitations of HMM and CTC models. We propose a novel approach to ASR with neural attention models and we directly optimize speech transcriptions. Our proposed method is not only an end-to- end trained system but also an end-to-end model. The end-to-end model jointly learns all the traditional components of a speech recognition system: the pronunciation model, acoustic model and language model. Our model can directly emit English/Chinese characters or even word pieces given the audio signal. There is no need for explicit phonetic representations, intermediate heuristic loss functions or conditional independence assumptions. We demonstrate our end-to-end speech recognition model on various ASR tasks. We show competitive results compared to a state-of-the-art HMM based system on the Google voice search task. We demonstrate an online end-to-end Chinese Mandarin model and show how to jointly optimize the Pinyin transcriptions during training. Finally, we also show state-of-the-art results on the Wall Street Journal ASR task compared to other end-to-end models.

Degree Type

Dissertation
Electrical and Computer Engineering

Degree Name

Doctor of Philosophy (PhD)

Usage metrics

Computer Engineering
Electrical and Electronic Engineering not elsewhere classified

2011 Theses Doctoral

Automatic Dialect and Accent Recognition and its Application to Speech Recognition

Biadsy, Fadi

A fundamental challenge for current research on speech science and technology is understanding and modeling individual variation in spoken language. Individuals have their own speaking styles, depending on many factors, such as their dialect and accent as well as their socioeconomic background. These individual differences typically introduce modeling difficulties for large-scale speaker-independent systems designed to process input from any variant of a given language. This dissertation focuses on automatically identifying the dialect or accent of a speaker given a sample of their speech, and demonstrates how such a technology can be employed to improve Automatic Speech Recognition (ASR). In this thesis, we describe a variety of approaches that make use of multiple streams of information in the acoustic signal to build a system that recognizes the regional dialect and accent of a speaker. In particular, we examine frame-based acoustic, phonetic, and phonotactic features, as well as high-level prosodic features, comparing generative and discriminative modeling techniques. We first analyze the effectiveness of approaches to language identification that have been successfully employed by that community, applying them here to dialect identification. We next show how we can improve upon these techniques. Finally, we introduce several novel modeling approaches -- Discriminative Phonotactics and kernel-based methods. We test our best performing approach on four broad Arabic dialects, ten Arabic sub-dialects, American English vs. Indian English accents, American English Southern vs. Non-Southern, American dialects at the state level plus Canada, and three Portuguese dialects. Our experiments demonstrate that our novel approach, which relies on the hypothesis that certain phones are realized differently across dialects, achieves new state-of-the-art performance on most dialect recognition tasks. This approach achieves an Equal Error Rate (EER) of 4% for four broad Arabic dialects, an EER of 6.3% for American vs. Indian English accents, 14.6% for American English Southern vs. Non-Southern dialects, and 7.9% for three Portuguese dialects. Our framework can also be used to automatically extract linguistic knowledge, specifically the context-dependent phonetic cues that may distinguish one dialect form another. We illustrate the efficacy of our approach by demonstrating the correlation of our results with geographical proximity of the various dialects. As a final measure of the utility of our studies, we also show that, it is possible to improve ASR. Employing our dialect identification system prior to ASR to identify the Levantine Arabic dialect in mixed speech of a variety of dialects allows us to optimize the engine's language model and use Levantine-specific acoustic models where appropriate. This procedure improves the Word Error Rate (WER) for Levantine by 4.6% absolute; 9.3% relative. In addition, we demonstrate in this thesis that, using a linguistically-motivated pronunciation modeling approach, we can improve the WER of a state-of-the art ASR system by 2.2% absolute and 11.5% relative WER on Modern Standard Arabic.

Computer science

thumnail for Biadsy_columbia_0054D_10084.pdf

More About This Work

DOI Copy DOI to clipboard

Machine listening: Making speech recognition systems more inclusive

Study explores how african american english speakers adapt their speech to be understood by voice technology..

Interactions with voice technology, such as Amazon's Alexa, Apple's Siri, and Google Assistant, can make life easier by increasing efficiency and productivity. However, errors in generating and understanding speech during interactions are common. When using these devices, speakers often style-shift their speech from their normal patterns into a louder and slower register, called technology-directed speech.

Research on technology-directed speech typically focuses on mainstream varieties of U.S. English without considering speaker groups that are more consistently misunderstood by technology. In JASA Express Letters, published on behalf of the Acoustical Society of America by AIP Publishing, researchers from Google Research, the University of California, Davis, and Stanford University wanted to address this gap.

One group commonly misunderstood by voice technology are individuals who speak African American English, or AAE. Since the rate of automatic speech recognition errors can be higher for AAE speakers, downstream effects of linguistic discrimination in technology may result.

"Across all automatic speech recognition systems, four out of every ten words spoken by Black men were being transcribed incorrectly," said co-author Zion Mengesha. "This affects fairness for African American English speakers in every institution using voice technology, including health care and employment."

"We saw an opportunity to better understand this problem by talking to Black users and understanding their emotional, behavioral, and linguistic responses when engaging with voice technology," said co-author Courtney Heldreth.

The team designed an experiment to test how AAE speakers adapt their speech when imagining talking to a voice assistant, compared to talking to a friend, family member, or stranger. The study tested familiar human, unfamiliar human, and voice assistant-directed speech conditions by comparing speech rate and pitch variation. Study participants included 19 adults identifying as Black or African American who had experienced issues with voice technology. Each participant asked a series of questions to a voice assistant. The same questions were repeated as if speaking to a familiar person and, again, to a stranger. Each question was recorded for a total of 153 recordings.

Analysis of the recordings showed that the speakers exhibited two consistent adjustments when they were talking to voice technology compared to talking to another person: a slower rate of speech with less pitch variation (more monotone speech).

"These findings suggest that people have mental models of how to talk to technology," said co-author Michelle Cohn. "A set 'mode' that they engage to be better understood, in light of disparities in speech recognition systems."

There are other groups misunderstood by voice technology, such as second-language speakers. The researchers hope to expand the language varieties explored in human-computer interaction experiments and address barriers in technology so that it can support everyone who wants to use it.

Communications
Artificial Intelligence
Information Technology
Computer Science
STEM Education
Racial Disparity
Media and Entertainment
Voice over IP
Speech recognition
Consumerism
Civil libertarianism
Communication
Social inequality
Virtual reality

Story Source:

Materials provided by American Institute of Physics . Note: Content may be edited for style and length.

Journal Reference :

Michelle Cohn, Zion Mengesha, Michal Lahav, Courtney Heldreth. African American English speakers’ pitch variation and rate adjustments for imagined technological and human addressees . JASA Express Letters , 2024; 4 (4) DOI: 10.1121/10.0025484

Cite This Page :

Explore More

Simulations Support Dark Matter Theory
3D Printed Programmable Living Materials
Emergence of Animals: Magnetic Field Collapse
Ice Shelves Crack from Weight of Meltwater Lakes
Countries' Plans to Remove CO2 Not Enough
Toward Robots With Human-Level Touch Sensitivity
'Doubling' in Origin of Cancer Cells
New Catalyst for Using Captured Carbon
Random Robots Are More Reliable
Significant Discovery in Teleportation Research

Dissertations / Theses on the topic 'Speech emotion recognition'

Create a spot-on reference in apa, mla, chicago, harvard, and other styles.

Consult the top 50 dissertations / theses for your research on the topic 'Speech emotion recognition.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

Sidorova, Julia. "Optimization techniques for speech emotion recognition." Doctoral thesis, Universitat Pompeu Fabra, 2009. http://hdl.handle.net/10803/7575.

Pachoud, Samuel. "Audio-visual speech and emotion recognition." Thesis, Queen Mary, University of London, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.528923.

Iliev, Alexander Iliev. "Emotion Recognition Using Glottal and Prosodic Features." Scholarly Repository, 2009. http://scholarlyrepository.miami.edu/oa_dissertations/515.

Väyrynen, E. (Eero). "Emotion recognition from speech using prosodic features." Doctoral thesis, Oulun yliopisto, 2014. http://urn.fi/urn:isbn:9789526204048.

Ma, Rui. "Parametric Speech Emotion Recognition Using Neural Network." Thesis, Högskolan i Gävle, Avdelningen för elektronik, matematik och naturvetenskap, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:hig:diva-17694.

Rintala, Jonathan. "Speech Emotion Recognition from Raw Audio using Deep Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-278858.

Mancini, Eleonora. "Disruptive Situations Detection on Public Transports through Speech Emotion Recognition." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24721/.

Al-Talabani, Abdulbasit. "Automatic Speech Emotion Recognition : feature space dimensionality and classification challenges." Thesis, University of Buckingham, 2015. http://bear.buckingham.ac.uk/101/.

Sun, Rui. "The evaluation of the stability of acoustic features in affective conveyance across multiple emotional databases." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/49041.

Noé, Paul-Gauthier. "Emotion Recognition in Football Commentator Speech : Is the action intense or not ?" Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-289370.

Tinnemore, Anna, and Anna Tinnemore. "Improving Understanding of Emotional Speech Acoustic Content." Diss., The University of Arizona, 2017. http://hdl.handle.net/10150/625368.

Bhullar, Naureen. "Effects of Facial and Vocal Emotion on Word Recognition in 11-to-13-month-old infants." Diss., Virginia Tech, 2007. http://hdl.handle.net/10919/27502.

Nguyen, Tien Dung. "Multimodal emotion recognition using deep learning techniques." Thesis, Queensland University of Technology, 2020. https://eprints.qut.edu.au/180753/1/Tien%20Dung_Nguyen_Thesis.pdf.

Siddiqui, Mohammad Faridul Haque. "A Multi-modal Emotion Recognition Framework Through The Fusion Of Speech With Visible And Infrared Images." University of Toledo / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1556459232937498.

Acosta, Jaime Cesar. "Using emotion to gain rapport in a spoken dialog system." To access this resource online via ProQuest Dissertations and Theses @ UTEP, 2009. http://0-proquest.umi.com.lib.utep.edu/login?COPT=REJTPTU0YmImSU5UPTAmVkVSPTI=&clientId=2515.

Pon-Barry, Heather Roberta. "Inferring Speaker Affect in Spoken Natural Language Communication." Thesis, Harvard University, 2012. http://dissertations.umi.com/gsas.harvard:10710.

Deng, Jun [Verfasser], Björn W. [Akademischer Betreuer] [Gutachter] Schuller, and Werner [Gutachter] Hemmert. "Feature Transfer Learning for Speech Emotion Recognition / Jun Deng. Betreuer: Björn W. Schuller. Gutachter: Björn W. Schuller ; Werner Hemmert." München : Universitätsbibliothek der TU München, 2016. http://d-nb.info/1106382331/34.

Iriya, Rafael. "Análise de sinais de voz para reconhecimento de emoções." Universidade de São Paulo, 2014. http://www.teses.usp.br/teses/disponiveis/3/3142/tde-14042015-160249/.

Chandrapati, Srivardhan. "Multi-modal expression recognition." Thesis, Manhattan, Kan. : Kansas State University, 2008. http://hdl.handle.net/2097/762.

KHALIFA, INTISSAR. "Deep psychology recognition based on automatic analysis of non-verbal behaviors." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2021. http://hdl.handle.net/10281/314920.

Guerrero, Razuri Javier Francisco. "Decisional-Emotional Support System for a Synthetic Agent : Influence of Emotions in Decision-Making Toward the Participation of Automata in Society." Doctoral thesis, Stockholms universitet, Institutionen för data- och systemvetenskap, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-122084.

At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 8: Accepted.

Žukas, Gediminas. "Kalbos emocijų požymių tyrimas." Master's thesis, Lithuanian Academic Libraries Network (LABT), 2014. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2014~D_20140617_133242-89394.

Vlasenko, Andrej. "Studentų emocinės būklės testavimo metu tyrimas panauduojant biometrines technologijas." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2012. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2012~D_20120329_153219-37955.

Zhu, Winstead Xingran. "Hotspot Detection for Automatic Podcast Trailer Generation." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-444887.

Navrátil, Michal. "Rozpoznávání emočních stavů pomocí analýzy řečového signálu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2008. http://www.nusl.cz/ntk/nusl-217263.

Pfeifer, Leon. "Automatické rozpoznávání emočních stavů člověka na základě analýzy řečového projevu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2008. http://www.nusl.cz/ntk/nusl-217520.

Shaukat, Arslan. "Automatic Emotional State Analysis and Recognition from Speech Signals." Thesis, University of Manchester, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.511910.

Atassi, Hicham. "Rozpoznání emočního stavu z hrané a spontánní řeči." Doctoral thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2014. http://www.nusl.cz/ntk/nusl-233665.

Hansson, Svan Angus, and Carl Mannerstråle. "Prediktion av användaromdömen om språkcafé-samtal baserat på automatisk röstanalys." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-261639.

Planet, García Santiago. "Reconeixement afectiu automàtic mitjançant l'anàlisi de paràmetres acústics i lingüístics de la parla espontània." Doctoral thesis, Universitat Ramon Llull, 2013. http://hdl.handle.net/10803/125335.

Ferro, Adelino Rafael Mendes. "Speech emotion recognition through statistical classification." Master's thesis, 2017. http://hdl.handle.net/10400.14/22817.

"Optimization techniques for speech emotion recognition." Universitat Pompeu Fabra, 2009. http://www.tesisenxarxa.net/TDX-0113110-133822/.

Yeh, Jun-Heng, and 葉俊亨. "Emotion Recognition from Mandarin Speech Signals." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/2f4evr.

Chiou, Bo-Chang, and 邱柏菖. "Cross-Lingual Automatic Speech Emotion Recognition." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/23736438894309347506.

SHEN, MENG-JHEN, and 沈孟蓁. "Research on Speech Emotion Recognition Systems." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/j5m53v.

CHENG, KUAN-JUNG, and 程冠融. "Cross-Lingual Speech Emotion Recognition Based on Speech Recognition Technology in An Emotional Speech Database in Mandarin, Taiwanese, and Hakka." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/6c4m2x.

Wang, Chun-Ming, and 王俊明. "Speech Emotion Recognition using 2D texture features." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/6y77cs.

Su, Yu-Che, and 蘇于哲. "Emotion Recognition based on Chinese Speech Signals." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/54453655022687274699.

Li, Pei-jia, and 李珮嘉. "Emotion Recognition from Continuous Mandarin Speech Signal." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/79746323884839442339.

Bakhshi, Ali. "Speech emotion recognition using deep neural networks." Thesis, 2021. http://hdl.handle.net/1959.13/1430839.

Yeh, Lan-Ying, and 葉藍霙. "Spectro-Temporal Modulations for Robust Speech Emotion Recognition." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/56883607879338166423.

Wu, Chien-Feng, and 吳鑑峰. "Bimodal Emotion Recognition from Speech and Facial Expression." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/g8tuye.

"Emotion Recognition and Traumatic Brain Injury." Master's thesis, 2011. http://hdl.handle.net/2286/R.I.9087.

Huang, Ching-Hsiu, and 黃慶修. "Emotion recognition of spontaneous speech using mutiple-instance learning." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/21389130030838831132.

Hsu, Jin-Huai, and 許晉懷. "Bimodal Emotion Recognition System Using Image and Speech Information." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/75013414815783421468.

Chen, Chia-ying, and 陳嘉穎. "Speech Emotion Recognition Using Factor Analysis and Identity Vectors." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/14016869692695939636.

Yang, Bo-Cheng, and 楊博丞. "Adversarial Feature Augmentation for Cross-corpus Speech Emotion Recognition." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/hrvxvp.

Manamela, Phuti John. "The automatic recognition of emotions in speech." Thesis, 2020. http://hdl.handle.net/10386/3347.

Lin, Ching-yi, and 林靜宜. "A Study on Identifying the Most Effective Speech Features for Speech Emotion Recognition." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/77453967950782385836.

Vogt, Thurid [Verfasser]. "Real-time automatic emotion recognition from speech / von Thurid Vogt." 2010. http://d-nb.info/1010495038/34.

Spring 2024 Recognition Ceremony Program

Congratulations to the electrical and computer engineering Class of 2024!

And a special thank you to all of the friends and family who have supported our graduates during their time at CU Boulder.

Today's Ceremony

Awards and Honors

PhD Candidates

Master's Candidates

Master of Science with Thesis

Master of science.

Master of Engineering

Bachelor's Graduates

Electrical and Computer Engineering
Electrical Engineering

Today's Ceremony

Processional

Welcoming Remarks Professor Chris Myers, Department Chair and Palmer Leadership Chair in Electrical, Computer & Energy Engineering

Acknowledgement of Award Winners

Holland Teaching Award Faculty Keynote Assistant Professor Joshua Combes

Undergraduate Student Address Jasleen Batra

Master's Student Address Gabriel Altman

PhD Student Address Neeraj Prakash

Presentation of the PhD Candidates Professor Sean Shaheen, Associate Chair for Research and Graduate Education

Presentation of Master’s Candidates Professor Sean Shaheen, Associate Chair for Research and Graduate Education

Presentation of BS Candidates Associate Teaching Professor Mona ElHelbawy, Associate Chair for Undergraduate Education

Closing Remarks Professor Chris Myers

Reception Please join us to continue celebrating our graduates. Light refreshments will be provided.

Awards and Honors

Best Thesis: Jieqiu Shao
Excellence in Graduate Research: Lukas Buecherel
Excellence in Graduate Teaching: Aravind Venkitasubramony
Undergraduate Academic Engagement: Kahlid Shahba and Zane McMorris
Undergraduate Community Impact Award: Jasleen Batra
Undergraduate Perseverance Award: Bruno Armas and Insar Magadeev
Outstanding ECEE Undergraduate: Olivia Egbert and Suhana Zeitzius

Doctor of Philosophy Candidates

Conrad Corbella Bagot Advised by Professor Won Park Dissertation to be defended in Summer 2024

Paige Danielson Advised by Professor Zoya Popovic Dissertation to be defended in Summer 2024

James William Hurtt Advised by Professor Kyri Baker Dissertation: "On the Techno-Economic Merits and Challenges of Clean Hybrid Energy Systems in Contemporary Power Systems"

Connor Nogales Advised by Professor Gregor Lasser Dissertation: "Broadband Supply Modulated PAs for Efficient and Linear Transmit Arrays"

Neeraj Prakash Advised by Professor Shu-Wei Huang Dissertation: "High-Energy Single-Cavity Fiber Dual-Comb Source"

Anthony Romano Advised by Professor Zoya Popovic Dissertation: "Monolithic Integration of Millimeter Wave Circuits in Advanced GaN Processes"

Jieqiu Shao Advised by Professor Marco Nicotra Dissertation: "Quantum Optimal Control and its Applications to Shaken Lattice Interferometry and Superconducting Qubits"

Terrence Skibik Advised by Professor Marco Nicotra Dissertation: "Advancements in Model Predictive Control for Real-Time Applications"

Dong-Chan Son Advised by Professor Dejan Filipovic Dissertation to be defended in Summer 2024

Timothy Sonnenberg Advised by Professor Zoya Popovic Dissertation: "GaN MMICs for Millimeter-Wave Front Ends"

Jack Wampler Advised by Professor Eric Wustrow Dissertation: "Opt Out at Your Own Expense - Designing Systems for Adversarial Contexts"

Songyi Yen Advised by Professor Dejan Filipovic Dissertation: "Unconventional Arrays for HF and Other Applications"

Master's Candidates

Gabriel Altman Advised by Professor Dejan Filipovic Dissertation to be defended in Summer 2024

Sai Abhishek Aravind Advised by Professor Marco Nicotra Dissertation: "Influence of Discretization on Hypersampled Model Predictive Control"

Lauren Teresa Baker
Suraj Ajjampur
Chris Thomas Alexander
Tasneem Alnajdi
Gabriel Altman
Akshith Aluguri
Nileshkartik Ashokkumar
Timothy Bailey
Donggeun Bak
Rylee Beach
Harsh Beriwal
Khalid Mohamed Abdelgalil Bakhit
Vishwanath Bhavikatti
Devang Boradhara
Alexander Bork
Naman Buch
Isha Burange
Aamir Suhail Burhan
Ruthvik Rangaiah Chanda
Chandinee Chandrasekaran
Rajesh Chittiappa
Hyoun J. Cho
Padmakshi Dahal
Tyler Davidson
Sauranil Debarshi
Aneesh Sadanand Deshpande
Varsha Dewangan
Paras Dhameliya
Kshitija Ramesh Dhondage
Jichao Fang
Harinarayanan Gajapathy
Joshua Galeno
Avirup Gupta
Avirup Kumar Gupta
Angel Manuel Hernandez Ortega
Ranjith Janardhana
Ayswariya Kannan
Sricharan Kidambi
Rakshit Kulkarni
Lalit Kumar
Abhinav Kumar
Ankit Kumar
Anuhya Kuraparthy
Sylvia Llosa
Spandana Mahendra
Erick Mancera
Kanin James McGuire
Colin Bruce McRae
Daniel Mendez
Nicole Danisha Milligan
Rylan Moore
Amey Chandrakant More
Sayali Sanjay Mule
Aditi Vijay Nanaware
Vidhya Palaniappan
Vaishnavi Sudhakar Patekar
Divyesh Shashikant Patel
Viraj Gopal Patel
Akash Patil
Mihir Jivan Patil
Aakash Pednekar
May An Ying van de Poll
Karthik Baggaon Rajendra
Chirayu Rajpurohit
Ritika Ramchandani
Thomas Ramirez
Lexie Roberts
Jessica Roosz
Satish Kumar Sankella
Cija Sathishkumar
Arun Kumar Sesha
Saquib Yasir Shaikh
Chinmay Venkatesh Shalawadi
Daanish Mohammed Shariff
Isha Sharma
Gregory James Southards
Malola Simman Srinivasan Kannan
Mangala Sneha Srinivasan
Rajesh Srirangam
Swapnil Alkesh Trivedi
Vignesh Vadivel
Robert Enright Van Trees
Swathi Venkatachalam
Mrunal Ankush Yadav
Omkar Abhay Yeole

Master of Engineering

Francis Xavier Bergh
Ashwin Ravindra
Abhishek Limaye
Viveka Salinamakki

Bachelor of Science Graduates

Bachelor of science, electrical and computer engineering.

Ahmed Adam
Yusef Jamal Al-Balushi
Saud Almuzaiel
Bruno Armas
Abhinav Avula
Jasleen Batra
William Boenning – Cum Laude
John Cates
Chandana Challa
Richard Chuang
Nicholas Alexander Cisne
Kailer Hawk Driscoll
Sullivan Fleming
Aidan Francis Hanlon Fitton
Timothy Houck
Daniel Juhwan Lee
Peter William Magro
Louis Marfone
Frank McDermott
Weston Carroll McEvoy – Magna Cum Laude
Zane McMorris
Caden McVey
Dominic Fawzi Menassa
Sarah Mesgina
Daniel Orthel
Madelyn Polly – Summa Cum Laude
Guillermo Alexander Rivas Calles
Samuel Robertson
Ginn Sato – Summa Cum Laude
Connor Smith
Aidan St. Cyr
Taylor Stevenson
Taylore Todd
Anton Manuel Vandenberge
Alexander Joseph Walker – Summa Cum Laude
William White
Suhana Zeutzius – Summa Cum Laude

Bachelor of Science, Electrical Engineering

Ali Karam Ali
Nasser Taleb Allanqawi
Meshal Alosaimi
Michelle Amankwah
Erika Antúnez
Andrew Aramians
Joshua Thomas Bay – Cum Laude
Katherine Christiansen
Michael Takuya Driscoll – Magna Cum Laude
Olivia Egbert – Cum Laude
Travis Fahrney
Luke Hanley – Cum Laude
Nicholas Haratsaris
Luke Jeseritz – Summa Cum Laude
Ryan McCallan
Oscar Omar Medina-Salazar
Tucker Mothersell
Matthew Joel Pollard – Cum Laude
Stewart Patrick Rojec – Magna Cum Laude
Khalid Shahba – Summa Cum Laude
Nathan Sharp
Danny Ming Sit
Timothy Henry Tomerlin
Robert B Traxler

Apply Visit Give

Departments

Ann and H.J. Smead Aerospace Engineering Sciences
Chemical & Biological Engineering
Civil, Environmental & Architectural Engineering
Computer Science
Electrical, Computer & Energy Engineering
Paul M. Rady Mechanical Engineering
Applied Mathematics
Biomedical Engineering
Creative Technology & Design
Engineering Education
Engineering Management
Engineering Physics
Environmental Engineering
Integrated Design Engineering
Materials Science & Engineering

Microsoft bans US police departments from using enterprise AI tool for facial recognition

Microsoft has reaffirmed its ban on U.S. police departments from using generative AI for facial recognition through Azure OpenAI Service , the company’s fully managed, enterprise-focused wrapper around OpenAI tech.

Language added Wednesday to the terms of service for Azure OpenAI Service more clearly prohibits integrations with Azure OpenAI Service from being used “by or for” police departments for facial recognition in the U.S., including integrations with OpenAI’s current — and possibly future — image-analyzing models.

A separate new bullet point covers “any law enforcement globally,” and explicitly bars the use of “real-time facial recognition technology” on mobile cameras, like body cameras and dashcams, to attempt to identify a person in “uncontrolled, in-the-wild” environments.

The changes in policy come a week after Axon, a maker of tech and weapons products for military and law enforcement, announced a new product that leverages OpenAI’s GPT-4 generative text model to summarize audio from body cameras. Critics were quick to point out the potential pitfalls, like hallucinations (even the best generative AI models today invent facts) and racial biases introduced from the training data (which is especially concerning given that people of color are far more likely to be stopped by police than their white peers).

It’s unclear whether Axon was using GPT-4 via Azure OpenAI Service, and, if so, whether the updated policy was in response to Axon’s product launch. OpenAI had previously restricted the use of its models for facial recognition through its APIs. We’ve reached out to Axon, Microsoft and OpenAI and will update this post if we hear back.

The new terms leave wiggle room for Microsoft.

The complete ban on Azure OpenAI Service usage pertains only to U.S. , not international, police. And it doesn’t cover facial recognition performed with stationary cameras in controlled environments, like a back office (although the terms prohibit any use of facial recognition by U.S. police).

That tracks with Microsoft’s and close partner OpenAI’s recent approach to AI-related law enforcement and defense contracts.

In January, reporting by Bloomberg revealed that OpenAI is working with the Pentagon on a number of projects including cybersecurity capabilities — a departure from the startup’s earlier ban on providing its AI to militaries. Elsewhere, Microsoft has pitched using OpenAI’s image generation tool, DALL-E, to help the Department of Defense (DoD) build software to execute military operations, per The Intercept.

Azure OpenAI Service became available in Microsoft’s Azure Government product in February, adding additional compliance and management features geared toward government agencies, including law enforcement. In a blog post , Candice Ling, SVP of Microsoft’s government-focused division Microsoft Federal, pledged that Azure OpenAI Service would be “submitted for additional authorization” to the DoD for workloads supporting DoD missions.

Update: After publication, Microsoft said its original change to the terms of service contained an error, and in fact the ban applies only to facial recognition in the U.S. It is not a blanket ban on police departments using the service.

IMAGES

(PDF) Use of Speech Recognition in Computer-assisted Language Learning
(PDF) A systematic review of speech recognition technology in health care
(PDF) Review on Speech Recognition System for Indian Languages
(PDF) PHONETIC EVENT-BASED WHOLE-WORD MODELING …...PHONETIC EVENT-BASED
(PDF) An Overview on Speech Recognition System and Comparative Study of
An Introduction To Speech Recognition

VIDEO

Sound Capture and Speech Enhancement for Communication and Distant Speech Recognition
Real-Time Speech Enhancement
Automatic Speech Recognition: An Overview
Autism Spectrum Disorder Prediction Using a Convolutional Neural Network CNN fMRI data python code
Fall2022-SpeechRecognition&Understanding (Lecture4
ASR / speech-to-text with Whisper at Stanford Libraries. P Leonard

COMMENTS

PDF Deep Neural Networks in Speech Recognition a Dissertation Submitted to
This thesis comes from a close collaboration with my advisor, Andrew Ng. Andrew has been an amazing mentor in the process of planning and solving research problems, and constantly encouraged me to work on challenging, impactful problems. Much of the work on speech recognition in this thesis comes from close collab-oration with Dan Jurafsky.
PDF Semi-supervised Training for Automatic Speech Recognition
AUTOMATIC SPEECH RECOGNITION by Vimal Manohar A dissertation submitted to The Johns Hopkins University in conformity with the requirements for the degree of ... In the second part of this thesis, we investigate using lattice-based supervi-sion as numerator graph to incorporate uncertainties in unsupervised data in
PDF X-vectors: Robust Neural Embeddings for Speaker Recognition
Speaker recognition is the task of identifying speakers based on their speech signal. Typically, this involves comparing speech from a known speaker, with recordings from unknown speakers, and making same-or-different speaker decisions. If the lexical contents of the recordings are fixed to some phrase,
PDF Deep learning approaches to problems in speech recognition
distributed representations of their input. This dissertation demonstrates the e cacy and generality of this approach in a series of diverse case studies in speech recognition, computational chemistry, and natural language processing. Throughout these studies, I extend and modify the neural network models as needed to be more e ective for each ...
PDF Towards Robust Conversational Speech Recognition and Understanding
great deal of expertise and developed a deep passion in speech recognition from knowing nothing about it. He has been and will always be a role model for my career. Thanks to the professors who spend their precious time to serve on my dissertation committee: Prof. Chin-Hui Lee, Prof. Mark Clements, Prof. Elliot Moore II and Prof. Yajun Mei.
PDF Deep Learning for Distant Speech Recognition
ligibility of a speech signal, making Distant Speech Recognition (DSR) one of the major open challenges in the eld. This thesis addresses the latter scenario and proposes some novel tech-niques, architectures, and algorithms to improve the robustness of distant-talking acoustic models. We rst elaborate on methodologies for realistic
PDF Model-based Approaches to Robust Speech Recognition in Diverse Environments
Many speech recognition applications will bene t from distant-talking speech capture. This avoids problems caused by using hand-held or body-worn equipment. However, due to the large speaker-to-microphone distance, both background noise and reverberant noise will signi cantly corrupt speech signals and negatively impact speech recognition ...
PDF Deep Learning Approaches for Automatic Sung Speech Recognition
techniques have revolutionised spoken speech recognition systems through advances in both acoustic modelling and audio source separation. This thesis evaluates whether these new techniques can be adapted to work for sung speech recognition. For this, it first presents an analysis of the differences between spoken and sung speech.
PDF Multi-Modal and Deep Learning for Robust Speech Recognition
Automatic speech recognition (ASR) decodes speech signals into text. While ASR can pro-duce accurate word recognition in clean environments, system performance can degrade dramatically when noise and reverberation are present. In this thesis, speech denoising and model adaptation for robust speech recognition were studied, and four novel meth-
Automatic Dialect and Accent Recognition and its Application to Speech
variant of a given language. This dissertation focuses on automatically identifying the di-alect or accent of a speaker given a sample of their speech, and demonstrates how such a technology can be employed to improve Automatic Speech Recognition (ASR). In this thesis, we describe a variety of approaches that make use of multiple streams of
End-to-End Speech Recognition Models
The end-to-end model jointly learns all the traditional components of a speech recognition system: the pronunciation model, acoustic model and language model. Our model can directly emit English/Chinese characters or even word pieces given the audio signal. There is no need for explicit phonetic representations, intermediate heuristic loss ...
[PDF] End-to-End Speech Recognition Models
This thesis proposes a novel approach to ASR with neural attention models and demonstrates the end-to-end speech recognition model, which can directly emit English/Chinese characters or even word pieces given the audio signal. For the past few decades, the bane of Automatic Speech Recognition (ASR) systems have been phonemes and Hidden Markov Models (HMMs). HMMs assume conditional independence ...
PDF Speech Recognition using Neural Networks
This thesis examines how artiﬁcial neural networks can beneﬁt a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modeling. Despite their state-of-the-art ...
Automatic Speech Recognition for Low-Resource and Morphologically
target language. As such, the motivation of this thesis is to expand upon the current deep learning implementations of low-resource speech recognition [8, 9, 10] through a study against small and diverse ASR training corpora. The nal result will be a pipeline capable of accepting an under-resourced language, determining the appro-
Novel NLP Methods for Improved Text-To-Speech Synthesis
These methods are also useful for automatic speech recognition (ASR) and dialogue systems. In my dissertation, I cover three different tasks: Grapheme-to-phoneme Conversion (G2P), Text ...
Automatic Dialect and Accent Recognition and its Application to Speech
This dissertation focuses on automatically identifying the dialect or accent of a speaker given a sample of their speech, and demonstrates how such a technology can be employed to improve Automatic Speech Recognition (ASR). In this thesis, we describe a variety of approaches that make use of multiple streams of information in the acoustic ...
Dissertations / Theses: 'Speech recognition'
This thesis describes a speech recognition system that was built to support spontaneous speech understanding. The system is composed of (1) a front end acoustic analyzer which computes Mel-frequency cepstral coefficients, (2) acoustic models of context-dependent phonemes (triphones), (3) a back-off bigram statistical language model, and (4) a ...
PDF Acoustical and Environmental Robustness in Automatic Speech Recognition
This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment. These algorithms attempt to improve the recognition accuracy of speech recognition systems when they are trained and tested in different acoustical
Speech Recognition Using Connectionist Networks Dissertation Proposal
That speech recognition and understanding is an important problem will be taken for granted. The extent to which computer speech recognition would change (improve) many aspects of work and life is certainly of great magnitude. Acoustic phonetic recognition is a well-defined and substantial subproblem of speech recognition.
(PDF) speech recognition and application
ABSTRACT. In this thesis, speech recognition systems are developed. The se applications are medium-sized, discrete and individual-. dependent systems. In these sy stems, training and testing ...
EfficientASR: Speech Recognition Network Compression via Attention
In recent years, Transformer networks have shown remarkable performance in speech recognition tasks. However, their deployment poses challenges due to high computational and storage resource requirements. To address this issue, a lightweight model called EfficientASR is proposed in this paper, aiming to enhance the versatility of Transformer models. EfficientASR employs two primary modules ...
PDF Continuous speech recognition for people with dysarthria
cus on dysarthric speech recognition research has not moved from isolated word to more challenging connected speech scenarios yet. There is a clear need to improve continuous dysarthric speech recognition. This thesis is the first to systematically investigate various methods for con-tinuous dysarthric speech recognition.
PDF Recognition and support of children with speech, language and
The thesis considers how early years and primary teachers support children with speech, language and communication needs and how mentors might then support student primary teachers in advancing their understanding of language development. The study discusses how, in the process, teachers might draw on and interact with their own knowledge, the
Machine listening: Making speech recognition systems more inclusive
Machine listening: Making speech recognition systems more inclusive. ScienceDaily . Retrieved May 1, 2024 from www.sciencedaily.com / releases / 2024 / 04 / 240430131852.htm
Dissertations / Theses: 'Speech emotion recognition'
In this thesis, speech signals have mainly been used for emotion recognition, as speech signals are the simplest means of communicating between humans and are a rich source of emotional information. Hence, the first speech emotion recognition architecture was designed based on a hierarchical classifier that used Cepstral coefficients based on ...
Clinical Associate Professor Chantee Earl
Scholars Recognition; CEHD Honors Day; Ed.S. and Ed.D. Graduate Recognition 2024; Faculty Awards Recognition 2024; CEHD Staff Council. Staff Resources; Resources For Part-Time Instructors; Office of Research & Sponsored Projects. Intent to Submit an Internal Georgia State Grant; Intent to Submit an External Grant or Contract Proposal; Emeriti ...
Spring 2024 Recognition Ceremony Program
Spring 2024 Recognition Ceremony Program . Congratulations to the electrical and computer engineering Class of 2024! And a special thank you to all of the friends and family who have supported our graduates during their time at CU Boulder. Table of Contents ... Dissertation: "Opt Out at Your Own Expense - Designing Systems for Adversarial ...
Microsoft bans US police departments from using enterprise AI tool for
Microsoft has reaffirmed its ban on U.S. police departments from using generative AI for facial recognition through Azure OpenAI Service, the company's fully managed, enterprise-focused wrapper ...

End-to-End Speech Recognition Models

Degree Type

Degree Name

Usage metrics

Automatic Dialect and Accent Recognition and its Application to Speech Recognition

More About This Work

Machine listening: Making speech recognition systems more inclusive

Explore More

Trending Topics

Dissertations / Theses on the topic 'Speech emotion recognition'

Spring 2024 Recognition Ceremony Program

Table of Contents

Master of Science with Thesis

Today's Ceremony

Awards and Honors

Doctor of Philosophy Candidates

Master's Candidates

Master of Engineering

Bachelor of Science Graduates

​Bachelor of Science, Electrical Engineering

Departments

Affiliates & Partners

Microsoft bans US police departments from using enterprise AI tool for facial recognition

IMAGES

VIDEO

COMMENTS

Bachelor of Science, Electrical Engineering