Carnegie Mellon University

End-to-End Speech Recognition Models

For the past few decades, the bane of Automatic Speech Recognition (ASR) systems have been phonemes and Hidden Markov Models (HMMs). HMMs assume conditional indepen-dence between observations, and the reliance on explicit phonetic representations requires expensive handcrafted pronunciation dictionaries. Learning is often via detached proxy problems, and there especially exists a disconnect between acoustic model performance and actual speech recognition performance. Connectionist Temporal Classification (CTC) character models were recently proposed attempts to solve some of these issues, namely jointly learning the pronunciation model and acoustic model. However, HMM and CTC models still suffer from conditional independence assumptions and must rely heavily on language models during decoding. In this thesis, we question the traditional paradigm of ASR and highlight the limitations of HMM and CTC models. We propose a novel approach to ASR with neural attention models and we directly optimize speech transcriptions. Our proposed method is not only an end-to- end trained system but also an end-to-end model. The end-to-end model jointly learns all the traditional components of a speech recognition system: the pronunciation model, acoustic model and language model. Our model can directly emit English/Chinese characters or even word pieces given the audio signal. There is no need for explicit phonetic representations, intermediate heuristic loss functions or conditional independence assumptions. We demonstrate our end-to-end speech recognition model on various ASR tasks. We show competitive results compared to a state-of-the-art HMM based system on the Google voice search task. We demonstrate an online end-to-end Chinese Mandarin model and show how to jointly optimize the Pinyin transcriptions during training. Finally, we also show state-of-the-art results on the Wall Street Journal ASR task compared to other end-to-end models.

Degree Type

  • Dissertation
  • Electrical and Computer Engineering

Degree Name

  • Doctor of Philosophy (PhD)

Usage metrics

  • Computer Engineering
  • Electrical and Electronic Engineering not elsewhere classified

2011 Theses Doctoral

Automatic Dialect and Accent Recognition and its Application to Speech Recognition

Biadsy, Fadi

A fundamental challenge for current research on speech science and technology is understanding and modeling individual variation in spoken language. Individuals have their own speaking styles, depending on many factors, such as their dialect and accent as well as their socioeconomic background. These individual differences typically introduce modeling difficulties for large-scale speaker-independent systems designed to process input from any variant of a given language. This dissertation focuses on automatically identifying the dialect or accent of a speaker given a sample of their speech, and demonstrates how such a technology can be employed to improve Automatic Speech Recognition (ASR). In this thesis, we describe a variety of approaches that make use of multiple streams of information in the acoustic signal to build a system that recognizes the regional dialect and accent of a speaker. In particular, we examine frame-based acoustic, phonetic, and phonotactic features, as well as high-level prosodic features, comparing generative and discriminative modeling techniques. We first analyze the effectiveness of approaches to language identification that have been successfully employed by that community, applying them here to dialect identification. We next show how we can improve upon these techniques. Finally, we introduce several novel modeling approaches -- Discriminative Phonotactics and kernel-based methods. We test our best performing approach on four broad Arabic dialects, ten Arabic sub-dialects, American English vs. Indian English accents, American English Southern vs. Non-Southern, American dialects at the state level plus Canada, and three Portuguese dialects. Our experiments demonstrate that our novel approach, which relies on the hypothesis that certain phones are realized differently across dialects, achieves new state-of-the-art performance on most dialect recognition tasks. This approach achieves an Equal Error Rate (EER) of 4% for four broad Arabic dialects, an EER of 6.3% for American vs. Indian English accents, 14.6% for American English Southern vs. Non-Southern dialects, and 7.9% for three Portuguese dialects. Our framework can also be used to automatically extract linguistic knowledge, specifically the context-dependent phonetic cues that may distinguish one dialect form another. We illustrate the efficacy of our approach by demonstrating the correlation of our results with geographical proximity of the various dialects. As a final measure of the utility of our studies, we also show that, it is possible to improve ASR. Employing our dialect identification system prior to ASR to identify the Levantine Arabic dialect in mixed speech of a variety of dialects allows us to optimize the engine's language model and use Levantine-specific acoustic models where appropriate. This procedure improves the Word Error Rate (WER) for Levantine by 4.6% absolute; 9.3% relative. In addition, we demonstrate in this thesis that, using a linguistically-motivated pronunciation modeling approach, we can improve the WER of a state-of-the art ASR system by 2.2% absolute and 11.5% relative WER on Modern Standard Arabic.

  • Computer science

thumnail for Biadsy_columbia_0054D_10084.pdf

More About This Work

  • DOI Copy DOI to clipboard

ScienceDaily

Machine listening: Making speech recognition systems more inclusive

Study explores how african american english speakers adapt their speech to be understood by voice technology..

Interactions with voice technology, such as Amazon's Alexa, Apple's Siri, and Google Assistant, can make life easier by increasing efficiency and productivity. However, errors in generating and understanding speech during interactions are common. When using these devices, speakers often style-shift their speech from their normal patterns into a louder and slower register, called technology-directed speech.

Research on technology-directed speech typically focuses on mainstream varieties of U.S. English without considering speaker groups that are more consistently misunderstood by technology. In JASA Express Letters, published on behalf of the Acoustical Society of America by AIP Publishing, researchers from Google Research, the University of California, Davis, and Stanford University wanted to address this gap.

One group commonly misunderstood by voice technology are individuals who speak African American English, or AAE. Since the rate of automatic speech recognition errors can be higher for AAE speakers, downstream effects of linguistic discrimination in technology may result.

"Across all automatic speech recognition systems, four out of every ten words spoken by Black men were being transcribed incorrectly," said co-author Zion Mengesha. "This affects fairness for African American English speakers in every institution using voice technology, including health care and employment."

"We saw an opportunity to better understand this problem by talking to Black users and understanding their emotional, behavioral, and linguistic responses when engaging with voice technology," said co-author Courtney Heldreth.

The team designed an experiment to test how AAE speakers adapt their speech when imagining talking to a voice assistant, compared to talking to a friend, family member, or stranger. The study tested familiar human, unfamiliar human, and voice assistant-directed speech conditions by comparing speech rate and pitch variation. Study participants included 19 adults identifying as Black or African American who had experienced issues with voice technology. Each participant asked a series of questions to a voice assistant. The same questions were repeated as if speaking to a familiar person and, again, to a stranger. Each question was recorded for a total of 153 recordings.

Analysis of the recordings showed that the speakers exhibited two consistent adjustments when they were talking to voice technology compared to talking to another person: a slower rate of speech with less pitch variation (more monotone speech).

"These findings suggest that people have mental models of how to talk to technology," said co-author Michelle Cohn. "A set 'mode' that they engage to be better understood, in light of disparities in speech recognition systems."

There are other groups misunderstood by voice technology, such as second-language speakers. The researchers hope to expand the language varieties explored in human-computer interaction experiments and address barriers in technology so that it can support everyone who wants to use it.

  • Communications
  • Artificial Intelligence
  • Information Technology
  • Computer Science
  • STEM Education
  • Racial Disparity
  • Media and Entertainment
  • Voice over IP
  • Speech recognition
  • Consumerism
  • Civil libertarianism
  • Communication
  • Social inequality
  • Virtual reality

Story Source:

Materials provided by American Institute of Physics . Note: Content may be edited for style and length.

Journal Reference :

  • Michelle Cohn, Zion Mengesha, Michal Lahav, Courtney Heldreth. African American English speakers’ pitch variation and rate adjustments for imagined technological and human addressees . JASA Express Letters , 2024; 4 (4) DOI: 10.1121/10.0025484

Cite This Page :

Explore More

  • Simulations Support Dark Matter Theory
  • 3D Printed Programmable Living Materials
  • Emergence of Animals: Magnetic Field Collapse
  • Ice Shelves Crack from Weight of Meltwater Lakes
  • Countries' Plans to Remove CO2 Not Enough
  • Toward Robots With Human-Level Touch Sensitivity
  • 'Doubling' in Origin of Cancer Cells
  • New Catalyst for Using Captured Carbon
  • Random Robots Are More Reliable
  • Significant Discovery in Teleportation Research

Trending Topics

Strange & offbeat.

  • Bibliography
  • More Referencing guides Blog Automated transliteration Relevant bibliographies by topics
  • Automated transliteration
  • Relevant bibliographies by topics
  • Referencing guides

Dissertations / Theses on the topic 'Speech emotion recognition'

Create a spot-on reference in apa, mla, chicago, harvard, and other styles.

Consult the top 50 dissertations / theses for your research on the topic 'Speech emotion recognition.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

Sidorova, Julia. "Optimization techniques for speech emotion recognition." Doctoral thesis, Universitat Pompeu Fabra, 2009. http://hdl.handle.net/10803/7575.

Pachoud, Samuel. "Audio-visual speech and emotion recognition." Thesis, Queen Mary, University of London, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.528923.

Iliev, Alexander Iliev. "Emotion Recognition Using Glottal and Prosodic Features." Scholarly Repository, 2009. http://scholarlyrepository.miami.edu/oa_dissertations/515.

Väyrynen, E. (Eero). "Emotion recognition from speech using prosodic features." Doctoral thesis, Oulun yliopisto, 2014. http://urn.fi/urn:isbn:9789526204048.

Ma, Rui. "Parametric Speech Emotion Recognition Using Neural Network." Thesis, Högskolan i Gävle, Avdelningen för elektronik, matematik och naturvetenskap, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:hig:diva-17694.

Rintala, Jonathan. "Speech Emotion Recognition from Raw Audio using Deep Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-278858.

Mancini, Eleonora. "Disruptive Situations Detection on Public Transports through Speech Emotion Recognition." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24721/.

Al-Talabani, Abdulbasit. "Automatic Speech Emotion Recognition : feature space dimensionality and classification challenges." Thesis, University of Buckingham, 2015. http://bear.buckingham.ac.uk/101/.

Sun, Rui. "The evaluation of the stability of acoustic features in affective conveyance across multiple emotional databases." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/49041.

Noé, Paul-Gauthier. "Emotion Recognition in Football Commentator Speech : Is the action intense or not ?" Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-289370.

Tinnemore, Anna, and Anna Tinnemore. "Improving Understanding of Emotional Speech Acoustic Content." Diss., The University of Arizona, 2017. http://hdl.handle.net/10150/625368.

Bhullar, Naureen. "Effects of Facial and Vocal Emotion on Word Recognition in 11-to-13-month-old infants." Diss., Virginia Tech, 2007. http://hdl.handle.net/10919/27502.

Nguyen, Tien Dung. "Multimodal emotion recognition using deep learning techniques." Thesis, Queensland University of Technology, 2020. https://eprints.qut.edu.au/180753/1/Tien%20Dung_Nguyen_Thesis.pdf.

Siddiqui, Mohammad Faridul Haque. "A Multi-modal Emotion Recognition Framework Through The Fusion Of Speech With Visible And Infrared Images." University of Toledo / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1556459232937498.

Acosta, Jaime Cesar. "Using emotion to gain rapport in a spoken dialog system." To access this resource online via ProQuest Dissertations and Theses @ UTEP, 2009. http://0-proquest.umi.com.lib.utep.edu/login?COPT=REJTPTU0YmImSU5UPTAmVkVSPTI=&clientId=2515.

Pon-Barry, Heather Roberta. "Inferring Speaker Affect in Spoken Natural Language Communication." Thesis, Harvard University, 2012. http://dissertations.umi.com/gsas.harvard:10710.

Deng, Jun [Verfasser], Björn W. [Akademischer Betreuer] [Gutachter] Schuller, and Werner [Gutachter] Hemmert. "Feature Transfer Learning for Speech Emotion Recognition / Jun Deng. Betreuer: Björn W. Schuller. Gutachter: Björn W. Schuller ; Werner Hemmert." München : Universitätsbibliothek der TU München, 2016. http://d-nb.info/1106382331/34.

Iriya, Rafael. "Análise de sinais de voz para reconhecimento de emoções." Universidade de São Paulo, 2014. http://www.teses.usp.br/teses/disponiveis/3/3142/tde-14042015-160249/.

Chandrapati, Srivardhan. "Multi-modal expression recognition." Thesis, Manhattan, Kan. : Kansas State University, 2008. http://hdl.handle.net/2097/762.

KHALIFA, INTISSAR. "Deep psychology recognition based on automatic analysis of non-verbal behaviors." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2021. http://hdl.handle.net/10281/314920.

Guerrero, Razuri Javier Francisco. "Decisional-Emotional Support System for a Synthetic Agent : Influence of Emotions in Decision-Making Toward the Participation of Automata in Society." Doctoral thesis, Stockholms universitet, Institutionen för data- och systemvetenskap, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-122084.

At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 8: Accepted.

Žukas, Gediminas. "Kalbos emocijų požymių tyrimas." Master's thesis, Lithuanian Academic Libraries Network (LABT), 2014. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2014~D_20140617_133242-89394.

Vlasenko, Andrej. "Studentų emocinės būklės testavimo metu tyrimas panauduojant biometrines technologijas." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2012. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2012~D_20120329_153219-37955.

Zhu, Winstead Xingran. "Hotspot Detection for Automatic Podcast Trailer Generation." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-444887.

Navrátil, Michal. "Rozpoznávání emočních stavů pomocí analýzy řečového signálu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2008. http://www.nusl.cz/ntk/nusl-217263.

Pfeifer, Leon. "Automatické rozpoznávání emočních stavů člověka na základě analýzy řečového projevu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2008. http://www.nusl.cz/ntk/nusl-217520.

Shaukat, Arslan. "Automatic Emotional State Analysis and Recognition from Speech Signals." Thesis, University of Manchester, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.511910.

Atassi, Hicham. "Rozpoznání emočního stavu z hrané a spontánní řeči." Doctoral thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2014. http://www.nusl.cz/ntk/nusl-233665.

Hansson, Svan Angus, and Carl Mannerstråle. "Prediktion av användaromdömen om språkcafé-samtal baserat på automatisk röstanalys." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-261639.

Planet, García Santiago. "Reconeixement afectiu automàtic mitjançant l'anàlisi de paràmetres acústics i lingüístics de la parla espontània." Doctoral thesis, Universitat Ramon Llull, 2013. http://hdl.handle.net/10803/125335.

Ferro, Adelino Rafael Mendes. "Speech emotion recognition through statistical classification." Master's thesis, 2017. http://hdl.handle.net/10400.14/22817.

"Optimization techniques for speech emotion recognition." Universitat Pompeu Fabra, 2009. http://www.tesisenxarxa.net/TDX-0113110-133822/.

Yeh, Jun-Heng, and 葉俊亨. "Emotion Recognition from Mandarin Speech Signals." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/2f4evr.

Chiou, Bo-Chang, and 邱柏菖. "Cross-Lingual Automatic Speech Emotion Recognition." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/23736438894309347506.

SHEN, MENG-JHEN, and 沈孟蓁. "Research on Speech Emotion Recognition Systems." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/j5m53v.

CHENG, KUAN-JUNG, and 程冠融. "Cross-Lingual Speech Emotion Recognition Based on Speech Recognition Technology in An Emotional Speech Database in Mandarin, Taiwanese, and Hakka." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/6c4m2x.

Wang, Chun-Ming, and 王俊明. "Speech Emotion Recognition using 2D texture features." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/6y77cs.

Su, Yu-Che, and 蘇于哲. "Emotion Recognition based on Chinese Speech Signals." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/54453655022687274699.

Li, Pei-jia, and 李珮嘉. "Emotion Recognition from Continuous Mandarin Speech Signal." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/79746323884839442339.

Bakhshi, Ali. "Speech emotion recognition using deep neural networks." Thesis, 2021. http://hdl.handle.net/1959.13/1430839.

Yeh, Lan-Ying, and 葉藍霙. "Spectro-Temporal Modulations for Robust Speech Emotion Recognition." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/56883607879338166423.

Wu, Chien-Feng, and 吳鑑峰. "Bimodal Emotion Recognition from Speech and Facial Expression." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/g8tuye.

"Emotion Recognition and Traumatic Brain Injury." Master's thesis, 2011. http://hdl.handle.net/2286/R.I.9087.

Huang, Ching-Hsiu, and 黃慶修. "Emotion recognition of spontaneous speech using mutiple-instance learning." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/21389130030838831132.

Hsu, Jin-Huai, and 許晉懷. "Bimodal Emotion Recognition System Using Image and Speech Information." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/75013414815783421468.

Chen, Chia-ying, and 陳嘉穎. "Speech Emotion Recognition Using Factor Analysis and Identity Vectors." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/14016869692695939636.

Yang, Bo-Cheng, and 楊博丞. "Adversarial Feature Augmentation for Cross-corpus Speech Emotion Recognition." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/hrvxvp.

Manamela, Phuti John. "The automatic recognition of emotions in speech." Thesis, 2020. http://hdl.handle.net/10386/3347.

Lin, Ching-yi, and 林靜宜. "A Study on Identifying the Most Effective Speech Features for Speech Emotion Recognition." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/77453967950782385836.

Vogt, Thurid [Verfasser]. "Real-time automatic emotion recognition from speech / von Thurid Vogt." 2010. http://d-nb.info/1010495038/34.

Skip to Content

Spring 2024 Recognition Ceremony Program

Congratulations to the electrical and computer engineering Class of 2024!

And a special thank you to all of the friends and family who have supported our graduates during their time at CU Boulder. 

Table of Contents 

  Today's Ceremony

  Awards and Honors  

  PhD Candidates

  Master's Candidates  

Master of Science with Thesis

Master of science.

  • Master of Engineering

  Bachelor's Graduates  

  • Electrical and Computer Engineering
  • ​ Electrical Engineering

Today's Ceremony

Processional

Welcoming Remarks Professor Chris Myers, Department Chair and Palmer Leadership Chair in Electrical, Computer & Energy Engineering

Acknowledgement of Award Winners

Holland Teaching Award Faculty Keynote Assistant Professor Joshua Combes​

Undergraduate Student Address Jasleen Batra

Master's Student Address Gabriel Altman

PhD Student Address Neeraj Prakash

Presentation of the PhD Candidates  Professor Sean Shaheen, Associate Chair for Research and Graduate Education

Presentation of Master’s Candidates Professor Sean Shaheen, Associate Chair for Research and Graduate Education

Presentation of BS Candidates Associate Teaching Professor Mona ElHelbawy, Associate Chair for Undergraduate Education

Closing Remarks Professor Chris Myers

Reception Please join us to continue celebrating our graduates. Light refreshments will be provided. 

Awards and Honors 

  • Best Thesis: Jieqiu Shao
  • Excellence in Graduate Research: Lukas Buecherel
  • Excellence in Graduate Teaching: Aravind Venkitasubramony
  • ​Undergraduate Academic Engagement: Kahlid Shahba and Zane McMorris
  • Undergraduate Community Impact Award: Jasleen Batra
  • Undergraduate Perseverance Award: Bruno Armas and Insar Magadeev
  • Outstanding ECEE Undergraduate: Olivia Egbert and Suhana Zeitzius

Doctor of Philosophy Candidates

Conrad Corbella Bagot Advised by Professor Won Park  Dissertation to be defended in Summer 2024

Paige Danielson  Advised by Professor Zoya Popovic Dissertation to be defended in Summer 2024

James William Hurtt Advised by Professor Kyri Baker Dissertation: "On the Techno-Economic Merits and Challenges of Clean Hybrid Energy Systems in Contemporary Power Systems" 

Connor Nogales Advised by Professor Gregor Lasser  Dissertation: "Broadband Supply Modulated PAs for Efficient and Linear Transmit Arrays"

Neeraj Prakash Advised by Professor Shu-Wei Huang  Dissertation: "High-Energy Single-Cavity Fiber Dual-Comb Source"  

Anthony Romano Advised by Professor Zoya Popovic Dissertation: "Monolithic Integration of Millimeter Wave Circuits in Advanced GaN Processes"

Jieqiu Shao Advised by Professor Marco Nicotra Dissertation: "Quantum Optimal Control and its Applications to Shaken Lattice Interferometry and Superconducting Qubits"

Terrence Skibik Advised by Professor Marco Nicotra Dissertation: "Advancements in Model Predictive Control for Real-Time Applications" 

Dong-Chan Son Advised by Professor Dejan Filipovic  Dissertation to be defended in Summer 2024 

Timothy Sonnenberg Advised by Professor Zoya Popovic Dissertation: "GaN MMICs for Millimeter-Wave Front Ends" 

Jack Wampler Advised by Professor Eric Wustrow Dissertation: "Opt Out at Your Own Expense - Designing Systems for Adversarial Contexts" 

Songyi Yen Advised by Professor Dejan Filipovic Dissertation: "Unconventional Arrays for HF and Other Applications" 

Master's Candidates 

Gabriel Altman Advised by Professor Dejan Filipovic Dissertation to be defended in Summer 2024 

Sai Abhishek Aravind Advised by Professor Marco Nicotra Dissertation: "Influence of Discretization on Hypersampled Model Predictive Control"

  • Lauren Teresa Baker 
  • Suraj Ajjampur 
  • Chris Thomas Alexander 
  • Tasneem Alnajdi 
  • Gabriel Altman 
  • Akshith Aluguri 
  • Nileshkartik Ashokkumar 
  • Timothy Bailey 
  • Donggeun Bak 
  • Rylee Beach 
  • Harsh Beriwal 
  • Khalid Mohamed Abdelgalil Bakhit 
  • Vishwanath Bhavikatti 
  • Devang Boradhara 
  • Alexander Bork 
  • Naman Buch 
  • Isha Burange 
  • Aamir Suhail Burhan 
  • Ruthvik Rangaiah Chanda 
  • Chandinee Chandrasekaran 
  • Rajesh Chittiappa 
  • Hyoun J. Cho 
  • Padmakshi Dahal 
  • Tyler Davidson 
  • Sauranil Debarshi 
  • Aneesh Sadanand Deshpande 
  • Varsha Dewangan 
  • Paras Dhameliya 
  • Kshitija Ramesh Dhondage 
  • Jichao Fang 
  • Harinarayanan Gajapathy 
  • Joshua Galeno 
  • Avirup Gupta 
  • Avirup Kumar Gupta 
  • Angel Manuel Hernandez Ortega 
  • Ranjith Janardhana 
  • Ayswariya Kannan 
  • Sricharan Kidambi 
  • Rakshit Kulkarni 
  • Lalit Kumar 
  • Abhinav Kumar 
  • Ankit Kumar 
  • Anuhya Kuraparthy 
  • Sylvia Llosa 
  • Spandana Mahendra 
  • Erick Mancera 
  • Kanin James McGuire 
  • Colin Bruce McRae 
  • Daniel Mendez 
  • Nicole Danisha Milligan 
  • Rylan Moore 
  • Amey Chandrakant More 
  • Sayali Sanjay Mule 
  • Aditi Vijay Nanaware 
  • Vidhya Palaniappan 
  • Vaishnavi Sudhakar Patekar 
  • Divyesh Shashikant Patel 
  • Viraj Gopal Patel 
  • Akash Patil 
  • Mihir Jivan Patil 
  • Aakash Pednekar 
  • May An Ying van de Poll 
  • Karthik Baggaon Rajendra 
  • Chirayu Rajpurohit 
  • Ritika Ramchandani 
  • Thomas Ramirez 
  • Lexie Roberts 
  • Jessica Roosz 
  • Satish Kumar Sankella 
  • Cija Sathishkumar 
  • Arun Kumar Sesha 
  • Saquib Yasir Shaikh 
  • Chinmay Venkatesh Shalawadi 
  • Daanish Mohammed Shariff 
  • Isha Sharma 
  • Gregory James Southards 
  • Malola Simman Srinivasan Kannan 
  • Mangala Sneha Srinivasan 
  • Rajesh Srirangam 
  • Swapnil Alkesh Trivedi 
  • Vignesh Vadivel 
  • Robert Enright Van Trees 
  • Swathi Venkatachalam 
  • Mrunal Ankush Yadav
  • Omkar Abhay Yeole

Master of Engineering 

  • Francis Xavier Bergh 
  • Ashwin Ravindra 
  • Abhishek Limaye 
  • Viveka Salinamakki 

Bachelor of Science Graduates

Bachelor of science, electrical and computer engineering.

  • Ahmed Adam 
  • Yusef Jamal Al-Balushi 
  • Saud Almuzaiel 
  • Bruno Armas 
  • Abhinav Avula 
  • Jasleen Batra 
  • William Boenning – Cum Laude 
  • John Cates 
  • Chandana Challa 
  • Richard Chuang 
  • Nicholas Alexander Cisne 
  • Kailer Hawk Driscoll 
  • Sullivan Fleming 
  • Aidan Francis Hanlon Fitton 
  • Timothy Houck 
  • Daniel Juhwan Lee 
  • Peter William Magro 
  • Louis Marfone 
  • Frank McDermott 
  • Weston Carroll McEvoy – Magna Cum Laude 
  • Zane McMorris 
  • Caden McVey 
  • Dominic Fawzi Menassa 
  • Sarah Mesgina 
  • Daniel Orthel 
  • Madelyn Polly – Summa Cum Laude 
  • Guillermo Alexander Rivas Calles 
  • Samuel Robertson 
  • Ginn Sato – Summa Cum Laude 
  • Connor Smith 
  • Aidan St. Cyr 
  • Taylor Stevenson 
  • Taylore Todd 
  • Anton Manuel Vandenberge 
  • Alexander Joseph Walker – Summa Cum Laude 
  • William White 
  • Suhana Zeutzius – Summa Cum Laude 

​Bachelor of Science, Electrical Engineering

  • Ali Karam Ali 
  • Nasser Taleb Allanqawi 
  • Meshal Alosaimi 
  • Michelle Amankwah 
  • Erika Antúnez  
  • Andrew Aramians 
  • Joshua Thomas Bay – Cum Laude 
  • Katherine Christiansen 
  • Michael Takuya Driscoll – Magna Cum Laude 
  • Olivia Egbert – Cum Laude  
  • Travis Fahrney 
  • Luke Hanley – Cum Laude  
  • Nicholas Haratsaris  
  • Luke Jeseritz – Summa Cum Laude 
  • Ryan McCallan 
  • Oscar Omar Medina-Salazar 
  • Tucker Mothersell 
  • Matthew Joel Pollard – Cum Laude 
  • Stewart Patrick Rojec – Magna Cum Laude 
  • Khalid Shahba – Summa Cum Laude 
  • Nathan Sharp 
  • Danny Ming Sit 
  • Timothy Henry Tomerlin 
  • Robert B Traxler 

Apply   Visit   Give

Departments

  • Ann and H.J. Smead Aerospace Engineering Sciences
  • Chemical & Biological Engineering
  • Civil, Environmental & Architectural Engineering
  • Computer Science
  • Electrical, Computer & Energy Engineering
  • Paul M. Rady Mechanical Engineering
  • Applied Mathematics
  • Biomedical Engineering
  • Creative Technology & Design
  • Engineering Education
  • Engineering Management
  • Engineering Physics
  • Environmental Engineering
  • Integrated Design Engineering
  • Materials Science & Engineering

Affiliates & Partners

  • ATLAS Institute
  • BOLD Center
  • Colorado Mesa University
  • Colorado Space Grant Consortium
  • Discovery Learning
  • Engineering Honors
  • Engineering Leadership
  • Entrepreneurship
  • Herbst Program for Engineering, Ethics & Society
  • Integrated Teaching and Learning
  • Global Engineering
  • Mortenson Center for Global Engineering
  • National Center for Women & Information Technology
  • Western Colorado University

Microsoft bans US police departments from using enterprise AI tool for facial recognition

dissertation on speech recognition

Microsoft has reaffirmed its ban on U.S. police departments from using generative AI for facial recognition through Azure OpenAI Service , the company’s fully managed, enterprise-focused wrapper around OpenAI tech.

Language added Wednesday to the terms of service for Azure OpenAI Service more clearly prohibits integrations with Azure OpenAI Service from being used “by or for” police departments for facial recognition in the U.S., including integrations with OpenAI’s current — and possibly future — image-analyzing models.

A separate new bullet point covers “any law enforcement globally,” and explicitly bars the use of “real-time facial recognition technology” on mobile cameras, like body cameras and dashcams, to attempt to identify a person in “uncontrolled, in-the-wild” environments.

The changes in policy come a week after Axon, a maker of tech and weapons products for military and law enforcement, announced a new product that leverages OpenAI’s GPT-4 generative text model to summarize audio from body cameras. Critics were quick to point out the potential pitfalls, like hallucinations (even the best generative AI models today invent facts) and racial biases introduced from the training data (which is especially concerning given that people of color are far more likely to be stopped by police than their white peers).

It’s unclear whether Axon was using GPT-4 via Azure OpenAI Service, and, if so, whether the updated policy was in response to Axon’s product launch. OpenAI had previously restricted the use of its models for facial recognition through its APIs. We’ve reached out to Axon, Microsoft and OpenAI and will update this post if we hear back.

The new terms leave wiggle room for Microsoft.

The complete ban on Azure OpenAI Service usage pertains only to U.S. , not international, police. And it doesn’t cover facial recognition performed with stationary cameras in controlled environments, like a back office (although the terms prohibit any use of facial recognition by U.S. police).

That tracks with Microsoft’s and close partner OpenAI’s recent approach to AI-related law enforcement and defense contracts.

In January, reporting by Bloomberg revealed that OpenAI is working with the Pentagon on a number of projects including cybersecurity capabilities — a departure from the startup’s earlier ban on providing its AI to militaries. Elsewhere, Microsoft has pitched using OpenAI’s image generation tool, DALL-E, to help the Department of Defense (DoD) build software to execute military operations, per The Intercept.

Azure OpenAI Service became available in Microsoft’s Azure Government product in February, adding additional compliance and management features geared toward government agencies, including law enforcement. In a blog post , Candice Ling, SVP of Microsoft’s government-focused division Microsoft Federal, pledged that Azure OpenAI Service would be “submitted for additional authorization” to the DoD for workloads supporting DoD missions.

Update: After publication, Microsoft said its original change to the terms of service contained an error, and in fact the ban applies only to facial recognition in the U.S. It is not a blanket ban on police departments using the service. 

IMAGES

  1. (PDF) Use of Speech Recognition in Computer-assisted Language Learning

    dissertation on speech recognition

  2. (PDF) A systematic review of speech recognition technology in health care

    dissertation on speech recognition

  3. (PDF) Review on Speech Recognition System for Indian Languages

    dissertation on speech recognition

  4. (PDF) PHONETIC EVENT-BASED WHOLE-WORD MODELING …...PHONETIC EVENT-BASED

    dissertation on speech recognition

  5. (PDF) An Overview on Speech Recognition System and Comparative Study of

    dissertation on speech recognition

  6. An Introduction To Speech Recognition

    dissertation on speech recognition

VIDEO

  1. Sound Capture and Speech Enhancement for Communication and Distant Speech Recognition

  2. Real-Time Speech Enhancement

  3. Automatic Speech Recognition: An Overview

  4. Autism Spectrum Disorder Prediction Using a Convolutional Neural Network CNN fMRI data python code

  5. Fall2022-SpeechRecognition&Understanding (Lecture4

  6. ASR / speech-to-text with Whisper at Stanford Libraries. P Leonard

COMMENTS

  1. PDF Deep Neural Networks in Speech Recognition a Dissertation Submitted to

    This thesis comes from a close collaboration with my advisor, Andrew Ng. Andrew has been an amazing mentor in the process of planning and solving research problems, and constantly encouraged me to work on challenging, impactful problems. Much of the work on speech recognition in this thesis comes from close collab-oration with Dan Jurafsky.

  2. PDF Semi-supervised Training for Automatic Speech Recognition

    AUTOMATIC SPEECH RECOGNITION by Vimal Manohar A dissertation submitted to The Johns Hopkins University in conformity with the requirements for the degree of ... In the second part of this thesis, we investigate using lattice-based supervi-sion as numerator graph to incorporate uncertainties in unsupervised data in

  3. PDF X-vectors: Robust Neural Embeddings for Speaker Recognition

    Speaker recognition is the task of identifying speakers based on their speech signal. Typically, this involves comparing speech from a known speaker, with recordings from unknown speakers, and making same-or-different speaker decisions. If the lexical contents of the recordings are fixed to some phrase,

  4. PDF Deep learning approaches to problems in speech recognition

    distributed representations of their input. This dissertation demonstrates the e cacy and generality of this approach in a series of diverse case studies in speech recognition, computational chemistry, and natural language processing. Throughout these studies, I extend and modify the neural network models as needed to be more e ective for each ...

  5. PDF Towards Robust Conversational Speech Recognition and Understanding

    great deal of expertise and developed a deep passion in speech recognition from knowing nothing about it. He has been and will always be a role model for my career. Thanks to the professors who spend their precious time to serve on my dissertation committee: Prof. Chin-Hui Lee, Prof. Mark Clements, Prof. Elliot Moore II and Prof. Yajun Mei.

  6. PDF Deep Learning for Distant Speech Recognition

    ligibility of a speech signal, making Distant Speech Recognition (DSR) one of the major open challenges in the eld. This thesis addresses the latter scenario and proposes some novel tech-niques, architectures, and algorithms to improve the robustness of distant-talking acoustic models. We rst elaborate on methodologies for realistic

  7. PDF Model-based Approaches to Robust Speech Recognition in Diverse Environments

    Many speech recognition applications will bene t from distant-talking speech capture. This avoids problems caused by using hand-held or body-worn equipment. However, due to the large speaker-to-microphone distance, both background noise and reverberant noise will signi cantly corrupt speech signals and negatively impact speech recognition ...

  8. PDF Deep Learning Approaches for Automatic Sung Speech Recognition

    techniques have revolutionised spoken speech recognition systems through advances in both acoustic modelling and audio source separation. This thesis evaluates whether these new techniques can be adapted to work for sung speech recognition. For this, it first presents an analysis of the differences between spoken and sung speech.

  9. PDF Multi-Modal and Deep Learning for Robust Speech Recognition

    Automatic speech recognition (ASR) decodes speech signals into text. While ASR can pro-duce accurate word recognition in clean environments, system performance can degrade dramatically when noise and reverberation are present. In this thesis, speech denoising and model adaptation for robust speech recognition were studied, and four novel meth-

  10. Automatic Dialect and Accent Recognition and its Application to Speech

    variant of a given language. This dissertation focuses on automatically identifying the di-alect or accent of a speaker given a sample of their speech, and demonstrates how such a technology can be employed to improve Automatic Speech Recognition (ASR). In this thesis, we describe a variety of approaches that make use of multiple streams of

  11. End-to-End Speech Recognition Models

    The end-to-end model jointly learns all the traditional components of a speech recognition system: the pronunciation model, acoustic model and language model. Our model can directly emit English/Chinese characters or even word pieces given the audio signal. There is no need for explicit phonetic representations, intermediate heuristic loss ...

  12. [PDF] End-to-End Speech Recognition Models

    This thesis proposes a novel approach to ASR with neural attention models and demonstrates the end-to-end speech recognition model, which can directly emit English/Chinese characters or even word pieces given the audio signal. For the past few decades, the bane of Automatic Speech Recognition (ASR) systems have been phonemes and Hidden Markov Models (HMMs). HMMs assume conditional independence ...

  13. PDF Speech Recognition using Neural Networks

    This thesis examines how artificial neural networks can benefit a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modeling. Despite their state-of-the-art ...

  14. Automatic Speech Recognition for Low-Resource and Morphologically

    target language. As such, the motivation of this thesis is to expand upon the current deep learning implementations of low-resource speech recognition [8, 9, 10] through a study against small and diverse ASR training corpora. The nal result will be a pipeline capable of accepting an under-resourced language, determining the appro-

  15. Novel NLP Methods for Improved Text-To-Speech Synthesis

    These methods are also useful for automatic speech recognition (ASR) and dialogue systems. In my dissertation, I cover three different tasks: Grapheme-to-phoneme Conversion (G2P), Text ...

  16. Automatic Dialect and Accent Recognition and its Application to Speech

    This dissertation focuses on automatically identifying the dialect or accent of a speaker given a sample of their speech, and demonstrates how such a technology can be employed to improve Automatic Speech Recognition (ASR). In this thesis, we describe a variety of approaches that make use of multiple streams of information in the acoustic ...

  17. Dissertations / Theses: 'Speech recognition'

    This thesis describes a speech recognition system that was built to support spontaneous speech understanding. The system is composed of (1) a front end acoustic analyzer which computes Mel-frequency cepstral coefficients, (2) acoustic models of context-dependent phonemes (triphones), (3) a back-off bigram statistical language model, and (4) a ...

  18. PDF Acoustical and Environmental Robustness in Automatic Speech Recognition

    This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment. These algorithms attempt to improve the recognition accuracy of speech recognition systems when they are trained and tested in different acoustical

  19. Speech Recognition Using Connectionist Networks Dissertation Proposal

    That speech recognition and understanding is an important problem will be taken for granted. The extent to which computer speech recognition would change (improve) many aspects of work and life is certainly of great magnitude. Acoustic phonetic recognition is a well-defined and substantial subproblem of speech recognition.

  20. (PDF) speech recognition and application

    ABSTRACT. In this thesis, speech recognition systems are developed. The se applications are medium-sized, discrete and individual-. dependent systems. In these sy stems, training and testing ...

  21. EfficientASR: Speech Recognition Network Compression via Attention

    In recent years, Transformer networks have shown remarkable performance in speech recognition tasks. However, their deployment poses challenges due to high computational and storage resource requirements. To address this issue, a lightweight model called EfficientASR is proposed in this paper, aiming to enhance the versatility of Transformer models. EfficientASR employs two primary modules ...

  22. PDF Continuous speech recognition for people with dysarthria

    cus on dysarthric speech recognition research has not moved from isolated word to more challenging connected speech scenarios yet. There is a clear need to improve continuous dysarthric speech recognition. This thesis is the first to systematically investigate various methods for con-tinuous dysarthric speech recognition.

  23. PDF Recognition and support of children with speech, language and

    The thesis considers how early years and primary teachers support children with speech, language and communication needs and how mentors might then support student primary teachers in advancing their understanding of language development. The study discusses how, in the process, teachers might draw on and interact with their own knowledge, the

  24. Machine listening: Making speech recognition systems more inclusive

    Machine listening: Making speech recognition systems more inclusive. ScienceDaily . Retrieved May 1, 2024 from www.sciencedaily.com / releases / 2024 / 04 / 240430131852.htm

  25. Dissertations / Theses: 'Speech emotion recognition'

    In this thesis, speech signals have mainly been used for emotion recognition, as speech signals are the simplest means of communicating between humans and are a rich source of emotional information. Hence, the first speech emotion recognition architecture was designed based on a hierarchical classifier that used Cepstral coefficients based on ...

  26. Clinical Associate Professor Chantee Earl

    Scholars Recognition; CEHD Honors Day; Ed.S. and Ed.D. Graduate Recognition 2024; Faculty Awards Recognition 2024; CEHD Staff Council. Staff Resources; Resources For Part-Time Instructors; Office of Research & Sponsored Projects. Intent to Submit an Internal Georgia State Grant; Intent to Submit an External Grant or Contract Proposal; Emeriti ...

  27. Spring 2024 Recognition Ceremony Program

    Spring 2024 Recognition Ceremony Program . Congratulations to the electrical and computer engineering Class of 2024! And a special thank you to all of the friends and family who have supported our graduates during their time at CU Boulder. Table of Contents ... Dissertation: "Opt Out at Your Own Expense - Designing Systems for Adversarial ...

  28. Microsoft bans US police departments from using enterprise AI tool for

    Microsoft has reaffirmed its ban on U.S. police departments from using generative AI for facial recognition through Azure OpenAI Service, the company's fully managed, enterprise-focused wrapper ...