IEEE MIPR 2019
San Jose, California, USA
March 28-30, 2019
The panel will review present latest research being conducted in academia and industry in the area of NLP with Audio. Specifically, the panel will cover diverse topics in NLP related to conversational speech recognition and performance with deep learning, audio event detection including isolating live spoken words from words spoken from devices to prevent false triggers or mitigate threat, emotion & behavior analysis using unimodal and multimodal (both camera and microphone) signals, speech synthesis and voice conversion, speech rate modeling and augmentation, and intelligent agents for natural language conversation.
Sunil Bharitkar received his Ph.D in Electrical Eng. (minor in Math) from the University of Southern California (USC) in 2004 and is presently the Distinguished Member of Technical Staff at HP Labs. He is involved in research in neural networks, signal processing, speech/audio analysis, bioinformatics (signal proc and deep learning for cancer detection), & machine learning. From 2011-2016 he was the Director of Audio Technology at Dolby leading/guiding research in audio, signal processing, haptics, machine learning, hearing augmentation, &standardization activities at ITU, SMPTE, AES. He co-founded the company Audyssey Labs in 2002 where he was VP/Research responsible for inventing new technologies which were licensed to companies including IMAX, Denon, Audi, Sharp, etc. He also taught in the Dept. of Electrical Engineering at USC. Sunil has published over 50 peer-reviewed conf./journal papers and has over 20 patents in the area of signal processing, acoustics, neural networks, pattern recognition, and a textbook (Immersive Audio Signal Processing) from Springer-Verlag. He is a reviewer for various IEEE journals, Journal of the Acoustical Society of America, EURASIP, and the Journal of the Audio Eng. Soc. He has also been on the Organizing and Technical Program Committees of various conferences such as the 2008 and 2009 European Sig. Proc. Conference (EUSIPCO), the 57th AES Conference, SMPTE. He has also served as an invited tutorial speaker at the 2006 IEEE Conf. on Acoustics Speech and Signal Processing (ICASSP). Sunil is also a recipient of a Best Paper Award at the 2003 37th IEEE Asilomar Conference on Signals, Systems, and Computers, and the Dept. of EE (USC) TA Award for DSP. He is a Senior Member of the IEEE, member of the ILB of the IEEE Systems, Man, & Cybernetics Society, the Acoustical Soc. of America (ASA), European Association for Signal and Image Processing (EURASIP), and the Audio Eng. Soc. (AES). Sunil is a PADI diver & plays the Didgeridoo.
Phil Hilmes is Director of Audio Technology for Amazon Lab126, where he has been since 2012. He helped to create the first Alexa Echo device there and manages the team of scientists, software developers, and hardware developers delivering audio technologies for Amazon’s products including all Echos, tablets, Fire TVs, and more to enable far-field speech recognition, voice communication, playback, and audio event detection. Previously, he was the VP of Engineering and Cofounder of Audyssey Laboratories where he delivered room correction and audio enhancement technologies for consumer, professional, and automotive audio products. He has also held positions with DIRECTV delivering digital audio and communications solutions. He earned his BS in Engineering from Harvey Mudd College and his graduate degree in Electrical Engineering, focused on signal processing and communications, from the University of Southern California. He is a member of IEEE and AES and holds numerous publications and patents to his credit.
Vivek Kumar is Director, Applied AI at Dolby Laboratories where he leads a team of researchers and developers focused on creating technology based on Deep Learning for speech/NLP and Audio. He is also responsible for understanding the implications of recent advances and developing an AI strategy for Dolby. The team he previously led was responsible for the development of Dolby Atmos for Home, the next-generation audio and surround sound technologies for which Dolby is widely recognized. Vivek is a lifelong maker, and his compulsion to “build, break and hack” is a trait that extends beyond his workday. He also provides mentoring and strategy development for startups and has invested in several early-stage ventures.
Panayiotis G. Georgiou received the B.A. and M.Eng. degrees (with Honors) from Cambridge University (Pembroke College), Cambridge, U.K., in 1996, where he was a Cambridge-Commonwealth Scholar, and the M.Sc. and Ph.D. degrees from the University of Southern California (USC), Los Angeles, in 1998 and 2002, respectively. Since 2003, he has been a member of the Signal Analysis and Interpretation Lab at USC, where he is currently an Assistant Professor. His interests span the fields of multimodal and behavioral signal processing and speech to speech translation. He has worked on and published over 100 papers in the fields of behavioral signal processing, statistical signal processing, alpha stable distributions, speech and multimodal signal processing and interfaces, speech translation, language modeling, immersive sound processing, sound source localization, and speaker identification. He is a senior member of IEEE. He has been a PI and co-PI on federally funded projects notably including the DARPA Transtac "SpeechLinks"; the NSF "An Integrated Approach to Creating Enriched Speech Translation Systems"; and "Quantitative Observational Practice in Family Studies: The case of reactivity." and DoD "Technologies for assessing suicide risk." He is currently on the editorial board of the EURASIP Journal on Audio, Speech, and Music Processing, a guest editor of the Computer Speech And Language journal, the Technical Chair for Interspeech 2016, and a member of the Speech and Language Technical Committee. His current focus is on behavioral signal processing, multimodal environments, and speech-to-speech translation.
Kyu Jeong Han received his PhD in Electrical Engineering from the University of Southern California and is currently a Senior Staff Scientist at JD AI Research, focusing on deep learning technologies for various domains including automatic speech recognition (ASR) and natural language processing (NLP). During his more-than-a-decade experience in researching and developing speech technologies from industry biggies to early stage Silicon Valley startups, Dr. Han held research positions at IBM T. J. Watson Research Center and Ford Research and Innovation Center, and also led Capio's R&D team as a Principal Scientist to achieve the world best conversational speech recognition results on the industry standard Switchboard dataset in 2017-2018. He is actively involved in the speech community service as a reviewer for a number of IEEE, ISCA or Elsevier journals and conferences. In addition, he is a Technical Committee member in the Speech and Language Processing committee of the IEEE Signal Processing Society. Last year he received the ISCA Award for the Best Paper Published in Computer Speech & Language 2013-2017.