Posted by Catherine Armato, Program Manager, Google
This week, the 24th Annual Conference of the International Speech Communication Association (INTERSPEECH 2023) is being held in Dublin, Ireland, representing one of the world’s most extensive conferences on research and technology of spoken language understanding and processing. Experts in speech-related research fields gather to take part in oral presentations and poster sessions and to build collaborations across the globe.
We are excited to be a Platinum Sponsor of INTERSPEECH 2023, where we will be showcasing more than 20 research publications and supporting a number of workshops and special sessions. We welcome in-person attendees to drop by the Google Research booth to meet our researchers and participate in Q&As and demonstrations of some of our latest speech technologies, which help to improve accessibility and provide convenience in communication for billions of users. In addition, online attendees are encouraged to visit our virtual booth in Topia where you can get up-to-date information on research and opportunities at Google. Visit the @GoogleAI Twitter account to find out about Google booth activities (e.g., demos and Q&A sessions). You can also learn more about the Google research being presented at INTERSPEECH 2023 below (Google affiliations in bold).
Board and Organizing Committee
ISCA Board, Technical Committee Chair: Bhuvana Ramabhadran
Area Chairs include: Analysis of Speech and Audio Signals: Richard Rose Speech Synthesis and Spoken Language Generation: Rob Clark Special Areas: Tara Sainath
Satellite events
VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC-23)Organizers include: Arsha Nagrani
ISCA Speech Synthesis Workshop (SSW12)Speakers include: Rob Clark
Keynote talk – ISCA Medalist
Bridging Speech Science and Technology — Now and Into the Future
Speaker: Shrikanth Narayanan
Survey Talk
Speech Compression in the AI EraSpeaker: Jan Skoglund
Special session papers
Cascaded Encoders for Fine-Tuning ASR Models on Overlapped SpeechRichard Rose, Oscar Chang, Olivier Siohan
TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and RecognitionHakan Erdogan, Scott Wisdom, Xuankai Chang*, Zal?n Borsos, Marco Tagliasacchi, Neil Zeghidour, John R. Hershey
Papers
DeePMOS: Deep Posterior Mean-Opinion-Score of SpeechXinyu Liang, Fredrik Cumlin, Christian Sch?ldt, Saikat Chatterjee
O-1: Self-Training with Oracle and 1-Best HypothesisMurali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Kartik Audhkhasi
Re-investigating the Efficient Transfer Learning of Speech Foundation Model Using Feature Fusion MethodsZhouyuan Huo, Khe Chai Sim, Dongseong Hwang, Tsendsuren Munkhdalai, Tara N. Sainath, Pedro Moreno
MOS vs. AB: Evaluating Text-to-Speech Systems Reliably Using Clustered Standard ErrorsJoshua Camp, Tom Kenter, Lev Finkelstein, Rob Clark
LanSER: Language-Model Supported Speech Emotion RecognitionTaesik Gong, Josh Belanich, Krishna Somandepalli, Arsha Nagrani, Brian Eoff, Brendan Jou
Modular Domain Adaptation for Conformer-Based Streaming ASRQiujia Li, Bo Li, Dongseong Hwang, Tara N. Sainath, Pedro M. Mengibar
On Training a Neural Residual Acoustic Echo Suppressor for Improved ASRSankaran Panchapagesan, Turaj Zakizadeh Shabestary, Arun Narayanan
MD3: The Multi-dialect Dataset of DialoguesJacob Eisenstein, Vinodkumar Prabhakaran, Clara Rivera, Dorottya Demszky, Devyani Sharma
Dual-Mode NAM: Effective Top-K Context Injection for End-to-End ASRZelin Wu, Tsendsuren Munkhdalai, Pat Rondon, Golan Pundak, Khe Chai Sim, Christopher Li
Using Text Injection to Improve Recognition of Personal Identifiers in SpeechYochai Blau, Rohan Agrawal, Lior Madmony, Gary Wang, Andrew Rosenberg, Zhehuai Chen, Zorik Gekhman, Genady Beryozkin, Parisa Haghani, Bhuvana Ramabhadran
How to Estimate Model Transferability of Pre-trained Speech Models?Zih-Ching Chen, Chao-Han Huck Yang*, Bo Li, Yu Zhang, Nanxin Chen, Shuo-yiin Chang, Rohit Prabhavalkar, Hung-yi Lee, Tara N. Sainath
Improving Joint Speech-Text Representations Without AlignmentCal Peyser, Zhong Meng, Ke Hu, Rohit Prabhavalkar, Andrew Rosenberg, Tara N. Sainath, Michael Picheny, Kyunghyun Cho
Text Injection for Capitalization and Turn-Taking Prediction in Speech ModelsShaan Bijwadia, Shuo-yiin Chang, Weiran Wang, Zhong Meng, Hao Zhang, Tara N. Sainath
Streaming Parrotron for On-Device Speech-to-Speech ConversionOleg Rybakov, Fadi Biadsy, Xia Zhang, Liyang Jiang, Phoenix Meadowlark, Shivani Agrawal
Semantic Segmentation with Bidirectional Language Models Improves Long-Form ASRW. Ronny Huang, Hao Zhang, Shankar Kumar, Shuo-yiin Chang, Tara N. Sainath
Universal Automatic Phonetic Transcription into the International Phonetic AlphabetChihiro Taguchi, Yusuke Sakai, Parisa Haghani, David Chiang
Mixture-of-Expert Conformer for Streaming Multilingual ASRKe Hu, Bo Li, Tara N. Sainath, Yu Zhang, Francoise Beaufays
Real Time Spectrogram Inversion on Mobile PhoneOleg Rybakov, Marco Tagliasacchi, Yunpeng Li, Liyang Jiang, Xia Zhang, Fadi Biadsy
2-Bit Conformer Quantization for Automatic Speech RecognitionOleg Rybakov, Phoenix Meadowlark, Shaojin Ding, David Qiu, Jian Li, David Rim, Yanzhang He
LibriTTS-R: A Restored Multi-speaker Text-to-Speech CorpusYuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Michiel Bacchiani, Yu Zhang, Wei Han, Ankur Bapna
PronScribe: Highly Accurate Multimodal Phonemic Transcription from Speech and TextYang Yu, Matthew Perez*, Ankur Bapna, Fadi Haik, Siamak Tazari, Yu Zhang
Label Aware Speech Representation Learning for Language IdentificationShikhar Vashishth, Shikhar Bharadwaj, Sriram Ganapathy, Ankur Bapna, Min Ma, Wei Han, Vera Axelrod, Partha Talukdar
* Work done while at Google
Post Disclaimer
The information provided in our posts or blogs are for educational and informative purposes only. We do not guarantee the accuracy, completeness or suitability of the information. We do not provide financial or investment advice. Readers should always seek professional advice before making any financial or investment decisions based on the information provided in our content. We will not be held responsible for any losses, damages or consequences that may arise from relying on the information provided in our content.