Posted by Catherine Armato, Program Manager, Google

This week, the 24th Annual Conference of the International Speech Communication Association (INTERSPEECH 2023) is being held in Dublin, Ireland, representing one of the world’s most extensive conferences on research and technology of spoken language understanding and processing. Experts in speech-related research fields gather to take part in oral presentations and poster sessions and to build collaborations across the globe.

We are excited to be a Platinum Sponsor of INTERSPEECH 2023, where we will be showcasing more than 20 research publications and supporting a number of workshops and special sessions. We welcome in-person attendees to drop by the Google Research booth to meet our researchers and participate in Q&As and demonstrations of some of our latest speech technologies, which help to improve accessibility and provide convenience in communication for billions of users. In addition, online attendees are encouraged to visit our virtual booth in Topia where you can get up-to-date information on research and opportunities at Google. Visit the @GoogleAI Twitter account to find out about Google booth activities (e.g., demos and Q&A sessions). You can also learn more about the Google research being presented at INTERSPEECH 2023 below (Google affiliations in bold).

Board and Organizing Committee

ISCA Board, Technical Committee Chair: Bhuvana Ramabhadran

Area Chairs include: Analysis of Speech and Audio Signals: Richard Rose Speech Synthesis and Spoken Language Generation: Rob Clark Special Areas: Tara Sainath

Satellite events

VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC-23)Organizers include: Arsha Nagrani

ISCA Speech Synthesis Workshop (SSW12)Speakers include: Rob Clark

Keynote talk – ISCA Medalist

Bridging Speech Science and Technology — Now and Into the Future
Speaker: Shrikanth Narayanan

Survey Talk

Speech Compression in the AI EraSpeaker: Jan Skoglund

Special session papers

Cascaded Encoders for Fine-Tuning ASR Models on Overlapped SpeechRichard Rose, Oscar Chang, Olivier Siohan

TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and RecognitionHakan Erdogan, Scott Wisdom, Xuankai Chang*, Zal?n Borsos, Marco Tagliasacchi, Neil Zeghidour, John R. Hershey

Papers

DeePMOS: Deep Posterior Mean-Opinion-Score of SpeechXinyu Liang, Fredrik Cumlin, Christian Sch?ldt, Saikat Chatterjee

O-1: Self-Training with Oracle and 1-Best HypothesisMurali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Kartik Audhkhasi

Re-investigating the Efficient Transfer Learning of Speech Foundation Model Using Feature Fusion MethodsZhouyuan Huo, Khe Chai Sim, Dongseong Hwang, Tsendsuren Munkhdalai, Tara N. Sainath, Pedro Moreno

MOS vs. AB: Evaluating Text-to-Speech Systems Reliably Using Clustered Standard ErrorsJoshua Camp, Tom Kenter, Lev Finkelstein, Rob Clark

LanSER: Language-Model Supported Speech Emotion RecognitionTaesik Gong, Josh Belanich, Krishna Somandepalli, Arsha Nagrani, Brian Eoff, Brendan Jou

Modular Domain Adaptation for Conformer-Based Streaming ASRQiujia Li, Bo Li, Dongseong Hwang, Tara N. Sainath, Pedro M. Mengibar

On Training a Neural Residual Acoustic Echo Suppressor for Improved ASRSankaran Panchapagesan, Turaj Zakizadeh Shabestary, Arun Narayanan

MD3: The Multi-dialect Dataset of DialoguesJacob Eisenstein, Vinodkumar Prabhakaran, Clara Rivera, Dorottya Demszky, Devyani Sharma

Dual-Mode NAM: Effective Top-K Context Injection for End-to-End ASRZelin Wu, Tsendsuren Munkhdalai, Pat Rondon, Golan Pundak, Khe Chai Sim, Christopher Li

Using Text Injection to Improve Recognition of Personal Identifiers in SpeechYochai Blau, Rohan Agrawal, Lior Madmony, Gary Wang, Andrew Rosenberg, Zhehuai Chen, Zorik Gekhman, Genady Beryozkin, Parisa Haghani, Bhuvana Ramabhadran

How to Estimate Model Transferability of Pre-trained Speech Models?Zih-Ching Chen, Chao-Han Huck Yang*, Bo Li, Yu Zhang, Nanxin Chen, Shuo-yiin Chang, Rohit Prabhavalkar, Hung-yi Lee, Tara N. Sainath

Improving Joint Speech-Text Representations Without AlignmentCal Peyser, Zhong Meng, Ke Hu, Rohit Prabhavalkar, Andrew Rosenberg, Tara N. Sainath, Michael Picheny, Kyunghyun Cho

Text Injection for Capitalization and Turn-Taking Prediction in Speech ModelsShaan Bijwadia, Shuo-yiin Chang, Weiran Wang, Zhong Meng, Hao Zhang, Tara N. Sainath

Streaming Parrotron for On-Device Speech-to-Speech ConversionOleg Rybakov, Fadi Biadsy, Xia Zhang, Liyang Jiang, Phoenix Meadowlark, Shivani Agrawal

Semantic Segmentation with Bidirectional Language Models Improves Long-Form ASRW. Ronny Huang, Hao Zhang, Shankar Kumar, Shuo-yiin Chang, Tara N. Sainath

Universal Automatic Phonetic Transcription into the International Phonetic AlphabetChihiro Taguchi, Yusuke Sakai, Parisa Haghani, David Chiang

Mixture-of-Expert Conformer for Streaming Multilingual ASRKe Hu, Bo Li, Tara N. Sainath, Yu Zhang, Francoise Beaufays

Real Time Spectrogram Inversion on Mobile PhoneOleg Rybakov, Marco Tagliasacchi, Yunpeng Li, Liyang Jiang, Xia Zhang, Fadi Biadsy

2-Bit Conformer Quantization for Automatic Speech RecognitionOleg Rybakov, Phoenix Meadowlark, Shaojin Ding, David Qiu, Jian Li, David Rim, Yanzhang He

LibriTTS-R: A Restored Multi-speaker Text-to-Speech CorpusYuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Michiel Bacchiani, Yu Zhang, Wei Han, Ankur Bapna

PronScribe: Highly Accurate Multimodal Phonemic Transcription from Speech and TextYang Yu, Matthew Perez*, Ankur Bapna, Fadi Haik, Siamak Tazari, Yu Zhang

Label Aware Speech Representation Learning for Language IdentificationShikhar Vashishth, Shikhar Bharadwaj, Sriram Ganapathy, Ankur Bapna, Min Ma, Wei Han, Vera Axelrod, Partha Talukdar

* Work done while at Google

Post Disclaimer

The information provided in our posts or blogs are for educational and informative purposes only. We do not guarantee the accuracy, completeness or suitability of the information. We do not provide financial or investment advice. Readers should always seek professional advice before making any financial or investment decisions based on the information provided in our content. We will not be held responsible for any losses, damages or consequences that may arise from relying on the information provided in our content.

Google at Interspeech 2023

Board and Organizing Committee

Satellite events

Keynote talk – ISCA Medalist

Survey Talk

Special session papers

Papers

Post Disclaimer

Intel and NVIDIA Shift Gears: From Rivalry to Collaboration in Custom Chips

Oracle and AMD Ramp Up AI Power: A 50,000‑GPU Leap

OpenAI’s Billion-Dollar Deals with AMD and NVIDIA Redefine the Future of AI Infrastructure

Top GPU and AI Accelerator Industry Leaders in 2025 Driving the Future of AI

AI & Machine Learning: From Buzzwords to Boardroom Blueprints

Explainable AI Trends 2025: Boosting Transparency and Trust in Artificial Intelligence

Revolutionary AI Agent Technology for 2025

Most Popular

Top Gen AI Trends Transforming Supply Chain Operations 2025

IBM Sterling Supply Chain Management – Are they on top of the Elite?

The Rise of Supply Chain as a Service (SCaaS): Unlocking Efficiency and Resilience in Modern Businesses

The Future of Supply Chain Management: 2025–2026 Tech Trends to Watch

Recent Comments

EDITOR PICKS

When AI Meets Cybersecurity: The Digital Arms Race We All Signed Up For

Navigating the Web 3.0: A Guide to Harnessing Its Power in 2024

Payment Processing in 2025: The Fast, The Secure, and The Smart

POPULAR POSTS

FTC Providing Refunds to Consumers who Lost Money to Tech Support Scheme

Warehouse Productivity Guide for 2023

Acting Comptroller Issues Statement at Financial Stability Oversight Council Meeting on Nonbank Designations Guidance

POPULAR CATEGORY

ABOUT TECH ONLINE NEWS

FOLLOW US