Jaesung Bae

jb82 [at] illinois [dot] edu

About

I am a PhD student in the Computer Science (CS) Department at the University of Illinois Urbana-Champaign, advised by Prof. Minje Kim and Prof. Paris Smaragdis.
Previously I worked as a speech AI researcher at Samsung Research . My main research topics incldue personalized and zero-shot on-device TTS systems, and I’m proud to have contributed to the TTS systems integrated in the Galaxy S24. Before that, I worked at NCSOFT, a game company, where I primarily studied expressive TTS and prosody controllable TTS systems. I earned my MS in Electrical Engineering from KAIST, where I was advised by Prof. Daeshik Kim in the BREIL lab, and my BS in Electrical and Electronic Engineering from Yonsei University.
I am interested in speech synthesis and speech representation learing of prosody and spaeker identity. Currently, I am expanding my interests to generative models for data augmentation, multi-modal AI, and other areas in speech processing.

Below shows my projects, publications, invited talks, and academic services. Please refere to my CV for further details.

News

(Dec 2024) I am organizing the "ICASSP 2025 Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement." Looking forward to your participation! [link]
(Aug 2024) Starting my PhD at the University of Illinois Urbana-Champaign (UIUC).
(Dec 2023) Two papers have been accepted to ICASSP 2024! (one first author, one second author)

Publications

*: Equal Contribution

Go to top

2025

Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement
Jae-Sung Bae, Anastasia Kuznetsova, Dinesh Manocha, John Hershey, Trausti Kristjansson, and Minje Kim
In Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing Workshops (ICASSPW): Generative Data Augmentation for Real-World Signal Processing Applications (GenDA 2025), 2025.
[paper] [code] [website]

2024

Latent Filling: Latent Space Data Augmentation for Zero-shot Speech Synthesis
Jae-Sung Bae, Joun Yeop Lee, Ji-Hyun Lee, Seongkyu Mun, Taehwa Kang, Hoon-Young Cho, Chanwoo Kim
In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2024.
[paper] [demo]

MELS-TTS : Multi-Emotion Multi-Lingual Multi-Speaker Text-to-Speech System via Disentangled Style Tokens
Heejin Choi, Jae-Sung Bae, Joun Yeop Lee, Seongkyu Mun, Jihwan Lee, Hoon-Young Cho, Chanwoo Kim
In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2024.

2023

Hierarchical Timbre-Cadence Speaker Encoder for Zero-shot Speech Synthesis
Joun Yeop Lee, Jae-Sung Bae, Seongkyu Mun, Jihwan Lee, Ji-Hyun Lee, Hoon-Young Cho, Chanwoo Kim
In Proc. INTERSPEECH, 2023.
[paper] [demo]

Avocodo: Generative Adversarial Network for Artifact-free Vocoder
Taejun Bak, Junmo Lee, Hanbin Bae, Jinhyeok Yang, Jae-Sung Bae, Young-Sun Joo
In Proc. AAAI, 2023.
[paper] [demo] [code]

2022

Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech
Jae-Sung Bae, Jinhyeok Yang, Tae-Jun Bak, Young-Sun Joo
In Proc. INTERSPEECH, 2022.
[paper] [demo] [video]

Into-TTS : Intonation Template Based Prosody Control System
Jihwan Lee, Joun Yeop Lee, Heejin Choi, Seongkyu Mun, Sangjun Park, Jae-Sung Bae, Chanwoo Kim
arXiv preprint arXiv:2204.01271, 2022.
[paper] [demo]

2021

Hierarchical Context-Aware Transformers for Non-Autoregressive Text to Speech
Jae-Sung Bae, Tae-Jun Bak, Young-Sun Joo, and Hoon-Young Cho
In Proc. INTERSPEECH, 2021.
[paper] [demo]

GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis
Jinhyeok Yang, Jae-Sung Bae, Taejun Bak, Youngik Kim, and Hoon-Young Cho
In Proc. INTERSPEECH, 2021.
[paper] [demo]

FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis
Taejun Bak, Jae-Sung Bae, Hanbin Bae, Young-Ik Kim, and Hoon-Young Cho
In Proc. INTERSPEECH, 2021.
[paper] [demo]

A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music
Hanbin Bae, Jae-Sung Bae, Young-Sun Joo, Young-Ik Kim, and Hoon-Young Cho
In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2021.
[paper] [demo]

2020

Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning
Jae-Sung Bae, Hanbin Bae, Young-Sun Joo, Junmo Lee, Gyeong-Hoon Lee, Hoon-Young Cho
In Proc. INTERSPEECH, 2020.
[paper] [demo] [video]

2018

End-to-End Speech Command Recognition with Capsule Network
Jae-Sung Bae, Dae-Shik Kim
In Proc. INTERSPEECH, 2018.
[paper]

Projects

You can click each project and check demos and further information.

Go to top

On-device TTS System in various languages for Galaxy S24's Live Translation
Mar 2023 - Jan 2024 (@Samsung Research)

I contributed to the research and development of an on-device TTS system in eight different languages, which is included as a Live Translation feature and introduced as a main AI feature in the Galaxy S24. My contribution involved enhancing the model architecture and achieving a high-quality TTS system that supports various languages with a reduced model size.

On-device Personalized TTS System for Bixby Custom Voice Creation
May 2022 - Jan 2024 (@Samsung Research)

I contributed to the research and development of an on-device personalized TTS system, which was integrated into Samsung Galaxy Bixby's Custom Voice Creation and utilized within Bixby Text-call functionality. This system can create a personalized TTS system by fine-tuning the TTS directly on the user’s device with just 10 utterances.

Fine-grained Prosody Control of TTS System (prototype web service)
Mar 2021 - Apr 2022 (@NCSOFT)

I conducted research and developed a TTS system that is capable of controlling the prosody of speech in a fine-grained level. With this system, users were able to modify the speech to have desired prosody. This system is released as an in-company web service and was widely used to make an guide videos of NCSOFT's game.

TTS System for K-pop Fandom Platform, “UNIVERSE” (live service)
Mar 2019 - Apr 2022 (@NCSOFT)

I contributed to the research and development of a multi-speaker TTS system replicating the voices of numerous K-pop artists, approximately 100 in total, within a single TTS system. This TTS system was used in "UNIVERSE" service, which is a K-pop fan community platform.

TTS System in Baseball Broadcast Scenario
Mar 2019 - Mar 2021 (@NCSOFT)

I researched and developed an expressive TTS system that can generate speech with dynamic expressions suitable for diverse baseball situations. I published several demos on NCSOFT’s official blog and news articles. Kindly recommand to click this project, and see the demo videos.

Invited Talks

End-to-End Speech Command Recognition with Capsule Network
NAVER Corp., Seong-Nam, Republic of Korea
Sep 2018

Go to top

Academic Services

Challenge Organizer on ICASSP 2025 Generative Data Augmentation for Real-World Signal Processing Applications (GenDA 2025) Workshop: Zero-Shot Speech Synthesis for Personalized Speech Enhancement [link]
Reviewer: AAAI 2025

Jaesung Bae

About

News

Publications

*: Equal Contribution

Latent Filling: Latent Space Data Augmentation for Zero-shot Speech Synthesis Jae-Sung Bae, Joun Yeop Lee, Ji-Hyun Lee, Seongkyu Mun, Taehwa Kang, Hoon-Young Cho, Chanwoo Kim In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2024. [paper] [demo]

MELS-TTS : Multi-Emotion Multi-Lingual Multi-Speaker Text-to-Speech System via Disentangled Style Tokens Heejin Choi, Jae-Sung Bae, Joun Yeop Lee, Seongkyu Mun, Jihwan Lee, Hoon-Young Cho, Chanwoo Kim In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2024.

Hierarchical Timbre-Cadence Speaker Encoder for Zero-shot Speech Synthesis Joun Yeop Lee, Jae-Sung Bae, Seongkyu Mun, Jihwan Lee, Ji-Hyun Lee, Hoon-Young Cho, Chanwoo Kim In Proc. INTERSPEECH, 2023. [paper] [demo]

Avocodo: Generative Adversarial Network for Artifact-free Vocoder Taejun Bak, Junmo Lee, Hanbin Bae, Jinhyeok Yang, Jae-Sung Bae, Young-Sun Joo In Proc. AAAI, 2023. [paper] [demo] [code]

Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech Jae-Sung Bae, Jinhyeok Yang, Tae-Jun Bak, Young-Sun Joo In Proc. INTERSPEECH, 2022. [paper] [demo] [video]

Into-TTS : Intonation Template Based Prosody Control System Jihwan Lee, Joun Yeop Lee, Heejin Choi, Seongkyu Mun, Sangjun Park, Jae-Sung Bae, Chanwoo Kim arXiv preprint arXiv:2204.01271, 2022. [paper] [demo]

Hierarchical Context-Aware Transformers for Non-Autoregressive Text to Speech Jae-Sung Bae, Tae-Jun Bak, Young-Sun Joo, and Hoon-Young Cho In Proc. INTERSPEECH, 2021. [paper] [demo]

GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis Jinhyeok Yang*, Jae-Sung Bae*, Taejun Bak, Youngik Kim, and Hoon-Young Cho In Proc. INTERSPEECH, 2021. [paper] [demo]

FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis Taejun Bak, Jae-Sung Bae, Hanbin Bae, Young-Ik Kim, and Hoon-Young Cho In Proc. INTERSPEECH, 2021. [paper] [demo]

A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music Hanbin Bae, Jae-Sung Bae, Young-Sun Joo, Young-Ik Kim, and Hoon-Young Cho In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2021. [paper] [demo]

Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning Jae-Sung Bae, Hanbin Bae, Young-Sun Joo, Junmo Lee, Gyeong-Hoon Lee, Hoon-Young Cho In Proc. INTERSPEECH, 2020. [paper] [demo] [video]

End-to-End Speech Command Recognition with Capsule Network Jae-Sung Bae, Dae-Shik Kim In Proc. INTERSPEECH, 2018. [paper]

Projects

On-device TTS System in various languages for Galaxy S24's Live TranslationMar 2023 - Jan 2024 (@Samsung Research)

On-device Personalized TTS System for Bixby Custom Voice CreationMay 2022 - Jan 2024 (@Samsung Research)

Fine-grained Prosody Control of TTS System (prototype web service)Mar 2021 - Apr 2022 (@NCSOFT)

TTS System for K-pop Fandom Platform, “UNIVERSE” (live service)Mar 2019 - Apr 2022 (@NCSOFT)

TTS System in Baseball Broadcast ScenarioMar 2019 - Mar 2021 (@NCSOFT)

Invited Talks

End-to-End Speech Command Recognition with Capsule Network NAVER Corp., Seong-Nam, Republic of Korea Sep 2018

Academic Services

Latent Filling: Latent Space Data Augmentation for Zero-shot Speech Synthesis
Jae-Sung Bae, Joun Yeop Lee, Ji-Hyun Lee, Seongkyu Mun, Taehwa Kang, Hoon-Young Cho, Chanwoo Kim
In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2024.
[paper] [demo]

MELS-TTS : Multi-Emotion Multi-Lingual Multi-Speaker Text-to-Speech System via Disentangled Style Tokens
Heejin Choi, Jae-Sung Bae, Joun Yeop Lee, Seongkyu Mun, Jihwan Lee, Hoon-Young Cho, Chanwoo Kim
In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2024.

Hierarchical Timbre-Cadence Speaker Encoder for Zero-shot Speech Synthesis
Joun Yeop Lee, Jae-Sung Bae, Seongkyu Mun, Jihwan Lee, Ji-Hyun Lee, Hoon-Young Cho, Chanwoo Kim
In Proc. INTERSPEECH, 2023.
[paper] [demo]

Avocodo: Generative Adversarial Network for Artifact-free Vocoder
Taejun Bak, Junmo Lee, Hanbin Bae, Jinhyeok Yang, Jae-Sung Bae, Young-Sun Joo
In Proc. AAAI, 2023.
[paper] [demo] [code]

Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech
Jae-Sung Bae, Jinhyeok Yang, Tae-Jun Bak, Young-Sun Joo
In Proc. INTERSPEECH, 2022.
[paper] [demo] [video]

Into-TTS : Intonation Template Based Prosody Control System
Jihwan Lee, Joun Yeop Lee, Heejin Choi, Seongkyu Mun, Sangjun Park, Jae-Sung Bae, Chanwoo Kim
arXiv preprint arXiv:2204.01271, 2022.
[paper] [demo]

Hierarchical Context-Aware Transformers for Non-Autoregressive Text to Speech
Jae-Sung Bae, Tae-Jun Bak, Young-Sun Joo, and Hoon-Young Cho
In Proc. INTERSPEECH, 2021.
[paper] [demo]

GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis
Jinhyeok Yang, Jae-Sung Bae, Taejun Bak, Youngik Kim, and Hoon-Young Cho
In Proc. INTERSPEECH, 2021.
[paper] [demo]

FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis
Taejun Bak, Jae-Sung Bae, Hanbin Bae, Young-Ik Kim, and Hoon-Young Cho
In Proc. INTERSPEECH, 2021.
[paper] [demo]

A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music
Hanbin Bae, Jae-Sung Bae, Young-Sun Joo, Young-Ik Kim, and Hoon-Young Cho
In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2021.
[paper] [demo]

Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning
Jae-Sung Bae, Hanbin Bae, Young-Sun Joo, Junmo Lee, Gyeong-Hoon Lee, Hoon-Young Cho
In Proc. INTERSPEECH, 2020.
[paper] [demo] [video]

End-to-End Speech Command Recognition with Capsule Network
Jae-Sung Bae, Dae-Shik Kim
In Proc. INTERSPEECH, 2018.
[paper]

On-device TTS System in various languages for Galaxy S24's Live Translation
Mar 2023 - Jan 2024 (@Samsung Research)

On-device Personalized TTS System for Bixby Custom Voice Creation
May 2022 - Jan 2024 (@Samsung Research)

Fine-grained Prosody Control of TTS System (prototype web service)
Mar 2021 - Apr 2022 (@NCSOFT)

TTS System for K-pop Fandom Platform, “UNIVERSE” (live service)
Mar 2019 - Apr 2022 (@NCSOFT)

TTS System in Baseball Broadcast Scenario
Mar 2019 - Mar 2021 (@NCSOFT)

End-to-End Speech Command Recognition with Capsule Network
NAVER Corp., Seong-Nam, Republic of Korea
Sep 2018