Jaesung Bae

jb82 [at] illinois [dot] edu

About


I am a PhD student in the Computer Science (CS) Department at the University of Illinois Urbana-Champaign, advised by Prof. Minje Kim and Prof. Paris Smaragdis.
   Previously I worked as a speech AI researcher at Samsung Research . My main research topics incldue personalized and zero-shot on-device TTS systems, and I’m proud to have contributed to the TTS systems integrated in the Galaxy S24. Before that, I worked at NCSOFT, a game company, where I primarily studied expressive TTS and prosody controllable TTS systems. I earned my MS in Electrical Engineering from KAIST, where I was advised by Daeshik Kim in the BREIL lab, and my BS in Electrical and Electronic Engineering from Yonsei University.
   I am interested in speech synthesis and speech representation learing of prosody and spaeker identity. Currently, I am expanding my interests to generative models for data augmentation, multi-modal AI, and other areas in speech processing.

   Below shows my projects, publications, invited talks, and academic services. Please refere to my CV for further details.

Publications


*: Equal Contribution
2024

Latent Filling: Latent Space Data Augmentation for Zero-shot Speech Synthesis
Jae-Sung Bae, Joun Yeop Lee, Ji-Hyun Lee, Seongkyu Mun, Taehwa Kang, Hoon-Young Cho, Chanwoo Kim
In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2024.
[paper] [demo]

MELS-TTS : Multi-Emotion Multi-Lingual Multi-Speaker Text-to-Speech System via Disentangled Style Tokens
Heejin Choi, Jae-Sung Bae, Joun Yeop Lee, Seongkyu Mun, Jihwan Lee, Hoon-Young Cho, Chanwoo Kim
In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2024.


2023

Hierarchical Timbre-Cadence Speaker Encoder for Zero-shot Speech Synthesis
Joun Yeop Lee, Jae-Sung Bae, Seongkyu Mun, Jihwan Lee, Ji-Hyun Lee, Hoon-Young Cho, Chanwoo Kim
In Proc. INTERSPEECH, 2023.
[paper] [demo]

Avocodo: Generative Adversarial Network for Artifact-free Vocoder
Taejun Bak, Junmo Lee, Hanbin Bae, Jinhyeok Yang, Jae-Sung Bae, Young-Sun Joo
In Proc. AAAI, 2023.
[paper] [demo] [code]


2022

Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech
Jae-Sung Bae, Jinhyeok Yang, Tae-Jun Bak, Young-Sun Joo
In Proc. INTERSPEECH, 2022.
[paper] [demo] [video]

Into-TTS : Intonation Template Based Prosody Control System
Jihwan Lee, Joun Yeop Lee, Heejin Choi, Seongkyu Mun, Sangjun Park, Jae-Sung Bae, Chanwoo Kim
arXiv preprint arXiv:2204.01271, 2022.
[paper] [demo]


2021

Hierarchical Context-Aware Transformers for Non-Autoregressive Text to Speech
Jae-Sung Bae, Tae-Jun Bak, Young-Sun Joo, and Hoon-Young Cho
In Proc. INTERSPEECH, 2021.
[paper] [demo]

GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis
Jinhyeok Yang*, Jae-Sung Bae*, Taejun Bak, Youngik Kim, and Hoon-Young Cho
In Proc. INTERSPEECH, 2021.
[paper] [demo]

FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis
Taejun Bak, Jae-Sung Bae, Hanbin Bae, Young-Ik Kim, and Hoon-Young Cho
In Proc. INTERSPEECH, 2021.
[paper] [demo]

A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music
Hanbin Bae, Jae-Sung Bae, Young-Sun Joo, Young-Ik Kim, and Hoon-Young Cho
In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2021.
[paper] [demo]


2020

Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning
Jae-Sung Bae, Hanbin Bae, Young-Sun Joo, Junmo Lee, Gyeong-Hoon Lee, Hoon-Young Cho
In Proc. INTERSPEECH, 2020.
[paper] [demo] [video]


2018

End-to-End Speech Command Recognition with Capsule Network
Jae-Sung Bae, Dae-Shik Kim
In Proc. INTERSPEECH, 2018.
[paper]

Projects


You can click each project and check demos and further information.

On-device TTS System in various languages for Galaxy S24's Live Translation
Mar 2023 - Jan 2024 (@Samsung Research)

I contributed to the research and development of an on-device TTS system in eight different languages, which is included as a Live Translation feature and introduced as a main AI feature in the Galaxy S24. My contribution involved enhancing the model architecture and achieving a high-quality TTS system that supports various languages with a reduced model size.


On-device Personalized TTS System for Bixby Custom Voice Creation
May 2022 - Jan 2024 (@Samsung Research)

I contributed to the research and development of an on-device personalized TTS system, which was integrated into Samsung Galaxy Bixby's Custom Voice Creation and utilized within Bixby Text-call functionality. This system can create a personalized TTS system by fine-tuning the TTS directly on the user’s device with just 10 utterances.


Fine-grained Prosody Control of TTS System (prototype web service)
Mar 2021 - Apr 2022 (@NCSOFT)

I conducted research and developed a TTS system that is capable of controlling the prosody of speech in a fine-grained level. With this system, users were able to modify the speech to have desired prosody. This system is released as an in-company web service and was widely used to make an guide videos of NCSOFT's game.


TTS System for K-pop Fandom Platform, “UNIVERSE” (live service)
Mar 2019 - Apr 2022 (@NCSOFT)

I contributed to the research and development of a multi-speaker TTS system replicating the voices of numerous K-pop artists, approximately 100 in total, within a single TTS system. This TTS system was used in "UNIVERSE" service, which is a K-pop fan community platform.


TTS System in Baseball Broadcast Scenario
Mar 2019 - Mar 2021 (@NCSOFT)

I researched and developed an expressive TTS system that can generate speech with dynamic expressions suitable for diverse baseball situations. I published several demos on NCSOFT’s official blog and news articles. Kindly recommand to click this project, and see the demo videos.

Invited Talks


End-to-End Speech Command Recognition with Capsule Network
NAVER Corp., Seong-Nam, Republic of Korea
Sep 2018

Academic Services


Reviewer: AAAI 2025