Jaesung Bae

bjs2279 [at] gmail [dot] com

About

I am presently working for Samsung Research as a speech AI researcher. My main research topic has recently been personalized and zero-shot on-device TTS systems. Previously, I worked at NCSOFT, a game company, mainly studying expressive TTS and prosody controllable TTS systems. My academic background includes a BS in Electrical and Electronic Engineering from Yonsei University and I earned my MS in Electrical Engineering from KAIST in BREIL lab, advised by Daeshik Kim. I am interested in speech synthesis and speech representation learing of prosody and spaeker identity. Currently, I am widening my interest to combining speech synthesis with techniques from various fields, such as spontaneous speech-to-speech, multimodal generation, video dubbing, etc.

Below shows my projects, publications, and invited talks. Please refere to my CV for further details.

News

(12/13/2023) Two papers have been accepted to ICASSP 2024! (one first author, one second author)

Projects

You can click each project and check demos and further information.

Go to top

On-device TTS System in various languages for Galaxy S24's Live Translation
Mar 2023 - Jan 2024 (@Samsung Research)

I contributed to the research and development of an on-device TTS system in eight different languages, which is included as a Live Translation feature and introduced as a main AI feature in the Galaxy S24. My contribution involved enhancing the model architecture and achieving a high-quality TTS system that supports various languages with a reduced model size.

On-device Personalized TTS System for Bixby Custom Voice Creation
May 2022 - Jan 2024 (@Samsung Research)

I contributed to the research and development of an on-device personalized TTS system, which was integrated into Samsung Galaxy Bixby's Custom Voice Creation and utilized within Bixby Text-call functionality. This system can create a personalized TTS system by fine-tuning the TTS directly on the user’s device with just 10 utterances.

Fine-grained Prosody Control of TTS System (prototype web service)
Mar 2021 - Apr 2022 (@NCSOFT)

I conducted research and developed a TTS system that is capable of controlling the prosody of speech in a fine-grained level. With this system, users were able to modify the speech to have desired prosody. This system is released as an in-company web service and was widely used to make an guide videos of NCSOFT's game.

TTS System for K-pop Fandom Platform, “UNIVERSE” (live service)
Mar 2019 - Apr 2022 (@NCSOFT)

I contributed to the research and development of a multi-speaker TTS system replicating the voices of numerous K-pop artists, approximately 100 in total, within a single TTS system. This TTS system was used in "UNIVERSE" service, which is a K-pop fan community platform.

TTS System in Baseball Broadcast Scenario
Mar 2019 - Mar 2021 (@NCSOFT)

I researched and developed an expressive TTS system that can generate speech with dynamic expressions suitable for diverse baseball situations. I published several demos on NCSOFT’s official blog and news articles. Kindly recommand to click this project, and see the demo videos.

Publications

*: Equal Contribution

Go to top

2023

Latent Filling: Latent Space Data Augmentation for Zero-shot Speech Synthesis
Jae-Sung Bae, Joun Yeop Lee, Ji-Hyun Lee, Seongkyu Mun, Taehwa Kang, Hoon-Young Cho, Chanwoo Kim
arXiv preprint arXiv: 2310.03538, 2023. Accepted to ICASSP 2024
[paper] [demo]

MELS-TTS : Multi-Emotion Multi-Lingual Multi-Speaker Text-to-Speech System via Disentangled Style Tokens
Heejin Choi, Jae-Sung Bae, Joun Yeop Lee, Seongkyu Mun, Jihwan Lee, Hoon-Young Cho, Chanwoo Kim
Accepted to ICASSP 2024

Hierarchical Timbre-Cadence Speaker Encoder for Zero-shot Speech Synthesis
Joun Yeop Lee, Jae-Sung Bae, Seongkyu Mun, Jihwan Lee, Ji-Hyun Lee, Hoon-Young Cho, Chanwoo Kim
In Proc. INTERSPEECH, 2023.
[paper] [demo]

Avocodo: Generative Adversarial Network for Artifact-free Vocoder
Taejun Bak, Junmo Lee, Hanbin Bae, Jinhyeok Yang, Jae-Sung Bae, Young-Sun Joo
In Proc. AAAI, 2023.
[paper] [demo] [code]

2022

Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech
Jae-Sung Bae, Jinhyeok Yang, Tae-Jun Bak, Young-Sun Joo
In Proc. INTERSPEECH, 2022.
[paper] [demo] [video]

Into-TTS : Intonation Template Based Prosody Control System
Jihwan Lee, Joun Yeop Lee, Heejin Choi, Seongkyu Mun, Sangjun Park, Jae-Sung Bae, Chanwoo Kim
arXiv preprint arXiv:2204.01271, 2022.
[paper] [demo]

2021

Hierarchical Context-Aware Transformers for Non-Autoregressive Text to Speech
Jae-Sung Bae, Tae-Jun Bak, Young-Sun Joo, and Hoon-Young Cho
In Proc. INTERSPEECH, 2021.
[paper] [demo]

GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis
Jinhyeok Yang, Jae-Sung Bae, Taejun Bak, Youngik Kim, and Hoon-Young Cho
In Proc. INTERSPEECH, 2021.
[paper] [demo]

FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis
Taejun Bak, Jae-Sung Bae, Hanbin Bae, Young-Ik Kim, and Hoon-Young Cho
In Proc. INTERSPEECH, 2021.
[paper] [demo]

A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music
Hanbin Bae, Jae-Sung Bae, Young-Sun Joo, Young-Ik Kim, and Hoon-Young Cho
In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2021.
[paper] [demo]

2020

Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning
Jae-Sung Bae, Hanbin Bae, Young-Sun Joo, Junmo Lee, Gyeong-Hoon Lee, Hoon-Young Cho
In Proc. INTERSPEECH, 2020.
[paper] [demo] [video]

2019

End-Point Detection with State Transition Model based on Chunk-Wise Classification
Juntae Kim, Jaesung Bae, Minsoo Hahn
arXiv preprint arXiv:1912.10442, 2019.
[paper]

Phase-Aware Speech Enhancement with a Recurrent Two Stage Network
Juntae Kim, and Jae-Sung Bae
arXiv preprint arXiv:2001.09772 2019.
[paper]

2018

End-to-End Speech Command Recognition with Capsule Network
Jae-Sung Bae, Dae-Shik Kim
In Proc. INTERSPEECH, 2018.
[paper]

Invited Talks

End-to-End Speech Command Recognition with Capsule Network
NAVER Corp., Seong-Nam, Republic of Korea
Sep 2018

Go to top

Jaesung Bae

About

News

Projects

On-device TTS System in various languages for Galaxy S24's Live TranslationMar 2023 - Jan 2024 (@Samsung Research)

On-device Personalized TTS System for Bixby Custom Voice CreationMay 2022 - Jan 2024 (@Samsung Research)

Fine-grained Prosody Control of TTS System (prototype web service)Mar 2021 - Apr 2022 (@NCSOFT)

TTS System for K-pop Fandom Platform, “UNIVERSE” (live service)Mar 2019 - Apr 2022 (@NCSOFT)

TTS System in Baseball Broadcast ScenarioMar 2019 - Mar 2021 (@NCSOFT)

Publications

*: Equal Contribution

Latent Filling: Latent Space Data Augmentation for Zero-shot Speech Synthesis Jae-Sung Bae, Joun Yeop Lee, Ji-Hyun Lee, Seongkyu Mun, Taehwa Kang, Hoon-Young Cho, Chanwoo Kim arXiv preprint arXiv: 2310.03538, 2023. Accepted to ICASSP 2024 [paper] [demo]

MELS-TTS : Multi-Emotion Multi-Lingual Multi-Speaker Text-to-Speech System via Disentangled Style Tokens Heejin Choi, Jae-Sung Bae, Joun Yeop Lee, Seongkyu Mun, Jihwan Lee, Hoon-Young Cho, Chanwoo Kim Accepted to ICASSP 2024

Hierarchical Timbre-Cadence Speaker Encoder for Zero-shot Speech Synthesis Joun Yeop Lee, Jae-Sung Bae, Seongkyu Mun, Jihwan Lee, Ji-Hyun Lee, Hoon-Young Cho, Chanwoo Kim In Proc. INTERSPEECH, 2023. [paper] [demo]

Avocodo: Generative Adversarial Network for Artifact-free Vocoder Taejun Bak, Junmo Lee, Hanbin Bae, Jinhyeok Yang, Jae-Sung Bae, Young-Sun Joo In Proc. AAAI, 2023. [paper] [demo] [code]

Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech Jae-Sung Bae, Jinhyeok Yang, Tae-Jun Bak, Young-Sun Joo In Proc. INTERSPEECH, 2022. [paper] [demo] [video]

Into-TTS : Intonation Template Based Prosody Control System Jihwan Lee, Joun Yeop Lee, Heejin Choi, Seongkyu Mun, Sangjun Park, Jae-Sung Bae, Chanwoo Kim arXiv preprint arXiv:2204.01271, 2022. [paper] [demo]

Hierarchical Context-Aware Transformers for Non-Autoregressive Text to Speech Jae-Sung Bae, Tae-Jun Bak, Young-Sun Joo, and Hoon-Young Cho In Proc. INTERSPEECH, 2021. [paper] [demo]

GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis Jinhyeok Yang*, Jae-Sung Bae*, Taejun Bak, Youngik Kim, and Hoon-Young Cho In Proc. INTERSPEECH, 2021. [paper] [demo]

FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis Taejun Bak, Jae-Sung Bae, Hanbin Bae, Young-Ik Kim, and Hoon-Young Cho In Proc. INTERSPEECH, 2021. [paper] [demo]

A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music Hanbin Bae, Jae-Sung Bae, Young-Sun Joo, Young-Ik Kim, and Hoon-Young Cho In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2021. [paper] [demo]

Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning Jae-Sung Bae, Hanbin Bae, Young-Sun Joo, Junmo Lee, Gyeong-Hoon Lee, Hoon-Young Cho In Proc. INTERSPEECH, 2020. [paper] [demo] [video]

End-Point Detection with State Transition Model based on Chunk-Wise Classification Juntae Kim*, Jaesung Bae*, Minsoo Hahn arXiv preprint arXiv:1912.10442, 2019. [paper]

Phase-Aware Speech Enhancement with a Recurrent Two Stage Network Juntae Kim, and Jae-Sung Bae arXiv preprint arXiv:2001.09772 2019. [paper]

End-to-End Speech Command Recognition with Capsule Network Jae-Sung Bae, Dae-Shik Kim In Proc. INTERSPEECH, 2018. [paper]

Invited Talks

End-to-End Speech Command Recognition with Capsule Network NAVER Corp., Seong-Nam, Republic of Korea Sep 2018

On-device TTS System in various languages for Galaxy S24's Live Translation
Mar 2023 - Jan 2024 (@Samsung Research)

On-device Personalized TTS System for Bixby Custom Voice Creation
May 2022 - Jan 2024 (@Samsung Research)

Fine-grained Prosody Control of TTS System (prototype web service)
Mar 2021 - Apr 2022 (@NCSOFT)

TTS System for K-pop Fandom Platform, “UNIVERSE” (live service)
Mar 2019 - Apr 2022 (@NCSOFT)

TTS System in Baseball Broadcast Scenario
Mar 2019 - Mar 2021 (@NCSOFT)

Latent Filling: Latent Space Data Augmentation for Zero-shot Speech Synthesis
Jae-Sung Bae, Joun Yeop Lee, Ji-Hyun Lee, Seongkyu Mun, Taehwa Kang, Hoon-Young Cho, Chanwoo Kim
arXiv preprint arXiv: 2310.03538, 2023. Accepted to ICASSP 2024
[paper] [demo]

MELS-TTS : Multi-Emotion Multi-Lingual Multi-Speaker Text-to-Speech System via Disentangled Style Tokens
Heejin Choi, Jae-Sung Bae, Joun Yeop Lee, Seongkyu Mun, Jihwan Lee, Hoon-Young Cho, Chanwoo Kim
Accepted to ICASSP 2024

Hierarchical Timbre-Cadence Speaker Encoder for Zero-shot Speech Synthesis
Joun Yeop Lee, Jae-Sung Bae, Seongkyu Mun, Jihwan Lee, Ji-Hyun Lee, Hoon-Young Cho, Chanwoo Kim
In Proc. INTERSPEECH, 2023.
[paper] [demo]

Avocodo: Generative Adversarial Network for Artifact-free Vocoder
Taejun Bak, Junmo Lee, Hanbin Bae, Jinhyeok Yang, Jae-Sung Bae, Young-Sun Joo
In Proc. AAAI, 2023.
[paper] [demo] [code]

Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech
Jae-Sung Bae, Jinhyeok Yang, Tae-Jun Bak, Young-Sun Joo
In Proc. INTERSPEECH, 2022.
[paper] [demo] [video]

Into-TTS : Intonation Template Based Prosody Control System
Jihwan Lee, Joun Yeop Lee, Heejin Choi, Seongkyu Mun, Sangjun Park, Jae-Sung Bae, Chanwoo Kim
arXiv preprint arXiv:2204.01271, 2022.
[paper] [demo]

Hierarchical Context-Aware Transformers for Non-Autoregressive Text to Speech
Jae-Sung Bae, Tae-Jun Bak, Young-Sun Joo, and Hoon-Young Cho
In Proc. INTERSPEECH, 2021.
[paper] [demo]

GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis
Jinhyeok Yang, Jae-Sung Bae, Taejun Bak, Youngik Kim, and Hoon-Young Cho
In Proc. INTERSPEECH, 2021.
[paper] [demo]

FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis
Taejun Bak, Jae-Sung Bae, Hanbin Bae, Young-Ik Kim, and Hoon-Young Cho
In Proc. INTERSPEECH, 2021.
[paper] [demo]

A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music
Hanbin Bae, Jae-Sung Bae, Young-Sun Joo, Young-Ik Kim, and Hoon-Young Cho
In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2021.
[paper] [demo]

Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning
Jae-Sung Bae, Hanbin Bae, Young-Sun Joo, Junmo Lee, Gyeong-Hoon Lee, Hoon-Young Cho
In Proc. INTERSPEECH, 2020.
[paper] [demo] [video]

End-Point Detection with State Transition Model based on Chunk-Wise Classification
Juntae Kim, Jaesung Bae, Minsoo Hahn
arXiv preprint arXiv:1912.10442, 2019.
[paper]

Phase-Aware Speech Enhancement with a Recurrent Two Stage Network
Juntae Kim, and Jae-Sung Bae
arXiv preprint arXiv:2001.09772 2019.
[paper]

End-to-End Speech Command Recognition with Capsule Network
Jae-Sung Bae, Dae-Shik Kim
In Proc. INTERSPEECH, 2018.
[paper]

End-to-End Speech Command Recognition with Capsule Network
NAVER Corp., Seong-Nam, Republic of Korea
Sep 2018