I am presently working for Samsung Research as a speech AI researcher.
My main research topic has recently been personalized and zero-shot on-device TTS systems.
Previously, I worked at NCSOFT, a game company, mainly
studying expressive TTS and prosody controllable TTS systems.
My academic background includes a BS in Electrical and Electronic Engineering from Yonsei University
and I earned my MS in Electrical Engineering from KAIST
in BREIL lab, advised by Daeshik Kim.
I am interested in speech synthesis and speech representation learing of prosody and spaeker identity.
Currently, I am widening my interest to combining speech synthesis with techniques from various fields,
such as spontaneous speech-to-speech, multimodal generation, video dubbing, etc.
On-device TTS System in various languages for Galaxy S24's Live Translation Mar 2023 - Jan 2024 (@Samsung Research)
I contributed to the research and development of an on-device TTS system in eight different languages,
which is included as a Live Translation feature and introduced as a main AI feature in the Galaxy S24.
My contribution involved enhancing the model architecture and achieving a high-quality TTS system that supports various languages
with a reduced model size.
On-device Personalized TTS System for Bixby Custom Voice Creation May 2022 - Jan 2024 (@Samsung Research)
I contributed to the research and development of an on-device personalized TTS system, which was integrated into Samsung
Galaxy Bixby's Custom Voice Creation and utilized within Bixby Text-call functionality. This system can create a
personalized TTS system by fine-tuning the TTS directly on the user’s device with just 10 utterances.
Fine-grained Prosody Control of TTS System (prototype web service) Mar 2021 - Apr 2022 (@NCSOFT)
I conducted research and developed a TTS system that is capable of controlling the prosody of speech in a fine-grained level.
With this system, users were able to modify the speech to have desired prosody. This system is released as an in-company web service and was widely used to make an guide videos of NCSOFT's game.
TTS System for K-pop Fandom Platform, “UNIVERSE” (live service) Mar 2019 - Apr 2022 (@NCSOFT)
I contributed to the research and development of a multi-speaker TTS system replicating the voices of numerous K-pop artists, approximately 100 in total, within a single TTS system.
This TTS system was used in "UNIVERSE" service, which is a K-pop fan community platform.
TTS System in Baseball Broadcast Scenario Mar 2019 - Mar 2021 (@NCSOFT)
I researched and developed an expressive TTS system that can generate speech with dynamic expressions suitable for diverse baseball situations. I published several demos on NCSOFT’s official blog and news articles.
Kindly recommand to click this project, and see the demo videos.
Latent Filling: Latent Space Data Augmentation for Zero-shot Speech Synthesis
Jae-Sung Bae, Joun Yeop Lee, Ji-Hyun Lee, Seongkyu Mun, Taehwa Kang, Hoon-Young Cho, Chanwoo Kim arXiv preprint arXiv: 2310.03538, 2023. Accepted to ICASSP 2024 [paper][demo]
MELS-TTS : Multi-Emotion Multi-Lingual Multi-Speaker Text-to-Speech System via Disentangled Style Tokens
Heejin Choi, Jae-Sung Bae, Joun Yeop Lee, Seongkyu Mun, Jihwan Lee, Hoon-Young Cho, Chanwoo Kim Accepted to ICASSP 2024
Hierarchical Timbre-Cadence Speaker Encoder for Zero-shot Speech Synthesis
Joun Yeop Lee, Jae-Sung Bae, Seongkyu Mun, Jihwan Lee, Ji-Hyun Lee, Hoon-Young Cho, Chanwoo Kim
In Proc. INTERSPEECH, 2023. [paper][demo]
Avocodo: Generative Adversarial Network for Artifact-free Vocoder
Taejun Bak, Junmo Lee, Hanbin Bae, Jinhyeok Yang, Jae-Sung Bae, Young-Sun Joo
In Proc. AAAI, 2023. [paper][demo][code]
2022
Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech
Jae-Sung Bae, Jinhyeok Yang, Tae-Jun Bak, Young-Sun Joo
In Proc. INTERSPEECH, 2022. [paper][demo][video]
Into-TTS : Intonation Template Based Prosody Control System
Jihwan Lee, Joun Yeop Lee, Heejin Choi, Seongkyu Mun, Sangjun Park, Jae-Sung Bae, Chanwoo Kim arXiv preprint arXiv:2204.01271, 2022. [paper][demo]
2021
Hierarchical Context-Aware Transformers for Non-Autoregressive Text to Speech
Jae-Sung Bae, Tae-Jun Bak, Young-Sun Joo, and Hoon-Young Cho
In Proc. INTERSPEECH, 2021. [paper][demo]
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis
Jinhyeok Yang*, Jae-Sung Bae*, Taejun Bak, Youngik Kim, and Hoon-Young Cho
In Proc. INTERSPEECH, 2021. [paper][demo]
FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis
Taejun Bak, Jae-Sung Bae, Hanbin Bae, Young-Ik Kim, and Hoon-Young Cho
In Proc. INTERSPEECH, 2021. [paper][demo]
A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music
Hanbin Bae, Jae-Sung Bae, Young-Sun Joo, Young-Ik Kim, and Hoon-Young Cho
In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2021. [paper][demo]
2020
Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning
Jae-Sung Bae, Hanbin Bae, Young-Sun Joo, Junmo Lee, Gyeong-Hoon Lee, Hoon-Young Cho
In Proc. INTERSPEECH, 2020. [paper][demo][video]
2019
End-Point Detection with State Transition Model based on Chunk-Wise Classification
Juntae Kim*, Jaesung Bae*, Minsoo Hahn arXiv preprint arXiv:1912.10442, 2019. [paper]
Phase-Aware Speech Enhancement with a Recurrent Two Stage Network
Juntae Kim, and Jae-Sung Bae arXiv preprint arXiv:2001.09772 2019. [paper]
2018
End-to-End Speech Command Recognition with Capsule Network
Jae-Sung Bae, Dae-Shik Kim
In Proc. INTERSPEECH, 2018. [paper]