I am currently a 4th-year Ph.D. student at School of Computer Science and Engineering, Nanyang Technological University, supervised by Prof. Eng Siong Chng. Prior to this, I received my B.Eng. degree from University of Science and Technology of China in 2020.
My research topics are:
-
Large Language Models (LLMs): Generative Seq2seq Learning, Speech LLMs, Text-to-Speech Synthesis with RLHF;
-
Speech Processing: Efficient Adaptation of Speech Foundation Models, Speech Recognition / Translation / Enhancement;
-
Multimodal: Audio-Visual Representation Learning, Video-to-Audio Generation;
📖 Education
Nanyang Technological University (NTU), 08/2021 - 08/2025
- Ph.D. in Computer Science. Supervisor: Eng Siong Chng.
University of Science and Technology of China (USTC), 09/2016 - 06/2020
- B.Eng. in Automation. GPA: 3.76/4.3 (Rank: Top 5%) [Transcript]
📝 Publications
-
Preprint Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization, Yuchen Hu, Chen Chen, Siyin Wang, Eng Siong Chng, Chao Zhang. [Demo]
-
Preprint Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback, Chen Chen*, Yuchen Hu*, Wen Wu, Helin Wang, Eng Siong Chng, Chao Zhang. [Demo]
-
NeurIPS 2024 Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models, Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Chengwei Qin, Pin-Yu Chen, Eng Siong Chng, Chao Zhang. [Code]
-
ACL 2024(Oral) GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators, Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Dong Zhang, Zhehuai Chen, Eng Siong Chng. [Code] [Data]
-
ACL 2024 Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models, Yuchen Hu, Chen Chen, Chengwei Qin, Qiushi Zhu, Eng Siong Chng, Ruizhe Li.
-
ACL 2024 Overcoming Catastrophic Forgetting by Exemplar Selection in Task-oriented Dialogue System, Chen Chen, Ruizhe Li, Yuchen Hu, Yuanyuan Chen, Chengwei Qin, Qiang Zhang.
-
ICLR 2024(Spotlight, Top 5%) Large Language Models are Efficient Learners of Noise-Robust Speech Recognition, Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Chao Zhang, Pin-Yu Chen, Eng Siong Chng. [Code] [Data]
-
ICLR 2024 It’s Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition, Chen Chen, Ruizhe Li, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Eng Siong Chng, Chao-Han Huck Yang.
-
NeurIPS 2023 HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models, Chen Chen*, Yuchen Hu*, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Pin-Yu Chen, Eng Siong Chng. [Code] [Data]
-
AAAI 2024 Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-modal Speech Representation, Qiushi Zhu, Jie Zhang, Yu Gu, Yuchen Hu, Lirong Dai. [Code]
-
ICASSP 2024 Cross-Modality and Within-Modality Regularization for Audio-Visual DeepFake Detection, Heqing Zou, Meng Shen, Yuchen Hu, Chen Chen, Eng Siong Chng, Deepu Rajan. [Code]
-
ICASSP 2024 An Experimental Comparison of Noise-Robust Text-To-Speech Synthesis Systems Based On Self-Supervised Representation, Xiaoying Zhao, Qiushi Zhu, Yuchen Hu. [Demo]
-
ACL 2023(Oral) Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition, Yuchen Hu, Ruizhe Li, Chen Chen, Chengwei Qin, Qiushi Zhu, Eng Siong Chng. [Code]
-
ACL 2023(Oral) MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition, Yuchen Hu, Chen Chen, Ruizhe Li, Heqing Zou, Eng Siong Chng. [Code]
-
ACL 2023 UniS-MMC: Multimodal Classification via Unimodality-supervised Multimodal Contrastive Learning, Heqing Zou, Meng Shen, Chen Chen, Yuchen Hu, Deepu Rajan, Eng Siong Chng. [Code]
-
IJCAI 2023 Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition, Yuchen Hu, Ruizhe Li, Chen Chen, Heqing Zou, Qiushi Zhu, Eng Siong Chng. [Code]
-
AAAI 2023(Oral) Leveraging Modality-specific Representations for Audio-visual Speech Recognition via Reinforcement Learning, Chen Chen, Yuchen Hu, Qiang Zhang, Heqing Zou, Beier Zhu, Eng Siong Chng.
-
IEEE/ACM TASLP 2023 Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR, Yuchen Hu, Chen Chen, Qiushi Zhu, Eng Siong Chng.
-
InterSpeech 2024 Noise-aware Speech Enhancement using Diffusion Probabilistic Model, Yuchen Hu, Chen Chen, Ruizhe Li, Qiushi Zhu, Eng Siong Chng. [Code]
-
Preprint Rep2wav: Noise Robust Text-to-speech Using Self-supervised Representations, Qiushi Zhu, Yu Gu, Chao Weng, Yuchen Hu, Lirong Dai, Jie Zhang. [Demo]
-
InterSpeech 2023 Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition, Yuchen Hu, Nana Hou, Chen Chen, Eng Siong Chng. [Code]
-
InterSpeech 2023 A Neural State-space Model Approach to Efficient Speech Separation, Chen Chen, Chao-Han Huck Yang, Kai Li, Yuchen Hu, Pin-Jui Ku, Eng Siong Chng.
-
ICASSP 2023 Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation, Yuchen Hu, Chen Chen, Heqing Zou, Xionghu Zhong, Eng Siong Chng. [Code]
-
ICASSP 2023 Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition, Yuchen Hu, Chen Chen, Ruizhe Li, Qiushi Zhu, Eng Siong Chng. [Code]
-
ICASSP 2023 Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning, Qiushi Zhu, Long Zhou, Jie Zhang, Shujie Liu, Yuchen Hu, Lirong Dai. [Code]
-
ICASSP 2023 Metric-oriented Speech Enhancement Using Diffusion Probabilistic Model, Chen Chen, Yuchen Hu, Weiwei Weng, Eng Siong Chng.
-
ICASSP 2023 Unsupervised Noise Adaptation using Data Simulation, Chen Chen, Yuchen Hu, Heqing Zou, Linhui Sun, Eng Siong Chng.
-
InterSpeech 2022 Interactive Audio-text Representation for Automated Audio Captioning with Contrastive Learning, Chen Chen, Nana Hou, Yuchen Hu, Heqing Zou, Xiaofeng Qi, Eng Siong Chng.
-
ICASSP 2022 Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition, Yuchen Hu, Nana Hou, Chen Chen, Eng Siong Chng. [Code]
-
ICASSP 2022 Noise-robust Speech Recognition with 10 Minutes Unparalleled In-domain Data, Chen Chen, Nana Hou, Yuchen Hu, Shashank Shirol, Eng Siong Chng.
-
ICASSP 2022 Self-Critical Sequence Training for Automatic Speech Recognition, Chen Chen, Yuchen Hu, Nana Hou, Xiaofeng Qi, Heqing Zou, Eng Siong Chng.
-
IWSLT 2021 The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021, Dan Liu, Mengge Du, Xiaoxi Li, Yuchen Hu, Lirong Dai.
💻 Internships
Iflytek AI Research & USTC NEL-SLIP, 05/2020 - 07/2021
- Research intern at ASR group. Supervisor: Lirong Dai.
- Work on end-to-end ASR and speech translation.
🧑🔬 Services
- Reviewer: NeurIPS (24), ICLR (25), ACL (23-24), EMNLP (23-24), AAAI (24-25), ICASSP (22-25), InterSpeech (22-24), IEEE/ACM TASLP, IEEE SPL
- Volunteer: EMNLP (23), ICASSP (22)
🎖 Honors and Awards
- ACL 2023 Area Chair Award, 07/2023
- Winner of IWSLT 2021 Evaluation Campaign, 08/2021
- USTC Excellent Graduate (Top 10%), 06/2020
- Scholarship of SIMIT, Chinese Academy of Sciences (Top 5%), 10/2018
- USTC Outstanding Student Scholarship (Top 5%), 10/2017 & 10/2019
Thanks for the template of Yi Ren