Publications

For the most up-to-date list of publications and citation counts, please visit my Google Scholar profile.

Preprints

[2] Kun Zhou, You Zhang, Shengkui Zhao, Hao Wang, Zexu Pan, Dianwen Ng, Chong Zhang, Chongjia Ni, Yukun Ma, Trung Hieu Nguyen, Jia Qi Yip, and Bin Ma, Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions, 2025. <arXiv> <demo>

[1] Yuxiang Wang, You Zhang, Zhiyao Duan, and Mark Bocko, Predicting Global Head-Related Transfer Functions From Scanned Head Geometry Using Deep Learning and Compact Representations, 2025. <arXiv> <code>

Book Chapters

[1] You Zhang, Fei Jiang, Ge Zhu, Xinhui Chen, and Zhiyao Duan, Generalizing Voice Presentation Attack Detection to Unseen Synthetic Attacks and Channel Variation, Marcel, S., Fierrez, J., Evans, N. (eds) Handbook of Biometric Anti-Spoofing. Advances in Computer Vision and Pattern Recognition. Springer, Singapore, 2023. <link> <code>

Journals

[3] Xin Wang, Héctor Delgado, Hemlata Tak, Jee-weon Jung, Hye-jin Shim, Massimiliano Todisco, Ivan Kukanov, Xuechen Liu, Md Sahidullah, Tomi Kinnunen, Nicholas Evans, Kong Aik Lee, Junichi Yamagishi, Myeonghun Jeong, Ge Zhu, Yongyi Zang, You Zhang, Soumi Maiti, Florian Lux, Nicolas Müller, Wangyou Zhang, Chengzhe Sun, Shuwei Hou, Siwei Lyu, Sébastien Le Maguer, Cheng Gong, Hanjie Guo, Liping Chen, and Vishwanath Singh. ASVspoof 5: Design, Collection, and Validation of Resources for Spoofing, Deepfake, and Adversarial Attack Detection Using Crowdsourced Speech, in Computer Speech & Language, 2025. <link> <dataset>

[2] Sefik Emre Eskimez, You Zhang, and Zhiyao Duan, Speech Driven Talking Face Generation From a Single Image and an Emotion Condition, IEEE Transactions on Multimedia, vol. 24, pp. 3480-3490, 2022. <link> <arXiv> <code> <project>

[1] You Zhang, Fei Jiang, and Zhiyao Duan, One-Class Learning Towards Synthetic Voice Spoofing Detection, IEEE Signal Processing Letters, vol. 28, pp. 937-941, 2021. <link> <arXiv> <code> <poster> <slides> <video> <project>

Peer-reviewed Conferences and Workshops

[19] You Zhang, Andrew Francl, Ruohan Gao, Paul Calamia, Zhiyao Duan, and Ishwarya Ananthabhotla. Towards Perception-Informed Latent HRTF Representations, in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2025. <arXiv>

[18] You Zhang*, Baotong Tian*, Lin Zhang, and Zhiyao Duan. PartialEdit: Identifying Partial Deepfakes in the Era of Neural Speech Editing, in Proc. Interspeech, 2025. (* equal contribution) <arXiv> <dataset> <demo page>

[17] Kyungbok Lee$^\ddag$, You Zhang, and Zhiyao Duan. Audio Visual Segmentation Through Text Embeddings, in Proc. IEEE International Conference on Image Processing (ICIP), 2025. <arXiv> <code>

[16] Jiatong Shi, Hyejin Shim, Jinchuan Tian, Siddhant Arora, Haibin Wu, Darius Petermann, Jia Qi Yip, You Zhang, Yuxun Tang, Wangyou Zhang, Dareen Alharthi, Yichen Huang, Koichi Saito, Jionghao Han, Yiwen Zhao, Chris Donahue, and Shinji Watanabe. VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music, in Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) -- System Demonstration Track, 2025. <arXiv> <code>

[15] You Zhang, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan, SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge, in Proc. IEEE Spoken Language Technology Workshop (SLT), 2024. <link> <arXiv> <code> <webpage>

[14] Kyungbok Lee, You Zhang, and Zhiyao Duan, A Multi-Stream Fusion Approach with One-Class Learning for Audio-Visual Deepfake Detection, in Proc. IEEE 26th International Workshop on Multimedia Signal Processing (MMSP), 2024. <link> <arXiv> <code>

[13] Yongyi Zang, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu, Wenxiao Zhao, Jing Guo, Tomoki Toda, and Zhiyao Duan. CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection, in Proc. Interspeech, 2024. <link> <arXiv> <dataset>

[12] Yongyi Zang*, You Zhang*, Mojtaba Heydari, and Zhiyao Duan, SingFake: Singing Voice Deepfake Detection, in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024. (* equal contribution) <link> <arXiv> <code> <webpage>

[11] Enting Zhou, You Zhang, and Zhiyao Duan, Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech, in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024. <link> <arXiv> <code>

[10] Yutong Wen, You Zhang, and Zhiyao Duan, Mitigating Cross-Database Differences for Learning Unified HRTF Representation, in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023. <link> <arXiv> <code> <video>

[9] Yongyi Zang, You Zhang, and Zhiyao Duan, Phase perturbation improves channel robustness for speech spoofing countermeasures, in Proc. Interspeech, 2023, pp. 3162-3166. <link> <arXiv> <code>

[8] You Zhang, Yuxiang Wang, and Zhiyao Duan, HRTF Field: Unifying Measured HRTF Magnitude Representation with Neural Fields, in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023. (Recognized as one of the top 3% of all papers accepted at ICASSP 2023 ) <link> <arXiv> <code> <video>

[7] Siwen Ding, You Zhang, and Zhiyao Duan, SAMO: Speaker Attractor Multi-Center One-Class Learning for Voice Anti-Spoofing, in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023. <link> <arXiv> <code> <video>

[6] Abudukelimu Wuerkaixi, Kunda Yan, You Zhang, Zhiyao Duan, and Changshui Zhang, DyViSE: Dynamic Vision-Guided Speaker Embedding for Audio-Visual Speaker Diarization, in Proc. IEEE 24th International Workshop on Multimedia Signal Processing (MMSP), 2022, pp. 1-6. <link> <pdf> <code>

[5] Abudukelimu Wuerkaixi, You Zhang, Zhiyao Duan, and Changshui Zhang, Rethinking Audio-Visual Synchronization for Active Speaker Detection, in Proc. IEEE 32nd International Workshop on Machine Learning for Signal Processing (MLSP), 2022, pp. 01-06. <link> <pdf> <code>

[4] You Zhang, Ge Zhu, and Zhiyao Duan, A Probabilistic Fusion Framework for Spoofing Aware Speaker Verification, in Proc. The Speaker and Language Recognition Workshop (Odyssey), 2022, pp. 77-84. <link> <pdf> <code> <video> <slides>

[3] Xinhui Chen*, You Zhang*, Ge Zhu*, and Zhiyao Duan, UR Channel-Robust Synthetic Speech Detection System for ASVspoof 2021, in Proc. 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge Workshop (ASVspoof), 2021, pp. 75-82. (* equal contribution) <link> <pdf> <code> <video>

[2] Yuxiang Wang, You Zhang, Zhiyao Duan, and Mark Bocko, Global HRTF Personalization Using Anthropometric Measures, in Audio Engineering Society 150th Convention, 2021. <link> <pdf> <code>

[1] You Zhang, Ge Zhu, Fei Jiang, and Zhiyao Duan, An Empirical Study on Channel Effects for Synthetic Voice Spoofing Countermeasure Systems, in Proc. Interspeech, 2021, pp. 4309-4313. <link> <pdf> <code> <video> <slides> <dataset>

Conference Abstracts

[3] You Zhang, Yuxiang Wang, Mark Bocko, and Zhiyao Duan, Grid-agnostic personalized head-related transfer function modeling with neural fields, in Acoustical Society of America 184th Meeting, 2023. (Recognized by Signal Processing at the ASA Student Paper Award - Second Place) <link>

[2] Samantha E. Lettenberger, Maryam Zafar, Julia M. Soto, You Zhang, Ge Zhu, Aaron J. Masino, Grace Nkrumah, Emma Waddell, Kelsey Spear, Abigail Arky, Rajbir Toor, Emily Hartman, Jacob Epifano, Rich Christie, Zhiyao Duan, and Ray Dorsey, Words Spoken Daily: A Novel Measure of Cognition, in International Congress of Parkinson’s Disease and Movement Disorders (MDS), 2023. <link>

[1] Yuxiang Wang, You Zhang, Zhiyao Duan, and Mark Bocko, Employing deep learning method to predict global head-related transfer functions from scanned head geometry, in Acoustical Society of America 181th Meeting, 2021. <link>