My name is Xiaoke Huang, and I am a Ph.D. student at the University of California, Santa Cruz. My research focuses on multimodal and reasoning models, media generation, and AI for healthcare; previously, I received my master’s degree from Tsinghua University (worked on vision–language learning and 3D reconstruction and generation of digital humans) and bachelor’s degree from Beijing Normal University. Currently, I am interning at Meta and have previously interned at Microsoft Research. I am interested in building scalable environments for agentic learning.
News
- 2025-10 A preprint about multi-modal verifiable question answering synthesis for RLVR.
- 2025-08 (Accepted by ML4H’25) A preprint about multi-modal medical reasoning.
- 2025-06 Start my internship at Meta.
- 2025-03 (Accepted by ML4H’25) A preprint about test-time scaling for medical reasoning LLMs.
- 2025-01: A preprint on human DNA methylation prediction.
[More]
- 2024-10: A preprint on text-to-image long story visualization.
- 2024-09: Start my PhD journey at UCSC.
- 2024-06: (Accepted by CVPR’25) A preprint on visual compression with LLM, featured in Hugging Face 🤗 Daily Papers!
- 2024-06: A preprint on unified in-context medical vision models.
- 2023-12: (Accepted by CVPR’24) Excited to share a new preprint enhancing SAM with regional captioning capabilities (featured in Hugging Face 🤗 Daily Papers)! Had amazing days at Microsoft!
- 2023-05: Start an internship at Microsoft Research Lab - Asia (MSRA)
- 2023-03: A preprint on efficient human digitization
- 2022-09: A paper on language-guided ordinal regression accepted by NeurIPS’22
- 2021-03: A paper on uncertainty-aware ordinal regression accepted by CVPR’21
Publications and Preprints
For more works please check here.
* indicates equal contribution.
Vision-Language Learning
Segment and Caption Anything
Xiaoke Huang, Jianfeng Wang, Yansong Tang, Zheng Zhang, Han Hu, Jiwen Lu, Lijuan Wang, Zicheng Liu
Conference on Computer Vision and Pattern Recognition (CVPR), 2024
[project page] [paper] [code]OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression
Wanhua Li*, Xiaoke Huang*, Zheng Zhu, Yansong Tang, Xiu Li, Jiwen Lu, Jie Zhou
Conference on Neural Information Processing Systems (NeurIPS), 2022
[project page] [paper] [code] [中文解读]
Human Digitization
EMA: Efficient Meshy Neural Fields for Animatable Human Avatars
Xiaoke Huang, Yiji Cheng, Yansong Tang, Xiu Li, Jiwen Lu, Jie Zhou
Preprint, 2023
[project page] [paper] [code] [demo video]
SD-NeRF: Lifelike Talking Head Animation via Spatially-adaptive Dual-driven NeRFs.
Shuai Shen*, Wanhua Li*, Xiaoke Huang*, Zheng Zhu, Jie Zhou, Jiwen Lu
IEEE Transactions on Multimedia (TMM), 2023
[paper]
Internship
Research Intern, Meta, London, UK. June-Nov., 2025.
Project: Unified Vector Glyph Generation with Language Models
Work with Brandon Han, Bhavul Gauri, Tony Ng, Kam Woh Ng, Frost Xu, and Tao Xiang.
Research Intern, Microsoft Research Asia (MSRA), Beijing, China. April-September, 2023.
Project: Generative Regional Understanding with Vision and Language
Work with Jianfeng Wang, Zheng Zhang, Han Hu, Lijuan Wang, and Zicheng Liu.
Awards
Programming Contest
- ACM-ICPC Contest Jiang Su, Silver Medal, June 2018
- ICPC Asia Regional Contest {Nanchang, Xuzhou, Shanghai}, Bronze Medal, {June. Nov. Nov.} 2019
Links
- Github: https://www.github.com/xk-huang
- Google Scholar: https://scholar.google.com/citations?user=BD9AT04AAAAJ&hl=en
- Twitter: https://twitter.com/xiaoke_shawn_h
- Linkedin: https://www.linkedin.com/in/xiaoke-huang-283470189/
- E-mail: click me
Misc
I enjoy reading non-fiction.
Some recent and highly recommended books (Oct. 2025):
- Exercised: Why Something We Never Evolved to Do Is Healthy and Rewarding,
- The Story of the Human Body: Evolution, Health and Disease,
- Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets,
- and Skin in the Game: Hidden Asymmetries in Daily Life.