I am a PhD Student in VLAA lab, UCSC, advised by Prof. Yuyin Zhou and Prof. Cihang Xie, working on generative multi-modal models and reasoning models. Previously I got my M.Eng from Tsinghua University (THU) advised by Prof. Jiwen Lu and Prof. Yansong Tang, working on vision-language models and human digitization. I got my B.S. in CS from Beijing Normal University (BNU) in 2021.

News

2025-03 A preprint about test-time scaling for medical reasoning LLMs.
2025-02, 2024-06: (Accepted by CVPR’25) A preprint on visual compression with LLM, featured in Hugging Face 🤗 Daily Papers!
2025-01: A preprint on human DNA methylation prediction.

[More]

2024-10: A preprint on text-to-image long story visualization.
2024-09: Start my PhD journey at UCSC!
2024-06: A preprint on unified in-context medical vision models.
2024-02, 2023-12: (Accepted by CVPR’24) Excited to share a new preprint enhancing SAM with regional captioning capabilities (featured in Hugging Face 🤗 Daily Papers)! Had amazing days at Microsoft!
2023-05: Start an internship at Microsoft Research Lab - Asia (MSRA)
2023-03: A preprint on efficient human digitization
2022-09: A paper on language-guided ordinal regression accepted by NeurIPS’22
2021-03: A paper on uncertainty-aware ordinal regression accepted by CVPR’21

Publications and Preprints

For more works please check here.

* indicates equal contribution.

Vision-Language Learning

Segment and Caption Anything
Xiaoke Huang, Jianfeng Wang, Yansong Tang, Zheng Zhang, Han Hu, Jiwen Lu, Lijuan Wang, Zicheng Liu
Conference on Computer Vision and Pattern Recognition (CVPR), 2024
[project page] [paper] [code]
OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression
Wanhua Li*, Xiaoke Huang*, Zheng Zhu, Yansong Tang, Xiu Li, Jiwen Lu, Jie Zhou
Conference on Neural Information Processing Systems (NeurIPS), 2022
[project page] [paper] [code] [中文解读]

Human Digitization

EMA: Efficient Meshy Neural Fields for Animatable Human Avatars
Xiaoke Huang, Yiji Cheng, Yansong Tang, Xiu Li, Jiwen Lu, Jie Zhou
Preprint, 2023
[project page] [paper] [code] [demo video]
SD-NeRF: Lifelike Talking Head Animation via Spatially-adaptive Dual-driven NeRFs.
Shuai Shen*, Wanhua Li*, Xiaoke Huang*, Zheng Zhu, Jie Zhou, Jiwen Lu
IEEE Transactions on Multimedia (TMM), 2023
[paper]

Internship

Research Intern, Microsoft Research - Asia (MSRA), Beijing, China. April-September, 2023.

Project: Generative Regional Understanding with Vision and Language
Work with Dr. Jianfeng Wang, Dr. Zheng Zhang,
Dr. Han Hu, Dr. Lijuan Wang and Dr. Zicheng Liu.

Awards

Scholarship

National Scholarship of China, March 2021
Huiyan Talent Second Prize of THU, Nov. 2022
JingShi First Prize of BNU, Oct. {2018, 2019, 2020}

Programming Contest

ACM-ICPC Contest Jiang Su, Silver Medal, June 2018
ICPC Asia Regional Contest {Nanchang, Xuzhou, Shanghai}, Bronze Medal, {June. Nov. Nov.} 2019

Activities

Reviewer of CVPR’{22,23,24}, ICCV’23, ECCV’22, ICML’24, FG’{23,24}.

Misc

I take pleasure in reading non-fiction books. A recent and highly recommended read is Johann Hari’s “Stolen Focus”, which thoroughly examines the challenges of living in this information-saturated era.

Additionally, I often go hiking in the suburbs, where I find tranquility and inner peace.

Since June 2023, I began to become passionate about bodybuilding as it deeply connects me with the sensation of “being present”.

My Chinese name is 黄小可 (Huang, Xiaoke).

Xiaoke Huang