My name is Xiaoke Huang, and I am a Ph.D. student at the University of California, Santa Cruz (UCSC). My research focuses on multimodal and reasoning models, as well as media generation; previously, I worked on vision–language learning and 3D reconstruction and generation of digital humans.

I received my master’s degree from Tsinghua University (THU) and bachelor’s degree from Beijing Normal University. I am currently interning at Meta and have previously interned at Microsoft Research.

I am currently exploring scalable environments (e.g., verifiable question answering, multimodal world models, etc.) for agentic reinforcement learning.

News

  • 2025-10 A preprint about multi-modal verifiable question answering synthesis for RLVR.
  • 2025-08 A preprint about multi-modal medical reasoning.
  • 2025-06 Start my internship at Meta.
  • 2025-03 A preprint about test-time scaling for medical reasoning LLMs.
  • 2025-02, 2024-06: (Accepted by CVPR’25) A preprint on visual compression with LLM, featured in Hugging Face 🤗 Daily Papers!
  • 2025-01: A preprint on human DNA methylation prediction.
[More]
  • 2024-10: A preprint on text-to-image long story visualization.
  • 2024-09: Start my PhD journey at UCSC!
  • 2024-06: A preprint on unified in-context medical vision models.
  • 2024-02, 2023-12: (Accepted by CVPR’24) Excited to share a new preprint enhancing SAM with regional captioning capabilities (featured in Hugging Face 🤗 Daily Papers)! Had amazing days at Microsoft!
  • 2023-05: Start an internship at Microsoft Research Lab - Asia (MSRA)
  • 2023-03: A preprint on efficient human digitization
  • 2022-09: A paper on language-guided ordinal regression accepted by NeurIPS’22
  • 2021-03: A paper on uncertainty-aware ordinal regression accepted by CVPR’21

Publications and Preprints

For more works please check here.

* indicates equal contribution.

Vision-Language Learning

  • Segment and Caption Anything
    Xiaoke Huang, Jianfeng Wang, Yansong Tang, Zheng Zhang, Han Hu, Jiwen Lu, Lijuan Wang, Zicheng Liu
    Conference on Computer Vision and Pattern Recognition (CVPR), 2024
    [project page] [paper] [code]

    sca-teaser
  • OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression
    Wanhua Li*, Xiaoke Huang*, Zheng Zhu, Yansong Tang, Xiu Li, Jiwen Lu, Jie Zhou
    Conference on Neural Information Processing Systems (NeurIPS), 2022
    [project page] [paper] [code] [中文解读]

    ordinalclip_framework

Human Digitization

  • EMA: Efficient Meshy Neural Fields for Animatable Human Avatars
    Xiaoke Huang, Yiji Cheng, Yansong Tang, Xiu Li, Jiwen Lu, Jie Zhou
    Preprint, 2023
    [project page] [paper] [code] [demo video]

    ema_teaser
  • SD-NeRF: Lifelike Talking Head Animation via Spatially-adaptive Dual-driven NeRFs.
    Shuai Shen*, Wanhua Li*, Xiaoke Huang*, Zheng Zhu, Jie Zhou, Jiwen Lu
    IEEE Transactions on Multimedia (TMM), 2023
    [paper]

    sdnerf-teaser

Internship

Research Intern, Meta, London, UK. June-, 2025.

Research Intern, Microsoft Research - Asia (MSRA), Beijing, China. April-September, 2023.

Awards

Scholarship

  • National Scholarship of China, March 2021
  • Huiyan Talent Second Prize of THU, Nov. 2022
  • JingShi First Prize of BNU, Oct. {2018, 2019, 2020}

Programming Contest

  • ACM-ICPC Contest Jiang Su, Silver Medal, June 2018
  • ICPC Asia Regional Contest {Nanchang, Xuzhou, Shanghai}, Bronze Medal, {June. Nov. Nov.} 2019

Activities

Reviewer of CVPR’{22,23,24}, ICCV’23, ECCV’22, ICML’24, FG’{23,24}.

Links

Misc

I take pleasure in reading non-fiction books. A recent and highly recommended read is Johann Hari’s “Stolen Focus”, which thoroughly examines the challenges of living in this information-saturated era.

Additionally, I often go hiking in the suburbs, where I find tranquility and inner peace.

Since June 2023, I began to become passionate about bodybuilding as it deeply connects me with the sensation of “being present”.

My Chinese name is 黄 小可 (Huang, Xiaoke).