CVPR 2026

VecGlypher: Unified Vector Glyph Generation with Language Models

Xiaoke Huang1,2,*, Bhavul Gauri1, Kam Woh Ng1, Tony Ng1, Mengmeng Xu1, Zhiheng Liu1, Weiming Ren1, Zhaochong An1, Zijian Zhou1, Haonan Qiu1, Yuyin Zhou2, Sen He1, Ziheng Wang1, Tao Xiang1, Xiao Han1

1Meta AI   2UC Santa Cruz   *Work done at Meta

February 7, 2026

Paper (PDF) arXiv Code Results Citation

Key Takeaways

Abstract

VecGlypher is a single multimodal language model that generates editable SVG glyph outlines directly from either natural-language style prompts or reference glyph images. Instead of relying on raster intermediates and post-vectorization, it autoregressively emits path tokens in one pass. The training recipe combines large-scale continuation on noisy Envato fonts with post-training on expert-tagged Google Fonts, alongside typography-aware preprocessing for stable long-sequence decoding.

Unified Generation Modes

Text-Referenced

Input: style tags + target character.

"Active, Cute, Vintage" + "V"

Output: a valid SVG path that matches requested style and content.

Image-Referenced

Input: 1-8 exemplar glyph images + target character.

[ref glyphs] + "V"

Output: style-consistent target glyph with closed, editable outlines.

Figure 1 from VecGlypher paper showing image-referenced and text-referenced vector glyph generation examples.
Figure 1. VecGlypher generation examples for image-referenced and text-referenced settings.

Method at a Glance

1. Data Curation

Filter malformed and duplicate fonts, normalize coordinate systems, keep only SVG path geometry, and quantize to one decimal.

2. Two-Stage Training

Stage 1 (Envato, text-referenced): learn SVG syntax and long-horizon geometry. Stage 2 (Google Fonts, text+image): align style conditioning with geometry.

3. Autoregressive Decoding

A single multimodal LLM predicts SVG tokens directly. No raster denoiser or vector post-optimizer is required.

Figure 2 paradigm comparison between prior methods and the unified VecGlypher LLM approach.
Figure 2. Paradigm comparisons: prior pipelines versus unified VecGlypher formulation.
Figure 3 showing VecGlypher pipeline and two-stage training recipe.
Figure 3. VecGlypher pipeline and two-stage training recipe.

Data Scale

After Filtering

Google Fonts: 2,497 fonts

Envato: 39,497 fonts

Font Families

Google: 1,117 families

Envato: 23,543 families

Glyph Instances

Google: 157,899

Envato: 2,495,363

Table 5 from VecGlypher paper summarizing dataset statistics and tag distributions for Google Fonts and Envato.
Table 5. Dataset statistics and vocabulary/length distribution analysis.

Quantitative Results (Cross-Family OOD)

VecGlypher outperforms both general-purpose LLMs on text-referenced generation and dedicated vector-font baselines on image-referenced generation.

Text-Referenced Comparison (Table 8)

Model R-ACC ↑ CD ↓ DINO ↑ FID ↓
Claude Sonnet 4.5 46.65 5.28 88.31 19.59
GPT-5 43.98 6.12 86.92 29.00
VecGlypher 27B (T,I,A) 100.5 1.72 94.22 3.46
VecGlypher 70B (T,A) 100.4 1.68 94.28 3.34

Image-Referenced Comparison (Table 9)

Model R-ACC ↑ CD ↓ DINO ↑ FID ↓
DeepVecFont-v2 37.86 14.58 79.41 115.5
DualVector 49.20 16.45 79.57 105.5
VecGlypher 27B (T,I,A) 99.12 1.18 95.82 2.32
Figure 6 qualitative comparison for text-referenced generation between general LLMs and VecGlypher.
Figure 6. Text-referenced qualitative comparisons against general LLMs.
Figure 7 qualitative comparison for image-referenced generation between baselines and VecGlypher.
Figure 7. Image-referenced qualitative comparisons against vector-font baselines.

Citation

@article{VecGlypher,
  title     = {VecGlypher: Unified Vector Glyph Generation with Language Models},
  author    = {Huang, Xiaoke and Gauri, Bhavul and Ng, Kam Woh and Ng, Tony and Xu, Mengmeng and Liu, Zhiheng and Ren, Weiming and An, Zhaochong and Zhou, Zijian and Qiu, Haonan and Zhou, Yuyin and He, Sen and Wang, Ziheng and Xiang, Tao and Han, Xiao},
  journal   = {arXiv preprint arXiv},
  year      = {2026}
}