VecGlypher | Project Page

Key Takeaways

Unified LLM formulation handles both text and image conditioning with one decoding interface.
Model scale and staged supervision are critical for stable, high-fidelity SVG generation.
Absolute-coordinate serialization is consistently strongest at larger model sizes.
Direct vector decoding avoids raster artifacts and keeps outputs editable for design workflows.
General-purpose LLMs that can draw SVG icons still struggle to produce typographically valid glyphs and to follow style instructions for glyph generation.

We argues this is a domain-data gap: glyph programs are underrepresented in standard LLM corpora. A practical reason is that most real font data is distributed in binary files (for example .ttf and .otf) rather than directly exposed as text-like SVG path programs. This highlights a current generalization boundary of LLMs.

Abstract

VecGlypher is a single multimodal language model that generates editable SVG glyph outlines directly from either natural-language style prompts or reference glyph images. Instead of relying on raster intermediates and post-vectorization, it autoregressively emits path tokens in one pass. The training recipe combines large-scale continuation on noisy Envato fonts with post-training on expert-tagged Google Fonts, alongside typography-aware preprocessing for stable long-sequence decoding.

Unified Generation Modes

Text-Referenced

Input: style tags + target character.

"Active, Cute, Vintage" + "V"

Output: a valid SVG path that matches requested style and content.

Image-Referenced

Input: 1-8 exemplar glyph images + target character.

[ref glyphs] + "V"

Output: style-consistent target glyph with closed, editable outlines.

Figure 1 from VecGlypher paper showing image-referenced and text-referenced vector glyph generation examples. — Figure 1. VecGlypher generation examples for image-referenced and text-referenced settings.

Method at a Glance

1. Data Curation

Filter malformed and duplicate fonts, normalize coordinate systems, keep only SVG path geometry, and quantize to one decimal.

2. Two-Stage Training

Stage 1 (Envato, text-referenced): learn SVG syntax and long-horizon geometry. Stage 2 (Google Fonts, text+image): align style conditioning with geometry.

3. Autoregressive Decoding

A single multimodal LLM predicts SVG tokens directly. No raster denoiser or vector post-optimizer is required.

Figure 2 paradigm comparison between prior methods and the unified VecGlypher LLM approach. — Figure 2. Paradigm comparisons: prior pipelines versus unified VecGlypher formulation.

Figure 3 showing VecGlypher pipeline and two-stage training recipe. — Figure 3. VecGlypher pipeline and two-stage training recipe.

Data Scale

After Filtering

Google Fonts: 2,497 fonts

Envato: 39,497 fonts

Font Families

Google: 1,117 families

Envato: 23,543 families

Glyph Instances

Google: 157,899

Envato: 2,495,363

Table 5 from VecGlypher paper summarizing dataset statistics and tag distributions for Google Fonts and Envato. — Table 5. Dataset statistics and vocabulary/length distribution analysis.

Quantitative Results (Cross-Family OOD)

VecGlypher outperforms both general-purpose LLMs on text-referenced generation and dedicated vector-font baselines on image-referenced generation.

Text-Referenced Comparison (Table 8)

Model	R-ACC ↑	CD ↓	DINO ↑	FID ↓
Claude Sonnet 4.5	46.65	5.28	88.31	19.59
GPT-5	43.98	6.12	86.92	29.00
VecGlypher 27B (T,I,A)	100.5	1.72	94.22	3.46
VecGlypher 70B (T,A)	100.4	1.68	94.28	3.34

Image-Referenced Comparison (Table 9)

Model	R-ACC ↑	CD ↓	DINO ↑	FID ↓
DeepVecFont-v2	37.86	14.58	79.41	115.5
DualVector	49.20	16.45	79.57	105.5
VecGlypher 27B (T,I,A)	99.12	1.18	95.82	2.32

Figure 6 qualitative comparison for text-referenced generation between general LLMs and VecGlypher. — Figure 6. Text-referenced qualitative comparisons against general LLMs.

Figure 7 qualitative comparison for image-referenced generation between baselines and VecGlypher. — Figure 7. Image-referenced qualitative comparisons against vector-font baselines.

Citation

@inproceedings{huang2026vecglypher,
    title = {VecGlypher: Unified Vector Glyph Generation with Language Models},
    author = {Xiaoke Huang, Bhavul Gauri, Kam Woh Ng, Tony Ng, Mengmeng Xu, Zhiheng Liu, Weiming Ren, Zhaochong An, Zijian Zhou, Haonan Qiu, Yuyin Zhou, Sen He, Ziheng Wang, Tao Xiang, Xiao Han},
    booktitle = {CVPR},
    year = {2026},
}