Skip to content

awei669/DiffInk

Repository files navigation

DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation

Updates

  • [2026/6/19] DiffMath is now on arXiv, focusing on structured content generation.
  • [2026/3/21] Code and pretrained weights are released.
  • [2026/1/29] DiffInk is accepted by ICLR 2026 🎉🎉🎉.
  • [2025/10/1] The DiffInk paper can be found at arXiv.

Overview of TOHG

Text-to-Online Handwriting Generation (TOHG) refers to the task of synthesizing realistic pen trajectories $(G_i)$ conditioned on textual content $(T)$ and style reference $(S_i)$.

Overview of TOHG

DiffInk vs. Character–Layout Decoupled Approaches

(a) A two-stage pipeline combining handwritten font generation with layout post-processing; (b) DiffInk (Ours), which takes text and a style reference to directly output complete text lines. Unlike the two-stage pipeline, DiffInk generates more natural character connections rather than mechanically stitching bounding boxes.

Overview of TOHG

Usage

Install

  • Create environment:: conda create -n diffink python=3.8 -y
  • Install dependencies: conda activate diffink && pip install -r requirements.txt

Data and Pretrained Weights

Download the dataset and pretrained weights from Google Drive or Baidu Cloud.

Training

  • Train the InkVAE model with: bash scripts/train_vae_ddp.sh

  • Then, train the InkDiT model with: bash scripts/train_dit_ddp.sh

  • Finally, fine-tune the model on real data with: bash scripts/tune_dit_ddp.sh

Inference

Run inference with: CUDA_VISIBLE_DEVICES=0 python val_dit.py

Acknowledgements

Citation

If you find this work useful or use this code in your research, please consider citing the following paper.

@inproceedings{pan2026diffink,
  title={DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation},
  author={Wei Pan and Huiguo He and Hiuyi Cheng and Yilin Shi and Lianwen Jin},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://openreview.net/forum?id=XKOEQFKFdL}
}

License

Our code is released under the MIT License. The pretrained weights, which are trained using part of the CASIA-OLHWDB dataset, follow the original license of the dataset.

About

[ICLR 2026] DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors