How was the `tokenizer.json` created?

by VaishalBusiness - opened Oct 22, 2025

Oct 22, 2025

•

edited Oct 22, 2025

Hi Xenova team,

I'm trying to understand how you generated the tokenizer.json file used in your models. Was it directly exported from a SentencePiece model, converted via Hugging Face’s transformers tools, or created through a custom process?

Specifically, I'm interested in reproducing the same structure for a custom SentencePiece model so it works with your ONNX/transformers.js pipelines.

Could you please share how you built or converted it — and which tools or scripts were used?

Thanks in advance!

VaishalBusiness changed discussion status to closed Oct 31, 2025

VaishalBusiness changed discussion status to open Oct 31, 2025

VaishalBusiness

Nov 27, 2025

•

edited 3 days ago

Hi Xenova team,

I hope you’re doing well. I sent the message above on October 22 regarding how the tokenizer.json file was generated for your models, but I haven’t heard back yet.

I’m still very interested in understanding whether it was exported directly from a SentencePiece model, converted via Hugging Face tools, or created through a custom process — and any guidance for reproducing the same structure for a custom SentencePiece model to work with your ONNX/transformers.js pipelines.

I’d greatly appreciate any insight or pointers whenever you have a chance.

Thank you very much!

VaishalBusiness

3 days ago

Hi Xenova,
I have been working on this for a quite long time it would be great if you answered my question. Thank you.

Xenova

Owner 3 days ago

Hi there. Sorry for the delayed response. I wrote a very simple conversion script a couple of years ago for it: https://github.com/huggingface/transformers.js/blob/b125e82b86f62cf9f0f77217601e04a5ff7c6e7f/scripts/extra/marian.py

I haven't tested or used it in a long time, so hopefully it still works 😅

VaishalBusiness

2 days ago

Hi Xenova,

Thank you so much for sharing the script — I tried it out and it worked perfectly for generating the tokenizer.json.

Best regards,
Vaishal

VaishalBusiness changed discussion status to closed 2 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment