fix: update `generation_config.json`to default to stochastic sampling (temp 0.15)
Hello,
This is a mirror PR for Devstral-Small, based on: https://huggingface.co/mistralai/Devstral-2-123B-Instruct-2512/discussions/18
adds the required hparam arguments to enable stochastic sampling (temp 0.15) rather than greedy decoding in the generation_config.json.
So when users load the mistralai/Devstral-Small-2-24B-Instruct-2512 model, they automatically get the default sampling settings intended by Mistral.
Motivation: Not all users might know about these sampling hparams and what they do, defaulting to what Mistral recommends, could lower complaints about potential poor generations/model performances.
Opened as a separate PR in case you want to keep it greedy by default, this is originally linked to: https://huggingface.co/mistralai/Devstral-2-123B-Instruct-2512/discussions/9
(It's true that low temp is close to greedy but this is still stochastic nonetheless)
Thanks !
This change makes no difference in model output from my tests.
The model still randomly decides to ignore my tasks (even with planning).
This change makes no difference in model output from my tests.
The model still randomly decides to ignore my tasks (even with planning).
Hey,
Just to make sure if my theory is right, could you test, with a fixed seed (ex:torch.manual_seed(123)), the official transformers snippet from the model card below, with a higher temp (I changed to temperature=1.4).
If you now see even the slightest difference in tokens (vs greedy/temp=0.15) I believe the low precision combined with low temp, makes the model behaves too close to greedy, which could explain you see no difference in trajectories. Eg, like 100% vs 99% chance of choosing the argmax token.
We could even retrieve and print the probs to confirm that's the culprit
Otherwise that would be weird and imply something wrong with hparams and temp. The transformers version is 5.0 (not stable yet) in the config, but haven't seen any breaking changes listed in the repo concerning this.
Hope that helps.
import torch
from transformers import (
Mistral3ForConditionalGeneration,
MistralCommonBackend,
)
model_id = "mistralai/Devstral-Small-2-24B-Instruct-2512"
tokenizer = MistralCommonBackend.from_pretrained(model_id)
model = Mistral3ForConditionalGeneration.from_pretrained(model_id, device_map="auto")
SP = """You are operating as and within Mistral Vibe, a CLI coding-agent built by Mistral AI and powered by default by the Devstral family of models. It wraps Mistral's Devstral models to enable natural language interaction with a local codebase. Use the available tools when helpful.
You can:
- Receive user prompts, project context, and files.
- Send responses and emit function calls (e.g., shell commands, code edits).
- Apply patches, run commands, based on user approvals.
Answer the user's request using the relevant tool(s), if they are available. Check that all the required parameters for each tool call are provided or can reasonably be inferred from context. IF there are no relevant tools or there are missing values for required parameters, ask the user to supply these values; otherwise proceed with the tool calls. If the user provides a specific value for a parameter (for example provided in quotes), make sure to use that value EXACTLY. DO NOT make up values for or ask about optional parameters. Carefully analyze descriptive terms in the request as they may indicate required parameter values that should be included even if not explicitly quoted.
Always try your hardest to use the tools to answer the user's request. If you can't use the tools, explain why and ask the user for more information.
Act as an agentic assistant, if a user asks for a long task, break it down and do it step by step.
When you want to commit changes, you will always use the 'git commit' bash command. It will always be suffixed with a line telling it was generated by Mistral Vibe with the appropriate co-authoring information. The format you will always use is the following heredoc.
```bash
git commit -m "<Commit message here>
Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <[email protected]>"
```"""
input = {
"messages": [
{
"role": "system",
"content": SP,
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "Can you implement in Python a method to compute the fibonnaci sequence at the `n`th element with `n` a parameter passed to the function ? You should start the sequence from 1, previous values are invalid.\nThen run the Python code for the function for n=5 and give the answer.",
}
],
},
],
"tools": [
{
"type": "function",
"function": {
"name": "add_number",
"description": "Add two numbers.",
"parameters": {
"type": "object",
"properties": {
"a": {"type": "string", "description": "The first number."},
"b": {"type": "string", "description": "The second number."},
},
"required": ["a", "b"],
},
},
},
{
"type": "function",
"function": {
"name": "multiply_number",
"description": "Multiply two numbers.",
"parameters": {
"type": "object",
"properties": {
"a": {"type": "string", "description": "The first number."},
"b": {"type": "string", "description": "The second number."},
},
"required": ["a", "b"],
},
},
},
{
"type": "function",
"function": {
"name": "substract_number",
"description": "Substract two numbers.",
"parameters": {
"type": "object",
"properties": {
"a": {"type": "string", "description": "The first number."},
"b": {"type": "string", "description": "The second number."},
},
"required": ["a", "b"],
},
},
},
{
"type": "function",
"function": {
"name": "write_a_story",
"description": "Write a story about science fiction and people with badass laser sabers.",
"parameters": {},
},
},
{
"type": "function",
"function": {
"name": "terminal",
"description": "Perform operations from the terminal.",
"parameters": {
"type": "object",
"properties": {
"command": {
"type": "string",
"description": "The command you wish to launch, e.g `ls`, `rm`, ...",
},
"args": {
"type": "string",
"description": "The arguments to pass to the command.",
},
},
"required": ["command"],
},
},
},
{
"type": "function",
"function": {
"name": "python",
"description": "Call a Python interpreter with some Python code that will be ran.",
"parameters": {
"type": "object",
"properties": {
"code": {
"type": "string",
"description": "The Python code to run",
},
"result_variable": {
"type": "string",
"description": "Variable containing the result you'd like to retrieve from the execution.",
},
},
"required": ["code", "result_variable"],
},
},
},
],
}
tokenized = tokenizer.apply_chat_template(
conversation=input["messages"],
tools=input["tools"],
return_tensors="pt",
return_dict=True,
)
input_ids = tokenized["input_ids"].to(device="cuda")
output = model.generate(
input_ids,
max_new_tokens=200,
do_sample=True,
temperature=1.4,
)[0]
decoded_output = tokenizer.decode(output[len(tokenized["input_ids"][0]) :])
print(decoded_output)
I'm not using the model like that at all. I'm running it with vLLM (256k context, 0.15 temp) and using that via Open-WebUI, Cline, and Kilo.
Another thing I notice is that even with an AGENTS.md file, the model will not follow the instructions set there.
I also had cases where it tells me "Task is done", but the previous step shows tests still failing.
Had it once delete all the tests that were not passing, just to tell me the test-suite was passing now.
I could try "mistral-vibe", but there's no IDE extension for that.
I'm not using the model like that at all. I'm running it with vLLM (256k context, 0.15 temp) and using that via Open-WebUI, Cline, and Kilo.
Another thing I notice is that even with an AGENTS.md file, the model will not follow the instructions set there.
I also had cases where it tells me "Task is done", but the previous step shows tests still failing.
Had it once delete all the tests that were not passing, just to tell me the test-suite was passing now.
I could try "mistral-vibe", but there's no IDE extension for that.
I see, my PR was only concerning usage with transformers. In any case, it wouldn't really have helped given your problem description.