You've probably heard of Stable Diffusion for image creation. Now imagine the same method applied to text or code generation.
No more token-by-token generation (autoregressive): the model produces everything en bloc, then "refines" the answer in several passes. Result:
- Speed multiplied by 10 (over 1,000 tokens/sec)
- Lower inference costs
- More global corrections (the model sees and reviews the whole answer)
Why is this a major leap?
- AI agents, which link tasks and reasoning, can work significantly faster and avoid long waits for generation
- Developers gain in productivity thanks to virtually instantaneous code generation, with easy corrections
- Companies could deploy these models on a larger scale, without blowing their budget
The secret?
It is inspired by the AI diffusion technique for images: we start with a textual "draft" and gradually refine it until we obtain a coherent answer. The model is no longer constrained by a strictly linear progression.
This could democratize as-yet unexplored use cases
The challenge The key is to ensure ethical alignment and impeccable quality, especially with such a "turbo". Speed shouldn't come at the expense of reliability.
A philosophical note We humans often write word after word, like GPT (autoregressive) models. Recent approaches, where AI takes the time to "think" before revealing a more complete answer, are reminiscent of the way we write a draft and then correct it. But diffusion goes even further: it considers the whole text (or code) as a whole, and refines it layer by layer. It's a fascinating mechanism, perhaps opening the way to other ways of structuring thought... and "thinking". Worth exploring!
Discover our AI expertise and contact our experts to discuss your specific projects and needs.
By Jérémy BRON, AI Director, Silamir Group