Google 发布开源 DiffusionGemma 文本模型

Simon Willison·6月11日 04:00 UTC·作者 Simon Willison

关键信息

Google 将 DiffusionGemma 描述为一个 260 亿参数的 Mixture of Experts 模型，在推理时只激活 38 亿参数。公司称其速度最高可达典型自回归 LLM 的 4 倍，在量化后于 NVIDIA H100 上可达到 1000+ tokens/秒，在 GeForce RTX 5090 上可达到 700+ tokens/秒。

资讯摘要

Google 推出了 DiffusionGemma，这是一个实验性的开放模型，重点探索“文本扩散（text diffusion）”这种生成方式。与传统自回归 LLM 逐个 token 解码不同，它可以并行生成整块文本，从而降低延迟。该模型以 Apache 2.0 许可发布，并属于 Gemma 系列，Google 将其定位为建立在 Gemini 同源研究基础上的开放模型。Google 表示，DiffusionGemma 更适合需要速度和交互性的本地工作流，而不是作为常规生产场景中自回归 Gemma 4 模型的直接替代品。公司强调，该模型使用了新的 diffusion head 来提升生成速度。

根据官方说明，26B 的 MoE 架构在推理时只激活 3.8B 参数，量化后可在高端消费级 GPU 上以 18GB VRAM 左右运行。Google 还声称它在 GPU 上可实现最高 4 倍速度提升，在 NVIDIA H100 上可达到 1000+ tokens/秒，在 GeForce RTX 5090 上可达到 700+ tokens/秒。Simon Willison 使用 NVIDIA 免费托管的 NIM API 做了实测，生成 2409 个 token 只用了 4.4 秒，折算下来至少有 500 tokens/秒。整体来看，这次发布把此前的 Gemini Diffusion 研究重新带回公众视野，而且以开放模型和可直接调用的 API 形式提供出来。

资讯正文

DiffusionGemma

去年 5 月，Google 曾短暂发布过一个实验性的 Gemini Diffusion 模型。我当时试用了这个预览版，并记录到它的运行速度达到了 857 tokens/秒。这是一个令人兴奋的模型，但 Google 之后就没有再就它发布进一步公告。

这项研究以最好的方式回归了：它成为了一个新的开源权重（Apache 2 许可）Gemma 模型，<a href="https://huggingface.co/google/diffusiongemma-26B-A4B-it">google/diffusiongemma-26B-A4B-it</a>。

NVIDIA 目前正在他们的 NIM 云 API 上免费托管这个模型。我用那个 API <a href="https://tools.simonwillison.net/markdown-svg-renderer#url=https%3A%2F%2Fgist.github.com%2Fsimonw%2Fe5e234a6dc6eef61e209ce1629620042">生成了这只鹈鹕</a>；根据 <code>time uv run generate.py</code> 的结果，返回 2,409 个 token 用了 4.4 秒——因此速度至少有 500 tokens/秒。

通过 <a href="https://news.ycombinator.com/item?id=48478471">Hacker News</a>。

标签：<a href="https://simonwillison.net/tags/google">google</a>、<a href="https://simonwillison.net/tags/ai">ai</a>、<a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>、<a href="https://simonwillison.net/tags/llms">llms</a>、<a href="https://simonwillison.net/tags/nvidia">nvidia</a>、<a href="https://simonwillison.net/tags/pelican-riding-a-bicycle">骑自行车的鹈鹕</a>、<a href="https://simonwillison.net/tags/gemma">gemma</a>、<a href="https://simonwillison.net/tags/llm-release">llm-release</a>、<a href="https://simonwillison.net/tags/llm-performance">llm-performance</a>

来源与参考

收录于 2026-06-11