Alibaba Drops Qwen3-Coder-Next: An 80B MoE Model That Runs Coding Agents on a Single GPU

Alibaba's Tongyi Lab released Qwen3-Coder-Next, an open-weights mixture-of-experts model purpose-built for autonomous coding agents. With only 3 billion active parameters at inference, it fits in 46GB of RAM — and its benchmarks suggest closed-model competitors should be nervous.

Alibaba's Tongyi Lab released Qwen3-Coder-Next on Tuesday, an open-weights model explicitly designed not just for code generation but for running autonomous coding agents end-to-end. As @Ali_TongyiLab announced, the model handles complex multi-step tasks, integrates with frameworks like OpenClaw, and is optimized for the kind of sustained tool-use loops that define modern agentic workflows. It is, in effect, Alibaba's bid to become the default brain inside every open-source coding assistant.

The architecture is the story. Qwen3-Coder-Next is an 80-billion-parameter mixture-of-experts model, but only 3 billion parameters are active during any given inference pass. As @UnslothAI detailed, this means the model can run on a machine with just 46GB of RAM — well within the range of a single high-end consumer GPU or a modest cloud instance. For developers who have been renting expensive API access to GPT-5-class models for agentic coding, the economics shift dramatically: you can now run a competitive coding agent locally, with no data leaving your machine.

Get our free daily newsletter

Get this article free — plus the lead story every day — delivered to your inbox.

Want every article and the full archive? Upgrade anytime.

No spam. Unsubscribe anytime.