News

The strongest open source model! Alibaba released and open source Qwen3, seamlessly integrating thinking mode, multi-language, and easy to call Agent

Source: Wall Street Insights

Alibaba released and open-sourced the Tongyi Qianwen 3.0 (Qwen3) series of models on Monday, claiming that it can rival DeepSeek's performance in multiple aspects such as mathematics and programming. Compared with other mainstream models, Qwen3 also significantly reduces the deployment cost. Alibaba stated that Qwen3 seamlessly integrates two thinking modes, supports 119 languages, and is convenient for Agent calls.

Performance Comparable to DeepSeek R1 and OpenAI o1, All Open-Sourced

The Qwen3 series includes two Mixture-of-Experts (MoE) models and six other models. Alibaba said that the newly released flagship model, Qwen3-235B-A22B, demonstrates highly competitive performance in benchmark tests of code, mathematics, general capabilities, etc., compared with top models such as DeepSeek-R1, o1, o3-mini, Grok-3, and Gemini-2.5-Pro.

In addition, the number of activated parameters of the Qwen3-30B-A3B model, known as the "Mixture-of-Experts" (MoE) model, is 10% of that of QwQ-32B, and it performs even better. Even a small model like Qwen3-4B can rival the performance of Qwen2.5-72B-Instruct. This type of system simulates the way humans solve problems, dividing tasks into smaller data sets, similar to having a group of experts with different specialties handle different parts respectively, thus improving overall efficiency.

Meanwhile, Alibaba also open-sourced the weights of two MoE models: Qwen3-235B-A22B with over 235 billion total parameters and more than 22 billion activated parameters, and the small MoE model Qwen3-30B-A3B with about 30 billion total parameters and 3 billion activated parameters. In addition, six Dense models have also been open-sourced, including Qwen3-32B, Qwen3-14B, Qwen3-8B, Qwen3-4B, Qwen3-1.7B, and Qwen3-0.6B, all open-sourced under the Apache 2.0 license.

"Hybrid" Model, Two Thinking Modes, and Greatly Reduced Deployment Cost

Alibaba stated that one of the major innovations of the Qwen 3 series lies in its "hybrid" model design, which integrates two thinking modes. Qwen3 can either spend time "reasoning" to solve complex problems (thinking mode) or quickly answer simple requests (non-thinking mode). The reasoning ability in the "thinking mode" enables the model to effectively conduct self-fact-checking, similar to OpenAI's o3 model, but at the cost of higher latency during the reasoning process.

The Qwen team wrote in a blog post:

This flexibility allows users to control the degree to which the model "thinks" according to specific tasks. For example, complex problems can be solved by expanding the reasoning steps, while simple problems can be answered directly and quickly without delay.

Crucially, the combination of these two modes greatly enhances the model's ability to achieve stable and efficient control of the "thinking budget." As mentioned above, Qwen3 demonstrates scalable and smooth performance improvement, which is directly related to the allocated computational reasoning budget.

Such a design allows users to set their own "thinking costs" and more easily configure specific budgets for different tasks, achieving a better balance between cost-effectiveness and reasoning quality. Compared with other large models of comparable performance, Qwen3.0 significantly reduces the deployment threshold. According to data comparison:

The full-powered 671B DeepSeek-R1 requires 8 H20 GPUs (approximately 1 million yuan) to run, and the recommended configuration is 16 H20 GPUs (approximately 2 million yuan).

The flagship model of Qianwen 3 only requires 3 H20 GPUs (approximately 360,000 yuan) to run, and the recommended configuration is 4 H20 GPUs (approximately 500,000 yuan).

Therefore, from the perspective of deployment cost, the flagship model of Qwen3.0 is 25% to 35% of the full-powered R1, and the deployment cost is greatly reduced by 75% to 65%.

The Training Data Volume is Twice That of Qwen2.5, Convenient for Agent Calls

Alibaba stated that the Qwen3 series supports 119 languages and is trained based on nearly 3.6 trillion tokens, using twice the amount of data as Qwen2.5. A token is the basic data unit processed by the model, and about 1 million tokens are equivalent to 750,000 English words. Alibaba claims that the training data of Qwen3 includes various contents such as textbooks, question-answer pairs, and code snippets.

According to the introduction, the pre-training process of Qwen3 is divided into three stages. In the first stage (S1), the model was pre-trained on more than 3 trillion tokens with a context length of 4K tokens. This stage provides the model with basic language skills and general knowledge.

In the second stage (S2), the training improved the data set by increasing the proportion of knowledge-intensive data (such as STEM, programming, and reasoning tasks), and then the model was further pre-trained on an additional 500 billion tokens. In the final stage, high-quality long-context data was used to expand the context length to 32K tokens, ensuring that the model can effectively handle longer inputs.

Alibaba said that due to the improvement of the model architecture, the increase in training data, and more effective training methods, the overall performance of the Qwen3 Dense base model is comparable to that of the Qwen2.5 base model with more parameters. For example, Qwen3-1.7B/4B/8B/14B/32B-Base performs comparably to Qwen2.5-3B/7B/14B/32B/72B-Base respectively. Especially in fields such as STEM, coding, and reasoning, the performance of the Qwen3 Dense base model even exceeds that of the larger-scale Qwen2.5 model. For the Qwen3 MoE base models, they achieved similar performance to the Qwen2.5 Dense base model while only using 10% of the activated parameters, significantly saving training and reasoning costs.

In the post-training stage, Alibaba fine-tuned the model using diverse long-thinking-chain data, covering various tasks and fields such as mathematics, code, logical reasoning, and STEM problems, equipping the model with basic reasoning capabilities. Then, through large-scale reinforcement learning, rule-based rewards were used to enhance the model's exploration and in-depth research capabilities.

Alibaba stated that Qwen3 performs excellently in capabilities such as tool-calling, executing instructions, and replicating specific data formats, and recommends users to use Qwen-Agent to fully leverage the Agent capabilities of Qwen3. Qwen-Agent internally encapsulates the tool-calling template and the tool-calling parser, greatly reducing code complexity.

In addition to providing a downloadable version, Qwen3 can also be used through cloud service providers such as Fireworks AI and Hyperbolic.

The Goal is Still Aimed at AGI

OpenAI, Google, and Anthropic have also successively launched several new models recently. OpenAI recently said that it also plans to release a more "open" model in the next few months, imitating the way humans reason, which marks a shift in its strategy. Previously, DeepSeek and Alibaba had already taken the lead in launching open-source AI systems.

Currently, Alibaba is building its AI landscape with Qwen as the core. In February this year, CEO Wu Yongming said that the company's current "top priority" is to achieve Artificial General Intelligence (AGI) - that is, to create an AI system with human intelligence levels.

Alibaba said that Qwen3 represents an important milestone in the company's journey towards Artificial General Intelligence (AGI) and Artificial Superintelligence (ASI). Looking ahead, Alibaba plans to enhance the model from multiple dimensions, including optimizing the model architecture and training methods, to achieve several key goals: expanding the data scale, increasing the model size, extending the context length, broadening the modality range, and using environmental feedback to advance reinforcement learning for long-cycle reasoning.

The Open-Source Community is Excited

The release of Alibaba's Qwen3 has excited the AI community, and some netizens have presented classic Memes:

Some netizens said,

In my tests, the performance of 235B in high-dimensional tensor operations is equivalent to that of Sonnet.

This is a really excellent model,

Thank you all.

Some netizens praised Qwen3:

If I hadn't seen the tokens generated in real-time on the screen with my own eyes, I wouldn't have believed those benchmark test results at all. It's just like magic.

And supporters of open-source AI are even more excited. Some netizens said:

"With an open-source 32B large model, its performance is on par with Gemini 2.5 Pro."

"We're back with a vengeance!"

Netizens thanked Alibaba for actively promoting open-source:

Disclaimer: The views in this article only represent the personal views of the author and do not constitute investment advice on this platform. This platform does not make any guarantees regarding the accuracy, integrity, originality, and timeliness of the article information, nor does it assume any liability for any losses caused by the use or reliance on the article information.

PREVIOUS：Morning News NEXT：Accident! The U.S. Treasury Department's Q2 l