Z.AI Releases GLM 4.5 Series with New Features

Z.AI Releases GLM 4.5 Series: High-Performance, Open-Source Models Focused on Agent Capabilities

Contents

Z.AI (formerly Zepoo AI) has launched the GLM 4.5 series, comprising the flagship GLM 4.5 model and the lighter GLM 4.5 Air. Positioned as a significant open-source release in 2025, these models emphasize a balance of performance, efficiency, agent capabilities, and cost.

Model Architecture and Efficiency

GLM 4.5: A 355 billion parameter foundation model utilizing a mixture of experts (MoE) architecture. During inference, only 32 billion parameters are active per prompt. This design aims for high performance while managing costs and hardware demands.
GLM 4.5 Air: A smaller variant designed for accessibility, with 106 billion total parameters and 12 billion active per prompt. It targets hardware with 32GB to 64GB of VRAM.
Core Focus: Both models are engineered specifically for agent tasks, enabling step-by-step reasoning, tool usage, multi-turn planning, API calling, and interface control.
Operational Modes: Users can toggle between a “thinking mode” for complex reasoning and a “fast response mode” for speed.

Performance and Speed

The models employ speculative decoding and multi-token prediction layers.
GLM 4.5 generates over 100 tokens per second via its high-speed API, with observed peaks near 200 tokens per second.
It supports a context window of 128,000 tokens for input and 96,000 tokens for output.

Training and Development

Pre-training utilized 15 trillion tokens of general-purpose data.
An additional 7-8 trillion tokens were used for fine-tuning focused on code, reasoning, and autonomous agent tasks.
Z.AI developed “Slime,” a custom reinforcement learning infrastructure. Slime uses a hybrid setup decoupling training and data generation across hardware. It supports synchronous training and asynchronous rollouts, employing FP8 mixed precision to maintain efficiency during long agent tasks.
Architectural choices include increased model depth (more layers) over width, grouped query attention, partial rotary positional embeddings, 96 attention heads, and a hidden size of 5,120. A multi-token prediction layer aids speculative decoding.

Benchmark Results

Across 12 major evaluations (including reasoning, math, coding, agent behavior), GLM 4.5 ranked third globally, behind OpenAI’s GPT-4 and X.AI’s Grok-4, and ahead of models like Claude 4 Opus, DeepSeek-R1, and Gemini 2.5 Flash in most areas.
Key scores:
- Reasoning: 91% on AIM24, 98.2% on MATH 500.
- Coding: 64.2% on SWE-bench verified, 37.5% on Terminal Bench.
- Agent Tasks: 53.9% win rate over Kimi K2 in 52 tasks; 90.6% tool calling success rate (6% higher than Claude 4 Sonnet, Kimi K2, DeepSeek-R1).
- Web Research (BrowseComp): 26.4% accuracy.

Pricing and Accessibility

API pricing is approximately ¥0.8 (approx. $0.11 USD) per million input tokens and ¥2 (approx. $0.28 USD) per million output tokens. Combined cost is roughly ¥2.8 ($0.39 USD) per million tokens.
This pricing is significantly lower than competitors like Claude 4 (roughly ¥30 per million tokens at 100K context) and OpenAI’s GPT-4.
The models are released under the MIT license, making them open source and commercially usable.
Weights are available on Hugging Face and ModelScope.
Z.AI provides OpenAI-compatible APIs and compatibility with existing agent frameworks like LangChain for easy integration.

Demonstrated Capabilities

Live demos showcased GLM 4.5 acting as a research assistant (performing web searches, analyzing results, compiling answers with sources).
It built and controlled a Flappy Bird game from scratch (logic and animation).
It generated a complete HTML-based PowerPoint slide deck based on a single prompt (layout, content, images, text).
It built a full-stack web application (frontend, backend, database, deployment) through multi-turn conversation, resulting in a functional site.

Context and Strategy

Z.AI is part of a group of Chinese AI startups (including Moonshot, Step AI, Baichuan) referred to locally as the “AI6 Tigers,” known for releasing advanced open models.
This release positions GLM 4.5 as the largest open model to date (355B parameters).
Z.AI’s open-source, low-cost strategy contrasts with closed, expensive US models like GPT-4 and Claude 3, aiming for adoption through network effects.
The company has secured substantial funding: $1.5 billion from investors including Tencent, Alibaba, and Chinese local governments, plus a recent ~$140 million stake from Shanghai’s Puong VC.
Z.AI is preparing for a potential Hong Kong IPO and plans to raise an additional $300 million, focusing on foundation models. GLM 5 and multimodal capabilities are in development.