
Contents
Z.AI (formerly Zepoo AI) has launched the GLM 4.5 series, comprising the flagship GLM 4.5 model and the lighter GLM 4.5 Air. Positioned as a significant open-source release in 2025, these models emphasize a balance of performance, efficiency, agent capabilities, and cost.
Model Architecture and Efficiency
- GLM 4.5: A 355 billion parameter foundation model utilizing a mixture of experts (MoE) architecture. During inference, only 32 billion parameters are active per prompt. This design aims for high performance while managing costs and hardware demands.
- GLM 4.5 Air: A smaller variant designed for accessibility, with 106 billion total parameters and 12 billion active per prompt. It targets hardware with 32GB to 64GB of VRAM.
- Core Focus: Both models are engineered specifically for agent tasks, enabling step-by-step reasoning, tool usage, multi-turn planning, API calling, and interface control.
- Operational Modes: Users can toggle between a “thinking mode” for complex reasoning and a “fast response mode” for speed.
Performance and Speed
- The models employ speculative decoding and multi-token prediction layers.
- GLM 4.5 generates over 100 tokens per second via its high-speed API, with observed peaks near 200 tokens per second.
- It supports a context window of 128,000 tokens for input and 96,000 tokens for output.
Training and Development
- Pre-training utilized 15 trillion tokens of general-purpose data.
- An additional 7-8 trillion tokens were used for fine-tuning focused on code, reasoning, and autonomous agent tasks.
- Z.AI developed “Slime,” a custom reinforcement learning infrastructure. Slime uses a hybrid setup decoupling training and data generation across hardware. It supports synchronous training and asynchronous rollouts, employing FP8 mixed precision to maintain efficiency during long agent tasks.
- Architectural choices include increased model depth (more layers) over width, grouped query attention, partial rotary positional embeddings, 96 attention heads, and a hidden size of 5,120. A multi-token prediction layer aids speculative decoding.
Benchmark Results
- Across 12 major evaluations (including reasoning, math, coding, agent behavior), GLM 4.5 ranked third globally, behind OpenAI’s GPT-4 and X.AI’s Grok-4, and ahead of models like Claude 4 Opus, DeepSeek-R1, and Gemini 2.5 Flash in most areas.
- Key scores:
- Reasoning: 91% on AIM24, 98.2% on MATH 500.
- Coding: 64.2% on SWE-bench verified, 37.5% on Terminal Bench.
- Agent Tasks: 53.9% win rate over Kimi K2 in 52 tasks; 90.6% tool calling success rate (6% higher than Claude 4 Sonnet, Kimi K2, DeepSeek-R1).
- Web Research (BrowseComp): 26.4% accuracy.
Pricing and Accessibility
- API pricing is approximately ¥0.8 (approx. $0.11 USD) per million input tokens and ¥2 (approx. $0.28 USD) per million output tokens. Combined cost is roughly ¥2.8 ($0.39 USD) per million tokens.
- This pricing is significantly lower than competitors like Claude 4 (roughly ¥30 per million tokens at 100K context) and OpenAI’s GPT-4.
- The models are released under the MIT license, making them open source and commercially usable.
- Weights are available on Hugging Face and ModelScope.
- Z.AI provides OpenAI-compatible APIs and compatibility with existing agent frameworks like LangChain for easy integration.
Demonstrated Capabilities
- Live demos showcased GLM 4.5 acting as a research assistant (performing web searches, analyzing results, compiling answers with sources).
- It built and controlled a Flappy Bird game from scratch (logic and animation).
- It generated a complete HTML-based PowerPoint slide deck based on a single prompt (layout, content, images, text).
- It built a full-stack web application (frontend, backend, database, deployment) through multi-turn conversation, resulting in a functional site.
Context and Strategy
- Z.AI is part of a group of Chinese AI startups (including Moonshot, Step AI, Baichuan) referred to locally as the “AI6 Tigers,” known for releasing advanced open models.
- This release positions GLM 4.5 as the largest open model to date (355B parameters).
- Z.AI’s open-source, low-cost strategy contrasts with closed, expensive US models like GPT-4 and Claude 3, aiming for adoption through network effects.
- The company has secured substantial funding: $1.5 billion from investors including Tencent, Alibaba, and Chinese local governments, plus a recent ~$140 million stake from Shanghai’s Puong VC.
- Z.AI is preparing for a potential Hong Kong IPO and plans to raise an additional $300 million, focusing on foundation models. GLM 5 and multimodal capabilities are in development.
Discover more from Aree Blog
Subscribe now to keep reading and get access to the full archive.

