TinyML and Edge AI on Resource-Constrained Devices

Contents

Artificial intelligence is no longer confined to powerful servers and cloud platforms; TinyML and Edge AI now bring capable machine learning models onto tiny, battery-powered devices. In 2025, it is just as likely to be running on a device the size of a coin, powered by a small battery, and connected to a sensor.

By some estimates, tens of billions of connected devices are now part of everyday life, from smart speakers to industrial sensors. What has changed is the way these devices process information. Instead of always sending data to the cloud, many are now able to analyze it on the spot.

This shift comes from TinyML and Edge AI. Think of them as ways to squeeze machine learning into gadgets that barely have any memory or power. We’re talking kilobytes instead of gigabytes.

The pace of development has been rapid. Over the past year, new chips, software runtimes, and community standards have matured. Industry groups are consolidating, benchmark suites are setting common baselines, and developers are finding practical ways to deploy models on devices that once seemed too limited.

Key Takeaways:

TinyML is evolving into Edge AI, reflecting a broader scope and industry adoption.
Benchmarking frameworks such as MLPerf Tiny are setting reproducible standards for performance and energy efficiency.
Hardware like Arm Cortex-M55, Arm Ethos-U microNPUs, Syntiant NDP processors, and GreenWaves GAP9 are pushing low-power AI forward.
Software ecosystems including TensorFlow Lite for Microcontrollers, LiteRT, Edge Impulse, ONNX Runtime, and Apache TVM are maturing.
Model compression techniques, quantization, pruning, and distillation, are essential to fitting AI into tiny footprints.
Developers now have clearer pathways from prototyping to production, with better community support and more stable tooling.

What is TinyML?

TinyML is short for tiny machine learning. It means running machine learning models directly on very small, low-power devices. These devices often run on coin-cell batteries, have only a few kilobytes of memory, and are built to do simple but useful tasks. Instead of relying on a big server somewhere far away, the device itself can process data and make decisions on the spot.

A simple diagram showing how TinyML works: data from sensor → tiny model on device → decision/output, contrasted with the cloud approach (data → internet → cloud → back to device). — TinyML Algorithms. Credit: MDPI

What makes TinyML special is its efficiency. A traditional machine learning model might need a powerful computer with lots of memory to run. But TinyML trims those models down, compresses them, and optimizes the code so they fit on microcontrollers and sensors. That means devices can keep working even with limited resources, while using very little energy.

Everyday objects can become “smart” without being bulky or constantly connected to the internet. A fitness tracker that counts your steps, a security sensor that detects glass breaking, or a farm sensor that spots changes in soil moisture are all examples of TinyML in action. It takes artificial intelligence out of the data center and puts it into the smallest corners of daily life.

From TinyML to Edge AI: A Growing Identity

A notable change in the past year has been the rebranding of the TinyML Foundation into the Edge AI Foundation. It reflects how the community has moved from experimental “tiny” projects to industry-scale deployments at the edge.

Companies are no longer seeing these tools as niche. Instead, they are being integrated into medical wearables, smart agriculture systems, and predictive maintenance sensors. The terminology shift also highlights the role of micro-NPUs and specialized accelerators, which expand the scope beyond microcontrollers alone.

The research community has followed this trend. Where earlier conferences focused on proving that neural networks could even run on a microcontroller, today’s tracks explore optimized compilers, dedicated silicon, and reproducible power measurements.

Benchmarks That Set the Standard for TinyML

When comparing chips and software, marketing claims are rarely enough. This is where MLPerf Tiny has become central. The benchmark suite includes common tasks such as keyword spotting, image recognition at low resolution, and anomaly detection from sensor data.

A visual of modern microcontrollers designed for TinyML: Arduino Nano 33 BLE Sense, — Arduino Nano 33 BLE Sense

Version 1.2 of MLPerf Tiny was published in 2024, and Version 1.3 results were released in 2025. These reports allow developers, vendors, and researchers to compare devices on a level playing field. Instead of vague claims of “fast” or “efficient,” MLPerf Tiny provides numbers for latency, throughput, and energy use.

The impact is simple, benchmarks make it easier to trust that a device can really run inferences for days or weeks on a coin cell battery, and that it will work consistently across vendors.

Hardware Driving the Change

The progress of TinyML would not be possible without specialized hardware. Several platforms now stand out:

Arm Cortex-M55 with Helium: This core brings advanced DSP and ML instructions into the microcontroller world. It delivers performance uplifts over previous Cortex-M generations, while keeping the low-power profile developers expect.
Arm Ethos-U Series (U55, U65, U85): These are micro-NPUs built to sit alongside Cortex-M cores. They offload the heavy lifting of neural inference, making voice and vision models practical on ultra-low-power chips.
Syntiant Neural Decision Processors (NDP120, NDP250): Designed for always-on audio and sensor applications, these processors are extremely efficient at tasks like wake-word detection and environmental sound classification.
GreenWaves GAP9: Based on RISC-V, this processor is optimized for embedded audio and sensor fusion. It offers a strong balance of performance and efficiency, with a focus on hearables and edge audio analytics.

Software Ecosystem Maturing

Hardware alone is not enough. To run AI models efficiently, developers need runtimes and compilers that can translate neural networks into optimized instructions for tiny chips. Over the past year, several ecosystems have matured:

TensorFlow Lite for Microcontrollers (TFLM) remains the most widely used open-source runtime. It supports many MCUs and integrates well with existing TensorFlow workflows.
LiteRT for Microcontrollers is Google’s new runtime designed for extremely constrained devices, capable of fitting into as little as 16 KB of memory for basic models.
Edge Impulse provides a developer-friendly environment that combines data collection, model training, and deployment into a single pipeline. Its EON compiler helps optimize memory and latency.
ONNX Runtime and ONNX Runtime Mobile are increasingly important for cross-platform portability. Arm and Microsoft have collaborated to bring optimizations for Arm devices through KleidiAI.
Apache TVM and microTVM continue to evolve as compiler stacks, though discussions in 2024–2025 show that microcontroller support requires ongoing community work. Still, TVM remains valuable for advanced optimization and ahead-of-time compilation.

Techniques That Make AI Fit

Bringing a neural network into a few hundred kilobytes of memory requires more than just clever hardware. Researchers and practitioners rely on a set of techniques that shrink and accelerate models without sacrificing too much accuracy:

Quantization: Converting model weights from 32-bit floating point to 8-bit integers, and in some cases even fewer bits, drastically reduces memory and computation.
Pruning and sparsity: Removing redundant weights and connections streamlines the model. Structured pruning can align with hardware capabilities, boosting efficiency further.
Knowledge distillation: Training a small “student” model to imitate a larger “teacher” model allows compact networks to retain much of the accuracy of their bigger counterparts.
Neural architecture search: Automated methods can design architectures specifically for small devices, often outperforming manually reduced models.
Ahead-of-time compilation and operator fusion: Compilers generate lean C code or platform-specific binaries that eliminate runtime overhead.

Together, these methods ensure that models can run continuously on devices with minimal memory and power.

Use Cases Emerging in the Real World

The combination of efficient hardware, optimized runtimes, and compact models is leading to practical deployments across sectors:

Smart homes: Always-on voice detection and gesture recognition can run without cloud connectivity, improving privacy and responsiveness.
Healthcare wearables: Continuous monitoring of heart rate, breathing patterns, or movement can be analyzed locally, reducing battery drain and ensuring faster feedback.
Industrial monitoring: Vibration and sound sensors equipped with TinyML models can detect anomalies in machinery early, minimizing downtime.
Agriculture: Low-power devices in the field can monitor soil, crops, and weather, making farming smarter without requiring constant internet access.

These examples show how moving intelligence directly onto devices is not just a technical exercise but a practical enabler of new applications.

Agricultural monitoring system with TinyML. Credit: Springer Nature

Gaps and Challenges Ahead

Despite the progress, several challenges remain:

Fragmented tooling: With multiple runtimes, compilers, and frameworks, portability is not guaranteed. Developers often need to validate workflows across platforms.
Energy measurement: While MLPerf Tiny helps, energy profiling is still inconsistent across vendors. For accurate deployment, teams must perform in-house testing.
Compiler maturity: Projects like microTVM highlight how community-driven work can fluctuate. Stability and long-term support remain important considerations.

These gaps are not insurmountable, but they show that edge AI is still evolving.

The Revolution of TinyML and Edge AI

TinyML and Edge AI are making intelligence local. Instead of thinking of AI as something locked away in distant servers, we’re now seeing it inside the tiniest of devices, on our desks, in our pockets, and out in the field.

The secret to why this shift is taking place is incremental advancement in the background. Benchmarks like MLPerf Tiny give engineers an honest way of comparing devices, and fresh microcontrollers, Cortex-M processors, and tiny NPUs show that real products can run reliably on low power. It’s not theory anymore, the tools exist, and people are developing with them.

The future of AI won’t depend on massive models and supercomputers. Sometimes it will be about the small difference in a sensor on the wall, the kind that spreads outwards in ways we hardly appreciate, until it becomes normal.