What are small language models — and why do smart homes want them?

A small language model (SLM) is a compact AI model, usually under three billion parameters, designed to run directly on a device rather than in a remote data centre. Through techniques like quantisation (shrinking model weights from 16-bit to 4-bit or even 2-bit precision), pruning, and distillation, these models fit inside chips with limited memory and tight power budgets.

For smart homes, the appeal is straightforward. A local model means your voice command never leaves the house. Response times drop below 100 milliseconds because there is no round trip to a server. The system keeps working when your internet goes down. And there is no per-query cloud cost eating into the margin of a $50 thermostat.

Today, most smart speakers and hubs use a split approach: a tiny on-device model handles wake-word detection ("Hey Google," "Alexa"), then everything else ships off to a cloud-based large language model. SLMs promise to move far more of that processing onto the device itself. Not free-form conversation about the meaning of life — but understanding what you mean when you say "turn off the downstairs lights except the hallway" and dispatching the right commands without waiting for a server.

That distinction matters. The most useful small language models for smart homes are not chat models. They are function-calling models — trained to interpret a natural language request and output a structured command. Google released Gemma 3 at 270 million parameters specifically for this kind of tool dispatch. That is a different, more tractable problem than general conversation, and it needs far less compute to solve well.

What's actually shipping right now

For all the conference-stage demos and chip announcements, the list of products you can buy today with a real on-device language model is still short. But it is no longer empty.

SwitchBot AI Hub went on sale in January 2026 for $259.99. It runs a local vision-language model for camera analysis and automation triggers, bridges to Apple Home, Google Home, Alexa, and SmartThings via Matter, and embeds a Home Assistant Core container directly on the hub — no Raspberry Pi required. It handles up to 100 SwitchBot devices and eight concurrent 2K camera feeds locally. There is a subscription behind the advanced AI features after a one-month trial, which is a revealing tell about the economics of edge AI: local inference eliminates per-query cost but does not eliminate the manufacturer's need for recurring revenue.

Xiaomi Miloco (Xiaomi Local Copilot) debuted at MWC 2026. Built on Xiaomi's own MiMo foundation model, it handles multimodal smart home automation: detecting clutter and dispatching the robot vacuum, adjusting room temperature based on a person's comfort and sleep state, orchestrating multi-device scenarios. Xiaomi's separate open-source voice model, MiDashengLM-7B, is already embedded in over 30 shipping Xiaomi products, powering wake-up systems, gesture controls, and sound monitoring. Xiaomi ships more smart home devices annually than Amazon, Google, and Apple combined, so this is not a niche experiment.

The open-source stack works today. Home Assistant with Piper (local neural text-to-speech), Whisper (local speech-to-text), and any local LLM via Ollama forms a complete, fully offline voice assistant. Over a million active installations run this stack. The community project "Home LLM" fine-tunes small models specifically for smart home device control — function-calling, not chat. It runs on a Raspberry Pi 4. Not a polished consumer product, but proof that every layer of the local voice AI pipeline works on commodity hardware right now.

Robot vacuums are worth watching as a stealth beachhead. They already carry cameras, vision processors, and enough battery for inference. Models like the iRobot j7+ and Roborock S8 run vision transformers for obstacle avoidance. Adding a small function-calling language model to interpret natural commands ("clean under the dining table, avoid the kids' toys") is a short step from where they already are — and Xiaomi's MWC demo featured exactly this.

The silicon that makes it possible

The hardware story in 2026 is broader than most coverage suggests. It is not just Qualcomm and Arm.

Google Coral NPU is an open-source, RISC-V-based neural processing unit co-designed with Google Research and DeepMind. It delivers 512 billion operations per second at roughly 6 milliwatts. Read that again: six milliwatts. That power profile means always-on AI inference in a device that runs on a coin cell or a thin power cable. The Synaptics Astra SL2610, which integrates the Coral NPU, is sampling now with general availability planned for Q2 2026. Because it is RISC-V (open instruction set, no per-unit licensing fees), it changes the BOM arithmetic for thin-margin appliances.

Nordic Semiconductor's nRF54L with Axon NPU is perhaps the most overlooked announcement. Nordic's nRF chips are already inside millions of smart home sensors, locks, thermostats, and trackers. Adding an on-chip NPU that accelerates TensorFlow Lite models 15x faster than CPU-only execution — with native Matter-over-Thread support — means existing product lines can gain AI capabilities in a hardware revision, not a full redesign. Sampling Q2 2026.

Qualcomm Snapdragon Wear Elite is the first wearable platform with an NPU, supporting models up to two billion parameters on-device. That is the strongest signal yet that smartwatches and fitness bands will get real local language features, not just forwarded-to-phone queries. And Arm's Ethos-U85 brings transformer support to IoT-class devices through the ExecuTorch runtime, giving appliance makers a familiar deployment path.

Dark horse: neuromorphic chips. Intel's Loihi 3 (early 2026) packs 8 million neurons and delivers up to 1,000x better energy efficiency than conventional processors on state-space model workloads. Neuromorphic chips activate only when input changes — perfect for a security camera watching an empty room or a smoke detector listening for anomalies. Not a replacement for SLMs, but a complementary architecture for always-on sensing that uses almost no power.

What's holding it back

Memory bandwidth, not compute, is the bottleneck. Mobile devices offer 50–90 GB/s of memory bandwidth; data-centre GPUs deliver 2–3 TB/s. Language model inference is memory-bound because the full model weights stream through memory for every single token generated. The TOPS (trillions of operations per second) numbers that chip vendors advertise are misleading for this workload — they reflect idealised batch scenarios, not the autoregressive decode that an SLM actually does when answering your question. Real-world benchmarks show a Hailo-10H NPU getting about 6.9 tokens per second on a 1.5-billion-parameter model, nowhere near what its 40 TOPS rating might suggest. Google's David Patterson published a January 2026 paper documenting that over the past decade AI compute grew 80x while memory bandwidth grew only 17x. That gap is structural.

Thermal throttling kills "always-on." A March 2026 benchmark found that an iPhone 16 Pro loses 44% of its inference throughput within two iterations under sustained LLM load, and a Samsung Galaxy S24 Ultra hits an OS-enforced GPU frequency floor that terminates inference entirely after six iterations. A dedicated low-power NPU at under 5 watts sustained near-zero-variance performance indefinitely. The implication: purpose-built edge silicon will outperform repurposed phone chips for always-on agent workloads, even with less peak compute.

Security is the uncharted problem. A smart home hub running a local SLM that controls locks, cameras, and alarms is a fundamentally different security target than a thermostat running simple rules. Recent research documents specific attack vectors: prompt injection through adversarial sensor inputs, model extraction from physical devices, and privacy leakage through model responses grounded in personal data. The industry has barely begun to address these. It is worth watching how early products handle this — and whether it slows adoption.

What to watch over the next 12 months

The infrastructure has arrived. The products are starting. Here is what separates "early adopter curiosity" from "your next smart home upgrade" over the next year.

Chip availability. Nordic's nRF54L and Google's Coral-based Synaptics SL2610 both target broad availability in mid-2026. If both ship on time with good developer tooling, expect a wave of smaller manufacturers adding local AI features to sensors, hubs, and cameras by year-end.

Subscription pricing disclosures. SwitchBot charges for advanced AI features after a trial. If other edge-AI products follow the same pattern, subscription pricing will reveal the real cost-per-user of running on-device models — and whether it is low enough for mass-market devices.

Whether any major Western appliance brand responds to Xiaomi. Xiaomi's Miloco is already embedded across 30+ products. Samsung has its Bespoke AI fridge with Gemini. Amazon's Alexa+ is smarter but still cloud-dependent. The competitive pressure is real. If a tier-one Western brand announces a local-SLM product line in 2026, the category inflects.

Cross-border compliance pressure is also accelerating the shift. Advanced-market privacy and data governance rules increasingly favour architectures that keep personal data on-device. For appliance makers selling into multiple jurisdictions, local inference may become the path of least regulatory friction, not just a privacy feature.

The next question is not whether small language models work in smart homes. The SwitchBot hub, Xiaomi's ecosystem, and the Home Assistant community have answered that. The question is whether manufacturers can make the economics work at the $50–$100 device tier, not just the $260 premium hub tier. The silicon arriving in mid-2026 suggests they might.

Frequently asked questions

What is a small language model?

A compact AI model — typically under three billion parameters — designed to run directly on a device instead of needing a cloud server. Think of it as a scaled-down version of ChatGPT or Gemini that trades broad knowledge for speed, privacy, and the ability to work offline.

Can small language models run on smart home devices today?

Yes. The SwitchBot AI Hub ($259.99) and Xiaomi Miloco both ship with on-device language models in 2026. The open-source Home Assistant stack can also run a local voice assistant on a Raspberry Pi 4 with no cloud dependency.

What is the difference between an SLM and a large language model for smart homes?

SLMs run locally on the device itself — faster responses, works offline, data stays private. Large language models (LLMs) typically run on cloud servers, offering broader capabilities but requiring an internet connection and introducing latency and privacy trade-offs.

Do I need special hardware for on-device AI in my smart home?

New NPU-equipped chips (Google Coral, Nordic Axon, Qualcomm Snapdragon Wear Elite) make it easier, but you can run a basic local voice assistant today on a Raspberry Pi 4 with open-source software.