Dark blue and black thematic depiction of data streams merging

July 3, 2025

How We Built a Smarter MQTT System for the Future of Wi-Fi Sensing

Wi-Fi Sensing is changing the way we understand motion inside homes and buildings — from enabling security systems that detect presence without cameras to powering health and wellness applications that can monitor daily activity patterns. But none of this works without a steady stream of real-time data flowing from devices in the home to processing systems in the cloud. As these sensing applications grow in complexity and reach, the underlying infrastructure that supports them must evolve too.

At Cognitive Systems, our original backend — built around a traditional MQTT setup — was well-suited for the early days of Wi-Fi Sensing. At that stage, deployments were smaller, and real-time motion data didn’t yet require the scale or speed it does today. But as adoption grew and user expectations around responsiveness and reliability increased, it became clear we needed a new approach to support the next stage of performance. (For a closer look at how our cloud architecture supports everything from live motion views to app alerts, read our behind-the-scenes blog on cloud infrastructure powering Wi-Fi Sensing at scale.) That’s why we built CoreV4: a next-generation backend architecture purpose-built to scale reliably, efficiently, and cost-effectively — not just for Wi-Fi Sensing, but for any high-volume, real-time IoT application.

What is CoreV4?

CoreV4 is the latest evolution of our cloud backend — the part of the system that acts as the bridge between the users’ sensing devices and the mobile application they rely on. While Wi-Fi sensing devices like smart plugs collect raw motion data locally, the backend is where that data is processed, translated, and delivered in ways that are meaningful to users. This includes generating live motion views, issuing timely alerts for unexpected activity, and storing motion timelines that help users review past events. Essentially, the backend turns raw signals into useful, actionable insights that are easy for people to interact with. To achieve this, CoreV4 brings several cloud services together for issuing alerts (Motion Notification Service), logging and visualization (Graphite), managing data ingestion (Funnel), and secure and smart device routing (Gatekeeper). The most significant innovation in CoreV4 is its adoption of “sharding”: a message bus partitioning technique that breaks up backend traffic into separate, self-contained lanes (called “shards”).

A Smarter MQTT Backend: Why We Chose Message Bus Partitioning Over Clustering

Most real-time device communication backends, like those built on MQTT (a lightweight messaging protocol used for connecting devices over the internet), scale using a method called clustering. In clustering, multiple brokers—which are servers that manage messages between devices—work together and share data. While clustering helps distribute the load, it comes with a major downside: broadcast storms — a flood of internal messages as brokers repeatedly relay updates to stay in sync. As more brokers are added, this chatter grows exponentially, clogging bandwidth, slowing performance, and increasing the risk of outages.

To avoid this problem, we use message bus partitioning instead. In this cloud architecture, each device is assigned to a specific broker (a “shard”), and importantly, these brokers don’t communicate with each other. This isolation removes all the internal traffic that causes broadcast storms, dramatically reducing complexity and improving performance.

Each shard handles only a subset of devices, preventing overload on any single system. By isolating traffic via message bus partitioning, we can greatly improve reliability and performance. It ensures that if one part of the system slows down or experiences issues, the others can continue operating smoothly — which is critical when users depend on fast notifications or expect real-time motion views. This containment not only makes the system more reliable but also makes it easier to scale as more homes and devices come online. Think of it like organizing highway traffic: clustering is like a giant roundabout with cars weaving in and out of every lane, while message bus partitioning is like a multi-lane highway with traffic lights guiding each lane independently—no collisions, no chaos.

How We Made Message Bus Partitioning Work

Our approach to message bus partitioning solves performance and reliability issues, but it introduces a new question: how does each device know which broker (or shard) it should connect to?

We built a smart routing system to manage this using two tools: Gatekeeper and HAProxy. Here’s how it works:

Gatekeeper, our IoT device provisioning service, decides which broker a device should use based on predefined logic (like region, load balancing, or device ID). It gives the device the domain for its assigned shard (like mqtt-4.yourcloud.com), which is shared by all devices connected to that broker.
The device then connects to an AWS Network Load Balancer (NLB), a service that helps manage and distribute traffic efficiently. The NLB forwards the connection to HAProxy, a fast and flexible proxy server that acts like a smart traffic controller. HAProxy reads something called the Server Name Indication ( SNI), which is part of the device’s initial encrypted connection. The SNI tells us which subdomain (and thus which shard) the device is trying to reach.
HAProxy, which manages and distributes incoming traffic, uses this to route the connection to the right broker.

To ensure stability, we implemented sticky sessions — a technique that ensures each device reconnects to the same broker every time, rather than being sent to a different one. This prevents disruptions like dropped messages or session errors that can happen when devices bounce between servers. Sticky sessions work using the proxy protocol, which passes key connection details through the load balancer so each device can be consistently routed to the correct broker. This consistent mapping also improves observability in production, making it easier to monitor connections, isolate issues, and troubleshoot specific shards or broker pods. In simple terms, the approach we designed is like making sure a traveler always returns to the same hotel room — not a random one each time.

Flow chart diagram showing the process of motion data being moved along the following stops: WiFI Motion Ebales Homes takes the encrypted motion data to 2) the NEtwork Load balancer which streamlines the data flow and sends it to the 3) HAProxy which figures out where it needs to go next and decrypts the data before sending it to the correct, 4) MQTT Server

Visualization of motion data flow in Cognitive’s MQTT system

Scaling Smarter, Not Harder — And Why It Matters

CoreV4 was purpose-built for intelligent growth and operational efficiency. Traditional systems often rely on overprovisioning, which means preemptively adding a surplus of servers to handle possible future demand. This wastes resources and drives up costs. Instead, CoreV4 takes a smarter approach. It uses a dynamic scaler — a background process that constantly monitors how many networks are connected. New shards (isolated brokers that handle groups of devices) are spun up only when needed. This means the system scales automatically based on real demand, avoiding unnecessary overhead while maintaining fast performance. Updates also become safer and faster. In the past, deploying new code or features often meant taking the whole system offline — sometimes for nearly an hour. CoreV4 updates are now rolled out shard by shard, so only a small portion of the system is affected at a time. We’ve also chosen to run our own infrastructure rather than depend on cloud-managed MQTT platforms, which charge per message. That cost model can get expensive fast at scale. By building and maintaining our own backend, we keep operating costs low while gaining the flexibility to optimize for performance and reliability — all on our terms.

For anyone building systems at scale, the lessons behind CoreV4 offer a clear blueprint:

Use HAProxy with sticky sessions and the proxy protocol to ensure stable, consistent MQTT connections.

Tune your kernel and timeout settings, because out-of-the-box defaults won’t withstand the pressure of real-world loads.

Leverage TLS SNI-based routing for fast, secure traffic direction.

And most importantly, instrument everything — from live connections to auth rates and resource usage — so you can detect and resolve issues before they escalate.

CoreV4 proves that scaling isn’t just about adding capacity — it’s about building smarter, with purpose and precision.