Reference / Case Study

Combotto contributes to optimizing secure edge IoT gateway

Thomas Bonderup 25 Nov 2025 Security & Reliability Audit

A comprehensive reliability and security audit of Combotto's secure edge IoT Gateway, identifying strengths, architectural bottlenecks, and a 90-day roadmap toward production-grade resilience.

Why this reference matters

Real evidence, not abstract claims

These references show the kind of architecture, delivery pressure, and proof Combotto works with in practice.

Findings should lead somewhere

A strong audit produces a concrete backlog, not a vague list of concerns that dies after the meeting.

Implementation follows the evidence

The Sprint is where the highest-value fixes get done and verified before posture drifts again.

Security & Reliability Audit iot rust gateway audit

A comprehensive reliability and security audit of Combotto's secure edge IoT Gateway, identifying strengths, architectural bottlenecks, and a 90-day roadmap toward production-grade resilience.

1. Background

Modern IoT systems often run in environment where devices must operate reliably even when the network is unstable, security requirements are strict, and large volumes of sensor data need to be processed safely. To demonstrate Combotto’s capabilities within secure and reliable IoT infrastructure, we created a reference IoT Edge Gateway built on modern technologies designed for safety, robustness, and transparency.

The gateway functions as the “local brain” at the edge of an IoT system: It received data from devices at the edge, ensures it is stored safely, and forwards it to the cloud when the network is available - all while enforcing strict security and providing full operational visibility.

Below is a context diagram of the gateway:

Secure IoT edge gateway cloud context diagram

To achieve this, the gateway is built on:

Rust, a modern programming language designed to eliminate entire classes of bugs and security vulnerabilities.
SQLite with write-ahead logging (WAL), ensuring data is never lost - even during power loss or network outrages - by storing sensor data locally until it can be delivered safely.
MQTT, the industry-standard protocol for IoT devices, enabling lightweight and reliable communication across sensors, machines, and cloud platforms
TLS/mTLS, securing communication and ensuring only trusted devices and servers can exchange data.
OpenTelemetry + Tempo tracing, giving full, real-time insight into how data flows through the gateway and where performance bottlenecks occur.

2. Scope of the Audit

To prepare the gateway for upcoming customer requirements and future fleet scaling, the audit focused on identifying potential security gaps, reliability risks, and weaknesses in observability. The analysis followed Combotto’s structured IoT Audit Framework and covered both the codebase and the operational maturity of the system.

The scope included:

Architecture Review (Edge Device -> Gateway -> Cloud -> UI) Evaluation of system boundaries, data flow, trust zones, and failure isolation between components.
MQTT Pipeline Analysis (Ingress, QoS handling, topic isolation, error paths) Assessment of MQTT ingress, message validation, topic hygiene, robustness under network instability and high-volume ingestion.
Security Posture & TLS/mTLS (High-risk zones in IoT) Inspection of cryptographic configuration, certificate lifecycle, identity trust model, and vulnerabilities common to IoT deployments.
Storage & Persistence Layer (WAL durability, replay logic) Analysis of SQLite WAL mode, durability guarantees under power loss, replay logic, offline-first behaviour.
Observability (metrics, logs, traces, failure visibility) Review of metrics, logging strategy, traces coverage. missing signals, bottleneck detection.
Deployment & Update Strategy Evaluation of provisioning, versioning, OTA strategy, secure update flow, and operational risks during rollout.

3. Key Strengths Observed

Strong foundation built on Rust ensures memory safety and predictable execution.
WAL-based local buffering enables reliable offline-first behavior.
Clean separation between ingest -> persistence -> cloud sink.
Observability instrumentation (metrics + traces) introduced early.
Simple, composable architecture that can scale horizontally.
Schema versioning for telemetry.

4. Critical Findings

Missing Certificate Rotation for Gateway & MQTT Broker Long-lived certificates introduce long-term security exposure.
Lack of Topic Governance & ACL Restrictions Insufficient MQTT access control increases the risk of data injection or privilege escalation.
Limited Observability Missing tracing, inconsistent metrics, and no end-to-end latency visibility.

5. Architecture Overview

Data from edge devices is ingested via MQTT (QoS1), persisted locally in WAL for durability, and forwarded to the cloud ingestion API. The gateway’s reliability depends on robust local buffering, connection management, and cloud retry behavior.

6. Security Assessment

Missing Certificate Rotation
• Why it matters: Long-lived certs create persistent attack windows.
• Evidence: Certificates are months old, no rotation policy in place.
• Impact: Gateway compromise -> full device-fleet exposure.
• Severity: High
• Effort: Medium (1–2 weeks to automate rotation)
MQTT ACLs Not Enforced
• Why it matters: Any device can publish/subscribe across topics.
• Evidence: No ACL file, wildcard permissions in broker config.
• Impact: Privilege escalation; rogue device may inject or manipulate telemetry.
• Severity: High
• Effort: Low

7. Reliability & Observability Assessment

Message delivery is reliable under normal conditions, but offline buffering is incomplete. There is currently no idempotency mechanism at the cloud API layer, increasing risk of duplicate ingestion.

Observability Overview:
Metrics are incomplete. Gateway uptime is monitored, but end-to-end latency and device drop rates are not.Trace coverage is limited.

Area	Status	Notes
Durable delivery	Partial	QoS1 ok; WAL replay incomplete
Local caching	Partial	Limited offline queue behavior
Health checks	Yes	Basic liveness/readiness only
Tracing	Partial	OTel basics, missing correlation
Prometheus metrics	Partial	Throughput missing
Log aggregation	Partial	Gateway logs local only

Reliability score: 2/5 Observability score: 1/5

8. 30/60/90 Day Improvement Roadmap

30 Days (High Priority - Security & Baseline Reliability)
Focus: Close the most critical security gaps and establish baseline visibility.

Enable TLS for device -> gateway and gateway -> cloud
Introduce MQTT ACLs, topic namespaces, and least-privilege policies
Implement certificate rotation for gateway + MQTT broker
Add baseline Prometheus metrics
Ingress throughput,
Drop rate
Queue/backlog size
End-to-end latency (gateway -> cloud)

Outcome: Secure communication, controlled message flows, and first-level visibility.

60 Days (Medium Priority - Observability & Edge Intelligence)
Focus: Add deeper system insight, diagnostics, and fleet-level awareness.

Implement full OpenTelemetry trace propagation across ingest -> WAL -> cloud sink
Add device health monitoring service
Online/offline status
Last-seen timestamps
Gateway health
Introduce telemetry schema governance
Version tagging
Validation rules
Backward/forward compatibility checks

Outcome: End-to-end traceability, system-level health monitoring, and controlled data evolution.

90 Days (Low Priority - Lifecycle, Recovery & Production Hardening)
Focus: Build long-term stability and secure lifecycle management.

Add automated firmware signing + integration into CI/CD
Design and test a disaster-recovery workflow
Cold start
- WAL corruption
- Cloud downtime
- Connectivity churn
Implement long-term storage tier and data retention policies
- Local WAL archival
- Cloud bulk storage
- Compliance-oriented retention windows

Outcome: Production-grade device lifecycle, structured recovery processes, and scalable data management.

9. Proposed Next Steps

We recommend a 2-week Reliability & Security Sprint focused on implementing TLS, certificate rotation, MQTT ACLs, and baseline observability. This sprint includes:

hands-on implementation
Validation tests
Updated diagrams and documentation
Performance and failure-mode testing.

10. Recommended Implementation Sprint - Deliverables

Hardened gateway configuration
Updated broker ACLs
Certificate rotation scripts & lifecycle documentation
New Prometheus dashboards
End-to-end trace map
Verification tests (drop-rate, latency, failover)

11. Result & Impact

The audit provides a clear path toward a production-grade, secure, and observable IoT Edge Gateway architecture. Critical gaps were identified early, enabling Combotto to harden the system before scaling to real customer deployments. With the recommended improvements, the gateway will be well-positioned for use in industrial IoT, telecom, and energy-sector environments where reliability and security are essential.

How engagements usually move

References should make the Audit to Sprint path easier to understand.

1. Audit the system under pressure

Baseline the selected assets, message paths, and operational risks with evidence leadership can act on.

2. Run a focused Sprint on the highest-impact findings

Fix the security, reliability, or telemetry gaps that are most likely to create downtime, review friction, or expensive rework.

3. Keep posture from drifting

Use a light retainer rhythm when the architecture is changing or customer pressure keeps moving.

Need this kind of evidence for your own IoT system?

Send the system slice you want reviewed and what is creating urgency. I’ll reply with a focused recommendation on audit scope, expected outputs, and whether a Sprint should follow.

Fastest direct route: +45 22 39 34 91 or tb@combotto.io.