1. Background
Modern IoT systems often run in environment where devices must operate reliably even when the network is unstable, security requirements are strict, and large volumes of sensor data need to be processed safely. To demonstrate Combotto’s capabilities within secure and reliable IoT infrastructure, we created a reference IoT Edge Gateway built on modern technologies designed for safety, robustness, and transparency.
The gateway functions as the “local brain” at the edge of an IoT system: It received data from devices at the edge, ensures it is stored safely, and forwards it to the cloud when the network is available - all while enforcing strict security and providing full operational visibility.
Below is a context diagram of the gateway:

To achieve this, the gateway is built on:
- Rust, a modern programming language designed to eliminate entire classes of bugs and security vulnerabilities.
- SQLite with write-ahead logging (WAL), ensuring data is never lost - even during power loss or network outrages - by storing sensor data locally until it can be delivered safely.
- MQTT, the industry-standard protocol for IoT devices, enabling lightweight and reliable communication across sensors, machines, and cloud platforms
- TLS/mTLS, securing communication and ensuring only trusted devices and servers can exchange data.
- OpenTelemetry + Tempo tracing, giving full, real-time insight into how data flows through the gateway and where performance bottlenecks occur.
2. Scope of the Audit
To prepare the gateway for upcoming customer requirements and future fleet scaling, the audit focused on identifying potential security gaps, reliability risks, and weaknesses in observability. The analysis followed Combotto’s structured IoT Audit Framework and covered both the codebase and the operational maturity of the system.
The scope included:
-
Architecture Review (Edge Device -> Gateway -> Cloud -> UI) Evaluation of system boundaries, data flow, trust zones, and failure isolation between components.
-
MQTT Pipeline Analysis (Ingress, QoS handling, topic isolation, error paths) Assessment of MQTT ingress, message validation, topic hygiene, robustness under network instability and high-volume ingestion.
-
Security Posture & TLS/mTLS (High-risk zones in IoT) Inspection of cryptographic configuration, certificate lifecycle, identity trust model, and vulnerabilities common to IoT deployments.
-
Storage & Persistence Layer (WAL durability, replay logic) Analysis of SQLite WAL mode, durability guarantees under power loss, replay logic, offline-first behaviour.
-
Observability (metrics, logs, traces, failure visibility) Review of metrics, logging strategy, traces coverage. missing signals, bottleneck detection.
-
Deployment & Update Strategy Evaluation of provisioning, versioning, OTA strategy, secure update flow, and operational risks during rollout.
3. Key Strengths Observed
- Strong foundation built on Rust ensures memory safety and predictable execution.
- WAL-based local buffering enables reliable offline-first behavior.
- Clean separation between ingest -> persistence -> cloud sink.
- Observability instrumentation (metrics + traces) introduced early.
- Simple, composable architecture that can scale horizontally.
- Schema versioning for telemetry.
4. Critical Findings
- Missing Certificate Rotation for Gateway & MQTT Broker Long-lived certificates introduce long-term security exposure.
- Lack of Topic Governance & ACL Restrictions Insufficient MQTT access control increases the risk of data injection or privilege escalation.
- Limited Observability Missing tracing, inconsistent metrics, and no end-to-end latency visibility.
5. Architecture Overview
Data from edge devices is ingested via MQTT (QoS1), persisted locally in WAL for durability, and forwarded to the cloud ingestion API. The gateway’s reliability depends on robust local buffering, connection management, and cloud retry behavior.
6. Security Assessment
-
Missing Certificate Rotation
• Why it matters: Long-lived certs create persistent attack windows.
• Evidence: Certificates are months old, no rotation policy in place.
• Impact: Gateway compromise -> full device-fleet exposure.
• Severity: High
• Effort: Medium (1–2 weeks to automate rotation) -
MQTT ACLs Not Enforced
• Why it matters: Any device can publish/subscribe across topics.
• Evidence: No ACL file, wildcard permissions in broker config.
• Impact: Privilege escalation; rogue device may inject or manipulate telemetry.
• Severity: High
• Effort: Low
7. Reliability & Observability Assessment
Message delivery is reliable under normal conditions, but offline buffering is incomplete. There is currently no idempotency mechanism at the cloud API layer, increasing risk of duplicate ingestion.
Observability Overview:
Metrics are incomplete. Gateway uptime is monitored, but end-to-end latency and device drop rates are not.Trace coverage is limited.
| Area | Status | Notes |
|---|---|---|
| Durable delivery | Partial | QoS1 ok; WAL replay incomplete |
| Local caching | Partial | Limited offline queue behavior |
| Health checks | Yes | Basic liveness/readiness only |
| Tracing | Partial | OTel basics, missing correlation |
| Prometheus metrics | Partial | Throughput missing |
| Log aggregation | Partial | Gateway logs local only |
Reliability score: 2/5 Observability score: 1/5
8. 30/60/90 Day Improvement Roadmap
30 Days (High Priority - Security & Baseline Reliability)
Focus: Close the most critical security gaps and establish baseline visibility.
- Enable TLS for device -> gateway and gateway -> cloud
- Introduce MQTT ACLs, topic namespaces, and least-privilege policies
- Implement certificate rotation for gateway + MQTT broker
- Add baseline Prometheus metrics
- Ingress throughput,
- Drop rate
- Queue/backlog size
- End-to-end latency (gateway -> cloud)
Outcome: Secure communication, controlled message flows, and first-level visibility.
60 Days (Medium Priority - Observability & Edge Intelligence)
Focus: Add deeper system insight, diagnostics, and fleet-level awareness.
- Implement full OpenTelemetry trace propagation across ingest -> WAL -> cloud sink
- Add device health monitoring service
- Online/offline status
- Last-seen timestamps
- Gateway health
- Introduce telemetry schema governance
- Version tagging
- Validation rules
- Backward/forward compatibility checks
Outcome: End-to-end traceability, system-level health monitoring, and controlled data evolution.
90 Days (Low Priority - Lifecycle, Recovery & Production Hardening)
Focus: Build long-term stability and secure lifecycle management.
- Add automated firmware signing + integration into CI/CD
- Design and test a disaster-recovery workflow
- Cold start
- WAL corruption
- Cloud downtime
- Connectivity churn
- Implement long-term storage tier and data retention policies
- Local WAL archival
- Cloud bulk storage
- Compliance-oriented retention windows
Outcome: Production-grade device lifecycle, structured recovery processes, and scalable data management.
9. Proposed Next Steps
We recommend a 2-week Reliability & Security Sprint focused on implementing TLS, certificate rotation, MQTT ACLs, and baseline observability. This sprint includes:
- hands-on implementation
- Validation tests
- Updated diagrams and documentation
- Performance and failure-mode testing.
10. Recommended Implementation Sprint - Deliverables
- Hardened gateway configuration
- Updated broker ACLs
- Certificate rotation scripts & lifecycle documentation
- New Prometheus dashboards
- End-to-end trace map
- Verification tests (drop-rate, latency, failover)
11. Result & Impact
The audit provides a clear path toward a production-grade, secure, and observable IoT Edge Gateway architecture. Critical gaps were identified early, enabling Combotto to harden the system before scaling to real customer deployments. With the recommended improvements, the gateway will be well-positioned for use in industrial IoT, telecom, and energy-sector environments where reliability and security are essential.
