HubbleStack Hubble — Cloud Security Compliance
Contributor to HubbleStack Hubble — an open-source security compliance framework deployed on cloud Linux machines for profile-based auditing, real-time security event detection, and centralized reporting. Re-architected the security rules module, removed Salt dependency, introduced engineering best practices, and built the data pipeline for centralized log collection via Splunk and Databricks.
Category
Enterprise
Year
2022
Status
Shipped
What Is HubbleStack Hubble
HubbleStack Hubble is an open-source, modular security compliance framework built in Python. It installs as a lightweight agent on cloud Linux machines and provides on-demand profile-based auditing, real-time security event notifications, alerting, and reporting.
Hubble’s core modules:
- Nova (Audit) — Profile-based security auditing. Runs configurable audit profiles against the host and reports compliance status. Checks things like file permissions, package versions, service states, kernel parameters, and CIS benchmark controls.
- Nebula (Osquery) — Leverages osquery to gather system-level statistics and telemetry — running processes, open ports, installed packages, user accounts, network connections, scheduled jobs, and more.
- Pulsar (FIM) — File integrity monitoring. Watches critical system files and directories for unauthorized changes and raises real-time alerts.
- Quasar (Reporting) — Collects results from Nova, Nebula, and Pulsar and ships them to centralized logging destinations (Splunk, Logstash, etc.) via configurable returners.
The framework was originally built on top of SaltStack, using Salt’s module system, grains, and file client for configuration management and remote execution. Hubble agents run as a daemon on each machine, executing audits and scans on a configurable schedule, and pushing results to a centralized collection point.
My Role
I am a contributor to this open-source project — not the original creator. I worked on Hubble as part of a cloud security team, where it was deployed across a large fleet of Linux machines in AWS. My contributions focused on re-architecting a core module, removing technical debt, improving engineering quality, and building the data pipeline that made Hubble’s output actionable at scale.
What I Did
Re-Architected the Security Rules Module:
- Hubble’s audit module (Nova) allowed teams to write security rules — compliance checks that run against hosts and report pass/fail status. The original implementation tightly coupled rule definitions with their execution logic, making it difficult to add new rule types without modifying core code
- Re-architected the module to cleanly separate rule definitions (declarative profiles in YAML) from the execution engine. New rules could be added by writing a profile without touching the audit engine code
- Introduced a pluggable comparator system — rule authors define what to check (a file permission, a config value, a running process) and how to compare (equals, greater than, regex match, contains). The engine handles execution, error handling, and result formatting
- This made it significantly easier for security teams to write and maintain rules without needing deep knowledge of Hubble’s internals
Removed SaltStack Dependency:
- Hubble was originally built on SaltStack, inheriting Salt’s module loader, grains system, file client, and configuration management. This created a heavy dependency — Salt is a large framework, and Hubble only used a fraction of it
- Worked on removing the Salt dependency from key modules, replacing Salt’s loader with Hubble’s own lightweight module loading system
- This reduced the installation footprint, eliminated Salt-related version conflicts and security vulnerabilities, and simplified deployment. Hubble could now run as a standalone agent without requiring a Salt master or minion infrastructure
Engineering Best Practices:
- Introduced unit testing for the audit modules — the project had minimal test coverage when I started contributing. Wrote tests for the rule evaluation engine, comparators, and profile parsing logic
- Added proper error handling and logging throughout the modules I worked on — previously, failures in rule execution could silently skip checks without reporting them as errors
- Improved code structure — broke monolithic modules into smaller, testable components with clear interfaces
- Added pre-commit hooks and linting configuration to catch issues before they reached code review
Bug Fixes:
- Fixed issues in the audit execution path where certain rule types would fail silently on specific OS configurations
- Fixed edge cases in the comparator logic — regex comparisons on multi-line output, numeric comparisons with mixed-type values, and handling of missing/null data from system queries
- Fixed issues with the daemon’s scheduling system where audit runs could overlap if a previous run took longer than expected
Centralized Log Collection & Data Pipeline
Hubble agents across the fleet produce a massive volume of audit results, compliance reports, osquery telemetry, and file integrity alerts. Making this data useful required a robust collection and analysis pipeline.
Log Collection Architecture:
- Each Hubble agent ships results to a centralized log collector using Hubble’s returner system
- Results are structured JSON — every audit check includes the host identity, timestamp, profile name, check name, pass/fail status, actual vs expected values, and severity
- The log collector aggregates results from all agents across the fleet
Splunk Integration:
- Hubble audit results and security events are ingested into Splunk for real-time monitoring, alerting, and dashboarding
- Built Splunk dashboards for: fleet-wide compliance posture (percentage of hosts passing each CIS benchmark control), trending compliance over time, hosts with the most failures, and new failures (regressions)
- Configured Splunk alerts for critical security events — a host failing a high-severity check, file integrity violations on sensitive system files, unexpected processes or open ports detected by osquery
- Security analysts could drill down from a fleet-wide view to a specific host’s full audit history
Databricks Pipeline:
- For deeper analysis and long-term trend tracking, Hubble data flowed into Databricks via a data pipeline
- Raw audit results stored in a data lake for historical analysis — compliance trends over months, correlation between configuration changes and security incidents, identification of systemic issues across the fleet
- Databricks notebooks for security team analysis — which security controls fail most often, which teams/regions have the lowest compliance, and what is the average time-to-remediation after a failure is detected
Technical Challenges
-
Salt dependency removal complexity — Salt’s module loader was deeply intertwined with Hubble’s code. Modules used Salt grains for host identification, Salt’s file client for profile distribution, and Salt’s configuration system for settings. Removing Salt required building lightweight replacements for each of these while maintaining backward compatibility with existing audit profiles. This was an incremental process — replacing one Salt dependency at a time, testing thoroughly at each step.
-
Rule engine extensibility — The re-architected rule engine needed to support existing profiles without breaking changes while also being extensible for new rule types. Designed the comparator system to be backward-compatible — old-style rules still worked, but new rules could use the cleaner declarative syntax. Migration was gradual, not a flag-day switch.
-
Testing a system-level tool — Hubble checks system state (file permissions, running services, kernel parameters). Unit testing these checks requires mocking system calls extensively. Built a test harness that could simulate various OS states so rule evaluation could be tested without running on actual hosts. Integration tests ran in Docker containers representing different OS configurations.
-
Log volume at scale — Hundreds of machines each running dozens of audit checks on a schedule generate enormous log volume. The Splunk ingestion pipeline needed careful tuning — index sizing, source type configuration, and retention policies to keep costs manageable while retaining enough history for trend analysis. Databricks handled the long-term archival and heavy analytical queries that would be too expensive to run in Splunk.
Architecture
- Hubble Agent — Python daemon installed on each Linux machine. Runs audit profiles (Nova), osquery queries (Nebula), and file integrity monitoring (Pulsar) on configurable schedules. Ships results via returners.
- Audit Engine (Nova) — Re-architected module with pluggable comparators. Reads YAML audit profiles, executes checks against the host, and produces structured JSON results with pass/fail status and evidence.
- Osquery Integration (Nebula) — Executes osquery packs to gather system telemetry — processes, ports, packages, users, network connections.
- File Integrity (Pulsar) — Monitors critical files and directories using inotify. Raises real-time alerts on unauthorized changes.
- Log Collector — Aggregates structured JSON results from all agents across the fleet.
- Splunk — Real-time ingestion of audit results and security events. Dashboards for compliance posture, trending, and per-host drill-down. Alerts for critical violations.
- Databricks — Data pipeline for long-term storage and deep analysis. Historical compliance trends, fleet-wide pattern detection, and analytical notebooks for the security team.
- Infrastructure — Agents deployed across AWS Linux fleet. Centralized collection and pipeline infrastructure on AWS.
Results & Impact
- Re-architected rule engine — security teams could write new audit rules in YAML without modifying Hubble’s core code, dramatically increasing the speed of new compliance check development
- Salt dependency removal — lighter installation footprint, fewer security vulnerabilities from transitive dependencies, and simpler deployment without Salt infrastructure
- Unit test coverage — introduced testing discipline to a project that had minimal coverage, catching regressions and enabling confident refactoring
- Fleet-wide compliance visibility — Splunk dashboards gave the security team real-time, fleet-wide compliance posture for the first time, replacing manual spot-checks with continuous monitoring
- Long-term trend analysis — Databricks pipeline enabled historical compliance analysis and identification of systemic security patterns across hundreds of machines
- Open-source contributor — all contributions merged upstream into the HubbleStack project, benefiting the broader community
Stack Deep Dive
- Python for the Hubble agent and all modules — audit engine, comparators, returners, and daemon
- SaltStack (partially removed) — original framework dependency, incrementally replaced with lightweight alternatives
- Osquery for system-level telemetry collection — processes, ports, packages, network state
- Splunk for real-time log ingestion, dashboarding, alerting, and compliance monitoring
- Databricks for long-term data pipeline, historical analysis, and security analytics notebooks
- AWS for fleet infrastructure — Linux machines running Hubble agents across multiple accounts and regions
- YAML for declarative audit profile definitions — the interface between security teams and the audit engine
Interested in working together?
Get in Touch →