Architecture
S4E is built on a microservices architecture where each service is responsible for a specific domain of functionality. Services communicate through a combination of REST APIs and an event-driven message bus, enabling the platform to scale horizontally, process scans in parallel, and remain resilient under heavy workloads.
This page provides a technical overview of the major services and data stores that make up the S4E platform.
Architecture Overview
At a high level, the S4E platform consists of:
- A frontend application that users interact with through the browser
- An API gateway that handles authentication, routing, and request orchestration
- A set of backend services responsible for data management, scan orchestration, job execution, and event processing
- Message queues for asynchronous, event-driven communication between services
- Data stores for persistent storage, caching, and document management
Core Services
s4e-core - API Gateway
The central entry point for all client interactions. s4e-core exposes the REST API consumed by the frontend, partner portal, CLI tools, and third-party integrations. It handles:
- User authentication and session management
- Request routing and response orchestration
- API key management and rate limiting
- Aggregating data from backend services to serve API responses
All external traffic flows through s4e-core, which then delegates to the appropriate backend services.
s4e-base - Database Service
The primary data access layer for the platform. s4e-base owns the PostgreSQL schema and provides a service interface for reading and writing structured data. It manages:
- All database migrations and schema evolution (using Alembic)
- CRUD operations for assets, findings, users, organizations, and configuration
- Complex queries, bulk operations, and data aggregation
- Transaction management and data integrity enforcement
Other services do not connect to PostgreSQL directly - they interact with data through s4e-base.
s4e-scan - Scan Manager
Responsible for the definition, configuration, and lifecycle management of security scans. s4e-scan:
- Stores scan definitions, categories, and service configurations
- Tracks scan execution state and results
- Coordinates with s4e-dispatcher to queue scan jobs for execution
- Provides APIs for creating, updating, and querying scans
s4e-crawler - Web Crawling Pipeline
A specialized service that performs web crawling and reconnaissance using industry-standard tools:
- Katana - A Go-based web crawler used for deep crawling of web applications, discovering pages, endpoints, and linked resources
- ffuf - A fast web fuzzer used for directory and file discovery (fuzzing)
The crawler pipeline processes tasks through multiple stages:
- Directory Fuzzing (ffuf) - Discovers hidden directories and files
- Deep Crawl (Katana) - Crawls discovered and known pages to map the full site structure
- API Document Parsing - Identifies and parses API documentation (OpenAPI/Swagger)
- URL Unification - Deduplicates and normalizes discovered URLs
- PII Parsing - Detects potential personally identifiable information exposure
- Enrichment - Augments discovered data with additional context
- Finisher - Consolidates results and publishes them for downstream consumption
Each stage publishes results to RabbitMQ, enabling parallel processing and pipeline resilience.
s4e-dispatcher - Job Dispatcher
Consumes scan jobs from the message queue and dispatches them to the appropriate executor or worker. s4e-dispatcher:
- Reads pending scan tasks from RabbitMQ queues
- Routes tasks to the correct scan executor based on scan type
- Monitors job execution status and handles retries on failure
- Manages concurrency to prevent worker overload
s4e-scheduler - Time-Based Scheduler
Triggers scan execution based on time-based schedules. s4e-scheduler:
- Manages cron-like schedules defined by users for recurring scans
- Publishes scan trigger events to RabbitMQ at the scheduled time
- Supports flexible scheduling intervals (hourly, daily, weekly, custom)
s4e-trigger - Event Trigger Service
Handles event-driven triggering of scans and actions. Unlike the scheduler (which is time-based), the trigger service responds to external events and conditions:
- Webhook-initiated scan triggers
- Condition-based automation rules
- Integration event processing (e.g., new asset detected, finding status change)
Data Stores
S4E uses multiple data stores, each chosen for the workload it serves best:
PostgreSQL
The primary relational database for all structured data:
- Assets, organizations, users, and permissions
- Scan definitions, schedules, and execution history
- Findings, severity data, and remediation tracking
- Actions, playbooks, and audit logs
All access goes through the s4e-base service, which manages schema migrations and query optimization.
Redis
An in-memory data store used for:
- Session caching and temporary state
- Rate limiting counters
- Real-time scan status tracking
- Short-lived task coordination between services
RabbitMQ
The message broker that enables asynchronous, event-driven communication:
- Scan job queues between s4e-scan, s4e-dispatcher, and scan workers
- Crawler pipeline stage transitions (ffuf, Katana, enrichment, etc.)
- Event notifications between services (finding created, scan completed, etc.)
- Retry and dead-letter queue management for fault tolerance
MongoDB
A document store used for:
- Large, semi-structured scan output data
- Crawl results and raw response storage
- Flexible schema data that does not fit the relational model
Communication Patterns
Services interact using two primary patterns:
Synchronous (REST APIs)
Used for request-response interactions where the caller needs an immediate result. For example, the frontend requests asset data from s4e-core, which queries s4e-base and returns the response.
Asynchronous (RabbitMQ)
Used for fire-and-forget operations and pipeline processing. For example, when a scheduled scan is triggered, s4e-scheduler publishes a message to RabbitMQ, s4e-dispatcher picks it up and routes it to a scan worker, and results flow back through subsequent queue stages.
Note
The asynchronous pattern is central to S4E's scalability. Adding more workers (consumers) to a queue linearly increases throughput without requiring changes to other services.
Deployment Topology
In a production deployment, services are containerized and orchestrated with Kubernetes using Helm charts and ArgoCD for GitOps-based continuous delivery. The platform supports:
- Horizontal scaling of individual services based on workload
- Node affinity and resource management for compute-intensive scan workers
- Persistent volume claims for database storage
- Ingress controllers for external traffic routing
For on-premises deployments, the same Kubernetes-based architecture is used, ensuring consistency between Cloud and On-Prem environments.
Tip
For details on deploying S4E in your own infrastructure, see the On-Prem Deployment guide.