Architecture

S4E is built on a microservices architecture where each service is responsible for a specific domain of functionality. Services communicate through a combination of REST APIs and an event-driven message bus, enabling the platform to scale horizontally, process scans in parallel, and remain resilient under heavy workloads.

This page provides a technical overview of the major services and data stores that make up the S4E platform.

Architecture Overview

At a high level, the S4E platform consists of:

A frontend application that users interact with through the browser
An API gateway that handles authentication, routing, and request orchestration
A set of backend services responsible for data management, scan orchestration, job execution, and event processing
Message queues for asynchronous, event-driven communication between services
Data stores for persistent storage, caching, and document management

Core Services

s4e-core - API Gateway

The central entry point for all client interactions. s4e-core exposes the REST API consumed by the frontend, partner portal, CLI tools, and third-party integrations. It handles:

User authentication and session management
Request routing and response orchestration
API key management and rate limiting
Aggregating data from backend services to serve API responses

All external traffic flows through s4e-core, which then delegates to the appropriate backend services.

s4e-base - Database Service

The primary data access layer for the platform. s4e-base owns the PostgreSQL schema and provides a service interface for reading and writing structured data. It manages:

All database migrations and schema evolution (using Alembic)
CRUD operations for assets, findings, users, organizations, and configuration
Complex queries, bulk operations, and data aggregation
Transaction management and data integrity enforcement

Other services do not connect to PostgreSQL directly - they interact with data through s4e-base.

s4e-scan - Scan Manager

Responsible for the definition, configuration, and lifecycle management of security scans. s4e-scan:

Stores scan definitions, categories, and service configurations
Tracks scan execution state and results
Coordinates with s4e-dispatcher to queue scan jobs for execution
Provides APIs for creating, updating, and querying scans

s4e-crawler - Web Crawling Pipeline

A specialized service that performs web crawling and reconnaissance using industry-standard tools:

Katana - A Go-based web crawler used for deep crawling of web applications, discovering pages, endpoints, and linked resources
ffuf - A fast web fuzzer used for directory and file discovery (fuzzing)

The crawler pipeline processes tasks through multiple stages:

Directory Fuzzing (ffuf) - Discovers hidden directories and files
Deep Crawl (Katana) - Crawls discovered and known pages to map the full site structure
API Document Parsing - Identifies and parses API documentation (OpenAPI/Swagger)
URL Unification - Deduplicates and normalizes discovered URLs
PII Parsing - Detects potential personally identifiable information exposure
Enrichment - Augments discovered data with additional context
Finisher - Consolidates results and publishes them for downstream consumption

Each stage publishes results to RabbitMQ, enabling parallel processing and pipeline resilience.

s4e-dispatcher - Job Dispatcher

Consumes scan jobs from the message queue and dispatches them to the appropriate executor or worker. s4e-dispatcher:

Reads pending scan tasks from RabbitMQ queues
Routes tasks to the correct scan executor based on scan type
Monitors job execution status and handles retries on failure
Manages concurrency to prevent worker overload

s4e-scheduler - Time-Based Scheduler

Triggers scan execution based on time-based schedules. s4e-scheduler:

Manages cron-like schedules defined by users for recurring scans
Publishes scan trigger events to RabbitMQ at the scheduled time
Supports flexible scheduling intervals (hourly, daily, weekly, custom)

s4e-trigger - Event Trigger Service

Handles event-driven triggering of scans and actions. Unlike the scheduler (which is time-based), the trigger service responds to external events and conditions:

Webhook-initiated scan triggers
Condition-based automation rules
Integration event processing (e.g., new asset detected, finding status change)

Data Stores

S4E uses multiple data stores, each chosen for the workload it serves best:

PostgreSQL

The primary relational database for all structured data:

Assets, organizations, users, and permissions
Scan definitions, schedules, and execution history
Findings, severity data, and remediation tracking
Actions, playbooks, and audit logs

All access goes through the s4e-base service, which manages schema migrations and query optimization.

Redis

An in-memory data store used for:

Session caching and temporary state
Rate limiting counters
Real-time scan status tracking
Short-lived task coordination between services

RabbitMQ

The message broker that enables asynchronous, event-driven communication:

Scan job queues between s4e-scan, s4e-dispatcher, and scan workers
Crawler pipeline stage transitions (ffuf, Katana, enrichment, etc.)
Event notifications between services (finding created, scan completed, etc.)
Retry and dead-letter queue management for fault tolerance

MongoDB

A document store used for:

Large, semi-structured scan output data
Crawl results and raw response storage
Flexible schema data that does not fit the relational model

Communication Patterns

Services interact using two primary patterns:

Synchronous (REST APIs)

Used for request-response interactions where the caller needs an immediate result. For example, the frontend requests asset data from s4e-core, which queries s4e-base and returns the response.

Asynchronous (RabbitMQ)

Used for fire-and-forget operations and pipeline processing. For example, when a scheduled scan is triggered, s4e-scheduler publishes a message to RabbitMQ, s4e-dispatcher picks it up and routes it to a scan worker, and results flow back through subsequent queue stages.

Note

The asynchronous pattern is central to S4E's scalability. Adding more workers (consumers) to a queue linearly increases throughput without requiring changes to other services.

Deployment Topology

In a production deployment, services are containerized and orchestrated with Kubernetes using Helm charts and ArgoCD for GitOps-based continuous delivery. The platform supports:

Horizontal scaling of individual services based on workload
Node affinity and resource management for compute-intensive scan workers
Persistent volume claims for database storage
Ingress controllers for external traffic routing

For on-premises deployments, the same Kubernetes-based architecture is used, ensuring consistency between Cloud and On-Prem environments.

Tip

For details on deploying S4E in your own infrastructure, see the On-Prem Deployment guide.