S4E is built on a microservices architecture where each service is responsible for a specific domain of functionality. Services communicate through a combination of REST APIs and an event-driven message bus, enabling the platform to scale horizontally, process scans in parallel, and remain resilient under heavy workloads.

This page provides a technical overview of the major services and data stores that make up the S4E platform.


Architecture Overview

At a high level, the S4E platform consists of:

  • A frontend application that users interact with through the browser
  • An API gateway that handles authentication, routing, and request orchestration
  • A set of backend services responsible for data management, scan orchestration, job execution, and event processing
  • Message queues for asynchronous, event-driven communication between services
  • Data stores for persistent storage, caching, and document management

Core Services

s4e-core - API Gateway

The central entry point for all client interactions. s4e-core exposes the REST API consumed by the frontend, partner portal, CLI tools, and third-party integrations. It handles:

  • User authentication and session management
  • Request routing and response orchestration
  • API key management and rate limiting
  • Aggregating data from backend services to serve API responses

All external traffic flows through s4e-core, which then delegates to the appropriate backend services.

s4e-base - Database Service

The primary data access layer for the platform. s4e-base owns the PostgreSQL schema and provides a service interface for reading and writing structured data. It manages:

  • All database migrations and schema evolution (using Alembic)
  • CRUD operations for assets, findings, users, organizations, and configuration
  • Complex queries, bulk operations, and data aggregation
  • Transaction management and data integrity enforcement

Other services do not connect to PostgreSQL directly - they interact with data through s4e-base.

s4e-scan - Scan Manager

Responsible for the definition, configuration, and lifecycle management of security scans. s4e-scan:

  • Stores scan definitions, categories, and service configurations
  • Tracks scan execution state and results
  • Coordinates with s4e-dispatcher to queue scan jobs for execution
  • Provides APIs for creating, updating, and querying scans

s4e-crawler - Web Crawling Pipeline

A specialized service that performs web crawling and reconnaissance using industry-standard tools:

  • Katana - A Go-based web crawler used for deep crawling of web applications, discovering pages, endpoints, and linked resources
  • ffuf - A fast web fuzzer used for directory and file discovery (fuzzing)

The crawler pipeline processes tasks through multiple stages:

  1. Directory Fuzzing (ffuf) - Discovers hidden directories and files
  2. Deep Crawl (Katana) - Crawls discovered and known pages to map the full site structure
  3. API Document Parsing - Identifies and parses API documentation (OpenAPI/Swagger)
  4. URL Unification - Deduplicates and normalizes discovered URLs
  5. PII Parsing - Detects potential personally identifiable information exposure
  6. Enrichment - Augments discovered data with additional context
  7. Finisher - Consolidates results and publishes them for downstream consumption

Each stage publishes results to RabbitMQ, enabling parallel processing and pipeline resilience.

s4e-dispatcher - Job Dispatcher

Consumes scan jobs from the message queue and dispatches them to the appropriate executor or worker. s4e-dispatcher:

  • Reads pending scan tasks from RabbitMQ queues
  • Routes tasks to the correct scan executor based on scan type
  • Monitors job execution status and handles retries on failure
  • Manages concurrency to prevent worker overload

s4e-scheduler - Time-Based Scheduler

Triggers scan execution based on time-based schedules. s4e-scheduler:

  • Manages cron-like schedules defined by users for recurring scans
  • Publishes scan trigger events to RabbitMQ at the scheduled time
  • Supports flexible scheduling intervals (hourly, daily, weekly, custom)

s4e-trigger - Event Trigger Service

Handles event-driven triggering of scans and actions. Unlike the scheduler (which is time-based), the trigger service responds to external events and conditions:

  • Webhook-initiated scan triggers
  • Condition-based automation rules
  • Integration event processing (e.g., new asset detected, finding status change)

Data Stores

S4E uses multiple data stores, each chosen for the workload it serves best:

PostgreSQL

The primary relational database for all structured data:

  • Assets, organizations, users, and permissions
  • Scan definitions, schedules, and execution history
  • Findings, severity data, and remediation tracking
  • Actions, playbooks, and audit logs

All access goes through the s4e-base service, which manages schema migrations and query optimization.

Redis

An in-memory data store used for:

  • Session caching and temporary state
  • Rate limiting counters
  • Real-time scan status tracking
  • Short-lived task coordination between services

RabbitMQ

The message broker that enables asynchronous, event-driven communication:

  • Scan job queues between s4e-scan, s4e-dispatcher, and scan workers
  • Crawler pipeline stage transitions (ffuf, Katana, enrichment, etc.)
  • Event notifications between services (finding created, scan completed, etc.)
  • Retry and dead-letter queue management for fault tolerance

MongoDB

A document store used for:

  • Large, semi-structured scan output data
  • Crawl results and raw response storage
  • Flexible schema data that does not fit the relational model

Communication Patterns

Services interact using two primary patterns:

Synchronous (REST APIs)

Used for request-response interactions where the caller needs an immediate result. For example, the frontend requests asset data from s4e-core, which queries s4e-base and returns the response.

Asynchronous (RabbitMQ)

Used for fire-and-forget operations and pipeline processing. For example, when a scheduled scan is triggered, s4e-scheduler publishes a message to RabbitMQ, s4e-dispatcher picks it up and routes it to a scan worker, and results flow back through subsequent queue stages.

Note

The asynchronous pattern is central to S4E's scalability. Adding more workers (consumers) to a queue linearly increases throughput without requiring changes to other services.


Deployment Topology

In a production deployment, services are containerized and orchestrated with Kubernetes using Helm charts and ArgoCD for GitOps-based continuous delivery. The platform supports:

  • Horizontal scaling of individual services based on workload
  • Node affinity and resource management for compute-intensive scan workers
  • Persistent volume claims for database storage
  • Ingress controllers for external traffic routing

For on-premises deployments, the same Kubernetes-based architecture is used, ensuring consistency between Cloud and On-Prem environments.

Tip

For details on deploying S4E in your own infrastructure, see the On-Prem Deployment guide.