System Flow

System Architecture Overview

The system uses a modular, event-driven architecture where agents run independently on their own schedules, communicating through a shared data store and event queue. This design provides:

Scalability: Agents can be scaled independently based on load
Reliability: Failures in one agent don't block others
Flexibility: Each agent runs on its optimal schedule
Modularity: Agents are loosely coupled and can be updated independently
Extensibility: Plugin-based architecture allows dynamic addition/removal of capabilities (e.g., analysis types) without code changes

Agent Schedules & Independence

Independent Agent Flows

Query Strategy Agent

Triggers:

Schedule
Manual trigger

Schedule:

Schedules are created by admins via the admin interface
Each schedule references a JobTemplate that defines the agent and parameters
Parameter expansion can be configured (e.g., expand to all tickers)
Examples:
- Entity Graph Refresh: Weekly schedule with ticker expansion
- Query Optimization: Daily schedule with ticker expansion
- Query Generation: On-demand or triggered by events

Data Collection Agent

Triggers:

Schedule
Manual trigger

Schedule:

Schedules are created by admins via the admin interface
Each schedule references a JobTemplate for the Data Collection agent
Parameter expansion can be configured to process multiple tickers
Runs independently - doesn't wait for Query Strategy
Example: Interval-based schedule (e.g., every 4 hours) with ticker expansion

Triggers:

Schedule
Manual trigger

Schedule:

Pipeline is defined in the Pipeline table by admins
Schedule references the pipeline (stored in Schedule table)
Admin-configurable timing (cron, interval, or on-demand)
Parameter expansion can be configured (e.g., expand to all user-ticker subscriptions)
User-defined schedules can be created per user-ticker subscription

Note: All agent outputs written to the database include an agentVersion field that specifies the semantic version (e.g., "1.2.3") of the agent that generated the output. This enables traceability, debugging, and supports the agent versioning and experimentation system. Agents read their active version from the AgentVersionDeployment table during initialization and include it in all outputs.

Key Design Principles

1. Loose Coupling

Agents don't directly call each other
Agents don't receive data from other agents
All communication through shared database
Agents read from and write to the database independently
Agents can be updated/deployed independently

2. Independent Scheduling

The Scheduler/Orchestrator manages all schedules (it is not an agent)
Each agent has its own optimal schedule (admin-configurable)
Schedules are stored in the database (Schedule table)
Job templates define agent parameters and expansion rules
Query Strategy: Admin-configurable schedules (e.g., weekly entity graph, daily optimization)
Data Collection: Admin-configurable intervals (e.g., every 1-4 hours)
Newsletter Generation: Admin-configurable pipeline schedules
Learning: Admin-configurable schedule (e.g., daily at midnight)

3. Data Freshness Management

Data Collection Agent maintains fresh data in database
Newsletter pipeline checks data freshness before analysis
Can trigger immediate collection if data is stale
Timestamps on all data for freshness tracking

4. Scalability

Agent Instances: Multiple instances of the same agent version can run in parallel for horizontal scaling
Job Distribution: Orchestrator distributes jobs across available agent instances using load balancing
Job Division: Large jobs (e.g., 100 keywords) can be divided into smaller sub-jobs and distributed across instances
Instance Management: Each agent instance registers itself and reports capacity/load for intelligent job distribution
Database connection pooling
Caching layer for frequently accessed data

5. Reliability

Failed jobs are retried with exponential backoff
Agents continue running even if others fail
Data is never lost (persisted before processing)
Circuit breakers for external API calls

6. Modularity

Each agent is a separate service/worker
Can be deployed independently
Configuration stored in database (hot-reloadable)
Versioned agent deployments
Plugin-based architecture: Analysis types are plugins registered in the database, allowing addition/removal without code changes
Dynamic component loading: Agents discover and load components (analysis types, sections) at runtime

Error Handling & Resilience

Per-Agent Error Handling

Query Strategy: Retries entity discovery, falls back to cached entity graph, logs errors to database
Data Collection: Continues with other sources if one fails, retries with exponential backoff, marks failed sources for health monitoring
Analysis: Returns partial results if some analysis types fail, logs errors per plugin, continues with successful analyses
Content Generation: Retries with different prompts on failure (up to max retries), falls back to simpler templates if needed
QA: Flags for manual review if repeated failures, writes detailed error information to database
Delivery: Retries with exponential backoff, dead-letter queue for permanently failed deliveries, tracks bounce rates
Learning: Continues analysis even if some metrics are unavailable, logs warnings for incomplete data

Job Queue Error Handling

BullMQ Retry Logic: All jobs have configurable retry attempts with exponential backoff
Failed Jobs: Jobs that exceed max retries are moved to dead-letter queue for manual review
Job Dependencies: Failed jobs don't block dependent jobs indefinitely; dependencies can timeout
Error Logging: All errors are logged to database with context for debugging

System-Level Resilience

Database Failures: Agents queue jobs for retry when DB is available, use connection pooling with retries
Queue Failures: Jobs persisted to database as backup, Redis replication for high availability
Agent Failures: Other agents continue operating independently, failed agent jobs are retried automatically
Data Staleness: Newsletter pipeline triggers fresh collection if needed, can proceed with existing data if collection fails
Partial Failures: System continues operating with partial data (e.g., some analysis types fail but others succeed)
Circuit Breakers: External API calls use circuit breakers to prevent cascade failures
Health Monitoring: All agents report health status, failed agents are automatically restarted

Configuration & Monitoring

Agent Configuration

All configs stored in AgentConfig table
Hot-reloadable without restart
Per-ticker, per-user, and system-wide configs
Versioned for rollback capability

Monitoring & Observability

Each agent emits metrics (execution time, success rate, data quality)
Metrics stored in AgentMetrics table
Learning Agent analyzes metrics and optimizes configs
Dashboard shows agent health, data freshness, pipeline status

Benefits of This Architecture

Scalability: Scale Data Collection independently from Newsletter Generation
Reliability: Data Collection failures don't block Newsletter Generation (uses cached data)
Efficiency: Data Collection runs frequently to keep data fresh, Newsletter uses latest data
Flexibility: Easy to add new agents or modify schedules
Performance: Parallel execution where possible, caching for speed
Maintainability: Clear separation of concerns, easy to debug and update

System Architecture Overview

Agent Schedules & Independence

Independent Agent Flows

Query Strategy Agent

Data Collection Agent

Event-Driven Communication

Data Flow & Storage

Key Design Principles

1. Loose Coupling

2. Independent Scheduling

3. Data Freshness Management

4. Scalability

5. Reliability

6. Modularity

Error Handling & Resilience

Per-Agent Error Handling

Job Queue Error Handling

System-Level Resilience

Configuration & Monitoring

Agent Configuration

Monitoring & Observability

Benefits of This Architecture

On this page