System Flow
System Architecture Overview
The system uses a modular, event-driven architecture where agents run independently on their own schedules, communicating through a shared data store and event queue. This design provides:
- Scalability: Agents can be scaled independently based on load
- Reliability: Failures in one agent don't block others
- Flexibility: Each agent runs on its optimal schedule
- Modularity: Agents are loosely coupled and can be updated independently
- Extensibility: Plugin-based architecture allows dynamic addition/removal of capabilities (e.g., analysis types) without code changes
Agent Schedules & Independence
Independent Agent Flows
Query Strategy Agent
Triggers:
- Schedule
- Manual trigger
Schedule:
- Schedules are created by admins via the admin interface
- Each schedule references a
JobTemplatethat defines the agent and parameters - Parameter expansion can be configured (e.g., expand to all tickers)
- Examples:
- Entity Graph Refresh: Weekly schedule with ticker expansion
- Query Optimization: Daily schedule with ticker expansion
- Query Generation: On-demand or triggered by events
Data Collection Agent
Triggers:
- Schedule
- Manual trigger
Schedule:
- Schedules are created by admins via the admin interface
- Each schedule references a
JobTemplatefor the Data Collection agent - Parameter expansion can be configured to process multiple tickers
- Runs independently - doesn't wait for Query Strategy
- Example: Interval-based schedule (e.g., every 4 hours) with ticker expansion
Newsletter Generation Pipeline
Triggers:
- Schedule
- Manual trigger
Schedule:
- Pipeline is defined in the
Pipelinetable by admins - Schedule references the pipeline (stored in
Scheduletable) - Admin-configurable timing (cron, interval, or on-demand)
- Parameter expansion can be configured (e.g., expand to all user-ticker subscriptions)
- User-defined schedules can be created per user-ticker subscription
Event-Driven Communication
Data Flow & Storage
Note: All agent outputs written to the database include an agentVersion field that specifies the semantic version (e.g., "1.2.3") of the agent that generated the output. This enables traceability, debugging, and supports the agent versioning and experimentation system. Agents read their active version from the AgentVersionDeployment table during initialization and include it in all outputs.
Key Design Principles
1. Loose Coupling
- Agents don't directly call each other
- Agents don't receive data from other agents
- All communication through shared database
- Agents read from and write to the database independently
- Agents can be updated/deployed independently
2. Independent Scheduling
- The Scheduler/Orchestrator manages all schedules (it is not an agent)
- Each agent has its own optimal schedule (admin-configurable)
- Schedules are stored in the database (
Scheduletable) - Job templates define agent parameters and expansion rules
- Query Strategy: Admin-configurable schedules (e.g., weekly entity graph, daily optimization)
- Data Collection: Admin-configurable intervals (e.g., every 1-4 hours)
- Newsletter Generation: Admin-configurable pipeline schedules
- Learning: Admin-configurable schedule (e.g., daily at midnight)
3. Data Freshness Management
- Data Collection Agent maintains fresh data in database
- Newsletter pipeline checks data freshness before analysis
- Can trigger immediate collection if data is stale
- Timestamps on all data for freshness tracking
4. Scalability
- Agent Instances: Multiple instances of the same agent version can run in parallel for horizontal scaling
- Job Distribution: Orchestrator distributes jobs across available agent instances using load balancing
- Job Division: Large jobs (e.g., 100 keywords) can be divided into smaller sub-jobs and distributed across instances
- Instance Management: Each agent instance registers itself and reports capacity/load for intelligent job distribution
- Database connection pooling
- Caching layer for frequently accessed data
5. Reliability
- Failed jobs are retried with exponential backoff
- Agents continue running even if others fail
- Data is never lost (persisted before processing)
- Circuit breakers for external API calls
6. Modularity
- Each agent is a separate service/worker
- Can be deployed independently
- Configuration stored in database (hot-reloadable)
- Versioned agent deployments
- Plugin-based architecture: Analysis types are plugins registered in the database, allowing addition/removal without code changes
- Dynamic component loading: Agents discover and load components (analysis types, sections) at runtime
Error Handling & Resilience
Per-Agent Error Handling
- Query Strategy: Retries entity discovery, falls back to cached entity graph, logs errors to database
- Data Collection: Continues with other sources if one fails, retries with exponential backoff, marks failed sources for health monitoring
- Analysis: Returns partial results if some analysis types fail, logs errors per plugin, continues with successful analyses
- Content Generation: Retries with different prompts on failure (up to max retries), falls back to simpler templates if needed
- QA: Flags for manual review if repeated failures, writes detailed error information to database
- Delivery: Retries with exponential backoff, dead-letter queue for permanently failed deliveries, tracks bounce rates
- Learning: Continues analysis even if some metrics are unavailable, logs warnings for incomplete data
Job Queue Error Handling
- BullMQ Retry Logic: All jobs have configurable retry attempts with exponential backoff
- Failed Jobs: Jobs that exceed max retries are moved to dead-letter queue for manual review
- Job Dependencies: Failed jobs don't block dependent jobs indefinitely; dependencies can timeout
- Error Logging: All errors are logged to database with context for debugging
System-Level Resilience
- Database Failures: Agents queue jobs for retry when DB is available, use connection pooling with retries
- Queue Failures: Jobs persisted to database as backup, Redis replication for high availability
- Agent Failures: Other agents continue operating independently, failed agent jobs are retried automatically
- Data Staleness: Newsletter pipeline triggers fresh collection if needed, can proceed with existing data if collection fails
- Partial Failures: System continues operating with partial data (e.g., some analysis types fail but others succeed)
- Circuit Breakers: External API calls use circuit breakers to prevent cascade failures
- Health Monitoring: All agents report health status, failed agents are automatically restarted
Configuration & Monitoring
Agent Configuration
- All configs stored in
AgentConfigtable - Hot-reloadable without restart
- Per-ticker, per-user, and system-wide configs
- Versioned for rollback capability
Monitoring & Observability
- Each agent emits metrics (execution time, success rate, data quality)
- Metrics stored in
AgentMetricstable - Learning Agent analyzes metrics and optimizes configs
- Dashboard shows agent health, data freshness, pipeline status
Benefits of This Architecture
- Scalability: Scale Data Collection independently from Newsletter Generation
- Reliability: Data Collection failures don't block Newsletter Generation (uses cached data)
- Efficiency: Data Collection runs frequently to keep data fresh, Newsletter uses latest data
- Flexibility: Easy to add new agents or modify schedules
- Performance: Parallel execution where possible, caching for speed
- Maintainability: Clear separation of concerns, easy to debug and update