Project Planning
Milestone 6 - Enhanced Data Collection
Summary
Add multiple data sources, improve deduplication, and add relevance scoring.
Timeline
Weeks 13-14
Goal
Expand data collection to multiple sources and improve data quality through better deduplication and relevance scoring.
Deliverables
Data Collection Agent (Enhanced)
- ✅ Independent Scheduling (Maintained):
- Continues running independently every 1-4 hours (configurable per ticker)
- High-priority tickers: Every 1 hour
- Low-priority tickers: Every 6-12 hours
- Reads latest queries from database (from Query Strategy Agent)
- ✅ Additional News Sources:
- Google News API integration
- RSS feed parsing (if available)
- At least 2-3 news sources total
- ✅ Social Media Integration (Basic):
- Twitter/X API integration (v2) - basic search
- Simple post collection (10-20 posts per ticker)
- ✅ Improved Deduplication:
- AI-powered content similarity (for near-duplicates)
- Title similarity with fuzzy matching
- URL normalization
- ✅ Relevance Scoring:
- AI-powered relevance scoring
- Filter low-relevance content (threshold-based)
- Score-based prioritization
- ✅ Source Management:
- Source health monitoring
- Graceful degradation when sources fail
- Source-specific rate limiting
Query Strategy Agent (Minor Enhancement)
- ✅ Better query optimization:
- Track query performance
- Prioritize high-performing queries
Agent Versioning System (Enhanced)
- ✅ Version Creation and Tracking:
- Uses
AgentVersiontable (already created in Milestone 1) - Version creation for agent configs and prompts (configs include prompts embedded within them)
- Version metadata (changelog, rationale, expected impact)
- Version status tracking (draft, experimental, testing, production, deprecated)
- Uses
- ✅ Enhanced Deployment System:
- Uses
AgentVersionDeploymenttable (already created in Milestone 1) - Only one version per agentId can be active in production
- Version promotion flow:
- Update
AgentVersionDeploymentto mark new version as active - Update
AgentConfigtable to match version's configuration fromAgentVersion.config - Agents read from
AgentConfigat runtime (hot-reload without restart)
- Update
- Rollback capability to previous production version:
- Update
AgentVersionDeploymentto previous version - Update
AgentConfigto match previous version's config - Agents automatically reload configuration
- Update
- Uses
- ✅ Basic Experimentation:
- Create experimental versions from production
- Run test executions on experimental versions
- Compare experimental vs production performance
- Basic validation checks before promotion (stored in
AgentValidationtable)
- ✅ Admin Dashboard:
- Basic version management UI (
/admin/agents/versions) - Version comparison tools
- Deployment controls (promote to production, rollback)
- Simple validation dashboard
- Basic version management UI (
Note: The Agent Versioning System is a foundational infrastructure feature that enables safe iteration on agent improvements. While it's introduced in this milestone, it will be fully utilized in later milestones (particularly Milestone 14) when the Learning Agent creates optimized versions automatically.
Hermes Orchestrator (Instance Scaling Enhancement)
- ✅ Automatic Instance Scaling:
- Monitors job queue size and instance load via
AgentInstancetable - Spawns new instances when queue grows or instances are at capacity
- Scales down instances when load decreases (terminates idle instances)
- Configurable scaling policies (min/max instances per agent type)
- Instance lifecycle fully managed by orchestrator
- Monitors job queue size and instance load via
- ✅ Instance Health Monitoring:
- Monitors instance heartbeats via Agent Registry API
- Automatically replaces failed or unhealthy instances (instances with stale heartbeats)
- Tracks instance metrics (success rate, average execution time)
- Queries
AgentInstancetable for health status
Database Updates
- ✅ Agent version tables already exist (
AgentVersion,AgentVersionDeployment- created in Milestone 1) - ✅ Basic experimentation tables (
AgentExperiment,AgentValidation) - ✅ Version tracking in agent configs (configs stored in
AgentConfigtable, snapshots inAgentVersiontable)
Task Timeline
Limitations (Acceptable for This Milestone)
- Limited social media data (Twitter only, basic search)
- No earnings calls or SEC filings yet
- Basic relevance scoring
- No advanced data enrichment
- Basic versioning (no advanced analytics or automated optimization)
- Manual version promotion (no automated promotion workflows)
Success Criteria
- ✅ Collects data from 3+ sources
- ✅ Collects 50+ news articles per ticker per day
- ✅ Collects 20+ social media posts per ticker per day
- ✅ Deduplication removes 80%+ of duplicates
- ✅ Relevance scoring filters low-quality content effectively
- ✅ System handles multiple sources gracefully
- ✅ Agent versioning system works correctly (version creation, deployment, rollback)
- ✅ Only one version per agent can be active in production at a time
- ✅ Version promotion updates both
AgentVersionDeploymentandAgentConfigtables - ✅ Agents hot-reload configuration when version is promoted (no restart required)
- ✅ Admins can create and test experimental versions
- ✅ Version comparison and rollback function properly
- ✅ Orchestrator can automatically scale instances up/down based on load
- ✅ Orchestrator replaces failed instances automatically
Next Steps
After this milestone, data collection is robust with multiple sources and agent versioning enables safe iteration on improvements. Milestone 7 will add comprehensive analysis (competitive, sentiment, and event/context analysis).