Q2 - Collect & Analyze
Milestone 2.1
Summary
Query Strategy Agent
Goal
Implement the Query Strategy Agent that discovers entities and generates intelligent search queries.
Deliverables
- ✅ Query Strategy Agent implementation (
packages/agents/query-strategy/) - ✅ Entity discovery system:
- SEC EDGAR API integration for filings (10-K, 10-Q, 8-K)
- Company website scraping for executive information
- AI-powered entity extraction from documents
- Entity graph database structure
- ✅ Relationship extraction:
- Competitor identification
- Supplier/customer relationship mapping
- Executive tracking
- Industry peer identification
- ✅ Dynamic query generation:
- Base query construction from ticker/entities
- AI-enhanced query generation
- Query prioritization and scoring
- Multi-variant query creation
- ✅ Keyword tracking system:
- Trend detection
- Emerging keyword identification
- Keyword relevance scoring
- ✅ Query optimization based on historical performance
- ✅ Entity graph storage and retrieval
- ✅ Comprehensive unit and integration tests
Success Criteria
- Agent discovers entities for test tickers (AAPL, TSLA, MSFT)
- Generates 20+ relevant queries per ticker
- Entity graph contains accurate relationships
- Query performance improves over time with optimization
Milestone 2.2
Summary
Data Collection Agent - Core Sources
Goal
Implement data collection from news sources and social media.
Deliverables
- ✅ Data Collection Agent implementation (
packages/agents/data-collection/) - ✅ Search source configuration system:
- Admin-configurable search sources (e.g., Serper.dev, Google Search API)
- Search source authentication and API key management
- Response mapping configuration per source
- Rate limiting per search source
- ✅ Query-based data collection:
- Retrieves queries from Query Strategy Agent (stored in database)
- Queries admin-configured search sources with generated queries
- Extracts URLs from search results
- ✅ Web page fetching:
- Fetches HTML content from search result URLs
- Extracts main content (removes ads, navigation)
- Handles JavaScript-rendered pages when necessary
- Respects robots.txt and rate limits
- ✅ Rate limiting and retry mechanisms per source
- ✅ Data deduplication system:
- Title similarity matching
- Content similarity (AI-powered)
- URL matching
- ✅ Relevance scoring system (AI-powered)
- ✅ Data storage to DataSource table
- ✅ Source health monitoring
Success Criteria
- Collects 50+ news articles per company per day
- Collects 100+ social media posts per company per day
- Deduplication removes 90%+ of duplicates
- Relevance scoring filters low-quality content effectively
Milestone 2.3
Summary
Data Collection Agent - Advanced Features
Goal
Complete data collection with earnings calls, SEC filings, and advanced processing.
Deliverables
- ✅ Earnings call integration:
- SEC EDGAR API for earnings filings
- Transcript fetching from multiple sources
- AI-powered transcript summarization
- Key metrics extraction (revenue, EPS, guidance)
- ✅ SEC filings processing:
- 8-K, 10-Q, 10-K filing retrieval
- Entity relationship extraction from filings
- ✅ Advanced data enrichment:
- Sentiment pre-calculation
- Entity extraction from content
- Language detection
- Metadata extraction
- ✅ Source configuration management:
- Dynamic source enable/disable
- Source-specific rate limits
- Source health status tracking
- ✅ Error handling and recovery:
- Graceful degradation when sources fail
- Automatic source disabling after repeated failures
- Error notification system
- ✅ Data collection metrics and monitoring dashboard
- ✅ End-to-end data collection pipeline tests
Success Criteria
- Successfully retrieves earnings transcripts for test tickers
- Extracts key metrics from earnings calls with 95%+ accuracy
- Processes SEC filings and extracts entity relationships
- System handles source failures gracefully
- Data collection dashboard shows real-time metrics
Milestone 2.4
Summary
Analysis Agent Implementation
Timeline
Weeks 25-28
Goal
Build comprehensive analysis capabilities (competitive, sentiment, event/context).
Deliverables
- ✅ Analysis Agent implementation (
packages/agents/analysis/) - ✅ Competitive analysis module:
- Peer group identification
- Media coverage comparison across competitors
- Industry trend analysis in media coverage
- Competitive positioning assessment
- AI-powered competitive summary
- ✅ Sentiment analysis module:
- News sentiment analysis (AI-powered)
- Social media sentiment analysis
- Weighted sentiment calculation with time decay
- Sentiment trend detection
- AI-powered sentiment summary
- ✅ Event/context analysis module:
- External event identification (natural disasters, political changes, regulatory updates, economic shifts, social movements)
- Relevance scoring for events
- Impact assessment (operations, reputation, strategy)
- Early warning signal detection
- Opportunity identification
- AI-powered event summary
- ✅ Insight extraction system:
- Cross-analysis insight identification
- Priority scoring for insights
- AI-powered insight generation
- ✅ Analysis result storage and caching
- ✅ Comprehensive test suite
Success Criteria
- Identifies peer groups and performs media coverage comparisons
- Sentiment scores align with manual review
- Events are identified accurately with appropriate relevance scores
- Insights are relevant and actionable