Project Planning
Milestone 5 - Query Strategy (Basic)
Summary
Implement basic query strategy to discover entities and generate better search queries.
Timeline
Weeks 11-12
Goal
Add entity discovery and intelligent query generation to improve data collection quality.
Deliverables
Query Strategy Agent (Basic)
- ✅ Agent Versioning:
- Reads active version from
AgentVersionDeploymenttable during initialization - Includes
agentVersionfield in all outputs - Version information stored in
AgentVersiontable
- Reads active version from
- ✅ Full Agent Registration:
- Registers agent type metadata via Agent Registry API (
POST /api/registry/register/) - Registers instance via Agent Registry API (
POST /api/register/) when spawned by orchestrator - Reports heartbeat via Agent Registry API (
POST /api/heartbeat/) with current load, capacity, and status - Updates capacity and load information in real-time
- Supports multiple instances running in parallel (orchestrator manages instances)
- Registers agent type metadata via Agent Registry API (
- ✅ Orchestrator-Triggered Execution:
- Runs on schedule created by admin:
- Entity Refresh: Weekly (configurable per ticker) - updates basic entity list
- Query Optimization: Daily (analyzes previous day's query performance) - basic performance tracking only in this milestone, full optimization in later milestones
- Query Generation: On-demand or when entity list updates
- Orchestrator invokes agent HTTP endpoint with job parameters
- Operates independently - Data Collection Agent reads queries from database
- Runs on schedule created by admin:
- ✅ Basic Entity Discovery:
- Company name extraction from ticker
- Simple competitor identification (hardcoded list or basic API)
- No entity relationship graph yet (simple flat entity storage only)
- ✅ Query Generation:
- Generate queries from ticker + basic entities
- Multiple query variants (5+ per ticker)
- Query prioritization (simple scoring based on entity relevance)
- ✅ Database Integration via Agent Data API:
- Writes queries and entities via
POST /api/query-strategy/endpoint - Store queries in database (available for Data Collection Agent)
- Store discovered entities in database (simple flat structure, not a graph)
- Link entities to tickers
- Includes
agentVersionin all outputs - Data Collection Agent reads latest queries from database (replaces hardcoded queries from Milestone 3)
- Writes queries and entities via
Data Collection Agent (Enhanced)
- ✅ Replace hardcoded queries from Milestone 3 with queries from Query Strategy Agent
- ✅ Execute multiple queries per ticker (reads from database)
- ✅ Merge results from different queries
- ✅ Basic query performance tracking (tracks which queries return results, stores feedback for future optimization)
- ✅ Continues to use Agent Data API for writing outputs (already implemented in Milestone 3)
- ✅ Continues agent versioning and registration (already implemented in Milestone 3)
Task Timeline
Limitations (Acceptable for This Milestone)
- Simple entity discovery (no SEC filings parsing yet)
- Hardcoded competitor lists or basic API
- No entity relationship graph (simple flat entity storage only)
- Basic query optimization schedule (daily runs, but only tracks performance; full optimization logic in later milestones)
- Basic query generation only (no AI-enhanced query variants yet)
Success Criteria
- ✅ Agent versioning is functional for Query Strategy Agent (agent reads active version, includes in outputs)
- ✅ Full agent registration is functional (agent registers type and instance, reports heartbeat with load)
- ✅ Orchestrator can spawn and manage multiple Query Strategy Agent instances
- ✅ Orchestrator distributes jobs across instances using load balancing
- ✅ Query Strategy Agent discovers basic entities for test tickers
- ✅ Generates 5+ relevant queries per ticker
- ✅ Queries and entities are stored via Agent Data API with
agentVersionfield - ✅ Data collection uses queries and collects more relevant data
- ✅ Newsletter quality improves (more relevant articles)
- ✅ Entities are stored and can be retrieved
Next Steps
After this milestone, data collection is more intelligent. Milestone 6 will add more data sources and better deduplication.