MediaPulse
Project Planning

Milestone 5 - Query Strategy (Basic)

Summary

Implement basic query strategy to discover entities and generate better search queries.

Timeline

Weeks 11-12

Goal

Add entity discovery and intelligent query generation to improve data collection quality.

Deliverables

Query Strategy Agent (Basic)

  • Agent Versioning:
    • Reads active version from AgentVersionDeployment table during initialization
    • Includes agentVersion field in all outputs
    • Version information stored in AgentVersion table
  • Full Agent Registration:
    • Registers agent type metadata via Agent Registry API (POST /api/registry/register/)
    • Registers instance via Agent Registry API (POST /api/register/) when spawned by orchestrator
    • Reports heartbeat via Agent Registry API (POST /api/heartbeat/) with current load, capacity, and status
    • Updates capacity and load information in real-time
    • Supports multiple instances running in parallel (orchestrator manages instances)
  • Orchestrator-Triggered Execution:
    • Runs on schedule created by admin:
      • Entity Refresh: Weekly (configurable per ticker) - updates basic entity list
      • Query Optimization: Daily (analyzes previous day's query performance) - basic performance tracking only in this milestone, full optimization in later milestones
      • Query Generation: On-demand or when entity list updates
    • Orchestrator invokes agent HTTP endpoint with job parameters
    • Operates independently - Data Collection Agent reads queries from database
  • Basic Entity Discovery:
    • Company name extraction from ticker
    • Simple competitor identification (hardcoded list or basic API)
    • No entity relationship graph yet (simple flat entity storage only)
  • Query Generation:
    • Generate queries from ticker + basic entities
    • Multiple query variants (5+ per ticker)
    • Query prioritization (simple scoring based on entity relevance)
  • Database Integration via Agent Data API:
    • Writes queries and entities via POST /api/query-strategy/ endpoint
    • Store queries in database (available for Data Collection Agent)
    • Store discovered entities in database (simple flat structure, not a graph)
    • Link entities to tickers
    • Includes agentVersion in all outputs
    • Data Collection Agent reads latest queries from database (replaces hardcoded queries from Milestone 3)

Data Collection Agent (Enhanced)

  • ✅ Replace hardcoded queries from Milestone 3 with queries from Query Strategy Agent
  • ✅ Execute multiple queries per ticker (reads from database)
  • ✅ Merge results from different queries
  • ✅ Basic query performance tracking (tracks which queries return results, stores feedback for future optimization)
  • ✅ Continues to use Agent Data API for writing outputs (already implemented in Milestone 3)
  • ✅ Continues agent versioning and registration (already implemented in Milestone 3)

Task Timeline

Limitations (Acceptable for This Milestone)

  • Simple entity discovery (no SEC filings parsing yet)
  • Hardcoded competitor lists or basic API
  • No entity relationship graph (simple flat entity storage only)
  • Basic query optimization schedule (daily runs, but only tracks performance; full optimization logic in later milestones)
  • Basic query generation only (no AI-enhanced query variants yet)

Success Criteria

  • ✅ Agent versioning is functional for Query Strategy Agent (agent reads active version, includes in outputs)
  • ✅ Full agent registration is functional (agent registers type and instance, reports heartbeat with load)
  • ✅ Orchestrator can spawn and manage multiple Query Strategy Agent instances
  • ✅ Orchestrator distributes jobs across instances using load balancing
  • ✅ Query Strategy Agent discovers basic entities for test tickers
  • ✅ Generates 5+ relevant queries per ticker
  • ✅ Queries and entities are stored via Agent Data API with agentVersion field
  • ✅ Data collection uses queries and collects more relevant data
  • ✅ Newsletter quality improves (more relevant articles)
  • ✅ Entities are stored and can be retrieved

Next Steps

After this milestone, data collection is more intelligent. Milestone 6 will add more data sources and better deduplication.