MediaPulse
Agents/Agent Types

Query Strategy Agent

ID: query-strategy

Purpose

Dynamically discover entities, maintain entity relationships, generate intelligent search queries, and optimize query strategies based on results. This agent ensures the Data Collection Agent knows what to search for and adapts to changing contexts over time.

What the Agent Does

  1. Entity Discovery & Management:

    • Discovers and maintains entity relationships: competitors, suppliers, customers, executives, industry peers, partners
    • Builds and updates entity graph database
    • Extracts entities from regulatory filings, company websites, industry databases
    • Tracks entity changes over time (new competitors, executive changes, partnerships)
    • Identifies key personnel (CEO, CFO, CTO, board members, key engineers)
  2. Dynamic Query Generation:

    • Generates search queries based on current context (company identifier, time period, recent events)
    • Creates queries for use with admin-configured search sources (e.g., Serper.dev, Google Search API)
    • Categorizes queries by type (news, socialMedia, web) for flexible routing
    • Adapts queries based on what's trending or relevant
    • Generates multi-variant queries (different phrasings, synonyms)
    • Considers query effectiveness history
  3. Keyword & Trend Evolution:

    • Tracks keyword popularity and evolution over time
    • Identifies emerging trends and topics
    • Discovers new relevant keywords through AI analysis
    • Monitors industry jargon and terminology changes
    • Tracks hashtag trends on social media
  4. Query Optimization:

    • Analyzes which queries returned high-relevance results
    • Learns from failed or low-yield queries
    • Adjusts query strategies based on collection results
    • A/B tests different query formulations
    • Optimizes query frequency and timing
  5. Context-Aware Query Building:

    • Considers recent events (earnings, product launches, news)
    • Adjusts queries based on current events and trends
    • Incorporates seasonal/cyclical patterns
    • Adapts to breaking news or significant events

Inputs

interface QueryStrategyInput {
  tickerId: number; // Required: Foreign key to the Ticker table
  timeWindow?: {
    start: Date;
    end: Date;
  }; // Optional: Time window for query generation
  jobId?: string; // Optional: Job ID for tracking this run
}

Note: The agent reads all necessary information from the database independently:

  • Company name, identifier, and market information are looked up from the Ticker table based on the tickerId
  • Previous query performance and collection results are read from the database (from Query and DataSource tables)
  • Recent events and context are determined by the agent based on the time window (if provided) and data in the database
  • The agent does not receive data from other agents as input parameters; all communication is via the database

Configurations

These are stored in the AgentConfig table, key: query-strategy

{
  entityDiscovery: {
    enabled: boolean,
    sources: {
      regulatoryFilings: {
        enabled: boolean,
        providers: Array<{
          name: string,                 // 'sec-edgar', 'companies-house', 'sedar', 'asx', etc.
          region: string,                // 'US', 'UK', 'CA', 'AU', etc.
          enabled: boolean,
          apiEndpoint?: string,          // Custom API endpoint if needed
          apiKey?: string,               // API key if required
          lookbackDays: number,          // How far back to analyze
          filingTypes: string[]         // Region-specific filing types
          // US: ['10-K', '10-Q', '8-K', 'DEF 14A']
          // UK: ['annual-return', 'confirmation-statement']
          // CA: ['annual-information-form', 'management-discussion']
          // etc.
        }>
      },
      companyWebsite: {
        enabled: boolean,
        sections: string[]             // ['about', 'investors', 'press', 'careers']
      },
      industryDatabases: {
        enabled: boolean,
        providers: string[]             // ['crunchbase', 'pitchbook', 'custom']
      },
      aiExtraction: {
        enabled: boolean,
        model: string,                  // 'gpt-4'
        extractionPrompt: string       // Template for entity extraction
      }
    },
    entityTypes: {
      competitors: {
        enabled: boolean,
        maxCount: number,              // Max competitors to track
        similarityThreshold: number    // Industry/sector similarity
      },
      suppliers: {
        enabled: boolean,
        minSignificance: 'high' | 'medium' | 'low'
      },
      customers: {
        enabled: boolean,
        minSignificance: 'high' | 'medium' | 'low'
      },
      executives: {
        enabled: boolean,
        roles: string[]                // ['CEO', 'CFO', 'CTO', 'Board Member']
        // Note: Role names may vary by region/language
      },
      industryPeers: {
        enabled: boolean,
        sameSector: boolean,
        sameRegion?: boolean           // Optional: limit to same geographic region
      }
    },
    refreshInterval: {
      entityGraph: number,              // Days between full refresh
      relationships: number,            // Days between relationship updates
      executives: number                // Days between executive list updates
    }
  },

  queryGeneration: {
    enabled: boolean,
    strategies: {
      news: {
        baseQueries: string[],          // Template queries
        queryVariants: number,          // How many variants to generate
        includeSynonyms: boolean,
        includeRelatedEntities: boolean,
        maxQueriesPerSource: number
      },
      socialMedia: {
        baseQueries: string[],
        includeHashtags: boolean,
        includeMentions: boolean,
        includeTrending: boolean,
        maxQueriesPerPlatform: number
      },
      web: {
        baseQueries: string[],
        includeLongTail: boolean,
        maxQueries: number
      }
    },
    aiGeneration: {
      enabled: boolean,
      model: string,                    // 'gpt-4'
      temperature: number,              // 0.7
      maxQueries: number,               // Per category
      generationPrompt: string          // Template
    },
  },

  keywordTracking: {
    enabled: boolean,
    sources: {
      newsTrends: boolean,
      socialTrends: boolean,
      industryReports: boolean
    },
    trackingWindow: number,             // Days to track trends
    minFrequency: number,                // Minimum mentions to track
    aiAnalysis: {
      enabled: boolean,
      model: string,
      analysisPrompt: string
    }
  },

  queryOptimization: {
    enabled: boolean,
    minResultsThreshold: number,         // Queries with fewer results are optimized
    maxResultsThreshold: number,        // Queries with too many results are narrowed
    learningWindow: number,              // Days of history to consider
    metrics: {
      relevanceScore: number,            // Weight for relevance
      resultCount: number,               // Weight for result count
      coverageScore: number              // Weight for topic coverage
    },
    optimizationFrequency: 'daily' | 'weekly' | 'monthly',
    minDataPoints: number,              // Min queries to analyze before optimizing
    aiOptimization: {
      enabled: boolean,
      model: string,
      optimizationPrompt: string
    }
  },

  ai: {
    entityExtractionModel: 'gpt-4',
    queryGenerationModel: 'gpt-4',
    optimizationModel: 'gpt-4',
    temperature: 0.7,
    maxTokens: 2000
  }
}

Outputs

{
  agentId: 'query-strategy',
  agentVersion: string,                    // Semantic version (e.g., "1.2.3") of the agent that generated this output
  jobId: string,                     // Job ID from orchestrator
  tickerId: number,
  timestamp: Date,
  executionTime: number,
  entityGraph: {
    companyName: string,
    entities: {
      competitors: Array<{
        identifier?: {
          type: 'ticker' | 'isin' | 'cusip' | 'sedol' | 'custom',
          value: string
        },
        companyName: string,
        similarityScore: number,        // 0-1
        relationshipStrength: 'strong' | 'medium' | 'weak',
        lastUpdated: Date
      }>,
      suppliers: Array<{
        name: string,
        type: 'manufacturing' | 'software' | 'services' | 'other',
        significance: 'high' | 'medium' | 'low',
        lastUpdated: Date
      }>,
      customers: Array<{
        name: string,
        type: 'enterprise' | 'consumer' | 'government',
        significance: 'high' | 'medium' | 'low',
        lastUpdated: Date
      }>,
      executives: Array<{
        name: string,
        role: string,
        tenure: number,                  // Years
        publicProfile: boolean,
        lastUpdated: Date
      }>,
      industryPeers: Array<{
        identifier?: {
          type: 'ticker' | 'isin' | 'cusip' | 'sedol' | 'custom',
          value: string
        },
        companyName: string,
        similarityFactors: string[],    // ['sector', 'revenue', 'industry']
        lastUpdated: Date
      }>
    },
    relationships: Array<{
      from: string,                     // Entity ID
      to: string,                       // Entity ID
      type: 'competitor' | 'supplier' | 'customer' | 'partner' | 'executive',
      strength: number,                  // 0-1
      sources: string[],                 // Where relationship was discovered
      lastUpdated: Date
    }>,
    metadata: {
      graphVersion: number,
      lastFullRefresh: Date,
      entitiesCount: number,
      relationshipsCount: number
    }
  },
  queries: {
    news: Array<{
      id: string,
      query: string,
      source?: string,                   // Optional: suggested search source ID (e.g., 'serper-dev')
      priority: 'high' | 'medium' | 'low',
      rationale: string                  // Why this query was generated
    }>,
    socialMedia: Array<{
      id: string,
      query: string,
      platform: 'twitter' | 'reddit' | 'linkedin',
      priority: 'high' | 'medium' | 'low',
      queryType: 'company' | 'competitor' | 'executive' | 'trend' | 'event',
      hashtags?: string[],
      mentions?: string[],
      rationale: string
    }>,
    web: Array<{
      id: string,
      query: string,
      searchEngine?: string,             // 'google', 'bing', etc.
      priority: 'high' | 'medium' | 'low',
      rationale: string
    }>
  },
  keywords: {
    primary: string[],                   // Core keywords for this company/identifier
    trending: Array<{
      keyword: string,
      trend: 'rising' | 'falling' | 'stable',
      relevanceScore: number,
      sources: string[],
      firstSeen: Date,
      lastSeen: Date
    }>,
    emerging: Array<{
      keyword: string,
      relevanceScore: number,
      confidence: number,
      sources: string[]
    }>
  },
  metadata: {
    queriesGenerated: number,
    entitiesDiscovered: number,
    entitiesUpdated: number,
    keywordsTracked: number,
    optimizationApplied: boolean
  }
}

Process (Step-by-Step)

Initialization Phase

  • Load agent configuration from database (AgentConfig table, key: query-strategy)
  • Look up company information from database (Ticker table) based on tickerId:
    • Company name
    • Market region, exchange, currency
  • Check if entity graph exists for company tickerId in database
  • Load previous query performance data from database:
    • Query Query table for previous queries and their performance metrics
    • Query DataSource table for collection results and relevance feedback
  • Initialize AI clients (models from config)
  • Determine regulatory filing sources based on market region from database

Entity Graph Management

Entity Discovery

This is run if the entity graph is not found or if the refresh interval has been reached.

  • Query regulatory filing APIs based on market region (configured in entityDiscovery.sources.regulatoryFilings):
    • SEC EDGAR for US companies
    • Companies House for UK companies
    • SEDAR for Canadian companies
    • ASX for Australian companies
    • Other region-specific providers as configured
  • Extract entity mentions from filings using AI (model from ai.entityExtractionModel):
    • Competitors mentioned in risk factors or competitive sections
    • Suppliers and customers from business relationships
    • Industry peers from market discussions
  • Scrape company website sections (configured in entityDiscovery.sources.companyWebsite.sections):
    • Extract executive information from "About" or "Leadership" pages
    • Extract partner information from "Partners" or "Customers" pages
  • Query industry databases (if entityDiscovery.sources.industryDatabases.enabled):
    • Crunchbase, Pitchbook, or custom providers as configured
  • Use AI (model from ai.entityExtractionModel) to extract and structure entities from all sources
  • Build/update the entity graph in database

Relationship Extraction

  • Analyze regulatory filings for relationship descriptions:
    • Risk factor sections for competitor mentions
    • Business relationship sections for supplier/customer relationships
    • Management discussion sections for industry context
  • Extract relationship descriptions from earnings call transcripts (if available for the region and configured)
  • Use AI (model from ai.entityExtractionModel) to identify relationship types and strengths
  • Update the relationship graph in database

Query Generation

Base Query Construction

  • Generate base queries from the company tickerId, company name (from database), and industry
  • Include entity-based queries (competitors, executives) from the entity graph
  • Determine recent events and context from the database (earnings dates, product launches, news from DataSource table)
  • Generate query variants (synonyms, phrasings) based on configuration
  • Adapt queries to regional language and terminology based on market region from database

AI-Enhanced Query Generation

  • Use AI (model from ai.queryGenerationModel) to generate additional creative queries
  • Consider trending topics and keywords from keyword tracking
  • Generate context-aware queries based on the time period and recent events
  • Create platform-specific queries (news, social media, web) with appropriate formatting

Query Prioritization

  • Score queries by expected relevance (based on entity importance, context, and performance history)
  • Prioritize queries based on entity importance and recent events (determined from database)
  • Consider query performance history from database (previous collection results from DataSource table)
  • Limit queries per source based on queryGeneration.strategies.{category}.maxQueriesPerSource configuration

Keyword Tracking

This is run if keywordTracking.enabled is true.

  • Analyze recent news and social media for trending keywords (from configured sources in keywordTracking.sources)
  • Use AI (model from keywordTracking.aiAnalysis.model) to identify emerging keywords related to the company
  • Track keyword evolution over time (within keywordTracking.trackingWindow)
  • Score keywords by relevance to the company and identifier
  • Filter keywords by minimum frequency threshold (keywordTracking.minFrequency)

Query Optimization

This is run if previous results are available in the database and queryOptimization.enabled is true.

  • Query database for previous query performance:
    • Read query performance metrics from Query table (linked to DataSource items)
    • Analyze collection results from DataSource table (relevance scores, result counts)
  • Analyze query performance:
    • Identify high-performing queries (high relevance scores, good result counts)
    • Identify low-performing queries (below minResultsThreshold or above maxResultsThreshold)
  • Use AI (model from ai.optimizationModel) to generate optimized query variants for underperforming queries
  • Update query generation strategies based on optimization results

Output Generation

  • Structure the entity graph
  • Format the queries for each source/platform
  • Include metadata and rationale for each query

Storage

  • Save the entity graph to the database (available for other agents to read independently)
  • Store the generated queries in the database (Data Collection Agent reads these asynchronously)
  • Update the query performance metrics (used by Learning Agent and future optimization cycles)
  • Log the optimization decisions
  • No direct agent communication: All outputs are persisted to the database; other agents read independently when they run

Sequence Diagram