Query Strategy Agent
ID: query-strategy
Purpose
Dynamically discover entities, maintain entity relationships, generate intelligent search queries, and optimize query strategies based on results. This agent ensures the Data Collection Agent knows what to search for and adapts to changing contexts over time.
What the Agent Does
-
Entity Discovery & Management:
- Discovers and maintains entity relationships: competitors, suppliers, customers, executives, industry peers, partners
- Builds and updates entity graph database
- Extracts entities from regulatory filings, company websites, industry databases
- Tracks entity changes over time (new competitors, executive changes, partnerships)
- Identifies key personnel (CEO, CFO, CTO, board members, key engineers)
-
Dynamic Query Generation:
- Generates search queries based on current context (company identifier, time period, recent events)
- Creates queries for use with admin-configured search sources (e.g., Serper.dev, Google Search API)
- Categorizes queries by type (news, socialMedia, web) for flexible routing
- Adapts queries based on what's trending or relevant
- Generates multi-variant queries (different phrasings, synonyms)
- Considers query effectiveness history
-
Keyword & Trend Evolution:
- Tracks keyword popularity and evolution over time
- Identifies emerging trends and topics
- Discovers new relevant keywords through AI analysis
- Monitors industry jargon and terminology changes
- Tracks hashtag trends on social media
-
Query Optimization:
- Analyzes which queries returned high-relevance results
- Learns from failed or low-yield queries
- Adjusts query strategies based on collection results
- A/B tests different query formulations
- Optimizes query frequency and timing
-
Context-Aware Query Building:
- Considers recent events (earnings, product launches, news)
- Adjusts queries based on current events and trends
- Incorporates seasonal/cyclical patterns
- Adapts to breaking news or significant events
Inputs
interface QueryStrategyInput {
tickerId: number; // Required: Foreign key to the Ticker table
timeWindow?: {
start: Date;
end: Date;
}; // Optional: Time window for query generation
jobId?: string; // Optional: Job ID for tracking this run
}Note: The agent reads all necessary information from the database independently:
- Company name, identifier, and market information are looked up from the
Tickertable based on thetickerId - Previous query performance and collection results are read from the database (from
QueryandDataSourcetables) - Recent events and context are determined by the agent based on the time window (if provided) and data in the database
- The agent does not receive data from other agents as input parameters; all communication is via the database
Configurations
These are stored in the AgentConfig table, key: query-strategy
{
entityDiscovery: {
enabled: boolean,
sources: {
regulatoryFilings: {
enabled: boolean,
providers: Array<{
name: string, // 'sec-edgar', 'companies-house', 'sedar', 'asx', etc.
region: string, // 'US', 'UK', 'CA', 'AU', etc.
enabled: boolean,
apiEndpoint?: string, // Custom API endpoint if needed
apiKey?: string, // API key if required
lookbackDays: number, // How far back to analyze
filingTypes: string[] // Region-specific filing types
// US: ['10-K', '10-Q', '8-K', 'DEF 14A']
// UK: ['annual-return', 'confirmation-statement']
// CA: ['annual-information-form', 'management-discussion']
// etc.
}>
},
companyWebsite: {
enabled: boolean,
sections: string[] // ['about', 'investors', 'press', 'careers']
},
industryDatabases: {
enabled: boolean,
providers: string[] // ['crunchbase', 'pitchbook', 'custom']
},
aiExtraction: {
enabled: boolean,
model: string, // 'gpt-4'
extractionPrompt: string // Template for entity extraction
}
},
entityTypes: {
competitors: {
enabled: boolean,
maxCount: number, // Max competitors to track
similarityThreshold: number // Industry/sector similarity
},
suppliers: {
enabled: boolean,
minSignificance: 'high' | 'medium' | 'low'
},
customers: {
enabled: boolean,
minSignificance: 'high' | 'medium' | 'low'
},
executives: {
enabled: boolean,
roles: string[] // ['CEO', 'CFO', 'CTO', 'Board Member']
// Note: Role names may vary by region/language
},
industryPeers: {
enabled: boolean,
sameSector: boolean,
sameRegion?: boolean // Optional: limit to same geographic region
}
},
refreshInterval: {
entityGraph: number, // Days between full refresh
relationships: number, // Days between relationship updates
executives: number // Days between executive list updates
}
},
queryGeneration: {
enabled: boolean,
strategies: {
news: {
baseQueries: string[], // Template queries
queryVariants: number, // How many variants to generate
includeSynonyms: boolean,
includeRelatedEntities: boolean,
maxQueriesPerSource: number
},
socialMedia: {
baseQueries: string[],
includeHashtags: boolean,
includeMentions: boolean,
includeTrending: boolean,
maxQueriesPerPlatform: number
},
web: {
baseQueries: string[],
includeLongTail: boolean,
maxQueries: number
}
},
aiGeneration: {
enabled: boolean,
model: string, // 'gpt-4'
temperature: number, // 0.7
maxQueries: number, // Per category
generationPrompt: string // Template
},
},
keywordTracking: {
enabled: boolean,
sources: {
newsTrends: boolean,
socialTrends: boolean,
industryReports: boolean
},
trackingWindow: number, // Days to track trends
minFrequency: number, // Minimum mentions to track
aiAnalysis: {
enabled: boolean,
model: string,
analysisPrompt: string
}
},
queryOptimization: {
enabled: boolean,
minResultsThreshold: number, // Queries with fewer results are optimized
maxResultsThreshold: number, // Queries with too many results are narrowed
learningWindow: number, // Days of history to consider
metrics: {
relevanceScore: number, // Weight for relevance
resultCount: number, // Weight for result count
coverageScore: number // Weight for topic coverage
},
optimizationFrequency: 'daily' | 'weekly' | 'monthly',
minDataPoints: number, // Min queries to analyze before optimizing
aiOptimization: {
enabled: boolean,
model: string,
optimizationPrompt: string
}
},
ai: {
entityExtractionModel: 'gpt-4',
queryGenerationModel: 'gpt-4',
optimizationModel: 'gpt-4',
temperature: 0.7,
maxTokens: 2000
}
}Outputs
{
agentId: 'query-strategy',
agentVersion: string, // Semantic version (e.g., "1.2.3") of the agent that generated this output
jobId: string, // Job ID from orchestrator
tickerId: number,
timestamp: Date,
executionTime: number,
entityGraph: {
companyName: string,
entities: {
competitors: Array<{
identifier?: {
type: 'ticker' | 'isin' | 'cusip' | 'sedol' | 'custom',
value: string
},
companyName: string,
similarityScore: number, // 0-1
relationshipStrength: 'strong' | 'medium' | 'weak',
lastUpdated: Date
}>,
suppliers: Array<{
name: string,
type: 'manufacturing' | 'software' | 'services' | 'other',
significance: 'high' | 'medium' | 'low',
lastUpdated: Date
}>,
customers: Array<{
name: string,
type: 'enterprise' | 'consumer' | 'government',
significance: 'high' | 'medium' | 'low',
lastUpdated: Date
}>,
executives: Array<{
name: string,
role: string,
tenure: number, // Years
publicProfile: boolean,
lastUpdated: Date
}>,
industryPeers: Array<{
identifier?: {
type: 'ticker' | 'isin' | 'cusip' | 'sedol' | 'custom',
value: string
},
companyName: string,
similarityFactors: string[], // ['sector', 'revenue', 'industry']
lastUpdated: Date
}>
},
relationships: Array<{
from: string, // Entity ID
to: string, // Entity ID
type: 'competitor' | 'supplier' | 'customer' | 'partner' | 'executive',
strength: number, // 0-1
sources: string[], // Where relationship was discovered
lastUpdated: Date
}>,
metadata: {
graphVersion: number,
lastFullRefresh: Date,
entitiesCount: number,
relationshipsCount: number
}
},
queries: {
news: Array<{
id: string,
query: string,
source?: string, // Optional: suggested search source ID (e.g., 'serper-dev')
priority: 'high' | 'medium' | 'low',
rationale: string // Why this query was generated
}>,
socialMedia: Array<{
id: string,
query: string,
platform: 'twitter' | 'reddit' | 'linkedin',
priority: 'high' | 'medium' | 'low',
queryType: 'company' | 'competitor' | 'executive' | 'trend' | 'event',
hashtags?: string[],
mentions?: string[],
rationale: string
}>,
web: Array<{
id: string,
query: string,
searchEngine?: string, // 'google', 'bing', etc.
priority: 'high' | 'medium' | 'low',
rationale: string
}>
},
keywords: {
primary: string[], // Core keywords for this company/identifier
trending: Array<{
keyword: string,
trend: 'rising' | 'falling' | 'stable',
relevanceScore: number,
sources: string[],
firstSeen: Date,
lastSeen: Date
}>,
emerging: Array<{
keyword: string,
relevanceScore: number,
confidence: number,
sources: string[]
}>
},
metadata: {
queriesGenerated: number,
entitiesDiscovered: number,
entitiesUpdated: number,
keywordsTracked: number,
optimizationApplied: boolean
}
}Process (Step-by-Step)
Initialization Phase
- Load agent configuration from database (
AgentConfigtable, key:query-strategy) - Look up company information from database (
Tickertable) based ontickerId:- Company name
- Market region, exchange, currency
- Check if entity graph exists for company
tickerIdin database - Load previous query performance data from database:
- Query
Querytable for previous queries and their performance metrics - Query
DataSourcetable for collection results and relevance feedback
- Query
- Initialize AI clients (models from config)
- Determine regulatory filing sources based on market region from database
Entity Graph Management
Entity Discovery
This is run if the entity graph is not found or if the refresh interval has been reached.
- Query regulatory filing APIs based on market region (configured in
entityDiscovery.sources.regulatoryFilings):- SEC EDGAR for US companies
- Companies House for UK companies
- SEDAR for Canadian companies
- ASX for Australian companies
- Other region-specific providers as configured
- Extract entity mentions from filings using AI (model from
ai.entityExtractionModel):- Competitors mentioned in risk factors or competitive sections
- Suppliers and customers from business relationships
- Industry peers from market discussions
- Scrape company website sections (configured in
entityDiscovery.sources.companyWebsite.sections):- Extract executive information from "About" or "Leadership" pages
- Extract partner information from "Partners" or "Customers" pages
- Query industry databases (if
entityDiscovery.sources.industryDatabases.enabled):- Crunchbase, Pitchbook, or custom providers as configured
- Use AI (model from
ai.entityExtractionModel) to extract and structure entities from all sources - Build/update the entity graph in database
Relationship Extraction
- Analyze regulatory filings for relationship descriptions:
- Risk factor sections for competitor mentions
- Business relationship sections for supplier/customer relationships
- Management discussion sections for industry context
- Extract relationship descriptions from earnings call transcripts (if available for the region and configured)
- Use AI (model from
ai.entityExtractionModel) to identify relationship types and strengths - Update the relationship graph in database
Query Generation
Base Query Construction
- Generate base queries from the company
tickerId, company name (from database), and industry - Include entity-based queries (competitors, executives) from the entity graph
- Determine recent events and context from the database (earnings dates, product launches, news from
DataSourcetable) - Generate query variants (synonyms, phrasings) based on configuration
- Adapt queries to regional language and terminology based on market region from database
AI-Enhanced Query Generation
- Use AI (model from
ai.queryGenerationModel) to generate additional creative queries - Consider trending topics and keywords from keyword tracking
- Generate context-aware queries based on the time period and recent events
- Create platform-specific queries (news, social media, web) with appropriate formatting
Query Prioritization
- Score queries by expected relevance (based on entity importance, context, and performance history)
- Prioritize queries based on entity importance and recent events (determined from database)
- Consider query performance history from database (previous collection results from
DataSourcetable) - Limit queries per source based on
queryGeneration.strategies.{category}.maxQueriesPerSourceconfiguration
Keyword Tracking
This is run if keywordTracking.enabled is true.
- Analyze recent news and social media for trending keywords (from configured sources in
keywordTracking.sources) - Use AI (model from
keywordTracking.aiAnalysis.model) to identify emerging keywords related to the company - Track keyword evolution over time (within
keywordTracking.trackingWindow) - Score keywords by relevance to the company and identifier
- Filter keywords by minimum frequency threshold (
keywordTracking.minFrequency)
Query Optimization
This is run if previous results are available in the database and queryOptimization.enabled is true.
- Query database for previous query performance:
- Read query performance metrics from
Querytable (linked toDataSourceitems) - Analyze collection results from
DataSourcetable (relevance scores, result counts)
- Read query performance metrics from
- Analyze query performance:
- Identify high-performing queries (high relevance scores, good result counts)
- Identify low-performing queries (below
minResultsThresholdor abovemaxResultsThreshold)
- Use AI (model from
ai.optimizationModel) to generate optimized query variants for underperforming queries - Update query generation strategies based on optimization results
Output Generation
- Structure the entity graph
- Format the queries for each source/platform
- Include metadata and rationale for each query
Storage
- Save the entity graph to the database (available for other agents to read independently)
- Store the generated queries in the database (Data Collection Agent reads these asynchronously)
- Update the query performance metrics (used by Learning Agent and future optimization cycles)
- Log the optimization decisions
- No direct agent communication: All outputs are persisted to the database; other agents read independently when they run