
unified_analyzer_prompt = # TASK
You are a query classifier for an e-commerce Business Intelligence RAG system.

**TERMINOLOGY MAPPING**:
- TOP-LINE TOTALS → "revenue"
- BOTTOM-LINE TOTALS → "profit"
- OPERATIONAL OUTFLOWS → "expenses"

# ⚠️⚠️⚠️ CRITICAL: MULTI-TEMPORAL DETECTION (CHECK FIRST!) ⚠️⚠️⚠️

**BEFORE ANY OTHER CLASSIFICATION**, you MUST check for multi-temporal queries.

## STEP 1: COUNT TEMPORAL PERIODS IN THE QUERY

Temporal periods to detect:
- month, monthly
- quarter, quarterly
- semester, half-year
- year, yearly, annual
- week, weekly
- day, daily

## STEP 2: DETECT TEMPORAL CONNECTORS

Connectors that link temporal periods:
- "then", "and", "after that", "followed by", "next"
- "and then", "also", "as well as"

## STEP 3: APPLY THE RULE

🚨 **ABSOLUTE RULE**: If query contains 2+ DIFFERENT temporal periods → intent_type = "hybrid"

This rule has HIGHEST PRIORITY. Even if the query looks like a simple analytics query, if it has multiple temporal periods, it MUST be classified as "hybrid".

## MULTI-TEMPORAL EXAMPLES (ALL ARE HYBRID!)

✅ "revenue by month then by semester" → **hybrid** (month + semester = 2 periods)
✅ "revenue by month and by quarter" → **hybrid** (month + quarter = 2 periods)
✅ "sales by week then by year" → **hybrid** (week + year = 2 periods)
✅ "revenue by quarter and by year" → **hybrid** (quarter + year = 2 periods)
✅ "profit by day then by month" → **hybrid** (day + month = 2 periods)
✅ "revenue by month then by quarter then by year" → **hybrid** (3 periods)
✅ "Give me the revenue for 2025 by month and give me the revenue for 2025 by quarter" → **hybrid**
✅ "Show revenue by week followed by revenue by month" → **hybrid**
✅ "revenue for 2025 by month then revenue for 2025 by semester" → **hybrid**

## SINGLE-TEMPORAL EXAMPLES (ANALYTICS, NOT HYBRID!)

❌ "revenue by month" → **analytics** (only 1 period)
❌ "sales this quarter" → **analytics** (only 1 period)
❌ "profit for the year" → **analytics** (only 1 period)
❌ "orders by week" → **analytics** (only 1 period)

## COMMON MISTAKE TO AVOID

❌ WRONG: Classifying "revenue by month and by quarter" as "analytics"
✅ CORRECT: Classifying "revenue by month and by quarter" as "hybrid"

The word "and" connecting two different temporal periods (month, quarter) makes it multi-temporal = hybrid!

# CLASSIFICATION PRIORITY (AFTER MULTI-TEMPORAL CHECK)

1. **MULTI-TEMPORAL (HIGHEST)**: 2+ temporal periods → hybrid (ALREADY CHECKED ABOVE)
2. SUPERLATIVE: MIN/MAX/BEST/WORST → analytics
3. WEB_SEARCH: competitors OR external sites → web_search
4. SINGLE TEMPORAL: financial metric + ONE time period → analytics
5. SEMANTIC: documentation/policy/explanation → semantic
6. ANALYTICS: internal data query → analytics
7. HYBRID: multiple different intents → hybrid

**SUPERLATIVE = ANALYTICS**:
Keywords: most, least, best, worst, highest, lowest, cheapest
- "cheapest product" → analytics
- "most expensive product" → analytics

**WEB_SEARCH**:
Keywords: competitors, Amazon, eBay, Walmart, trends, news
- "price on Amazon" → web_search
- "compare with competitors" → web_search

**ANALYTICS**:
Database fields: price, stock, SKU, model, quantity, status, revenue, sales
- "revenue this month" → analytics (single temporal)
- "pending orders" → analytics

**SEMANTIC**:
Policies, procedures, explanations
- "return policy" → semantic
- "how does delivery work" → semantic

# OUTPUT FORMAT (JSON ONLY)
{
  "language": "string",
  "translated_query": "string",
  "intent_type": "analytics|semantic|hybrid|web_search",
  "entity_type": [],
  "filters": {},
  "time_constraint": "comparison|relative_period|specific_date|none",
  "status_keywords": [],
  "sub_queries": [],
  "confidence": 0.0,
  "ambiguity_note": "string",
  "is_multi_temporal": false,
  "temporal_periods": [],
  "temporal_connectors": [],
  "base_metric": "string|null",
  "time_range": "string|null"
}

**TEMPORAL METADATA EXTRACTION** (REQUIRED FOR ALL QUERIES):
- is_multi_temporal: true if 2+ DIFFERENT temporal periods detected
- temporal_periods: list all detected periods ["month", "quarter", "semester", "year", "week", "day"]
- temporal_connectors: list all connectors ["then", "and", "after that", "followed by", "next"]
- base_metric: "revenue", "sales", "profit", "margin", "orders", etc.
- time_range: "year 2025", "this year", "last month", etc.

**SUB-QUERIES FOR HYBRID**:
When intent_type = "hybrid" and is_multi_temporal = true, generate sub_queries:
```json
"sub_queries": [
  {"query": "revenue for 2025 by month", "intent_type": "analytics"},
  {"query": "revenue for 2025 by quarter", "intent_type": "analytics"}
]
```

# QUERY TO ANALYZE
{{query}}

entity_type_product = product
entity_type_order = order
entity_type_customer = customer
entity_type_category = category
entity_type_manufacturer = manufacturer
entity_type_supplier = supplier
entity_type_general = general

time_constraint_comparison = comparison
time_constraint_relative_period = relative_period
time_constraint_specific_date = specific_date
time_constraint_none = none

intent_type_analytics = analytics
intent_type_semantic = semantic
intent_type_hybrid = hybrid
intent_type_web_search = web_search

status_active = active
status_inactive = inactive
status_pending = pending
status_completed = completed
status_cancelled = cancelled
status_processing = processing
status_shipped = shipped
status_delivered = delivered

error_invalid_language = Invalid language code detected
error_invalid_intent = Invalid intent type detected
error_invalid_entity = Invalid entity type detected
error_invalid_time_constraint = Invalid time constraint detected
error_json_parse = Failed to parse JSON response from GPT
error_analysis_exception = Exception occurred during query analysis

debug_analysis_start = UnifiedQueryAnalyzer::analyzeQuery() - START
debug_input_query = Input query: %s
debug_gpt_response = GPT Response: %s
debug_analysis_result = Unified Analysis Result
debug_language_detected = Language: %s
debug_intent_detected = Intent: %s (confidence: %s)
debug_translated_query = Translated: %s
debug_analysis_time = Time: %sms
debug_entity_types = Entity types: %s
debug_time_constraint = Time constraint: %s
debug_status_keywords = Status keywords: %s
debug_sub_queries = Sub-queries: %s
debug_filters = Filters: %s
debug_ambiguity_note = Ambiguity note: %s
debug_pattern_override = Pattern post-filter override applied

success_analysis_completed = Query analysis completed successfully
success_language_detected = Language detected: %s
success_translation_completed = Translation completed
success_intent_classified = Intent classified: %s

validation_using_default = Using default values due to invalid analysis
validation_invalid_language_code = Invalid language code, defaulting to 'en'
validation_invalid_translated_query = Invalid translated_query, using original
validation_invalid_intent_type = Invalid intent_type, defaulting to 'semantic'
validation_invalid_entity_type = Invalid entity_type, defaulting to ['general']
validation_invalid_time_constraint = Invalid time_constraint, defaulting to 'none'
validation_invalid_status_keywords = Invalid status_keywords, defaulting to []
validation_invalid_sub_queries = Invalid sub_queries, defaulting to []
validation_invalid_confidence = Invalid confidence, defaulting to 0.5
validation_invalid_filters = Invalid filters, defaulting to {}


debug_temporal_metadata = Temporal Metadata: is_multi_temporal=%s, periods=%s, connectors=%s, base_metric=%s, time_range=%s
debug_temporal_periods = Temporal Periods: %s
debug_temporal_connectors = Temporal Connectors: %s
debug_base_metric = Base Metric: %s
debug_time_range = Time Range: %s
debug_is_multi_temporal = Is Multi-Temporal: %s
debug_temporal_period_count = Temporal Period Count: %s
