Hybrid Retrieval System for Earnings Call Interpretation

Combining vector retrieval, knowledge graphs, and agentic planning to improve financial due-diligence workflows.

Graph RAG ChromaDB Knowledge Graph Agentic AI Financial NLP

TL;DR

The Decision That Triggered This Build

Earnings calls are core inputs for institutional due diligence, but each transcript can span 50 to 300 pages and mix executive narratives, analyst Q&A, and structured tables. Analysts need high-confidence answers quickly across firms and quarters.

The central product decision was whether retrieval could move beyond chunk relevance into reliable, evidence-grounded financial reasoning for summaries, Q&A, and peer comparisons.

What We Needed to Compare

Single-company Q&A

Answer targeted call questions with grounded transcript and metric evidence.

Quarter-over-quarter narrative

Compare guidance, risk posture, and key metric movements against prior periods.

Peer benchmarking

Contrast companies in the same period without inventing unsupported numeric claims.

Why this mattered

Retrieval-only systems can fetch relevant snippets, but often fail to combine semantic, numeric, and relational evidence into one coherent answer path.

Why Baseline RAG Alone Was Not Enough

Context isolation

Top chunks capture local semantics but miss cross-document and cross-quarter structure.

Weak numeric grounding

Without table retrieval, generated outputs drift on values and temporal attribution.

No tool policy

Single-pass retrieval cannot adapt to whether a question needs metrics, risks, or transcript tone.

The Data Behind the System

Three evidence layers were integrated to support semantic retrieval and symbolic reasoning in one pipeline.

Vector Store (ChromaDB)

  • Transcript chunks embedded with all-MiniLM-L6-v2
  • Metadata filters by ticker, filing type, and quarter
  • Quarter-relaxed fallback retrieval when exact match is missing

Structured Parquet Tables

  • Metrics, segment entries, and risk indicators
  • Shared quarter normalization across modules
  • Deterministic filters to anchor numeric outputs

Graph Layer

  • Company-quarter-metric entity schema
  • NetworkX implementation with Neo4j-compatible design
  • Foundation for richer temporal and peer reasoning

Technical Approach

A planner model selects tools per query and returns a JSON policy (use_vector_search, use_metrics, use_segments, use_risks, plus vector_query). Retrieved evidence is fused into one JSON blob passed to the final response model with explicit grounding instructions.

Planner + Tool Routing

Intent-aware retrieval selection with safe fallback to all-tools mode when planner output is invalid.

Evidence Fusion

Combines transcript snippets, metrics, segment data, and risk rows into a single context package.

Hallucination Controls

Low temperature and explicit prompts to avoid guessing numbers when data is absent.

Design choice

Intent handling is prompt-driven instead of hard-coded, making the system easier to extend with new tools and workflows.

Evaluation and Benchmarks

LLM-as-a-judge pilots showed strong tone/guidance synthesis but weaker reliability on strict numeric and temporal fact alignment.

Case Overall Relevance Factual Grounded Complete Clear
Tone and guidance 1.0 1.0 1.0 1.0 1.0 1.0
Risk relevance 0.1 1.0 0.0 0.0 0.2 1.0
Main weakness

Numeric grounding and quarter attribution remain the dominant error source in difficult queries.

Recommendation

Deploy as analyst-assist with guardrails

Use the system for exploratory analysis, summaries, and benchmarking, while requiring citation validation and confidence checks before high-stakes numeric decisions.

Next Iteration Priorities