Combining vector retrieval, knowledge graphs, and agentic planning to improve financial due-diligence workflows.
Earnings calls are core inputs for institutional due diligence, but each transcript can span 50 to 300 pages and mix executive narratives, analyst Q&A, and structured tables. Analysts need high-confidence answers quickly across firms and quarters.
The central product decision was whether retrieval could move beyond chunk relevance into reliable, evidence-grounded financial reasoning for summaries, Q&A, and peer comparisons.
Answer targeted call questions with grounded transcript and metric evidence.
Compare guidance, risk posture, and key metric movements against prior periods.
Contrast companies in the same period without inventing unsupported numeric claims.
Retrieval-only systems can fetch relevant snippets, but often fail to combine semantic, numeric, and relational evidence into one coherent answer path.
Top chunks capture local semantics but miss cross-document and cross-quarter structure.
Without table retrieval, generated outputs drift on values and temporal attribution.
Single-pass retrieval cannot adapt to whether a question needs metrics, risks, or transcript tone.
Three evidence layers were integrated to support semantic retrieval and symbolic reasoning in one pipeline.
A planner model selects tools per query and returns a JSON policy (use_vector_search, use_metrics, use_segments, use_risks, plus vector_query). Retrieved evidence is fused into one JSON blob passed to the final response model with explicit grounding instructions.
Intent-aware retrieval selection with safe fallback to all-tools mode when planner output is invalid.
Combines transcript snippets, metrics, segment data, and risk rows into a single context package.
Low temperature and explicit prompts to avoid guessing numbers when data is absent.
Intent handling is prompt-driven instead of hard-coded, making the system easier to extend with new tools and workflows.
LLM-as-a-judge pilots showed strong tone/guidance synthesis but weaker reliability on strict numeric and temporal fact alignment.
| Case | Overall | Relevance | Factual | Grounded | Complete | Clear |
|---|---|---|---|---|---|---|
| Tone and guidance | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
| Risk relevance | 0.1 | 1.0 | 0.0 | 0.0 | 0.2 | 1.0 |
Numeric grounding and quarter attribution remain the dominant error source in difficult queries.
Use the system for exploratory analysis, summaries, and benchmarking, while requiring citation validation and confidence checks before high-stakes numeric decisions.