feat(trace): generalize the RAG trace to all agents and export it to a file by pjmalandrino · Pull Request #42 · docling-project/docling-agent

pjmalandrino · 2026-06-26T09:11:12Z

Summary

Implements #37 ("Logs: export session logs") by generalizing the run_with_trace pattern from #39 to every agent and letting the orchestrator compose the sub-agent traces into a tree that exports to a single JSON file for debugging.

This is the "similar tracing capability across all agents, exportable to a file" follow-up suggested on #39 — built as a value object (no global state, no coupling to the logging subsystem).

Design doc: docs/design/37-export-session-trace.md.

The logical path from #39

#39 gave RAG three layers (RAGIteration → RAGResult → RAGTrace) + run_with_trace(). This PR lifts that shape to the base class:

AgentStep / AgentTrace (new agent_trace.py) — generic counterparts. AgentTrace holds ordered steps, nested children (sub-agent traces), timing, model id, and the produced document on output (kept in-memory, excluded from serialization; result_name is the persisted pointer). children uses SerializeAsAny so nested subclass fields (e.g. RAG) survive serialization.
BaseDoclingAgent.run_with_trace() — concrete default: times run() and wraps the result, so every agent exposes a trace (timing + model + result) with no bespoke code. run() stays the source of truth for the document.
RAGTrace becomes a subclass of AgentTrace — #39 is fully preserved (same fields, same construction, covariant return); a RAG run nests straight into the tree. RAG now builds the answer doc onto output and run() returns it.
DoclingOrchestratorAgent.run_task_with_trace() — composes the tree by recording each dispatched sub-agent's trace. run_task() is unchanged and incurs no overhead when tracing is off.
LoggingConfig.trace_path + CLI export; public re-exports AgentTrace / AgentStep.

tree = orchestrator.run_task_with_trace(task)
tree.save("run.json")          # orchestrator → children: [enricher, rag, ...]

Why this shape (vs the first revision)

The first revision captured raw LLM I/O via an ambient global recorder hooked into the logging functions. It worked for the CLI but introduced global mutable state and concurrency hazards for a server consumer (Docling Studio), and coupled tracing to logging. This revision is a pure value-object tree — same generic reach, none of those problems — and is literally "the same as #39, generalized". Raw prompt/response capture is deferred and becomes two optional fields on AgentStep if wanted later.

Scope / honesty

All agents get a useful trace now (timing, model, result, tree composition). Fine-grained steps for enricher/writer/editor/extractor are an additive follow-up (RAG already has the richest form).
No behaviour change to run() / run_task().

Tests

tests/test_agent_trace.py: base default, run/run_with_trace equivalence, RAG-as-AgentTrace, output excluded from serialization, save round-trip, nested RAGTrace subclass fields preserved, orchestrator tree composition, no-op when off. #39's test_rag_trace.py untouched and green; orchestrator fakes widened to the real run signature.

uv run pytest → 116 passed
uv run ruff check docling_agent tests → clean
uv run mypy docling_agent → clean (30 files)

Closes #37

github-actions · 2026-06-26T09:11:23Z

✅ DCO Check Passed

Thanks @pjmalandrino, all your commits are properly signed off. 🎉

mergify · 2026-06-26T09:11:47Z

Merge Protections

🔴 1 of 2 protections blocking · waiting on 👀 reviews

	Protection	Waiting on
🔴	Require two reviewer for test updates	👀 reviews
🟢	Enforce conventional commit	—

🔴 Require two reviewer for test updates

Waiting for

#approved-reviews-by >= 1

This rule is failing.

When test data is updated, we require two reviewers

#approved-reviews-by >= 1

Show 1 satisfied protection

🟢 Enforce conventional commit

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

…a file Generalize the run_with_trace pattern from docling-project#39 into a generic, typed AgentTrace that every agent produces, and let the orchestrator compose the traces of the sub-agents it dispatches to into a tree that can be exported to a single JSON file for debugging. - New AgentStep / AgentTrace value objects (agent_trace.py). AgentTrace carries ordered steps, nested children (sub-agent traces), timing, model id and the produced document on `output` (excluded from serialization; result_name is the persisted pointer). SerializeAsAny on children preserves subclass fields. - BaseDoclingAgent.run_with_trace(): concrete default so every agent exposes a trace (timing + model + result) with no bespoke code; agents override to add steps. run() stays the source of truth for the document. - RAGTrace becomes a subclass of AgentTrace, so docling-project#39 is preserved (same fields, same construction, covariant return) and a RAG run nests into the tree. DoclingRAGAgent builds the answer doc onto output; run() returns it. - DoclingOrchestratorAgent.run_task_with_trace() composes the tree by recording each dispatched sub-agent trace; run_task() is unchanged and incurs no overhead. - LoggingConfig.trace_path + CLI export. Public re-exports: AgentTrace, AgentStep. No global state, no logging coupling: the trace is a returned value object. Design doc: docs/design/37-export-session-trace.md. Closes docling-project#37 Signed-off-by: Pier-Jean Malandrino <pierjean.malandrino@scub.net>

pjmalandrino marked this pull request as draft June 26, 2026 09:12

pjmalandrino force-pushed the dev/export-session-trace branch from 7cb2ec2 to 518f071 Compare June 26, 2026 10:36

pjmalandrino changed the title ~~feat(trace): export a complete agentic session trace to a file~~ feat(trace): generalize the RAG trace to all agents and export it to a file Jun 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(trace): generalize the RAG trace to all agents and export it to a file#42

feat(trace): generalize the RAG trace to all agents and export it to a file#42
pjmalandrino wants to merge 1 commit into
docling-project:mainfrom
pjmalandrino:dev/export-session-trace

pjmalandrino commented Jun 26, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

mergify Bot commented Jun 26, 2026

🟢 Enforce conventional commit

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

pjmalandrino commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The logical path from #39

Why this shape (vs the first revision)

Scope / honesty

Tests

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

mergify Bot commented Jun 26, 2026

Merge Protections

🔴 Require two reviewer for test updates

🟢 Enforce conventional commit

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pjmalandrino commented Jun 26, 2026 •

edited

Loading