fix(agents): dedupe ReAct draft text per-round while preserving live streaming (#421) by MarioCadenas · Pull Request #446 · databricks/appkit

MarioCadenas · 2026-06-12T16:23:04Z

Problem

With OpenAI-compatible Claude on Databricks Model Serving, the model emits a full draft answer ALONGSIDE its tool calls on every ReAct round, so the answer appeared 3-4x in the output. The earlier approach (branch fix/421-adapter-text-dup) buffered text and flushed it only on the terminal turn — but that made the final answer arrive all-at-once instead of streaming, which is unacceptable. This PR closes/supersedes that approach and references issue #421.

Root cause

The duplication was never in the adapter or the wire protocol. AgentEventTranslator already closes the current message item (response.output_item.done) when a tool_call/tool_result arrives and opens a NEW message item for the next round's text — so each round's text is already a distinct Responses-API output item with its own output_index/item_id. The duplication came from two consumers that concatenated deltas across those item boundaries.

The three-layer fix (respect the item boundaries)

Adapter (databricks.ts) streams every round's text live (already matched main — no change needed).
consumeAdapterStream keeps only the terminal round: a draft followed by a tool_call/tool_result is set aside as superseded, and the message open at end-of-stream (or the last draft if maxSteps is exhausted mid-tool-calling) is returned. This centrally dedupes thread history, the non-streaming JSON fullContent, runAgent, and sub-agents.
useAgentChat reduces the wire events into an ordered, per-output-item list while streaming live. content now tracks only the LAST message item (the terminal answer), so flat-API consumers that only read content dedupe too.

New public hook API

export type AgentTurnItem =
  | { kind: "message"; id: string; text: string; status: "in_progress" | "completed" }
  | { kind: "tool_call"; id: string; callId: string; name: string; args: unknown; status: "in_progress" | "completed" }
  | { kind: "tool_result"; id: string; callId: string; output: unknown; error?: string };

UseAgentChatResult gains items: AgentTurnItem[] (ordered by output_index). The dev-playground agent route uses it to render a collapsible "Steps" disclosure (intermediate drafts + tool-call/result chips) above the prominent, live-streaming answer.

Tests

consume-adapter-stream.test.ts: multi-round draft+tool_call returns the terminal text once; maxSteps-exhaustion returns the last draft; LangChain single message still replaces; mixed.
use-agent-chat.test.ts: two message items separated by a function_call → items is [message, tool_call, tool_result, message], content === the LAST message text (not concatenated), deltas stream into the right item live.

All appkit (2147) and appkit-ui (319) tests pass; build, typecheck, biome, and knip clean.

This pull request and its description were written by Isaac.

…streaming (#421) With OpenAI-compatible Claude on Databricks Model Serving, the model emits a full draft answer ALONGSIDE its tool calls on every ReAct round, so the answer appeared 3-4x in the output. An earlier approach buffered text and flushed it only on the terminal turn, but that made the final answer arrive all-at-once instead of streaming. Root cause: the duplication was never in the adapter or wire protocol. The AgentEventTranslator already closes the current message item and opens a new one when a tool_call/tool_result arrives, so each round's text is a distinct Responses-API output item. Two CONSUMERS flattened those items by concatenating deltas across item boundaries. Three-layer fix that respects the existing item boundaries: - The Databricks adapter streams every round's text live again (this already matched main; no change needed). - consumeAdapterStream keeps only the terminal round: a draft followed by a tool_call is set aside as superseded, and the open message at end-of-stream (or the last draft if maxSteps exhausted mid-tool-calling) is returned. This centrally dedupes thread history, the non-streaming JSON fullContent, runAgent, and sub-agents. - use-agent-chat reduces the wire events into an ordered, per-output-item AgentTurnItem list while streaming live; `content` now tracks only the LAST message item (the terminal answer), so flat-API consumers dedupe too. New public hook API: the AgentTurnItem discriminated union (message / tool_call / tool_result) plus `items` on UseAgentChatResult. The dev-playground agent route uses it to render a collapsible "Steps" disclosure (intermediate drafts + tool-call/result chips) above the prominent, live-streaming answer. Supersedes the buffer-until-terminal approach in branch fix/421-adapter-text-dup. Co-authored-by: Isaac Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

MarioCadenas requested a review from a team as a code owner June 12, 2026 16:23

MarioCadenas requested a review from ditadi June 12, 2026 16:23

MarioCadenas mentioned this pull request Jun 12, 2026

fix(agents): stop DatabricksAdapter duplicating answer text across tool calls (#421) #436

Closed

MarioCadenas marked this pull request as draft June 12, 2026 16:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agents): dedupe ReAct draft text per-round while preserving live streaming (#421)#446

fix(agents): dedupe ReAct draft text per-round while preserving live streaming (#421)#446
MarioCadenas wants to merge 1 commit into
mainfrom
fix/421-agent-stream-items

MarioCadenas commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MarioCadenas commented Jun 12, 2026

Problem

Root cause

The three-layer fix (respect the item boundaries)

New public hook API

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant