LCORE-1613: RAG tool calls loop indefinitely by arin-deloatch · Pull Request #1998 · lightspeed-core/lightspeed-stack

arin-deloatch · 2026-06-25T18:13:48Z

Description

When using small models with limited context windows (e.g., Llama 3.1 8B on vLLM), RAG tool calls can loop indefinitely. The model repeatedly searches for a specific piece of text, each iteration appending tokens to the context until the model's context window is exceeded, resulting in a 400 error:

Error code: 400 - This model's maximum context ength is 113920 tokens. However, your request has 116873 input tokens.

The max_infer_iters and max_tool_calls fields already existed as optional per-request parameters on the Responses API, but both defaulted to None (no limit). There was no server-side mechanism for operators to set safety defaults, so unless a client explicitly set these values, nothing prevented the runaway loop.

Changes

Added max_infer_iters (default 10) and max_tool_calls (default 30) as configurable fields on InferenceConfiguration in src/models/config.py. Operators can override these in their YAML config or set them to None to disable the limit. Per-request values from clients always take precedence.

Applied config defaults in prepare_responses_params() for the /v1/query, /v1/streaming_query, and A2A endpoints.

Applied config defaults in responses_endpoint_handler() for the /v1/responses endpoint, only when the client omits the values.

Added 10 unit tests for the new config fields (default values, positive int acceptance, zero/negative rejection, None acceptance).

Configuration example

  inference:

    default_model: "meta-llama/Llama-3.1-8B-Instruct"

    default_provider: "vllm"

    max_infer_iters: 10

    max_tool_calls: 30

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Claude Code (Claude Opus 4.6)

Related Tickets & Documents

Closes LCORE-1613

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Run config tests: uv run pytest tests/unit/models/config/test_inference_configuration.py -v
Run responses endpoint tests: uv run pytest tests/unit/app/endpoints/test_responses.py -v
Run responses utils tests: uv run pytest tests/unit/utils/test_responses.py -v
All 286 tests pass across the 3 affected test files.
uv run make format and uv run make verify pass (only pre-existing mypy errors in test_container_lifecycle.py).

Test Regressions

8 existing tests in tests/unit/utils/test_responses.py set mock_config.inference = None, which caused AttributeError when the new code accessed configuration.inference.max_infer_iters. Updated these mocks to use a real InferenceConfiguration() instance.

Summary by CodeRabbit

New Features
- Added configurable limits for inference loops and tool usage, with sensible default values when not set.
Bug Fixes
- Requests now consistently apply server-side defaults before processing, improving reliability when limit values are omitted.
Tests
- Expanded automated test coverage for the new limit settings, including default behavior and validation rules.

coderabbitai · 2026-06-25T18:14:03Z

Warning

Review limit reached

@arin-deloatch, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 35 minutes and 34 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3ad605b3-fa87-418b-8735-073f6776d9f6

📥 Commits

Reviewing files that changed from the base of the PR and between c16c5d3 and de9d863.

📒 Files selected for processing (6)

docs/openapi.json
src/app/endpoints/responses.py
src/models/config.py
src/utils/responses.py
tests/unit/models/config/test_inference_configuration.py
tests/unit/utils/test_responses.py

Walkthrough

The PR adds optional inference limit fields to InferenceConfiguration, passes them into Responses request parameters, and applies server-side defaults in the responses endpoint when requests omit them.

Changes

Responses inference limits

Layer / File(s)	Summary
InferenceConfiguration limits `src/models/config.py`, `tests/unit/models/config/test_inference_configuration.py`	`InferenceConfiguration` gains optional `max_infer_iters` and `max_tool_calls` fields, and tests cover defaults, positive values, zero/negative rejection, and `None` handling.
Responses param defaults `src/utils/responses.py`, `tests/unit/utils/test_responses.py`	`prepare_responses_params` adds `max_infer_iters` and `max_tool_calls` to `ResponsesApiParams`, and tests initialize `mock_config.inference` for the updated request-parameter paths.
Endpoint request defaults `src/app/endpoints/responses.py`	The responses endpoint fills missing `max_infer_iters` and `max_tool_calls` on `api_params` from server configuration after request validation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

max-svistunov
tisnik

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title is specific and related to the core fix: preventing indefinite RAG tool-call loops.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

tisnik

LGTM

asimurka · 2026-06-26T11:37:09Z

+        api_params.max_infer_iters = configuration.inference.max_infer_iters
+    if "max_tool_calls" not in original_request.model_fields_set:
+        api_params.max_tool_calls = configuration.inference.max_tool_calls
+


It's just a nit, but could you please follow the same pattern as other parameter overrides do. Namely, override attributes of updated_request (if not explicitly set) above the model_validate command.

…uration

tisnik requested a review from asimurka June 26, 2026 11:29

tisnik approved these changes Jun 26, 2026

View reviewed changes

asimurka requested changes Jun 26, 2026

View reviewed changes

arin-deloatch added 4 commits June 26, 2026 10:39

LCORE-1613: Add max_infer_iters and max_tool_calls to InferenceConfig…

96771a8

…uration

LCORE-1613: Apply inference defaults in query and responses endpoints

2842b6b

LCORE-1613: Fix test mocks broken by inference defaults

59c9c0a

LCORE-1613: generate OpenAPI spec; address code pattern nit

de9d863

arin-deloatch force-pushed the bug/LCORE-1613 branch from c16c5d3 to de9d863 Compare June 26, 2026 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LCORE-1613: RAG tool calls loop indefinitely#1998

LCORE-1613: RAG tool calls loop indefinitely#1998
arin-deloatch wants to merge 4 commits into
lightspeed-core:mainfrom
arin-deloatch:bug/LCORE-1613

arin-deloatch commented Jun 25, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

Review limit reached

Uh oh!

tisnik left a comment

Uh oh!

asimurka Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

arin-deloatch commented Jun 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Configuration example

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Test Regressions

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

asimurka Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

arin-deloatch commented Jun 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading