Skip to content

LCORE-1613: RAG tool calls loop indefinitely#1998

Open
arin-deloatch wants to merge 4 commits into
lightspeed-core:mainfrom
arin-deloatch:bug/LCORE-1613
Open

LCORE-1613: RAG tool calls loop indefinitely#1998
arin-deloatch wants to merge 4 commits into
lightspeed-core:mainfrom
arin-deloatch:bug/LCORE-1613

Conversation

@arin-deloatch

@arin-deloatch arin-deloatch commented Jun 25, 2026

Copy link
Copy Markdown

Description

When using small models with limited context windows (e.g., Llama 3.1 8B on vLLM), RAG tool calls can loop indefinitely. The model repeatedly searches for a specific piece of text, each iteration appending tokens to the context until the model's context window is exceeded, resulting in a 400 error:

Error code: 400 - This model's maximum context ength is 113920 tokens. However, your request has 116873 input tokens.

The max_infer_iters and max_tool_calls fields already existed as optional per-request parameters on the Responses API, but both defaulted to None (no limit). There was no server-side mechanism for operators to set safety defaults, so unless a client explicitly set these values, nothing prevented the runaway loop.

Changes

Added max_infer_iters (default 10) and max_tool_calls (default 30) as configurable fields on InferenceConfiguration in src/models/config.py. Operators can override these in their YAML config or set them to None to disable the limit. Per-request values from clients always take precedence.

Applied config defaults in prepare_responses_params() for the /v1/query, /v1/streaming_query, and A2A endpoints.

Applied config defaults in responses_endpoint_handler() for the /v1/responses endpoint, only when the client omits the values.

Added 10 unit tests for the new config fields (default values, positive int acceptance, zero/negative rejection, None acceptance).

Configuration example

  inference:

    default_model: "meta-llama/Llama-3.1-8B-Instruct"

    default_provider: "vllm"

    max_infer_iters: 10

    max_tool_calls: 30

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude Code (Claude Opus 4.6)

Related Tickets & Documents

  • Closes LCORE-1613

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  1. Run config tests: uv run pytest tests/unit/models/config/test_inference_configuration.py -v
  2. Run responses endpoint tests: uv run pytest tests/unit/app/endpoints/test_responses.py -v
  3. Run responses utils tests: uv run pytest tests/unit/utils/test_responses.py -v
  4. All 286 tests pass across the 3 affected test files.
  5. uv run make format and uv run make verify pass (only pre-existing mypy errors in test_container_lifecycle.py).

Test Regressions

8 existing tests in tests/unit/utils/test_responses.py set mock_config.inference = None, which caused AttributeError when the new code accessed configuration.inference.max_infer_iters. Updated these mocks to use a real InferenceConfiguration() instance.

Summary by CodeRabbit

  • New Features

    • Added configurable limits for inference loops and tool usage, with sensible default values when not set.
  • Bug Fixes

    • Requests now consistently apply server-side defaults before processing, improving reliability when limit values are omitted.
  • Tests

    • Expanded automated test coverage for the new limit settings, including default behavior and validation rules.

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@arin-deloatch, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 35 minutes and 34 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3ad605b3-fa87-418b-8735-073f6776d9f6

📥 Commits

Reviewing files that changed from the base of the PR and between c16c5d3 and de9d863.

📒 Files selected for processing (6)
  • docs/openapi.json
  • src/app/endpoints/responses.py
  • src/models/config.py
  • src/utils/responses.py
  • tests/unit/models/config/test_inference_configuration.py
  • tests/unit/utils/test_responses.py

Walkthrough

The PR adds optional inference limit fields to InferenceConfiguration, passes them into Responses request parameters, and applies server-side defaults in the responses endpoint when requests omit them.

Changes

Responses inference limits

Layer / File(s) Summary
InferenceConfiguration limits
src/models/config.py, tests/unit/models/config/test_inference_configuration.py
InferenceConfiguration gains optional max_infer_iters and max_tool_calls fields, and tests cover defaults, positive values, zero/negative rejection, and None handling.
Responses param defaults
src/utils/responses.py, tests/unit/utils/test_responses.py
prepare_responses_params adds max_infer_iters and max_tool_calls to ResponsesApiParams, and tests initialize mock_config.inference for the updated request-parameter paths.
Endpoint request defaults
src/app/endpoints/responses.py
The responses endpoint fills missing max_infer_iters and max_tool_calls on api_params from server configuration after request validation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • max-svistunov
  • tisnik
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title is specific and related to the core fix: preventing indefinite RAG tool-call loops.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@tisnik tisnik requested a review from asimurka June 26, 2026 11:29

@tisnik tisnik left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

api_params.max_infer_iters = configuration.inference.max_infer_iters
if "max_tool_calls" not in original_request.model_fields_set:
api_params.max_tool_calls = configuration.inference.max_tool_calls

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a nit, but could you please follow the same pattern as other parameter overrides do. Namely, override attributes of updated_request (if not explicitly set) above the model_validate command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants