LCORE-1613: RAG tool calls loop indefinitely#1998
Conversation
|
Warning Review limit reached
More reviews will be available in 35 minutes and 34 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (6)
WalkthroughThe PR adds optional inference limit fields to ChangesResponses inference limits
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| api_params.max_infer_iters = configuration.inference.max_infer_iters | ||
| if "max_tool_calls" not in original_request.model_fields_set: | ||
| api_params.max_tool_calls = configuration.inference.max_tool_calls | ||
|
|
There was a problem hiding this comment.
It's just a nit, but could you please follow the same pattern as other parameter overrides do. Namely, override attributes of updated_request (if not explicitly set) above the model_validate command.
c16c5d3 to
de9d863
Compare
Description
When using small models with limited context windows (e.g., Llama 3.1 8B on vLLM), RAG tool calls can loop indefinitely. The model repeatedly searches for a specific piece of text, each iteration appending tokens to the context until the model's context window is exceeded, resulting in a 400 error:
Error code: 400 - This model's maximum context ength is 113920 tokens. However, your request has 116873 input tokens.The
max_infer_itersandmax_tool_callsfields already existed as optional per-request parameters on the Responses API, but both defaulted to None (no limit). There was no server-side mechanism for operators to set safety defaults, so unless a client explicitly set these values, nothing prevented the runaway loop.Changes
Added
max_infer_iters(default 10) andmax_tool_calls(default 30) as configurable fields onInferenceConfigurationinsrc/models/config.py. Operators can override these in their YAML config or set them to None to disable the limit. Per-request values from clients always take precedence.Applied config defaults in
prepare_responses_params()for the/v1/query,/v1/streaming_query, and A2A endpoints.Applied config defaults in
responses_endpoint_handler()for the/v1/responses endpoint, only when the client omits the values.Added 10 unit tests for the new config fields (default values, positive int acceptance, zero/negative rejection, None acceptance).
Configuration example
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
uv run pytest tests/unit/models/config/test_inference_configuration.py -vuv run pytest tests/unit/app/endpoints/test_responses.py -vuv run pytest tests/unit/utils/test_responses.py -vuv run make formatanduv run make verifypass (only pre-existing mypy errors in test_container_lifecycle.py).Test Regressions
8 existing tests in
tests/unit/utils/test_responses.pysetmock_config.inference = None, which causedAttributeErrorwhen the new code accessedconfiguration.inference.max_infer_iters. Updated these mocks to use a realInferenceConfiguration()instance.Summary by CodeRabbit
New Features
Bug Fixes
Tests