app_queue.c: Index member device states to avoid scanning on every event by PujaGediya · Pull Request #1973 · asterisk/asterisk

PujaGediya · 2026-06-04T07:55:27Z

device_state_cb() iterated every queue and every member for each device
state message on the devicestate:all topic. Reloading a queue with
thousands of members floods that single subscription's taskprocessor
(stasis/m:devicestate:all) with the per-member pause/avail hints
app_queue publishes, tripping the 500 high water mark and raising the
global taskprocessor congestion alert.

Maintain a reference-counted index of the device-state identifiers that
queue members actually watch (via state_interface) and consult it at the
top of device_state_cb(). Device states no member watches are dropped in
O(1) instead of triggering an O(queues * members) scan, and the
Queue:..._avail hints the callback republishes no longer re-enter it.
Behavior for watched devices is unchanged.

Also fix a race in rt_handle_member_record(): when a realtime reload
changes a member's state_interface, start watching the new device before
storing it on the member (and before unwatching the old). Previously the
member was pointed at the new interface first and only then added to the
watcher set, leaving a brief window where m->state_interface referred to
a device not yet watched. Watching before publishing closes the window:
any device_state_cb() that passes the watch check then serializes on the
queue lock and observes the committed state_interface.

Resolves: #1972

UserNote: app_queue now handles device-state changes efficiently when
reloading queues with large member counts, avoiding a flood of the
stasis/m:devicestate:all taskprocessor past its high water mark.

Co-authored-by: Thomas 1258170+ThomasSevestre@users.noreply.github.com

sangoma-oss-cla · 2026-06-04T07:55:42Z

All committers have signed the CLA.

github-actions

Pull Request Checklist Complete

github-actions · 2026-06-04T08:20:11Z

Workflow Check failed
master-pjs5-check-506: FAILED TEST: channels/pjsip/subscriptions/rls/lists_of_lists/nominal/mwi/batched

jcolp · 2026-06-04T08:31:44Z

Additionally, if AI was used in the creation of patches or issues its usage must be disclosed.

device_state_cb() iterated every queue and every member for each device state message on the devicestate:all topic. Reloading a queue with thousands of members floods that single subscription's taskprocessor (stasis/m:devicestate:all) with the per-member pause/avail hints app_queue publishes, tripping the 500 high water mark and raising the global taskprocessor congestion alert. Maintain a reference-counted index of the device-state identifiers that queue members actually watch (via state_interface) and consult it at the top of device_state_cb(). Device states no member watches are dropped in O(1) instead of triggering an O(queues * members) scan, and the Queue:..._avail hints the callback republishes no longer re-enter it. Behavior for watched devices is unchanged. Also fix a race in rt_handle_member_record(): when a realtime reload changes a member's state_interface, start watching the new device before storing it on the member (and before unwatching the old). Previously the member was pointed at the new interface first and only then added to the watcher set, leaving a brief window where m->state_interface referred to a device not yet watched. Watching before publishing closes the window: any device_state_cb() that passes the watch check then serializes on the queue lock and observes the committed state_interface. Resolves: asterisk#1972 UserNote: app_queue now handles device-state changes efficiently when reloading queues with large member counts, avoiding a flood of the stasis/m:devicestate:all taskprocessor past its high water mark. Co-authored-by: Thomas <1258170+ThomasSevestre@users.noreply.github.com>

PujaGediya · 2026-06-04T08:41:45Z

cherry-pick-to: 20

PujaGediya · 2026-06-04T08:57:52Z

AI usage disclosure

This contribution was prepared with AI assistance (Claude Code / Anthropic Claude).
AI was used to:

investigate the root cause in device_state_cb() and the device-state stasis path, and confirm the taskprocessor high-water behavior;
draft the code change in apps/app_queue.c (the reference-counted device_state_watchers index and the device_state_cb fast-path early return);

I have reviewed the change in full and understand it, and I can debug and own it.
It was built against [master / 20.19] and tested on my test server a reload of a queue with
5000 members no longer pushes the stasis/m:devicestate:all taskprocessor past
its high water mark.

github-actions · 2026-06-04T09:06:01Z

Workflow Check completed successfully

Pull Request Checklist Complete

jcolp · 2026-06-18T16:31:07Z

Instead of this approach, I wonder if instead app_queue should subscribe each queue member using a pool subscription to get the respective device or hint state. This would eliminate the processing of updates that app_queue is not interested in, and spread out updates across the pool instead of via a single thread.

PujaGediya · 2026-06-23T09:19:34Z

Thanks for the suggestion. I want to make sure I understand the proposed approach correctly before pivoting.

Current state: the flood is caused by the reload publishing ~5000 Queue:%s_pause_%s
cachable hints onto devicestate:all synchronously. With the index, each of those events
hits an O(1) lookup and returns; no member watches those hint identifiers so they're
discarded immediately.

For the per-member pool subscription approach, a few questions:

Overhead at scale: 5000 members would mean 5000 stasis_subscribe_pool() calls
creating 5000 ao2 objects and 5000 taskprocessors. Is that considered acceptable? The
flip side is that Queue:..._pause_... hints would naturally not match any member's
subscription topic, so the flood would be invisible at the stasis routing layer.
Reload churn: On a reload, would we unsubscribe all existing members and
resubscribe? The reload path that originally caused the flood would instead do
5000 unsubscribe + 5000 subscribe operations — each of those involves taskprocessor
teardown/create. Is there a lighter-weight way to hand off the subscription across
a member replacement (e.g. reuse the subscription if state_interface doesn't change)?
Hint vs device topic: Members can have a state_interface like hint:Hint/1001 —
should those subscribe to ast_hint_topic() and SIP members to ast_device_state_topic()?

Happy to rewrite to the suggestion if it's preferred. I just want to understand the expected
behaviour for the reload case before re-rolling.

jcolp · 2026-06-23T09:29:18Z

I don't know the impact at that scale, it has to be tested out. The most heavy part would be if each taskprocessor had a thread, which using a pool it won't. I think the actual cost would be only a little more than the hash table approach.
I'm not that familiar with app_queue reload so I can't comment on that, but there's nothing from a stasis perspective that prohibits keeping the subscription if it's unchanged.
I would leave hints untouched for now until the impact of the change is understood.

github-actions Bot added pr-submit-testing-in-progress pr-submit-tests-failed and removed pr-submit-testing-in-progress labels Jun 4, 2026

github-actions Bot previously requested changes Jun 4, 2026

View reviewed changes

github-actions Bot added the has-pr-checklist A PR Checklist is present on the PR label Jun 4, 2026

PujaGediya force-pushed the app_queue-devicestate-high-water-mark-issue-1972 branch from 0db05ad to f71e605 Compare June 4, 2026 08:41

github-actions Bot added pr-submit-testing-in-progress and removed pr-submit-tests-failed labels Jun 4, 2026

jcolp added the ai-usage label Jun 4, 2026

github-actions Bot added pr-submit-tests-passed and removed pr-submit-testing-in-progress labels Jun 4, 2026

seifzadeh mentioned this pull request Jun 4, 2026

app_queue: Avoid full scans for irrelevant device states #1974

Closed

gtjoseph changed the title ~~app queue devicestate high water mark issue 1972~~ app_queue.c: Index member device states to avoid scanning on every event Jun 9, 2026

gtjoseph removed the has-pr-checklist A PR Checklist is present on the PR label Jun 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

app_queue.c: Index member device states to avoid scanning on every event#1973

app_queue.c: Index member device states to avoid scanning on every event#1973
PujaGediya wants to merge 1 commit into
asterisk:masterfrom
PujaGediya:app_queue-devicestate-high-water-mark-issue-1972

PujaGediya commented Jun 4, 2026 •

edited by gtjoseph

Loading

Uh oh!

sangoma-oss-cla Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot left a comment •

edited by gtjoseph

Loading

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

jcolp commented Jun 4, 2026

Uh oh!

PujaGediya commented Jun 4, 2026

Uh oh!

PujaGediya commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

jcolp commented Jun 18, 2026

Uh oh!

PujaGediya commented Jun 23, 2026

Uh oh!

jcolp commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

PujaGediya commented Jun 4, 2026 • edited by gtjoseph Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sangoma-oss-cla Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot left a comment • edited by gtjoseph Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

jcolp commented Jun 4, 2026

Uh oh!

PujaGediya commented Jun 4, 2026

Uh oh!

PujaGediya commented Jun 4, 2026

AI usage disclosure

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

jcolp commented Jun 18, 2026

Uh oh!

PujaGediya commented Jun 23, 2026

Uh oh!

jcolp commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PujaGediya commented Jun 4, 2026 •

edited by gtjoseph

Loading

sangoma-oss-cla Bot commented Jun 4, 2026 •

edited

Loading

github-actions Bot left a comment •

edited by gtjoseph

Loading