Pace bulk refresh against the GitHub API quota by sctice-ifixit · Pull Request #473 · iFixit/pulldasher

sctice-ifixit · 2026-06-23T18:18:25Z

Connects #467

Why

Stacked on #472 (review/merge that first). The bulk refreshes (bin/refresh-* and the startup open-pulls refresh) share one GitHub token, and one hourly quota, with the live webhook/socket refreshes that keep Pulldasher current. A backfill is a tight loop, so it drains the quota to zero and starves normal operation, taking Pulldasher down. Pulldasher already had @octokit/plugin-throttling, and #472 added @octokit/plugin-retry, but both only react at exhaustion (429/5xx), which is too late. This adds a proactive pacer that reserves quota headroom for live traffic and spreads bulk work across the rate-limit reset window.

What changed

New lib/pacer.js, a process-wide pacer. observe(headers) records x-ratelimit-remaining/-reset/-used from every response (live calls included, so its view stays current), fed by the after/error hooks in git-manager.js. gate() waits before a bulk request. The pure computePace even-spreads over the reset window (interval = (reset - now) / (remaining - reserve)) and hard-pauses below a configurable reserve floor (config.github.bulkReserve, default 1000, 20% of the 5000/hr token). The token-bucket cursor advances by the x-ratelimit-used delta between gates, so a pull's multi-call fan-out (and any live spike) pushes bulk's next slot out. Pacing tracks real consumption, and bulk yields automatically.
Gating sits off the shared work queue: at enqueue (pushAllOnQueue) and between fetch pages (pacedPaginate in git-manager.js), never in the NotifyQueue consumer. A paged or floor-paused backfill can't block the single consumer that webhook refreshes also use, so live traffic keeps draining throughout. The single-item path (pushOnQueue, used by webhook/socket refreshes) is unchanged and never paces.
config.example.js documents bulkReserve.

I gated at enqueue rather than in the queue's pop handler on purpose. A gate() wait in the pop handler blocks the single shared NotifyQueue consumer, so every webhook item queued behind a paused bulk item stalls too (up to the roughly 1h floor-pause). The one queue exists (commit 041e000, 2016) to serialize refreshes and to never refresh the same issue twice concurrently (racing DB writes). A separate bulk queue would reintroduce that hazard, so enqueue-side gating keeps both guarantees and keeps the wait off the queue.

Tests

5 unit tests for the pure computePace (node --test): unknown quota allows, a stale reset window allows, at/below the reserve pauses to reset, the first gate records the usage baseline without delaying, and the cursor advances by the requests actually spent since the last gate (a 1-call gate vs an 8-call gate).

Validation (QA)

Deployed to pulldasher-dev. The startup open-pulls refresh (456 pulls) paced and spread across the reset window (pulldasher:pacer logs), peaking around a 22s inter-slot delay during the initial enqueue burst and then settling. remaining held between 4913 and 4930 throughout, never near the 1000 reserve, with 0 floor pauses. Live traffic kept flowing alongside the bulk work (581 getPull/webhook events in a 2-minute window), 0 errors, restarts=0. The prod-style token reads a 5000/hr ceiling.

The refresh-* backfills and the startup open-pulls refresh share one GitHub token — and one hourly quota — with the live webhook/socket refreshes that keep Pulldasher current. A bulk run is a tight loop, so it drains the quota to zero and starves normal operation, taking Pulldasher down. The throttle plugin only reacts once the quota is already exhausted, which is too late to protect live traffic. Add a proactive pacer (lib/pacer.js). It observes x-ratelimit-* on every response (live calls included, so its view stays fresh) and gates only the bulk path: requests are spread across the rate-limit reset window, and bulk pauses entirely below a configurable reserve floor (config.github.bulkReserve, default 1000). The token-bucket cursor advances by the x-ratelimit-used delta between gates, so a pull's multi-call fan-out — and any live-traffic spike — pushes bulk's next slot out; bulk yields automatically on both signals. Gate at enqueue (pushAllOnQueue) and between fetch pages (pacedPaginate), never inside the shared queue consumer. The wait happens off the queue, so a paced or floor-paused backfill can't park the single consumer that webhook refreshes also depend on — live traffic keeps draining throughout. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

djmetzle

Cool. CR ⏱️

Picks up master (merged into the base branch) under the stacked pacing work. Clean auto-merge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ianrohde

CR ⚪

The base branch was changed.

sctice-ifixit · 2026-06-25T19:34:15Z

QA 🌷 per the description.

djmetzle

CR 🏃

danielbeardsley · 2026-06-25T21:09:11Z

+ *
+ * This is the bulk path, so it paces enqueueing against the GitHub quota:
+ * `pacer.gate()` waits for each item's slot *before* pushing it, keeping the
+ * shared queue shallow. The wait happens here, off the queue — so a paced or
+ * floor-paused backfill never blocks the single consumer, and live
+ * webhook/socket refreshes keep draining promptly.


I don't understand..

A backfill is going to happen from the CLI, not the running app. i.e. they are in separate processes. They share the API key of course, but not the queue.

Is the idea that pushOnQueue() is given higher priority than pushAllOnQueue()?

A backfill is going to happen from the CLI, not the running app. i.e. they are in separate processes.

True except for refresh.openPulls(), which gets called by the running app on startup. It pushes to the same queue as web hooks and uses pushAllOnQueue() to do its work.

Is the idea that pushOnQueue() is given higher priority than pushAllOnQueue()?

Via a crude mechanism that sits in front of adding to the queue. pushAllOnQueue() waits for the rate limit before adding to the queue, so its waiting happens async independent of the queue processing, which itself happens async of the queue pushes. While it's waiting, it isn't adding things to the queue while web hooks are free to add to the queue during that time. If the bulk producer coroutine isn't rate-limited at all, it will add many items to the queue, the processing of which will be in direct competition with the web hook items.

danielbeardsley · 2026-06-25T21:16:40Z

I had a feeling that I remember doing this before.. or something very similar (with a quota buffer). Hmm... Looks like we deleted it when we upgraded the octokit api: #211

https://github.com/iFixit/pulldasher/blob/811a7a222b38c43e2a85009f4c282141aba4713e/lib/rate-limit.js

Anyway, the octokit plugin supports throttling across instances but it requires a common storage medium (redis or similar) to coordinate.

sctice-ifixit · 2026-06-25T22:00:19Z

Anyway, the octokit plugin supports throttling across instances but it requires a common storage medium (redis or similar) to coordinate.

I believe this pull's approach should work across instances using the headers we get back from GH as the coordination mechanism.

danielbeardsley · 2026-06-25T23:10:35Z

+  return async function (githubResponses) {
+    for (const githubResponse of githubResponses) {
+      await pacer.gate();
+      queue.push({ response: githubResponse, onFailure: onFailure });


It still feels quite weird to be rate-limiting pushing things onto the queue.

i.e. pulling from the queue and executing the api call is what updates the knowledge of the world (the headers) and pushing to the queue is about future commitment.

I haven't hard time imagining if it work work out to have a queue that you're throttling additions to, but draining the queue is actually what gives you the feedback of allowable rates.

Maybe instead we throttle the actual thing that needs throttling to, but have different bulkReserve depending on the source of the action (smaller reserve for things triggered by the web-app and big reserve for cli-triggered actions). The refreshOpenPulls on load doesn't need special treatment as it lasts less than a minute and only happens on startup.

djmetzle previously approved these changes Jun 23, 2026

View reviewed changes

Merge backfill-transient-error-tolerance into backfill-quota-pacing

751c600

Picks up master (merged into the base branch) under the stacked pacing work. Clean auto-merge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ianrohde previously approved these changes Jun 23, 2026

View reviewed changes

Base automatically changed from backfill-transient-error-tolerance to master June 25, 2026 16:29

ianrohde approved these changes Jun 25, 2026

View reviewed changes

djmetzle approved these changes Jun 25, 2026

View reviewed changes

danielbeardsley reviewed Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pace bulk refresh against the GitHub API quota#473

Pace bulk refresh against the GitHub API quota#473
sctice-ifixit wants to merge 2 commits into
masterfrom
backfill-quota-pacing

sctice-ifixit commented Jun 23, 2026 •

edited

Loading

Uh oh!

djmetzle left a comment

Uh oh!

ianrohde left a comment

Uh oh!

sctice-ifixit commented Jun 25, 2026

Uh oh!

djmetzle left a comment

Uh oh!

danielbeardsley Jun 25, 2026

Uh oh!

danielbeardsley Jun 25, 2026

Uh oh!

sctice-ifixit Jun 25, 2026

Uh oh!

danielbeardsley commented Jun 25, 2026

Uh oh!

sctice-ifixit commented Jun 25, 2026

Uh oh!

danielbeardsley Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

sctice-ifixit commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What changed

Tests

Validation (QA)

Uh oh!

djmetzle left a comment

Choose a reason for hiding this comment

Uh oh!

ianrohde left a comment

Choose a reason for hiding this comment

Uh oh!

sctice-ifixit commented Jun 25, 2026

Uh oh!

djmetzle left a comment

Choose a reason for hiding this comment

Uh oh!

danielbeardsley Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

danielbeardsley Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

sctice-ifixit Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

danielbeardsley commented Jun 25, 2026

Uh oh!

sctice-ifixit commented Jun 25, 2026

Uh oh!

danielbeardsley Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sctice-ifixit commented Jun 23, 2026 •

edited

Loading