Fix three reliability issues: request hangs, infinite retries, malformed PDFs#1146
Open
chen3feng wants to merge 3 commits into
Open
Fix three reliability issues: request hangs, infinite retries, malformed PDFs#1146chen3feng wants to merge 3 commits into
chen3feng wants to merge 3 commits into
Conversation
The OpenAI client was created without a timeout, so a single request that silently hangs (network stall / unresponsive endpoint) blocks the whole document indefinitely. Add a timeout (default 120s, override via PDF2ZH_OPENAI_TIMEOUT) plus max_retries so a stalled request fails fast and is retried instead of hanging forever. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
worker() was decorated with @Retry(wait=wait_fixed(1)) and no stop condition, so a permanently failing paragraph retried forever and the whole run got stuck. Cap attempts (default 5, override via PDF2ZH_TRANSLATE_RETRIES) and fall back to the source text on exhaustion so the document still completes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Some structurally broken PDFs make MuPDF raise during the initial doc_en.save(), aborting before translation starts. Catch that failure and repair the input with pikepdf (already a dependency) before retrying. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three independent robustness fixes found while translating real-world PDFs with the
deepseek/openaiservices. Each one could make a run hang forever or crash outright. All three are validated end-to-end (a 114-page and a 47-page paper now translate cleanly; the 47-page one previously hung 100% of the time on its last page).Changes
1.
translator.py— add a request timeout to the OpenAI clientopenai.OpenAI(...)was created without atimeout, so a single request that silently stalls (network blip / unresponsive endpoint) blocks the entire document indefinitely (0% CPU, no error, no progress). Addedtimeout(default120s, override viaPDF2ZH_OPENAI_TIMEOUT) andmax_retries=2, so a stalled request fails fast and is retried instead of hanging forever.2.
converter.py— bound paragraph translation retriesworker()used@retry(wait=wait_fixed(1))with no stop condition, so a permanently-failing paragraph retried every second forever and the whole run got stuck. Addedstop_after_attempt(default5, override viaPDF2ZH_TRANSLATE_RETRIES) and aretry_error_callbackthat falls back to the source text on exhaustion, so the document still completes instead of stalling.3.
high_level.py— auto-repair malformed input PDFsSome structurally broken PDFs make MuPDF raise during the initial
doc_en.save()(cannot parse object (N 0 R)), aborting before translation even starts. Caught that failure and repaired the input withpikepdf(already a dependency, already imported in this module) before retrying.Notes
pikepdfwas already required).🤖 Generated with Claude Code