fix: 修复音频 base64 被输出到控制台日志的问题#8748
Conversation
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- Consider precompiling the data URL regex at module scope (e.g.,
_DATA_URL_RE = re.compile(...)) and reusing it in_redact_data_url_for_logto avoid recompiling the pattern on every call in hot paths. - It may be safer for
_redact_data_url_for_logto acceptAnyand early-return non-string values (or explicitly cast tostr), so that it can be reused more broadly in logging without risking type errors.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider precompiling the data URL regex at module scope (e.g., `_DATA_URL_RE = re.compile(...)`) and reusing it in `_redact_data_url_for_log` to avoid recompiling the pattern on every call in hot paths.
- It may be safer for `_redact_data_url_for_log` to accept `Any` and early-return non-string values (or explicitly cast to `str`), so that it can be reused more broadly in logging without risking type errors.
## Individual Comments
### Comment 1
<location path="astrbot/core/provider/sources/openai_source.py" line_range="85-86" />
<code_context>
return None
+ @staticmethod
+ def _redact_data_url_for_log(value: str) -> str:
+ match = re.match(r"^(data:[^;,]+;base64,)(.*)$", value, flags=re.IGNORECASE)
+ if not match:
+ return value
</code_context>
<issue_to_address>
**issue (bug_risk):** The data URL regex may miss valid data URLs that have additional parameters before `;base64`.
The pattern `r"^(data:[^;,]+;base64,)(.*)$"` only matches when `;base64` directly follows the media type. Valid data URLs like `data:audio/wav;codec=opus;rate=48000;base64,...` won’t be redacted and will be logged in full. Consider a pattern such as `r"^(data:[^,]*;base64,)(.*)$"` (or a variant allowing multiple `;key=value` segments) so base64 payloads in data URLs with extra parameters are still redacted.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request introduces a helper method _redact_data_url_for_log to redact base64-encoded data URLs in warning logs when audio preprocessing fails. The reviewer identified an issue where the regular expression used for matching data URLs may fail if the URL contains additional parameters (such as charset) and suggested a more robust regex along with type checking to prevent potential runtime errors.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| @staticmethod | ||
| def _redact_data_url_for_log(value: str) -> str: | ||
| match = re.match(r"^(data:[^;,]+;base64,)(.*)$", value, flags=re.IGNORECASE) | ||
| if not match: | ||
| return value | ||
| prefix, payload = match.groups() | ||
| return f"{prefix}<redacted {len(payload)} chars>" |
There was a problem hiding this comment.
这里的正则表达式 r"^(data:[^;,]+;base64,)(.*)$" 在处理带有额外参数的 Data URL 时会匹配失败。
根据 RFC 2397,Data URL 的 MIME 类型部分可以包含其他参数(例如 data:text/plain;charset=utf-8;base64,YQ==)。因为 [^;,]+ 会在遇到第一个分号 ; 时停止匹配,导致无法匹配到后面的 ;base64,,从而使脱敏失效,完整的 base64 仍会被输出到日志中。
建议将正则表达式修改为 r"^(data:.*?;base64,)(.*)$",这样可以安全且非贪婪地匹配到 ;base64, 之前的所有参数。此外,为了防止传入非字符串类型导致 re.match 抛出 TypeError,建议增加类型检查。
| @staticmethod | |
| def _redact_data_url_for_log(value: str) -> str: | |
| match = re.match(r"^(data:[^;,]+;base64,)(.*)$", value, flags=re.IGNORECASE) | |
| if not match: | |
| return value | |
| prefix, payload = match.groups() | |
| return f"{prefix}<redacted {len(payload)} chars>" | |
| @staticmethod | |
| def _redact_data_url_for_log(value: Any) -> str: | |
| if not isinstance(value, str): | |
| return str(value) | |
| match = re.match(r"^(data:.*?;base64,)(.*)$", value, flags=re.IGNORECASE) | |
| if not match: | |
| return value | |
| prefix, payload = match.groups() | |
| return f"{prefix}<redacted {len(payload)} chars>" |
|
Fixes #8676 |
Motivation / 动机
When audio preprocessing fails in the OpenAI-compatible provider, the warning log currently prints the original
audio_refdirectly.If the audio reference is a
data:audio/...;base64,...URL, the full base64 payload is written to the console log. This can flood logs and may expose sensitive audio data.This PR redacts base64 payloads in data URLs before logging them.
Modifications / 改动点
Modified
astrbot/core/provider/sources/openai_source.py.Added
_redact_data_url_for_log()to redactdata:*;base64,...payloads in log output.Applied the redaction helper to the audio preprocessing failure warning.
Kept existing audio preprocessing and STT behavior unchanged.
This is NOT a breaking change. / 这不是一个破坏性变更。
Screenshots or Test Results / 运行截图或测试结果
Verification command:
Result:
Before this change, logs could include full audio data URLs:
After this change, the base64 payload is redacted:
Checklist / 检查清单
😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Bug Fixes: