fix: truncate base64 data in Image.__repr__ to avoid log flooding#8760
fix: truncate base64 data in Image.__repr__ to avoid log flooding#8760ScarletPupil wants to merge 3 commits into
Conversation
Include Reply component content in _format_message so the LLM can see what message was quoted when someone replies to the bot. - Add Reply import - Handle Reply in _format_message with message_str or chain fallback - Add _describe_chain helper for non-text quoted content (images, etc.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- Consider extracting the base64 truncation length (currently 60/57) into a named constant shared by
Image.__repr__so the behavior is clear and easier to adjust consistently. - In
_describe_chain, you may want to apply an overall length cap similar to_truncate_reply_textto avoid very long quoted-chain descriptions when manyPlaincomponents or large texts are present.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider extracting the base64 truncation length (currently 60/57) into a named constant shared by `Image.__repr__` so the behavior is clear and easier to adjust consistently.
- In `_describe_chain`, you may want to apply an overall length cap similar to `_truncate_reply_text` to avoid very long quoted-chain descriptions when many `Plain` components or large texts are present.
## Individual Comments
### Comment 1
<location path="astrbot/builtin_stars/astrbot/group_chat_context.py" line_range="236-245" />
<code_context>
+def _describe_chain(chain: list) -> str:
</code_context>
<issue_to_address>
**suggestion (performance):** Consider truncating or otherwise limiting chain descriptions to avoid extremely long quoted content.
This function currently concatenates all `Plain` texts in the chain with no length cap, so large or forwarded chains can produce very long quote summaries that bloat logs or exceed downstream limits. Consider reusing `_truncate_reply_text` on the final description or enforcing a maximum length in `_describe_chain` (e.g., stop appending once a threshold is reached).
</issue_to_address>
### Comment 2
<location path="astrbot/core/message/components.py" line_range="456-461" />
<code_context>
def __init__(self, file: str | None, **_) -> None:
super().__init__(file=file, **_)
+ def __repr__(self) -> str:
+ """截断 base64 数据,避免日志中打印完整的图片数据。"""
+ f = self.file or ""
+ if f.startswith("base64://") and len(f) > 60:
+ f = f[:57] + "..."
+ return f"Image(file={f!r})"
+
@staticmethod
</code_context>
<issue_to_address>
**suggestion:** The custom Image.__repr__ hides other potentially useful attributes passed via **_.
Focusing `repr` on `file` avoids large base64 dumps, but it also hides any other fields passed via `**_` that may be important for debugging (IDs, URLs, metadata, etc.). Consider including a minimal set of safe, non-sensitive fields—or at least signaling that additional attributes are present—so `repr(Image(...))` remains informative without exposing large or sensitive data.
```suggestion
def __repr__(self) -> str:
"""截断 base64 数据,避免日志中打印完整的图片数据,同时在不泄露大字段的前提下提供额外调试信息。"""
# 处理 file,避免在日志中打印完整的 base64 图片数据
f = getattr(self, "file", None) or ""
if isinstance(f, str) and f.startswith("base64://") and len(f) > 60:
f = f[:57] + "..."
# 从实例属性中提取除 file 以外的字段,尽量提供一些调试信息
extras: dict[str, str] = {}
try:
attrs = vars(self)
except TypeError:
# 某些实现可能不支持 vars(self),此时只回退到 file
attrs = {}
for key, value in attrs.items():
if key == "file":
continue
# 仅展示简单标量字段,避免打印过大的/敏感的结构
if isinstance(value, (str, int, float, bool)):
# 避免意外包含 base64 字段
if isinstance(value, str) and value.startswith("base64://"):
continue
preview = repr(value)
if len(preview) > 60:
preview = preview[:57] + "..."
extras[key] = preview
parts: list[str] = [f"file={f!r}"]
if extras:
extras_str = ", ".join(f"{k}={v}" for k, v in extras.items())
parts.append(extras_str)
# 统计未展示的额外字段数量,提示还有隐藏属性
hidden_count = max(len(attrs) - 1 - len(extras), 0)
if hidden_count > 0:
parts.append(f"... (+{hidden_count} hidden)")
return f"Image({', '.join(parts)})"
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| def _describe_chain(chain: list) -> str: | ||
| """简要描述消息链内容,用于引用消息的展示""" | ||
| desc = [] | ||
| for c in chain: | ||
| if isinstance(c, Plain) and getattr(c, "text", None): | ||
| desc.append(c.text) | ||
| elif isinstance(c, Image): | ||
| desc.append("[Image]") | ||
| elif isinstance(c, At): | ||
| name = getattr(c, "name", "") or getattr(c, "qq", "") |
There was a problem hiding this comment.
suggestion (performance): Consider truncating or otherwise limiting chain descriptions to avoid extremely long quoted content.
This function currently concatenates all Plain texts in the chain with no length cap, so large or forwarded chains can produce very long quote summaries that bloat logs or exceed downstream limits. Consider reusing _truncate_reply_text on the final description or enforcing a maximum length in _describe_chain (e.g., stop appending once a threshold is reached).
There was a problem hiding this comment.
Code Review
This pull request adds support for formatting and describing message replies (including message chains with various components like AtAll, Face, File, etc.) in group chat contexts. It also implements a custom __repr__ for the Image component to truncate base64 data in logs. The feedback points out two main issues: first, in _describe_chain, the type check for At is placed before AtAll (which inherits from At), making the AtAll check dead code; second, the custom __repr__ for Image should retain other fields like url and path to prevent losing useful debugging information.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| elif isinstance(c, At): | ||
| name = getattr(c, "name", "") or getattr(c, "qq", "") | ||
| desc.append(f"[At: {name}]") | ||
| elif isinstance(c, Record): | ||
| desc.append("[Voice]") | ||
| elif isinstance(c, Video): | ||
| desc.append("[Video]") | ||
| elif isinstance(c, File): | ||
| desc.append(f"[File: {getattr(c, 'name', '') or ''}]") | ||
| elif isinstance(c, Forward): | ||
| desc.append("[Forward]") | ||
| elif isinstance(c, AtAll): | ||
| desc.append("[At: All]") |
There was a problem hiding this comment.
AtAll 继承自 At。由于在 _describe_chain 的 if-elif 分支中,isinstance(c, At) 的判断在 isinstance(c, AtAll) 之前,任何 AtAll 类型的组件都会先匹配到 At 的分支,导致 isinstance(c, AtAll) 的分支变成了永远无法执行到的死代码(dead code)。
为了修复这个问题,应该将 AtAll 的类型判断移动到 At 之前。
| elif isinstance(c, At): | |
| name = getattr(c, "name", "") or getattr(c, "qq", "") | |
| desc.append(f"[At: {name}]") | |
| elif isinstance(c, Record): | |
| desc.append("[Voice]") | |
| elif isinstance(c, Video): | |
| desc.append("[Video]") | |
| elif isinstance(c, File): | |
| desc.append(f"[File: {getattr(c, 'name', '') or ''}]") | |
| elif isinstance(c, Forward): | |
| desc.append("[Forward]") | |
| elif isinstance(c, AtAll): | |
| desc.append("[At: All]") | |
| elif isinstance(c, AtAll): | |
| desc.append("[At: All]") | |
| elif isinstance(c, At): | |
| name = getattr(c, "name", "") or getattr(c, "qq", "") | |
| desc.append(f"[At: {name}]") | |
| elif isinstance(c, Record): | |
| desc.append("[Voice]") | |
| elif isinstance(c, Video): | |
| desc.append("[Video]") | |
| elif isinstance(c, File): | |
| desc.append(f"[File: {getattr(c, 'name', '') or ''}]") | |
| elif isinstance(c, Forward): | |
| desc.append("[Forward]") |
| def __repr__(self) -> str: | ||
| """截断 base64 数据,避免日志中打印完整的图片数据。""" | ||
| f = self.file or "" | ||
| if f.startswith("base64://") and len(f) > 60: | ||
| f = f[:57] + "..." | ||
| return f"Image(file={f!r})" |
There was a problem hiding this comment.
重写 __repr__ 时如果只返回 Image(file=...),会导致原有的 url 和 path 等其他有用字段在调试日志中丢失。建议在 __repr__ 中保留这些字段以方便排查问题。
另外,Record 组件也支持通过 Record.fromBase64 传入 base64 数据,同样存在日志洪水的隐患。建议在后续或此 PR 中也为 Record 组件实现类似的截断 __repr__。
| def __repr__(self) -> str: | |
| """截断 base64 数据,避免日志中打印完整的图片数据。""" | |
| f = self.file or "" | |
| if f.startswith("base64://") and len(f) > 60: | |
| f = f[:57] + "..." | |
| return f"Image(file={f!r})" | |
| def __repr__(self) -> str: | |
| """截断 base64 数据,避免日志中打印完整的图片数据。""" | |
| f = self.file or "" | |
| if f.startswith("base64://") and len(f) > 60: | |
| f = f[:57] + "..." | |
| return f"Image(file={f!r}, url={self.url!r}, path={self.path!r})" |
When message sending fails (e.g., ActionFailed timeout), the error log prints the full MessageChain including Image components with raw base64 data. For a typical 1024x1024 image this can be several thousand characters of base64, completely flooding the logs. Override __repr__ on Image to truncate base64:// data to ~57 chars, keeping the log readable while still identifiable. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…repr__ - Extract truncation thresholds as class constants - Include url, path, _type in repr output when non-empty - Preserve debugging info while still truncating base64 data Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
eab7dc3 to
61992f2
Compare
问题
当消息发送失败时(如 ActionFailed 超时),respond.stage 会打印完整的 MessageChain,其中 Image 组件的 file 字段包含完整的 base64 图片数据。对于 1024x1024 的图片,这是几千个字符的 base64 字符串,会完全淹没日志。
修复
重写 Image.repr,将 base64:// 数据截断到约 57 字符,保持日志可读的同时仍能识别为 base64 图片。
影响范围
仅影响 Image 组件的字符串表示(repr),不影响任何功能性代码。
Summary by Sourcery
Improve logging readability for messages and image components by summarizing quoted content and truncating embedded base64 image data.
Bug Fixes:
Enhancements: