Skip to content

fix: truncate base64 data in Image.__repr__ to avoid log flooding#8760

Open
ScarletPupil wants to merge 3 commits into
AstrBotDevs:masterfrom
ScarletPupil:fix/image-repr-truncate-base64
Open

fix: truncate base64 data in Image.__repr__ to avoid log flooding#8760
ScarletPupil wants to merge 3 commits into
AstrBotDevs:masterfrom
ScarletPupil:fix/image-repr-truncate-base64

Conversation

@ScarletPupil

@ScarletPupil ScarletPupil commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

问题

当消息发送失败时(如 ActionFailed 超时),respond.stage 会打印完整的 MessageChain,其中 Image 组件的 file 字段包含完整的 base64 图片数据。对于 1024x1024 的图片,这是几千个字符的 base64 字符串,会完全淹没日志。

[v4.25.5] [respond.stage:287]: 发送消息链失败: chain = MessageChain(chain=[...,
Image(file='base64://iVBORw0KGgoAAAANSUhEUgA...几千个字符...'))])

修复

重写 Image.repr,将 base64:// 数据截断到约 57 字符,保持日志可读的同时仍能识别为 base64 图片。

影响范围

仅影响 Image 组件的字符串表示(repr),不影响任何功能性代码。

Summary by Sourcery

Improve logging readability for messages and image components by summarizing quoted content and truncating embedded base64 image data.

Bug Fixes:

  • Prevent log flooding by truncating long base64-encoded image data in Image.repr.

Enhancements:

  • Summarize Reply components in group chat context using truncated text or a compact description of the quoted message chain to keep logs concise.

Include Reply component content in _format_message so the LLM can
see what message was quoted when someone replies to the bot.

- Add Reply import
- Handle Reply in _format_message with message_str or chain fallback
- Add _describe_chain helper for non-text quoted content (images, etc.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. area:core The bug / feature is about astrbot's core, backend labels Jun 13, 2026

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • Consider extracting the base64 truncation length (currently 60/57) into a named constant shared by Image.__repr__ so the behavior is clear and easier to adjust consistently.
  • In _describe_chain, you may want to apply an overall length cap similar to _truncate_reply_text to avoid very long quoted-chain descriptions when many Plain components or large texts are present.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider extracting the base64 truncation length (currently 60/57) into a named constant shared by `Image.__repr__` so the behavior is clear and easier to adjust consistently.
- In `_describe_chain`, you may want to apply an overall length cap similar to `_truncate_reply_text` to avoid very long quoted-chain descriptions when many `Plain` components or large texts are present.

## Individual Comments

### Comment 1
<location path="astrbot/builtin_stars/astrbot/group_chat_context.py" line_range="236-245" />
<code_context>
+def _describe_chain(chain: list) -> str:
</code_context>
<issue_to_address>
**suggestion (performance):** Consider truncating or otherwise limiting chain descriptions to avoid extremely long quoted content.

This function currently concatenates all `Plain` texts in the chain with no length cap, so large or forwarded chains can produce very long quote summaries that bloat logs or exceed downstream limits. Consider reusing `_truncate_reply_text` on the final description or enforcing a maximum length in `_describe_chain` (e.g., stop appending once a threshold is reached).
</issue_to_address>

### Comment 2
<location path="astrbot/core/message/components.py" line_range="456-461" />
<code_context>
     def __init__(self, file: str | None, **_) -> None:
         super().__init__(file=file, **_)

+    def __repr__(self) -> str:
+        """截断 base64 数据,避免日志中打印完整的图片数据。"""
+        f = self.file or ""
+        if f.startswith("base64://") and len(f) > 60:
+            f = f[:57] + "..."
+        return f"Image(file={f!r})"
+
     @staticmethod
</code_context>
<issue_to_address>
**suggestion:** The custom Image.__repr__ hides other potentially useful attributes passed via **_.

Focusing `repr` on `file` avoids large base64 dumps, but it also hides any other fields passed via `**_` that may be important for debugging (IDs, URLs, metadata, etc.). Consider including a minimal set of safe, non-sensitive fields—or at least signaling that additional attributes are present—so `repr(Image(...))` remains informative without exposing large or sensitive data.

```suggestion
    def __repr__(self) -> str:
        """截断 base64 数据,避免日志中打印完整的图片数据,同时在不泄露大字段的前提下提供额外调试信息。"""
        # 处理 file,避免在日志中打印完整的 base64 图片数据
        f = getattr(self, "file", None) or ""
        if isinstance(f, str) and f.startswith("base64://") and len(f) > 60:
            f = f[:57] + "..."

        # 从实例属性中提取除 file 以外的字段,尽量提供一些调试信息
        extras: dict[str, str] = {}
        try:
            attrs = vars(self)
        except TypeError:
            # 某些实现可能不支持 vars(self),此时只回退到 file
            attrs = {}

        for key, value in attrs.items():
            if key == "file":
                continue

            # 仅展示简单标量字段,避免打印过大的/敏感的结构
            if isinstance(value, (str, int, float, bool)):
                # 避免意外包含 base64 字段
                if isinstance(value, str) and value.startswith("base64://"):
                    continue
                preview = repr(value)
                if len(preview) > 60:
                    preview = preview[:57] + "..."
                extras[key] = preview

        parts: list[str] = [f"file={f!r}"]

        if extras:
            extras_str = ", ".join(f"{k}={v}" for k, v in extras.items())
            parts.append(extras_str)

            # 统计未展示的额外字段数量,提示还有隐藏属性
            hidden_count = max(len(attrs) - 1 - len(extras), 0)
            if hidden_count > 0:
                parts.append(f"... (+{hidden_count} hidden)")

        return f"Image({', '.join(parts)})"
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +236 to +245
def _describe_chain(chain: list) -> str:
"""简要描述消息链内容,用于引用消息的展示"""
desc = []
for c in chain:
if isinstance(c, Plain) and getattr(c, "text", None):
desc.append(c.text)
elif isinstance(c, Image):
desc.append("[Image]")
elif isinstance(c, At):
name = getattr(c, "name", "") or getattr(c, "qq", "")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): Consider truncating or otherwise limiting chain descriptions to avoid extremely long quoted content.

This function currently concatenates all Plain texts in the chain with no length cap, so large or forwarded chains can produce very long quote summaries that bloat logs or exceed downstream limits. Consider reusing _truncate_reply_text on the final description or enforcing a maximum length in _describe_chain (e.g., stop appending once a threshold is reached).

Comment thread astrbot/core/message/components.py Outdated

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for formatting and describing message replies (including message chains with various components like AtAll, Face, File, etc.) in group chat contexts. It also implements a custom __repr__ for the Image component to truncate base64 data in logs. The feedback points out two main issues: first, in _describe_chain, the type check for At is placed before AtAll (which inherits from At), making the AtAll check dead code; second, the custom __repr__ for Image should retain other fields like url and path to prevent losing useful debugging information.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +244 to +256
elif isinstance(c, At):
name = getattr(c, "name", "") or getattr(c, "qq", "")
desc.append(f"[At: {name}]")
elif isinstance(c, Record):
desc.append("[Voice]")
elif isinstance(c, Video):
desc.append("[Video]")
elif isinstance(c, File):
desc.append(f"[File: {getattr(c, 'name', '') or ''}]")
elif isinstance(c, Forward):
desc.append("[Forward]")
elif isinstance(c, AtAll):
desc.append("[At: All]")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

AtAll 继承自 At。由于在 _describe_chainif-elif 分支中,isinstance(c, At) 的判断在 isinstance(c, AtAll) 之前,任何 AtAll 类型的组件都会先匹配到 At 的分支,导致 isinstance(c, AtAll) 的分支变成了永远无法执行到的死代码(dead code)。

为了修复这个问题,应该将 AtAll 的类型判断移动到 At 之前。

Suggested change
elif isinstance(c, At):
name = getattr(c, "name", "") or getattr(c, "qq", "")
desc.append(f"[At: {name}]")
elif isinstance(c, Record):
desc.append("[Voice]")
elif isinstance(c, Video):
desc.append("[Video]")
elif isinstance(c, File):
desc.append(f"[File: {getattr(c, 'name', '') or ''}]")
elif isinstance(c, Forward):
desc.append("[Forward]")
elif isinstance(c, AtAll):
desc.append("[At: All]")
elif isinstance(c, AtAll):
desc.append("[At: All]")
elif isinstance(c, At):
name = getattr(c, "name", "") or getattr(c, "qq", "")
desc.append(f"[At: {name}]")
elif isinstance(c, Record):
desc.append("[Voice]")
elif isinstance(c, Video):
desc.append("[Video]")
elif isinstance(c, File):
desc.append(f"[File: {getattr(c, 'name', '') or ''}]")
elif isinstance(c, Forward):
desc.append("[Forward]")

Comment thread astrbot/core/message/components.py Outdated
Comment on lines +456 to +461
def __repr__(self) -> str:
"""截断 base64 数据,避免日志中打印完整的图片数据。"""
f = self.file or ""
if f.startswith("base64://") and len(f) > 60:
f = f[:57] + "..."
return f"Image(file={f!r})"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

重写 __repr__ 时如果只返回 Image(file=...),会导致原有的 urlpath 等其他有用字段在调试日志中丢失。建议在 __repr__ 中保留这些字段以方便排查问题。

另外,Record 组件也支持通过 Record.fromBase64 传入 base64 数据,同样存在日志洪水的隐患。建议在后续或此 PR 中也为 Record 组件实现类似的截断 __repr__

Suggested change
def __repr__(self) -> str:
"""截断 base64 数据,避免日志中打印完整的图片数据。"""
f = self.file or ""
if f.startswith("base64://") and len(f) > 60:
f = f[:57] + "..."
return f"Image(file={f!r})"
def __repr__(self) -> str:
"""截断 base64 数据,避免日志中打印完整的图片数据。"""
f = self.file or ""
if f.startswith("base64://") and len(f) > 60:
f = f[:57] + "..."
return f"Image(file={f!r}, url={self.url!r}, path={self.path!r})"

ScarletPupil and others added 2 commits June 13, 2026 20:30
When message sending fails (e.g., ActionFailed timeout), the error log
prints the full MessageChain including Image components with raw base64
data. For a typical 1024x1024 image this can be several thousand characters
of base64, completely flooding the logs.

Override __repr__ on Image to truncate base64:// data to ~57 chars, keeping
the log readable while still identifiable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…repr__

- Extract truncation thresholds as class constants
- Include url, path, _type in repr output when non-empty
- Preserve debugging info while still truncating base64 data

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ScarletPupil ScarletPupil force-pushed the fix/image-repr-truncate-base64 branch from eab7dc3 to 61992f2 Compare June 13, 2026 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core The bug / feature is about astrbot's core, backend size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant