Skip to content

fix: unify media reference handling#8764

Open
Soulter wants to merge 8 commits into
masterfrom
fix/unify-media-resolution
Open

fix: unify media reference handling#8764
Soulter wants to merge 8 commits into
masterfrom
fix/unify-media-resolution

Conversation

@Soulter

@Soulter Soulter commented Jun 13, 2026

Copy link
Copy Markdown
Member

Summary

  • Centralize media reference materialization and base64 resolution for local paths, http(s), base64://, data URIs, and legacy bare base64 payloads.
  • Normalize incoming Record audio to wav and Image media to temporary jpg during preprocess, with event-scoped cleanup.
  • Reuse the shared media resolver across OpenAI, Gemini, Anthropic, MiMo, DeerFlow, STT, and platform media paths while sanitizing logs and cleaning temporary conversion outputs.
  • Ensure generated TTS audio is tracked for cleanup after the event finishes.

Fixes #8676

Tests

  • uv run pytest tests/test_media_utils.py tests/test_agent_runner_media_resolver.py tests/test_platform_audio_media_resolver.py tests/test_openai_source.py tests/test_deerflow_agent_runner.py tests/test_discord_adapter.py tests/test_mattermost_adapter.py tests/test_kook/test_kook_client.py tests/test_whisper_api_source.py tests/test_preprocess_stage.py tests/unit/test_astr_main_agent.py
  • git diff --cached --name-only -- "*.py" | xargs uv run ruff check
  • git diff --cached --name-only -- "*.py" | xargs uv run ruff format --check
  • git diff --check

@dosubot dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jun 13, 2026

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @Soulter, your pull request is larger than the review limit of 150000 diff characters

@dosubot dosubot Bot added area:core The bug / feature is about astrbot's core, backend area:platform The bug / feature is about IM platform adapter, such as QQ, Lark, Telegram, WebChat and so on. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels Jun 13, 2026
Comment thread tests/test_media_utils.py Fixed

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a centralized MediaResolver utility to unify and normalize media resolution, format conversion, and base64 encoding across all platform adapters, agent runners, and LLM providers. It also updates the pipeline preprocess stage to normalize media files early and adds extensive unit tests. The review feedback highlights several key improvement opportunities, including resolving redundant base64 encoding/decoding in the Coze runner, broadening overly restrictive data: URI checks in message components, preventing a potential runtime crash in the Discord adapter due to empty media references, avoiding double-resolution of images in the QQOfficial adapter, and robustly stripping internal whitespace during base64 padding calculations.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +355 to +361
image_data = await MediaResolver(
image_url,
media_type="image",
).to_base64_data(strict=True)
if image_data is None:
raise ValueError("invalid image data")
file_id = await self.api_client.upload_file(image_data.to_bytes())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using to_base64_data(strict=True) followed by to_bytes() is inefficient because it performs redundant base64 encoding and decoding. You can use to_bytes() directly on the MediaResolver to retrieve the raw bytes efficiently.

            image_bytes = await MediaResolver(
                image_url,
                media_type="image",
            ).to_bytes()
            file_id = await self.api_client.upload_file(image_bytes)

Comment on lines 163 to 183
if (
self.file.startswith("file:///")
self.file.startswith("file://")
or self.file.startswith("http")
or self.file.startswith("base64://")
or self.file.startswith("data:audio/")
or os.path.exists(self.file)
):
return self.file

# 2) 尝试 url(可能是 file:/// 或 http 链接)
if self.url:
if (
self.url.startswith("file:///")
self.url.startswith("file://")
or self.url.startswith("http")
or self.url.startswith("data:audio/")
or os.path.exists(self.url)
or (
self.url.startswith("file:///")
self.url.startswith("file://")
and os.path.exists(self._decode_file_uri(self.url))
)
):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Checking for data:audio/ is overly restrictive. A data URI can have other audio-related MIME types (e.g., data:audio/mpeg, data:audio/ogg) or generic ones like data:application/octet-stream. Since MediaResolver can handle any valid data URI, checking for data: is more robust and future-proof.

            if (
                self.file.startswith("file://")
                or self.file.startswith("http")
                or self.file.startswith("base64://")
                or self.file.startswith("data:")
                or os.path.exists(self.file)
            ):
                return self.file

        # 2) 尝试 url(可能是 file:/// 或 http 链接)
        if self.url:
            if (
                self.url.startswith("file://")
                or self.url.startswith("http")
                or self.url.startswith("data:")
                or os.path.exists(self.url)
                or (
                    self.url.startswith("file://")
                    and os.path.exists(self._decode_file_uri(self.url))
                )
            ):

Comment on lines +278 to +284
if (
candidate.startswith("file://")
or candidate.startswith("http")
or candidate.startswith("base64://")
or candidate.startswith("data:video/")
or os.path.exists(candidate)
):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Checking for data:video/ is overly restrictive. A video data URI can have other MIME types or generic ones like data:application/octet-stream. Checking for data: is more robust and consistent with MediaResolver's capabilities.

Suggested change
if (
candidate.startswith("file://")
or candidate.startswith("http")
or candidate.startswith("base64://")
or candidate.startswith("data:video/")
or os.path.exists(candidate)
):
if (
candidate.startswith("file://")
or candidate.startswith("http")
or candidate.startswith("base64://")
or candidate.startswith("data:")
or os.path.exists(candidate)
):

Comment thread astrbot/core/platform/sources/discord/discord_platform_adapter.py Outdated
Comment on lines +684 to +694
if not i.file:
raise ValueError("Unsupported image file format")
image_base64 = image_base64.removeprefix("base64://")
image_base64 = await MediaResolver(
i.file,
media_type="image",
).to_base64()
if i.file.startswith("file://") or os.path.exists(i.file):
image_file_path = await MediaResolver(
i.file,
media_type="image",
).to_path()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation resolves the same image twice (once for to_base64() and once for to_path()). If the image is a local file, we can resolve it to a path first, and then read the file directly to get the base64 data, avoiding redundant resolution lifecycles.

                if not i.file:
                    raise ValueError("Unsupported image file format")
                resolver = MediaResolver(i.file, media_type="image")
                if i.file.startswith("file://") or os.path.exists(i.file):
                    image_file_path = await resolver.to_path()
                    with open(image_file_path, "rb") as f:
                        image_base64 = base64.b64encode(f.read()).decode("utf-8")
                else:
                    image_base64 = await resolver.to_base64()

Comment thread astrbot/core/utils/media_utils.py Outdated
validate: bool = False,
) -> bytes:
"""Decode base64 payloads while tolerating omitted padding."""
payload = payload.strip()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using payload.strip() only removes leading and trailing whitespace. If the base64 payload contains internal whitespace (such as newlines from MIME base64), len(payload) % 4 can calculate the wrong padding length, leading to incorrect padding errors. Stripping all whitespace is much safer.

Suggested change
payload = payload.strip()
payload = "".join(payload.split())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core The bug / feature is about astrbot's core, backend area:platform The bug / feature is about IM platform adapter, such as QQ, Lark, Telegram, WebChat and so on. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]私信语音占用太多日志

2 participants