fix: unify media reference handling#8764
Conversation
There was a problem hiding this comment.
Sorry @Soulter, your pull request is larger than the review limit of 150000 diff characters
There was a problem hiding this comment.
Code Review
This pull request introduces a centralized MediaResolver utility to unify and normalize media resolution, format conversion, and base64 encoding across all platform adapters, agent runners, and LLM providers. It also updates the pipeline preprocess stage to normalize media files early and adds extensive unit tests. The review feedback highlights several key improvement opportunities, including resolving redundant base64 encoding/decoding in the Coze runner, broadening overly restrictive data: URI checks in message components, preventing a potential runtime crash in the Discord adapter due to empty media references, avoiding double-resolution of images in the QQOfficial adapter, and robustly stripping internal whitespace during base64 padding calculations.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| image_data = await MediaResolver( | ||
| image_url, | ||
| media_type="image", | ||
| ).to_base64_data(strict=True) | ||
| if image_data is None: | ||
| raise ValueError("invalid image data") | ||
| file_id = await self.api_client.upload_file(image_data.to_bytes()) |
There was a problem hiding this comment.
Using to_base64_data(strict=True) followed by to_bytes() is inefficient because it performs redundant base64 encoding and decoding. You can use to_bytes() directly on the MediaResolver to retrieve the raw bytes efficiently.
image_bytes = await MediaResolver(
image_url,
media_type="image",
).to_bytes()
file_id = await self.api_client.upload_file(image_bytes)| if ( | ||
| self.file.startswith("file:///") | ||
| self.file.startswith("file://") | ||
| or self.file.startswith("http") | ||
| or self.file.startswith("base64://") | ||
| or self.file.startswith("data:audio/") | ||
| or os.path.exists(self.file) | ||
| ): | ||
| return self.file | ||
|
|
||
| # 2) 尝试 url(可能是 file:/// 或 http 链接) | ||
| if self.url: | ||
| if ( | ||
| self.url.startswith("file:///") | ||
| self.url.startswith("file://") | ||
| or self.url.startswith("http") | ||
| or self.url.startswith("data:audio/") | ||
| or os.path.exists(self.url) | ||
| or ( | ||
| self.url.startswith("file:///") | ||
| self.url.startswith("file://") | ||
| and os.path.exists(self._decode_file_uri(self.url)) | ||
| ) | ||
| ): |
There was a problem hiding this comment.
Checking for data:audio/ is overly restrictive. A data URI can have other audio-related MIME types (e.g., data:audio/mpeg, data:audio/ogg) or generic ones like data:application/octet-stream. Since MediaResolver can handle any valid data URI, checking for data: is more robust and future-proof.
if (
self.file.startswith("file://")
or self.file.startswith("http")
or self.file.startswith("base64://")
or self.file.startswith("data:")
or os.path.exists(self.file)
):
return self.file
# 2) 尝试 url(可能是 file:/// 或 http 链接)
if self.url:
if (
self.url.startswith("file://")
or self.url.startswith("http")
or self.url.startswith("data:")
or os.path.exists(self.url)
or (
self.url.startswith("file://")
and os.path.exists(self._decode_file_uri(self.url))
)
):| if ( | ||
| candidate.startswith("file://") | ||
| or candidate.startswith("http") | ||
| or candidate.startswith("base64://") | ||
| or candidate.startswith("data:video/") | ||
| or os.path.exists(candidate) | ||
| ): |
There was a problem hiding this comment.
Checking for data:video/ is overly restrictive. A video data URI can have other MIME types or generic ones like data:application/octet-stream. Checking for data: is more robust and consistent with MediaResolver's capabilities.
| if ( | |
| candidate.startswith("file://") | |
| or candidate.startswith("http") | |
| or candidate.startswith("base64://") | |
| or candidate.startswith("data:video/") | |
| or os.path.exists(candidate) | |
| ): | |
| if ( | |
| candidate.startswith("file://") | |
| or candidate.startswith("http") | |
| or candidate.startswith("base64://") | |
| or candidate.startswith("data:") | |
| or os.path.exists(candidate) | |
| ): |
| if not i.file: | ||
| raise ValueError("Unsupported image file format") | ||
| image_base64 = image_base64.removeprefix("base64://") | ||
| image_base64 = await MediaResolver( | ||
| i.file, | ||
| media_type="image", | ||
| ).to_base64() | ||
| if i.file.startswith("file://") or os.path.exists(i.file): | ||
| image_file_path = await MediaResolver( | ||
| i.file, | ||
| media_type="image", | ||
| ).to_path() |
There was a problem hiding this comment.
The current implementation resolves the same image twice (once for to_base64() and once for to_path()). If the image is a local file, we can resolve it to a path first, and then read the file directly to get the base64 data, avoiding redundant resolution lifecycles.
if not i.file:
raise ValueError("Unsupported image file format")
resolver = MediaResolver(i.file, media_type="image")
if i.file.startswith("file://") or os.path.exists(i.file):
image_file_path = await resolver.to_path()
with open(image_file_path, "rb") as f:
image_base64 = base64.b64encode(f.read()).decode("utf-8")
else:
image_base64 = await resolver.to_base64()| validate: bool = False, | ||
| ) -> bytes: | ||
| """Decode base64 payloads while tolerating omitted padding.""" | ||
| payload = payload.strip() |
There was a problem hiding this comment.
Using payload.strip() only removes leading and trailing whitespace. If the base64 payload contains internal whitespace (such as newlines from MIME base64), len(payload) % 4 can calculate the wrong padding length, leading to incorrect padding errors. Stripping all whitespace is much safer.
| payload = payload.strip() | |
| payload = "".join(payload.split()) |
…ring sanitization' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Summary
Fixes #8676
Tests