Support large target repos with with bind-mount option.#577
Conversation
- Change RuntimeError to TypeError for type validation in report/writer.py - Update pyupgrade to v3.21.2 for Python 3.14 compatibility
Mirror the layout introduced on feature/438-token_budget: pytest + pytest-asyncio dev deps, asyncio_mode auto, a tests.* mypy override, and pytest in the mypy pre-commit hook deps so the tests/ package type-checks.
…ix#492) Large local targets were copied into the sandbox file-by-file via the SDK LocalDir entry, which stalls on big repos and could leave /workspace empty. - --mount <path> bind-mounts a host directory read-only at /workspace/<subdir> instead of copying it, bypassing the per-file stream. - A size pre-flight (STRIX_MAX_LOCAL_COPY_MB, default 1024) fails fast with a clear message suggesting --mount when a non-mounted local target is too big.
An empty or whitespace-only --mount value resolves to the current working directory and would silently bind-mount it into the sandbox. Reject it.
If the same directory is passed via --target and --mount (or as duplicate values), it previously produced two targets — copied AND bind-mounted, and the copied one could trip the size pre-flight. Dedupe by resolved path, preferring the bind mount.
Previously a value of 0 (or negative) made every local target count as oversized, aborting all local scans. Now <= 0 disables the pre-flight.
os.walk silently swallowed directory-listing errors, so a permission-denied subtree could make a large repo under-count and slip past the pre-flight. Surface such omissions via an onerror warning.
Add CLI reference + example for --mount, document the size pre-flight env var, note the read-only-is-not-a-hard-boundary caveat and that remote repos are not size-checked, and clarify the backends docstring on when bind mounts apply.
Greptile SummaryThis PR adds a
Confidence Score: 5/5Safe to merge. The bind-mount path is well-isolated and the size pre-flight has no effect on correctness — only on UX for large trees. The core logic (manifest split, Docker mount injection, deduplication, path resolution) is correct and comprehensively tested. The one improvement opportunity — adding an early-exit to the directory-size walk — is a performance nicety, not a correctness issue. strix/interface/utils.py (directory_size_bytes early-exit suggestion) Important Files Changed
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
strix/interface/utils.py:1220-1245
`directory_size_bytes` walks the entire tree even when the cumulative size already exceeds the caller's limit. For a 100 GB repository with the default 1 GB cap, the CLI will stat every file before `find_oversized_local_targets` can reject it. Adding a `limit` parameter with an early-return makes the worst-case proportional to the limit, not the total size.
```suggestion
def directory_size_bytes(path: Path, limit: int = 0) -> int:
"""Total size in bytes of regular files under ``path`` (symlinks not followed).
Best-effort: files that disappear or can't be stat'd mid-walk are skipped.
Used as a cheap (stat-only) pre-flight to estimate the cost of streaming a
local target into the sandbox before we actually try to copy it.
Directories that can't be listed (e.g. permission denied) are logged and
skipped rather than silently dropped — so an under-count is at least
visible — but the returned total then excludes their contents.
When ``limit`` is positive the walk stops as soon as the running total
exceeds it — the returned value is then an over-estimate, but that is fine
for a "is this too big?" pre-flight check.
"""
def _on_walk_error(error: OSError) -> None:
logger.warning("Could not read %s while measuring size: %s", error.filename, error)
total = 0
for root, _dirs, files in os.walk(path, followlinks=False, onerror=_on_walk_error):
for name in files:
file_path = os.path.join(root, name) # noqa: PTH118
try:
if os.path.islink(file_path): # noqa: PTH114
continue
total += os.path.getsize(file_path) # noqa: PTH202
except OSError:
continue
if limit > 0 and total > limit:
return total
return total
```
### Issue 2 of 2
strix/interface/utils.py:1269-1270
Pass `max_bytes` as the early-exit `limit` so the walk stops as soon as it is clear the target is oversized, rather than always traversing the full tree.
```suggestion
size = directory_size_bytes(Path(target_path), limit=max_bytes)
if size > max_bytes:
```
Reviews (2): Last reviewed commit: "Update strix/runtime/docker_client.py" | Re-trigger Greptile |
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
|
@greptile |
Local
--targetdirs are copied into the sandbox file-by-file, which stalls on large repos and can leave/workspaceempty.What's new:
--mount <path>bind-mounts a local dir into the sandbox read-only instead of copying it — ideal for big monorepos.STRIX_MAX_LOCAL_COPY_MB, default 1024) exit early and suggest--mount.Tests and docs included. Fixes #492.