Skip to content

Support large target repos with with bind-mount option.#577

Open
mhspektr wants to merge 10 commits into
usestrix:mainfrom
mhspektr:feature/492-support-large-repos
Open

Support large target repos with with bind-mount option.#577
mhspektr wants to merge 10 commits into
usestrix:mainfrom
mhspektr:feature/492-support-large-repos

Conversation

@mhspektr

Copy link
Copy Markdown

Local --target dirs are copied into the sandbox file-by-file, which stalls on large repos and can leave /workspace empty.

What's new:

  • --mount <path> bind-mounts a local dir into the sandbox read-only instead of copying it — ideal for big monorepos.
  • A size pre-flight: oversized non-mounted targets (STRIX_MAX_LOCAL_COPY_MB, default 1024) exit early and suggest --mount.

Tests and docs included. Fixes #492.

mhspektr added 8 commits June 18, 2026 22:45
- Change RuntimeError to TypeError for type validation in report/writer.py
- Update pyupgrade to v3.21.2 for Python 3.14 compatibility
Mirror the layout introduced on feature/438-token_budget: pytest +
pytest-asyncio dev deps, asyncio_mode auto, a tests.* mypy override, and
pytest in the mypy pre-commit hook deps so the tests/ package type-checks.
…ix#492)

Large local targets were copied into the sandbox file-by-file via the SDK
LocalDir entry, which stalls on big repos and could leave /workspace empty.

- --mount <path> bind-mounts a host directory read-only at /workspace/<subdir>
  instead of copying it, bypassing the per-file stream.
- A size pre-flight (STRIX_MAX_LOCAL_COPY_MB, default 1024) fails fast with a
  clear message suggesting --mount when a non-mounted local target is too big.
An empty or whitespace-only --mount value resolves to the current working
directory and would silently bind-mount it into the sandbox. Reject it.
If the same directory is passed via --target and --mount (or as duplicate
values), it previously produced two targets — copied AND bind-mounted, and
the copied one could trip the size pre-flight. Dedupe by resolved path,
preferring the bind mount.
Previously a value of 0 (or negative) made every local target count as
oversized, aborting all local scans. Now <= 0 disables the pre-flight.
os.walk silently swallowed directory-listing errors, so a permission-denied
subtree could make a large repo under-count and slip past the pre-flight.
Surface such omissions via an onerror warning.
Add CLI reference + example for --mount, document the size pre-flight env var,
note the read-only-is-not-a-hard-boundary caveat and that remote repos are not
size-checked, and clarify the backends docstring on when bind mounts apply.
@greptile-apps

greptile-apps Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a --mount option that bind-mounts a local directory into the sandbox read-only instead of streaming it file-by-file, and a size pre-flight check (STRIX_MAX_LOCAL_COPY_MB, default 1024 MB) that rejects oversized non-mounted targets before the copy begins.

  • build_mount_targets_info / dedupe_local_targets / find_oversized_local_targets in utils.py handle CLI parsing, deduplication (mount wins over copy for the same path), and size enforcement.
  • build_session_entries in session_manager.py splits sources into SDK LocalDir manifest entries (copied) and host bind-mount specs passed to the Docker backend at container-create time via DockerSDKMount.
  • Test coverage is solid across all new helpers.

Confidence Score: 5/5

Safe to merge. The bind-mount path is well-isolated and the size pre-flight has no effect on correctness — only on UX for large trees.

The core logic (manifest split, Docker mount injection, deduplication, path resolution) is correct and comprehensively tested. The one improvement opportunity — adding an early-exit to the directory-size walk — is a performance nicety, not a correctness issue.

strix/interface/utils.py (directory_size_bytes early-exit suggestion)

Important Files Changed

Filename Overview
strix/interface/utils.py Adds five new helpers: directory_size_bytes, find_oversized_local_targets, build_mount_targets_info, dedupe_local_targets, and extends collect_local_sources with mount flag. Logic is correct; directory_size_bytes has no early-exit once the limit is exceeded.
strix/interface/main.py Adds --mount CLI argument, wires it into argument validation (resume guard, required-args check), runs dedup then size pre-flight before parsing completes. Flow and error handling are correct.
strix/runtime/session_manager.py Extracts build_session_entries to split local sources into copied manifest entries and bind mounts; passes bind_mounts to the backend. Logic is clean and well-tested.
strix/runtime/docker_client.py Adds strix_bind_mounts class attribute and appends DockerSDKMount objects to create_kwargs before container creation. Correctly uses setdefault to coexist with manifest-declared mounts.
strix/runtime/backends.py Adds bind_mounts parameter to _docker_backend and assigns it to client.strix_bind_mounts before create(). Clean and minimal change.
tests/test_local_sources.py Comprehensive unit tests for directory_size_bytes, find_oversized_local_targets, build_mount_targets_info, dedupe_local_targets, and collect_local_sources. Good edge-case coverage including POSIX permissions and empty-path rejection.
tests/test_session_entries.py Tests build_session_entries split logic for copied vs mounted sources; covers mixed, incomplete, and pure-mount inputs.
strix/config/settings.py Adds max_local_copy_mb field to RuntimeSettings with sensible 1024 MB default and STRIX_MAX_LOCAL_COPY_MB alias.
strix/core/inputs.py Appends ", read-only mount" suffix to the local-codebase task description when the target is bind-mounted; minor and correct.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
strix/interface/utils.py:1220-1245
`directory_size_bytes` walks the entire tree even when the cumulative size already exceeds the caller's limit. For a 100 GB repository with the default 1 GB cap, the CLI will stat every file before `find_oversized_local_targets` can reject it. Adding a `limit` parameter with an early-return makes the worst-case proportional to the limit, not the total size.

```suggestion
def directory_size_bytes(path: Path, limit: int = 0) -> int:
    """Total size in bytes of regular files under ``path`` (symlinks not followed).

    Best-effort: files that disappear or can't be stat'd mid-walk are skipped.
    Used as a cheap (stat-only) pre-flight to estimate the cost of streaming a
    local target into the sandbox before we actually try to copy it.

    Directories that can't be listed (e.g. permission denied) are logged and
    skipped rather than silently dropped — so an under-count is at least
    visible — but the returned total then excludes their contents.

    When ``limit`` is positive the walk stops as soon as the running total
    exceeds it — the returned value is then an over-estimate, but that is fine
    for a "is this too big?" pre-flight check.
    """

    def _on_walk_error(error: OSError) -> None:
        logger.warning("Could not read %s while measuring size: %s", error.filename, error)

    total = 0
    for root, _dirs, files in os.walk(path, followlinks=False, onerror=_on_walk_error):
        for name in files:
            file_path = os.path.join(root, name)  # noqa: PTH118
            try:
                if os.path.islink(file_path):  # noqa: PTH114
                    continue
                total += os.path.getsize(file_path)  # noqa: PTH202
            except OSError:
                continue
        if limit > 0 and total > limit:
            return total
    return total
```

### Issue 2 of 2
strix/interface/utils.py:1269-1270
Pass `max_bytes` as the early-exit `limit` so the walk stops as soon as it is clear the target is oversized, rather than always traversing the full tree.

```suggestion
        size = directory_size_bytes(Path(target_path), limit=max_bytes)
        if size > max_bytes:
```

Reviews (2): Last reviewed commit: "Update strix/runtime/docker_client.py" | Re-trigger Greptile

Comment thread strix/interface/main.py Outdated
Comment thread strix/runtime/docker_client.py Outdated
mhspektr and others added 2 commits June 19, 2026 07:34
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@bearsyankees

Copy link
Copy Markdown
Collaborator

@greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Local-code targets silently fail to copy when target dir is large (/workspace/ ends up empty)

2 participants