Skip to content

[AMD] Add MiniMax-M3-FP4 MI355X ATOMESH update 0623#1940

Open
seungrokj wants to merge 10 commits into
mainfrom
amd/m3_atom_pd_fp4_0623
Open

[AMD] Add MiniMax-M3-FP4 MI355X ATOMESH update 0623#1940
seungrokj wants to merge 10 commits into
mainfrom
amd/m3_atom_pd_fp4_0623

Conversation

@seungrokj

Copy link
Copy Markdown
Collaborator

Summary

  • Add minimaxm3-fp4-mi355x-atom-disagg CI recipe: multi-node disaggregated PD on MI355X via ATOM for MiniMax-M3-MXFP4
  • Refactor server_atom.sh to eliminate all hardcoded MODEL_NAME == "DeepSeek-V4-Pro" / per-model checks — all model-specific config (env vars, parallel flags, MTP flags, KV cache flags, HF overrides) now driven from models_atom.yaml, matching the server_vllm.sh pattern
  • Update models_atom.yaml schema with new fields for env, tp_dp_flags, tp_dp_env, ep_dp_flags, ep_dp_env, mtp_flags, kv_cache_flags, hf_overrides; add entries for MiniMax-M3-MXFP4 and MiniMax-M3-MXFP8 with EAGLE3 MTP flags
  • Fix model HuggingFace path: amd/MiniMax-M3-MXFP8MiniMaxAI/MiniMax-M3-MXFP8 in minimaxm3-fp8-mi355x-atom-disagg
  • Image bump for both FP4 and FP8 MI355X ATOM recipes: rocm/atom-dev:MiniMax-M3-20260623

Fields added to models_atom.yaml

Field Purpose
env Space-separated KEY=VALUE pairs exported unconditionally
tp_dp_flags Parallel flags for TP+DPA mode
tp_dp_env Env vars exported only in TP+DPA mode
ep_dp_flags Parallel flags for EP+DPA mode
ep_dp_env Env vars exported only in EP+DPA mode
mtp_flags Flags prepended to SPEC_ARGS before $DECODE_MTP_SIZE
kv_cache_flags Full --kv_cache_dtype flag string
hf_overrides JSON string passed to --hf-overrides

minimaxm3-fp4-mi355x-atom-disagg Recipe Details

  • Image: rocm/atom-dev:MiniMax-M3-20260623
  • Model: amd/MiniMax-M3-MXFP4
  • Framework: atom-disagg, multi-node disaggregated PD
  • Search space: ISL=8192 and ISL=1024, OSL=1024, 1P1D TP4, conc 1–512

PR Review Checklist

  • Verified that as of the moment of typing this, this is the latest version of PR_REVIEW_CHECKLIST.md
  • Verified that the general code quality meets the InferenceX standard and does not make the code quality any worse.
  • Verified that this PR has passed PR validation. Please link to GitHub Action workflow that shows this.
  • Verified that this PR passes evals. Please link to GitHub Action workflow that shows this.
  • Verified that speculative decoding PRs uses chat templates to align the AL distribution to real world
  • If a company claims that they support vLLM/SGLang as first class LLM inference engines on their hardware, I have verified that the respective vLLM/SGLang submission has been made before additional frameworks (TRT-LLM, ATOM, etc.). The only exceptions are for new hardware, such as MI455X UALoE72, Vera Rubin NVL72, Rubin NVL8, etc., and for new model architectures where there is an actual reason why vLLM/SGLang does not fundamentally support them yet.
  • Verified that the single-node recipes are similar to the official vLLM recipes and/or the SGLang cookbook:
    • If they are not, I have verified that a PR has been opened in vLLM recipe repo or SGLang repo and linked it below in the additional detail section:
  • If any of the above criteria cannot reasonably be satisfied, I have provided additional reasoning below.

🤖 Generated with Claude Code

@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.


感谢你的贡献!对于 vLLM 与 SGLang,请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致,请先创建一个 PR,之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准,使整个 ML 社区都能从你的辛勤工作中受益!谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。如果选择重新运行失败的任务,PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档:https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言,PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准,然后再请求核心维护者审阅。

如需更多帮助,PR 作者可通过 Slack 联系核心维护者。

seungrokj added a commit that referenced this pull request Jun 26, 2026
…nd server_atom.sh refactor (PR #1940)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@seungrokj seungrokj changed the title [AMD] Add MiniMax-M3-FP4 MI355X ATOM disagg + refactor server_atom.sh for YAML-driven model config [AMD] Add MiniMax-M3-FP4 MI355X ATOMESH update 0623 Jun 26, 2026
@seungrokj seungrokj added evals-only Suppress throughput and run only eval jobs; combine with all-evals to expand selection full-sweep-enabled labels Jun 26, 2026
Comment thread benchmarks/multi_node/amd_utils/models_atom.yaml
Comment thread benchmarks/multi_node/amd_utils/server_atom.sh Outdated
Comment thread benchmarks/multi_node/amd_utils/server_atom.sh
@github-actions

Copy link
Copy Markdown
Contributor

@functionstackx

Copy link
Copy Markdown
Collaborator

@seungrokj plz rebase this PR now that 9f02343 is merged to master such that we can avoid delay of resolving conflicts after u do an full perf sweep

seungrokj and others added 5 commits June 26, 2026 12:05
… ATOM config; add minimaxm3-fp4-mi355x-atom-disagg

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nd server_atom.sh refactor (PR #1940)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sagg launch script

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, SPEC_DECODING guard

- Replace fragile eval "$(python3 -c "...")" with heredoc + source tempfile to
  avoid nested quote escaping issues that caused MODEL_ENVS to be empty at runtime
- Fix PREFILL/DECODE_ENABLE_EP comparison from numeric -gt 1 to string = "true"
  to match the "true"/"false" values set by launch scripts
- Fix SPEC_DECODING guard from hardcoded "mtp" to any non-none/non-empty value
  so EAGLE3 and future methods also activate SPEC_ARGS from models_atom.yaml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ewline in models_atom.yaml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@seungrokj seungrokj force-pushed the amd/m3_atom_pd_fp4_0623 branch from baaf92b to 21e9281 Compare June 26, 2026 03:06
@github-actions

Copy link
Copy Markdown
Contributor

…niMax-M3 ATOM recipes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

…ages to 20260623

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

…agg image to 20260622

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AMD evals-only Suppress throughput and run only eval jobs; combine with all-evals to expand selection

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants