 Command

Sam Foreman's personal site. Vim-style keybinds for navigation; theme + font pickers below.

Theme
 Font
Keybinds
Navigation
j / ↓ Next item k / ↑ Previous item g First item in region G Last item in region zz Center focused item h / l Move left/right region ] / [ Next/previous heading } / { Next/previous block ⌃D / ⌃U Half-page down/up
Layout
<zh> / <zl> Toggle left/right sidebar <zj> / <zk> Focus main/navbar <S-h/j/k/l> Focus left/main/navbar/right ⌃H / ⌃L Focus left/right sidebar ⌃J / ⌃K Focus main/navbar ⇧C / ⇧E Collapse / expand all sections
Dialogs
⌃P / : Command palette ⌃X Theme picker / Search ? Show keybinds Esc / ⌃C Close dialog
History
⌃N Next document ⌃B Previous document ⌃O History back ⌃I History forward
 Search
about: Sam Foreman about/more: 🪪 More docs/test: Docs Test ideas: 💡 Ideas more: ➕ More now: Now posts: 📬 Posts webtui: Style projects: 📚 Projects talks: 🎙️ Talks posts/ai-for-physics: ⚛️ AI for Physics posts/auroragpt: 🤖 AuroraGPT posts/2025: 📆 2025 posts/dope-slides: 💅 How to Make Dope Slides posts/jupyter: 📗 Jupyter posts/ezpz-at-alcf: 🍋 ezpz @ ALCF posts/torchtune-aurora: 🪛 Torchtune on Aurora posts/ezpz-v1: 📝 ezpz-v1 posts/resume: 🧑🏻‍💻 Sam Foreman’s Résumé posts/svgbob: 🫥 svgbob posts/torchtune-patch-aurora: 🚑 Torchtune Patch on Aurora webtui/installation/nextjs: Next.js webtui/installation/astro: Astro webtui/installation/astro: ## Scoping webtui/installation/astro: ### Frontmatter Imports webtui/installation/astro: ### ‹style› tag webtui/installation/astro: ### Full Library Import webtui/installation/vite: Vite webtui/contributing/contributing: Contributing webtui/contributing/contributing: ## Local Development webtui/contributing/contributing: ## Issues webtui/contributing/contributing: ## Pull Requests webtui/contributing/style-guide: Style Guide webtui/contributing/style-guide: ## CSS Units webtui/contributing/style-guide: ## Selectors webtui/contributing/style-guide: ## Documentation webtui/components/accordion: Accordion webtui/components/badge: Badge webtui/components/button: Button webtui/components/checkbox: Checkbox webtui/components/dialog: Dialog webtui/components/popover: Popover webtui/components/input: Input webtui/components/pre: Pre webtui/components/progress: Progress webtui/components/radio: Radio webtui/components/range: Range webtui/components/separator: Separator webtui/components/spinner: Spinner webtui/components/switch: Switch webtui/components/table: Table webtui/components/textarea: Textarea webtui/components/tooltip: Popover webtui/components/typography: Typography webtui/components/view: View webtui/plugins/plugin-nf: Nerd Font Plugin webtui/plugins/theme-catppuccin: Catppuccin Theme webtui/plugins/plugin-dev: Developing Plugins webtui/plugins/plugin-dev: ### Style Layers webtui/plugins/theme-custom: Custom Theme webtui/plugins/theme-everforest: Everforest Theme webtui/plugins/theme-gruvbox: Gruvbox Theme webtui/plugins/theme-nord: Nord Theme webtui/plugins/theme-vitesse: Vitesse Theme webtui/start/ascii-boxes: ASCII Boxes webtui/start/changelog: Changelog webtui/start/installation: Installation webtui/start/installation: ## Installation webtui/start/installation: ## Using CSS webtui/start/installation: ## Using ESM webtui/start/installation: ## Using a CDN webtui/start/installation: ## Full Library Import webtui/start/installation: ### CSS webtui/start/installation: ### ESM webtui/start/installation: ### CDN webtui/start/intro: Introduction webtui/start/intro: ## Features webtui/start/plugins: Plugins webtui/start/plugins: ## Official Plugins webtui/start/plugins: ### Themes webtui/start/plugins: ## Community Plugins webtui/start/tuis-vs-guis: TUIs vs GUIs webtui/start/tuis-vs-guis: ## Monospace Fonts webtui/start/tuis-vs-guis: ## Character Cells webtui/start/theming: Theming webtui/start/theming: ## CSS Variables webtui/start/theming: ### Font Styles webtui/start/theming: ### Colors webtui/start/theming: ### Light & Dark webtui/start/theming: ## Theme Plugins webtui/start/theming: ### Using Multiple Theme Accents talks/auroragpt-siam25: AuroraGPT talks/ai-for-science-2024: Parallel Training Methods talks/alcf-hpc-workshop-2024/alcf-hpc-workshop-2024: Deep Learning and Foundation Models at Scale talks/aurora-gpt-fm-for-electric-grid/auroragpt-fm-for-electric-grid: AuroraGPT: Foundation Models for Science talks/demo-slides: AuroraGPT: Training Foundation Models on Supercomputers talks/hpc-user-forum/auroragpt: AuroraGPT talks/incite-hackathon-2025: ALCF Incite Hackathon 2025 talks/llms-on-polaris: Training LLMs on Polaris talks/openskai25: Open SkAI2025 talks/llms-at-scale: Training LLMs at Scale posts/ai-for-physics/l2hmc-qcd: 🎢 L2HMC for LQCD posts/ai-for-physics/diffusion: 🎲 MCMC + Diffusion Sampling posts/2025/06: 06 posts/auroragpt/aurora-gpt: 🏎️ Megatron-DeepSpeed on Intel XPU posts/auroragpt/checkpoints: 💾 Converting Checkpoints posts/auroragpt/long-sequences: 🚂 Loooooooong Sequence Lengths posts/auroragpt/determinstic-flash-attn/deterministic-flash-attn: 🎰 Deterministic `flash-attn` posts/auroragpt/flash-attn-sunspot: 📸 `flash-attn` on Sunspot posts/auroragpt/spike-skipper: 🏔️ Spike Skipper posts/auroragpt/mpi4py-reproducer: 🐛 `mpi4py` bug on Sunspot posts/auroragpt/startup-times: 🐢 Starting Up Distributed Training on Aurora posts/auroragpt/startup-times: ## Response posts/auroragpt/startup-times: ### Measuring / Calculating Startup Time posts/auroragpt/startup-times: ## Minimal Working Example posts/jupyter/l2hmc-4dsu3: 🔳 `l2hmc-qcd` Example: 4D SU(3) posts/jupyter/test: 🏁 `l2hmc` Example: 2D $U(1)$ talks/auroragpt/alcf-hpc-workshop-2024/auroragpt-alcf-hands-on-hpc-workshop-2024: AuroraGPT: ANL's General Purpose Scientific LLM talks/incite-hackathon-2025/auroragpt: LLMs on Aurora: Overview talks/incite-hackathon-2025/ezpz: LLMs on Aurora: Hands-On talks/openskai25/ai4science: Scientific AI at Scale: AuroraGPT talks/openskai25/training: Scientific AI at Scale: Distributed Training posts/2025/04/28: 🔥 Building PyTorch 2.6 from Source on Aurora posts/2025/05/03: 🚧 Frameworks Issue with numpy \› 2 posts/2025/06/02: 🧜‍♀️ Mermaid posts/2025/06/14: 🏗️ Building PyTorch 2.8 from Source on Aurora posts/2025/09/17: 📊 `pbs-tui`: TUI for PBS Job Scheduler Monitoring posts/2025/06/01: 📰 Nice Headings posts/2025/11/12: 🧊 Cooling Down Checkpoints: Best Practices for Model Evaluation posts/2026/02/28: ⏱️ Comparing Launchers on Aurora posts/2026/02/28: ## torchrun posts/2026/02/28: ## ezpz posts/2025/10/06: 🎨 Mixing Between Distributions While Training posts/2026/01/07: 🎉 Happy New Year! posts/2026/05/01: Running 50k Python Processes on Aurora with ezpz yeet posts/2026/04/27: Pre-Training AuroraGPT with TorchTitan posts/2026/04/27: ## Two-Week Summary (Apr 12–27, 2026) posts/2026/04/27: ## Detailed Breakdown posts/2026/04/27: ### Week 1: Apr 12–18 — Benchmarking, LR Finder, XPU Fixes posts/2026/04/27: #### Benchmarking (Apr 12–15) posts/2026/04/27: #### LR Finder (Apr 12–14) posts/2026/04/27: #### Scaling Study (Apr 12) posts/2026/04/27: #### Upstream Syncs (Apr 12–18, syncs 6–14) posts/2026/04/27: #### XPU Bug Fixes (Apr 18) posts/2026/04/27: #### RL Experiment (Apr 18) posts/2026/04/27: ### Week 1.5: Apr 18–25 — Production Readiness posts/2026/04/27: #### Torch 2.12 Benchmarks (Apr 18) posts/2026/04/27: #### LR Finder Extensions (Apr 20–21) posts/2026/04/27: #### XPU Fixes (Apr 23) posts/2026/04/27: #### Torch 2.13 Environment (Apr 25) posts/2026/04/27: #### 2B Scaling Study on Torch 2.13 (Apr 25) posts/2026/04/27: #### Production Training (Apr 25) posts/2026/04/27: ### Week 2: Apr 26–27 — Optimizer Competition posts/2026/04/27: #### RL Multi-Task Refactor (Apr 26) posts/2026/04/27: #### Docs Reorganization (Apr 26) posts/2026/04/27: #### Generic HF Dataset Streaming (Apr 26) posts/2026/04/27: #### New Optimizers (Apr 26) posts/2026/04/27: #### Architecture Tweaks (Apr 26–27) posts/2026/04/27: ## Competition Results posts/2026/04/27: ### Round 1–3: Speedrun — 2N, GBS=48, 1000 steps posts/2026/04/27: ### 10B Full Training — 8N, GBS=384, ~3,178 steps posts/2026/04/27: ### Round 4: Reproducible Speedrun — 2N, GAS=8, GBS=384, 1000 steps posts/2026/04/27: ## Key Discoveries posts/2026/04/27: ## Infrastructure Built posts/2026/04/27: ## High-Level posts/2026/04/27: ## Detailed Breakdown posts/2026/04/27: ### Week 1: Apr 12–18 — Benchmarking, LR Finder, XPU Fixes posts/2026/04/27: #### Benchmarking (Apr 12–15) posts/2026/04/27: #### LR Finder (Apr 12–14) posts/2026/04/27: #### Scaling Study (Apr 12) posts/2026/04/27: #### Upstream Syncs (Apr 12–18, syncs 6–14) posts/2026/04/27: #### XPU Bug Fixes (Apr 18) posts/2026/04/27: #### RL Experiment (Apr 18) posts/2026/04/27: ### Week 1.5: Apr 18–25 — Production Readiness posts/2026/04/27: #### Torch 2.12 Benchmarks (Apr 18) posts/2026/04/27: #### LR Finder Extensions (Apr 20–21) posts/2026/04/27: #### XPU Fixes (Apr 23) posts/2026/04/27: #### Torch 2.13 Environment (Apr 25) posts/2026/04/27: #### 2B Scaling Study on Torch 2.13 (Apr 25) posts/2026/04/27: #### Production Training (Apr 25) posts/2026/04/27: ### Week 2: Apr 26–27 — Optimizer Competition posts/2026/04/27: #### RL Multi-Task Refactor (Apr 26) posts/2026/04/27: #### Docs Reorganization (Apr 26) posts/2026/04/27: #### Generic HF Dataset Streaming (Apr 26) posts/2026/04/27: #### New Optimizers (Apr 26) posts/2026/04/27: #### Architecture Tweaks (Apr 26–27) posts/2026/04/27: ## Competition Results posts/2026/04/27: ### Round 1–3: 1000-step speedruns, 2 nodes, GBS=48 (17 configs) posts/2026/04/27: ### Round 4 (10B full training, 8 nodes, GBS=384, 5 configs) posts/2026/04/27: ### Round 5 (2 nodes, GAS=8, GBS=384, local dataset, 8 configs — in progress) posts/2026/04/27: ## Key Discoveries posts/2026/04/27: ## Infrastructure Built posts/ai-for-physics/l2hmc-qcd/2du1: 🎢 l2hmc-qcd Example: 2D U(1) posts/2026/01/10: 🍋 ezpz: distributed PyTorch across any hardware posts/jupyter/l2hmc/4dsu3: 🔳 l2hmc-qcd Example: 4D SU(3) talks/2025/10/08: AERIS: Argonne's Earth Systems Model talks/2025/09/24: Training Foundation Models on Supercomputers talks/2025/12/16: AuroraGPT: Training Foundation Models on Supercomputers talks/2025/10/15: Training Foundation Models on Supercomputers talks/2026/06/03: Production Pre-Training at Scale: The Good, the Bad, and the Restarts talks/2025/10/24: Training Foundation Models on Supercomputers posts/ai-for-physics/l2hmc-qcd/4dsu3nb/index-broken: 🕸️ l2hmc-qcd Example: 4D SU(3) posts/2025/09/12: 🍹 BlendCorpus + TorchTitan @ ALCF posts/drafts/2025/09/22: 📝 2025 Annual Report
 Theme Current: Light j/k or ↑/↓ + Enter

🍋 ezpz: distributed PyTorch across any hardware

A history and overview of ezpz, with AMD and Intel PyTorch enablement timelines and why portable distributed training across GPU vendors is finally possible.

For most of PyTorch’s first decade, “running PyTorch” effectively meant “running PyTorch on NVIDIA”. Every distributed training script, every profiler, every example notebook assumed CUDA. If you wanted to run the same code on AMD or Intel hardware, you were either going to rewrite a launch script, port a kernel, or maintain a vendor-specific fork — often all three.

That picture has changed faster than most people realize. In the last two years, PyTorch gained native Intel GPU support, AMD shipped day-zero ROCm builds for every PyTorch release, and Intel’s out-of-tree extension is now finishing its phased shutdown.1 You can write one PyTorch script today and run it across NVIDIA, AMD, and Intel hardware with no code changes — if you handle the launch / environment / device-init differences.

That last “if” is what ezpz exists to absorb. This post is mostly about how the vendor landscape got here, and a little about what that means for the launcher.

The two timelines

The clearest way to see the shift is side-by-side: AMD’s gradual ROCm-everywhere strategy, and Intel’s faster but later push to merge IPEX into upstream PyTorch.

%%{init: {'themeCSS': '.titleText{color:var(--foreground1)!important;fill:var(--foreground1)!important;font-size:0.95rem!important;font-weight:700;}.taskText{font-weight:600;font-size:0.74rem!important;}.taskText,.taskTextOutsideLeft,.taskTextOutsideRight,.sectionTitle,.tick text{fill:var(--foreground0)!important;}.taskTextOutsideLeft,.taskTextOutsideRight,.sectionTitle{font-size:0.74rem!important;}.tick text{font-size:0.7rem!important;}.taskTextOutsideRight{text-anchor:start;transform:translateX(0.45ch);}.taskTextOutsideLeft{text-anchor:end;transform:translateX(-0.45ch);}.todayMarker{stroke:var(--red)!important;stroke-width:0.12rem;opacity:0.9;}.grid .tick line{stroke:var(--background3)!important;opacity:0.6;}.section0{fill:color-mix(in oklch,var(--background1) 72%,transparent)!important;}.section1{fill:color-mix(in oklch,var(--blue) 38%,transparent)!important;}.active,.done{fill:color-mix(in srgb,var(--blue) 72%,white 28%)!important;}.crit,.milestone{fill:var(--red)!important;stroke:var(--red)!important;}'}}%% gantt title AMD and Intel PyTorch Enablement Timeline dateFormat YYYY axisFormat %Y section AMD ROCm and PyTorch Torch7 era and early CUDA to HIP ports :amd1, 2012, 2016 ROCm 1.0 and HIPIFY tooling :amd2, 2016, 2020 Official PyTorch ROCm Python packages :amd3, 2021, 2022 PyTorch Foundation governance participation :amd4, 2022, 2023 Triton ecosystem support :amd6, 2023, 2024 MI300x PyTorch guidance :amd7, 2024, 2024 section Intel and PyTorch Initial PyTorch contributions :i2, 2018, 2019 Intel Extension for PyTorch launch :i3, 2020, 2024 VTune ITT API integration in PyTorch :milestone, i4, 2022, 1d PyTorch Foundation Premier membership :milestone, i5, 2023, 1d Prototype native Intel GPU support :milestone, i6, 2024, 1d Solid native Intel GPU support :milestone, i7, 2025, 1d IPEX feature upstreaming completion :milestone, i8, 2025, 1d Intel Extension for PyTorch end of life :milestone, crit, i9, 2026, 1d

Lining the AMD and Intel work up against the actual PyTorch release cadence is illuminating — most of the integration milestones land on specific PyTorch versions:

%%{init: {'themeCSS': '.titleText{color:var(--foreground1)!important;fill:var(--foreground1)!important;font-size:0.95rem!important;font-weight:700;}.taskText{font-weight:600;font-size:0.74rem!important;}.taskText,.taskTextOutsideLeft,.taskTextOutsideRight,.sectionTitle,.tick text{fill:var(--foreground0)!important;}.taskTextOutsideLeft,.sectionTitle{font-size:0.74rem!important;}.taskTextOutsideRight{font-size:0.66rem!important;text-anchor:start;transform:translateX(0.2ch);}.tick text{font-size:0.7rem!important;}.taskTextOutsideLeft{text-anchor:end;transform:translateX(-0.45ch);}.todayMarker{stroke:var(--red)!important;stroke-width:0.12rem;opacity:0.9;}.grid .tick line{stroke:var(--background3)!important;opacity:0.6;}.section0{fill:color-mix(in oklch,var(--orange) 30%,transparent)!important;}.section1{fill:color-mix(in oklch,var(--background2) 76%,transparent)!important;}.section2{fill:color-mix(in oklch,var(--blue) 42%,transparent)!important;}.active,.done{fill:color-mix(in srgb,var(--blue) 72%,white 28%)!important;}.crit,.milestone{fill:var(--red)!important;stroke:var(--red)!important;}'}}%% gantt title PyTorch Vendor Integration Timeline AMD vs Intel dateFormat YYYY-MM-DD axisFormat %Y section AMD Installable PyTorch ROCm Python packages :amd2, 2021-03-04, 1d ROCm marked stable :amd3, 2022-06-28, 1d section PyTorch Releases 1.8 :milestone, crit, pt180, 2021-03-04, 1d 1.12 :pt1120, 2022-06-28, 1d 2.0 :milestone, crit, pt200, 2023-03-15, 1d 2.4 :pt24, 2024-07-24, 1d 2.5 :milestone, crit, pt250, 2024-10-17, 1d 2.6 :pt260, 2025-01-29, 1d 2.7 :pt270, 2025-04-23, 1d 2.8 :crit, pt280, 2025-08-06, 1d 2.9 :pt290, 2025-10-15, 1d 2.10 :pt210, 2026-01-15, 1d section Intel Intel GPU improvements begin :int2, 2024-07-24, 1d Native Intel GPU support in 2.5 :int3, 2024-10-17, 1d Intel GPU eager/compile parity in 2.7 :int4, 2025-04-23, 1d Intel XCCL backend in 2.8 :int5, 2025-04-23, 1d IPEX discontinued :int6, 2025-08-06, 2026-03-31 IPEX end of life :milestone, crit, int7, 2026-03-31, 1d

Heads up: Intel’s separate IPEX project reaches end-of-life in March 2026 — by then, native PyTorch is the only supported path on Intel GPUs.

AMD: a long, quiet build-up

AMD’s path to first-class PyTorch support is a 14-year project that mostly happened out of view. The pre-history goes back to the Torch7 era — well before PyTorch existed in its current form — and it’s not an accident that ROCm landed on Caffe and Torch7 first. AMD was building the porting story (HIP, HIPIFY, the C++ dialect, the toolchain) on the previous generation of frameworks before the new one became production-default.

That patience paid off in three big jumps:

  • 2021 — installable wheels. Before March 2021, you couldn’t just pip install torch and get an AMD-compatible build. Once the ROCm Python packages went official, AMD became a one-line install on supported Linux systems — the same UX as CUDA. PyTorch 1.8 was the first release with that working out of the box.
  • 2022 — governance. AMD joined the PyTorch Foundation as a founding member when the project moved under the Linux Foundation. This was the point at which AMD’s integration stopped being “a vendor patch” and started being a co-owned roadmap.
  • 2023 — day-zero. With PyTorch 2.0, AMD shipped ROCm 6.0 with same-day support, including TorchDynamo / TorchInductor on AMD hardware. This was the first release where you could pick up a fresh PyTorch and have AMD work immediately — no lag, no porting window.

The rest of the timeline is filling in the corners: OpenAI Triton support arrived in 2023, MI300x guidance in mid-2024, native PyTorch on Windows for consumer Radeon cards in late 2025. The overall trajectory is clear: AMD is no longer playing catch-up on the framework. The remaining gaps are about specific kernels, FlashAttention variants, custom collectives — work that lives in extensions, not in PyTorch itself.

Intel: a much faster, much later push

Intel’s story is compressed into a much shorter window — basically four years vs AMD’s fourteen — because Intel arrived after the framework had already standardized. Instead of a slow, parallel ROCm-style stack, Intel went the out-of-tree extension route first (IPEX, 2020) and only started the upstream merge in earnest with PyTorch 2.4 in 2024.

The integration cadence has been remarkably tight:

  • 2.4 (Jul 2024) — first prototype native Intel GPU support
  • 2.5 (Oct 2024) — solid native Intel GPU support landed
  • 2.7 (Apr 2025) — eager + torch.compile parity on Intel GPUs
  • 2.8 (Aug 2025) — XCCL collective backend; IPEX active development ceases
  • 2.10 / Mar 2026 — IPEX project reaches end-of-life

Notable to me: Intel chose to finish upstreaming before retiring the extension. The IPEX EOL date isn’t where the work stops — it’s where the redundancy stops. The features have already moved.

What this means in practice

If you’re writing a new training script today (early 2026), the boilerplate problem has shifted. You used to spend most of the lifting on:

  1. Picking the right torch.distributed backend (nccl, gloo, xccl, rccl, …).
  2. Knowing which environment variables your launcher expects on this particular cluster (MASTER_ADDR, WORLD_SIZE, LOCAL_RANK, PALS_*, PMI_*, OMPI_*, SLURM_*…).
  3. Handling per-vendor device init quirks (torch.cuda.set_device vs xpu.set_device vs hip.set_device).
  4. Then, finally, the model code.

Steps 1–3 are now almost the same across vendors. The collective backends mostly map to the right thing automatically. The device abstraction is unified under torch.accelerator (in 2.7+). What’s left is mostly the launch boilerplate — which is what 🍋 ezpz takes care of:

  • ezpz launch figures out the launcher (mpiexec, srun, torchrun, deepspeed) from the environment.
  • ezpz_setup_* shell helpers normalize the rank/size variables across PBS / SLURM / standalone.
  • ezpz yeet distributes your environment to every node so you don’t pay the Lustre-import tax — covered in Running 50k Python Processes on Aurora.
  • The Python entry points stay vendor-agnostic; device init goes through one helper that picks cuda / xpu / hip based on what’s actually available.

The point isn’t that ezpz is doing anything magical — it’s that the framework finally caught up enough that a small, vendor-agnostic launcher can exist at all. Five years ago, this post would have been about writing per-vendor shims. Today it’s about deleting them.

Detailed timelines

For reference, the full chronology:

AMD

  • Pre-2021 — Torch7 era and CUDA→HIP ports. Torch7 was released in 2012 as a precursor to PyTorch (C++ + CUDA). With ROCm 1.0, AMD demonstrated CUDA→HIP conversion using HIPIFY, including ports of Caffe and Torch7.
  • March 2021 — PyTorch for AMD ROCm becomes officially available as a Python package on supported Linux systems.
  • September 2022 — PyTorch joins the Linux Foundation; AMD is a founding member of the PyTorch Foundation governing board.
  • April 2023 — AMD ships day-zero support for PyTorch 2.0 within the ROCm 6.0 ecosystem, including TorchDynamo/TorchInductor.
  • 2023 — OpenAI Triton support extended to AMD GPUs.
  • June 2024 — MI300x PyTorch guidance published, with near drop-in compatibility for code written for NVIDIA GPUs.
  • September 2025 — Public preview of PyTorch on Windows for select consumer Radeon RX 7000/9000 series GPUs and Ryzen AI APUs (no WSL2 needed).
  • October 2024 — How-to guide for Torchtune (PyTorch LLM fine-tuning library) on AMD GPUs.
  • November 2025 — AMD Software: PyTorch on Windows Edition 7.1.1 with ROCm 7.1.1.
  • 2026 / post-2026 — MI450X rack-scale solution targeting NVIDIA high-end parity in H2 2026; MI500 series in development.

Intel

  • 2018 — Intel begins contributing to upstream PyTorch.
  • 2020 — Intel Extension for PyTorch (IPEX) launches as a separate package for Intel CPUs and GPUs.
  • October 20222 — PyTorch 1.13 ships with integrated Intel VTune ITT API support.
  • August 20233 — Intel joins the PyTorch Foundation as a Premier member.
  • July 2024 — PyTorch 2.4 with prototype native Intel GPU support (client + data center).
  • April 2025 — PyTorch 2.7 establishes solid Intel GPU support in both eager and graph modes (torch.compile) on Windows and Linux.
  • August 2025 — IPEX active development ceases following the PyTorch 2.8 release; most features are upstreamed.
  • End of March 2026 (planned) — IPEX reaches end-of-life. Use native PyTorch directly.

Footnotes

  1. Even now, in 2026, plenty of code is still NVIDIA-centric and is rarely designed with multi-platform support in mind — but the framework no longer is.

  2. PyTorch 1.13 release

  3. Intel Joins the PyTorch Foundation

NORMAL  main  sam.onl /posts/2026/01/10 · Top 1:1