If your workstation or cloud instance has two or more NVIDIA GPUs, Build 5572 lets you pick exactly which one runs each ONNX session. Eight new flags — ONNX_GPU_DEVICE_0 through ONNX_GPU_DEVICE_7 — cover up to an 8-GPU rig. This article covers the two practical use cases for them: pinning for reproducibility, and load-distributing across multiple EAs.

The default behavior (no flag)

If you pass ONNX_DEFAULT on a multi-GPU system, the runtime picks a device for you — usually device 0, but not guaranteed. On a single-GPU box this is fine. On a multi-GPU box, two problems appear:

Pinning a session to a specific device

pin to GPU 1
ExtHandle = OnnxCreateFromBuffer( ExtModel, ONNX_GPU_DEVICE_1 );

Verify in the Experts log with ONNX_LOGLEVEL_VERBOSE — you'll see CUDA device 1 selected: <card name>. Or check with nvidia-smi while the EA is running — the terminal64.exe process should show GPU memory usage on device 1.

Use case 1: reproducibility on dual-GPU dev box

You have one workstation with two RTX 4070s. When you're debugging an EA's inference behavior, you want the session to land on the same physical card every time, so timing and memory profiles are comparable across runs.

always device 0 for dev
#define DEV_GPU ONNX_GPU_DEVICE_0 ExtHandle = OnnxCreateFromBuffer(ExtModel, DEV_GPU);

Now every run uses the same card. If you upgrade one card to an RTX 4090, you can flip DEV_GPU to ONNX_GPU_DEVICE_1 and ride the faster one.

Use case 2: load-distribute across multiple EAs on one host

You run four EAs simultaneously on a dual-GPU server. Two should land on device 0, two on device 1, to avoid memory contention and to use both cards.

distribute by EA input parameter
input int InpGpuDevice = 0; // 0 or 1 int OnInit() { ulong gpu_flag = (InpGpuDevice == 1) ? ONNX_GPU_DEVICE_1 : ONNX_GPU_DEVICE_0; ExtHandle = OnnxCreateFromBuffer(ExtModel, gpu_flag); // ... rest of init ... }

Set InpGpuDevice in each EA's input parameters at attach time. EAs 1–2 use device 0, EAs 3–4 use device 1.

Edge cases and rules

One EA, one device — not two

The flags pin a session to a device. They do not split a single model across multiple GPUs (that's "model parallelism," and ONNX Runtime doesn't expose it through MQL5). Each EA gets one GPU. Use multiple GPUs to run multiple EAs in parallel, not to accelerate a single one.

If you genuinely need a model that's too big for one GPU, the answer is a bigger GPU — rent an A100 or H100 hourly. See the GPU cloud guide.