You created your ONNX session with ONNX_DEFAULT — meaning "use GPU if available." The EA loads without errors. Inference runs. The model produces predictions. Everything looks fine.
But how do you know the GPU is actually being used?
The default behavior of ONNX Runtime is to silently fall back to CPU when the CUDA execution provider can't handle something — an unsupported operator, a memory constraint, a missing kernel. The EA keeps running, just at CPU speed. Unless you check, you might be paying for GPU hardware that does nothing.
Three methods, from fastest to most thorough.
What's in this article
Method 1: nvidia-smi while EA runs (10 seconds)
The crudest but fastest check: while your EA is running, open a command prompt and run:
If terminal64.exe appears in the process list with non-zero GPU memory, MT5 has loaded something onto the GPU. That's a strong signal — but not definitive. The session is initialized on GPU; individual OnnxRun calls could still fall back.
Better: watch in real time
Run nvidia-smi -l 1 — updates every second. Then trigger an inference and watch the GPU utilization spike. If utilization stays at 0% while inferences happen, CPU fallback is occurring.
Method 2: verbose logging (one line of code)
Add ONNX_LOGLEVEL_VERBOSE to the flags you pass to OnnxCreate:
On session creation, the Experts log will fill with the runtime's internal decisions. Two lines to look for:
If you see the second pattern, the GPU initialization failed — check the lines above for the underlying reason (usually GPU compute capability too low, or driver issue).
Important: remove ONNX_LOGLEVEL_VERBOSE for production. It writes hundreds of lines per session. Keep it only during integration.
Method 3: profiling JSON (definitive)
The verbose log tells you which execution provider was added. The profiling JSON tells you which execution provider actually ran each node. This is the only way to confirm that not just session-level but per-node execution happened on GPU.
Run the EA for a minute, then stop it. The runtime writes a JSON to MQL5\Files\OnnxProfileReports\<EA name>_<date>_<time>.json. Open it in any text editor and search for "args":{"provider":. Each occurrence is a node execution — tagged with either "CUDAExecutionProvider" or "CPUExecutionProvider".
If all nodes are CUDAExecutionProvider: clean GPU execution. Optimal.
If most are CUDA, some are CPU: partial fallback, with the Memcpy overhead we covered in the Memcpy nodes article. Often acceptable.
If all are CPU: total fallback. The GPU isn't being used. Diagnose below.
If CPU fallback is happening, why?
Three causes, in order of frequency:
- Your GPU is below Turing (compute 7.5). The most common cause. Diagnosis:
nvidia-smi --query-gpu=name,compute_cap --format=csv. Fix: CUBLAS error guide. - You set
ONNX_USE_CPU_ONLYexplicitly somewhere. Maybe in older code you copied. Search your.mq5for the flag — remove it fromOnnxCreateandOnnxRunif present. - The model has operators the CUDA execution provider doesn't support. The runtime falls back gracefully but completely if it can't get a clean GPU graph. Verbose logging will show the unsupported op. Fix: re-export with simpler ops, or use
onnx-simplifier.