On January 30, 2026, MetaQuotes shipped MetaTrader 5 Build 5572 — the first stable release that exposes CUDA-accelerated ONNX inference directly inside MQL5. For developers running PyTorch or Keras models against tick data in an Expert Advisor, this is the most consequential platform update since native ONNX support landed in 2023.

But the release notes leave a lot unsaid. Which GPUs actually work. Which flags were renamed (or quietly removed). Where the profiling JSON lands. What happens when OnnxRun silently falls back to CPU. This article walks through every change in Build 5572 that touches the ONNX subsystem, with the gotchas the notes don't mention.

The headline change, in one paragraph

Before Build 5572, every OnnxRun call was a CPU operation. The runtime existed inside the MT5 process, but it had no path to the GPU. As of January 30, 2026, MQL5 ships with a CUDA-enabled ONNX Runtime binary, picks up your NVIDIA driver, and routes the compatible nodes of your graph to the GPU automatically. For a transformer-sized model or a long-context LSTM, the speedup is roughly an order of magnitude. For a small gradient-boosted tree, the speedup is approximately nothing — the data-transfer overhead eats the gain. The point isn't that GPU is always faster; the point is that you finally have the choice.

Flags that were renamed, added, removed

The ENUM_ONNX_FLAGS enum was reshuffled in Build 5572. If you copy a snippet from an article written before January 2026, it may not compile. Here's the full delta:

Renamed

Added

Unchanged but worth re-reading

flag cheat-sheet — Build 5572
// CPU / GPU selection ONNX_DEFAULT // pick GPU if available, else CPU ONNX_USE_CPU_ONLY // force CPU (was: ONNX_CUDA_DISABLE) ONNX_GPU_DEVICE_0 ... _7 // pick a specific CUDA device // Logging ONNX_LOGLEVEL_VERBOSE // every line (use during integration) ONNX_LOGLEVEL_WARNING // warnings + errors ONNX_LOGLEVEL_ERROR // errors only (use in production) // Profiling ONNX_ENABLE_PROFILING // dumps JSON to /Files/OnnxProfileReports/ // Runtime hint ONNX_NO_CONVERSION // skip auto type conversion (faster if types match)

Which GPUs work — and which throw on load

This is the section that catches people. Build 5572's CUDA backend is compiled against a specific minimum compute capability, and any card below that floor will fail to load the ONNX session with a hard error.

The floor is the Turing architecture. That means:

If your card is below Turing, the session creation throws a very specific error: CUBLAS failure 8: CUBLAS_STATUS_ARCH_MISMATCH. The error message itself reads cryptically, but it always means the same thing: your GPU's compute capability is below 7.5 (Turing) and the cuBLAS shipped with MT5 refuses to initialize. We documented the full diagnosis and fix in a dedicated error page.

Heads up

"Has an NVIDIA driver installed" is not enough. The driver can be current, the card can show up in nvidia-smi, and the ONNX session can still fail with the architecture error. The constraint is on the silicon, not the driver.

The new profiling JSON

Adding ONNX_ENABLE_PROFILING to the flags you pass to OnnxCreate turns on session-level profiling. When the EA stops (or you call OnnxRelease), a JSON file is written to /MQL5/Files/OnnxProfileReports/. The filename pattern is <EA name>_<date>_<time>.json — one report per session.

The file is structured as an array of events, in the same format used by Chrome's chrome://tracing. Each node in your graph gets a row showing the execution provider it ran on (CUDAExecutionProvider or CPUExecutionProvider), the time it took, and where it sits in the timeline. Two practical uses:

  1. Confirming the GPU is actually being used. If you wanted CUDA but every row says CPUExecutionProvider, the runtime fell back. Common cause: an unsupported op in the model graph.
  2. Finding the bottleneck. Long rows are slow nodes. If a single op is eating 60% of inference time, it's usually a candidate for replacement (e.g. an LSTM cell that could be a GRU, or a layer that could be folded into its neighbor at export time).

We walk through reading the JSON line-by-line in the profiling guide.

The 1 GB embedded-model limit

MQL5 lets you embed an .onnx file as a resource of the compiled EA, using the #resource directive. In Build 5572, the per-resource size cap rose to 1 GB. For practical purposes, that ceiling is gone — a 1 GB model is enormous for any trading workload, large enough to embed BERT-scale transformers without external file management.

The implication is workflow rather than capability: you can now ship a self-contained .ex5 compiled binary that includes the entire model, with no separate file to copy to /MQL5/Files/ on the production VPS. This matters for prop-firm setups where you may not have full filesystem access on the broker's terminal.

One catch

If you use #resource, you have to recompile the EA every time the model changes. For active development, loading the model from /MQL5/Files/ at runtime is more iterative. For production, embed it.

Why your first run is slow now

Up to Build 5571, the ONNX Runtime DLL shipped inside the MT5 installer. Starting with Build 5572, the runtime is fetched on demand — the first time an EA on the machine calls any ONNX function, the platform downloads the appropriate runtime binary. The advantage is that MetaQuotes can push runtime updates (security fixes, new operator support) without releasing a full platform update.

The practical consequence: your first OnnxCreate on a fresh MT5 install may stall for 5–20 seconds while the runtime downloads. After that, the binary is cached and subsequent calls are instant. If you're deploying to a new VPS, run a smoke-test EA once before the live one starts — otherwise the live EA's OnInit() may time out on the first tick.

What to change in your code today

If you have an existing ONNX-using EA from 2024 or 2025, here's the minimum migration:

  1. Replace ONNX_CUDA_DISABLE with ONNX_USE_CPU_ONLY if you were using it. Code that still references the old name will not compile in 5572.
  2. Replace ONNX_DEBUG_LOGS with ONNX_LOGLEVEL_* at the desired verbosity. Production EAs should use ONNX_LOGLEVEL_WARNING to keep the Experts log clean.
  3. Decide your default: GPU or CPU. If your model is small (gradient-boosted trees, small MLPs), the CPU is faster after warm-up — add ONNX_USE_CPU_ONLY and stop worrying about GPU compatibility. If your model has any sequence layers (LSTM, GRU, transformer), benchmark both with ONNX_ENABLE_PROFILING before deciding.
  4. On a multi-GPU box, pin the device explicitly. ONNX_GPU_DEVICE_0 beats relying on the runtime's auto-pick — it makes the choice reproducible across restarts.

For the full working pattern — OnnxCreateFromBuffer, OnnxSetInputShape, OnnxRun, OnnxRelease — refer to the API cheat-sheet, updated for Build 5572.

where to run it

You can't add a GPU to a forex VPS.

Forex VPS providers (QuantVPS, ForexVPS, NYCServers) sell network proximity to broker matching engines — they're CPU-only by design. To run a CUDA-enabled ONNX session, you need a local NVIDIA workstation or a GPU cloud instance. We compare four cloud providers honestly:

Affiliate disclosure: some links above are affiliate links. We earn a commission if you sign up — recommendations are independent.


The bottom line

Build 5572 doesn't change what you can do in MQL5 with ONNX — you could load and run models before. It changes how fast and at what model size. Architectures that were impractical on CPU (a few hundred milliseconds per inference) are now sub-10 ms on a Turing-or-newer GPU. The cost is the new constraint: your hardware has to be 2018-or-newer NVIDIA, your code has to use the new flag names, and your first run on a fresh machine will stall while the runtime downloads.

For anyone shipping a serious ML-backed Expert Advisor in 2026, this build is the new baseline.