Fix "CUBLAS_STATUS_ARCH_MISMATCH" when loading an ONNX model in MT5

You compiled your Expert Advisor, attached it to a chart, and instead of the model running, the Experts log filled with this:

MetaTrader 5 — Experts log

ONNX: create session failed with 1 status 'Exception during initialization: ...cuda_call.cc... CUBLAS failure 8: CUBLAS_STATUS_ARCH_MISMATCH ; GPU=0 ; hostname=... ; expr=cublasCreate_v2(&cublas_handle_) ERROR[5800] Failed to load .../MyModel.onnx WARNING: ONNX models failed to load — EA will not trade

Short answer: your GPU is too old. The ONNX Runtime shipped with MetaTrader 5 Build 5572 requires an NVIDIA card with compute capability 7.5 or higher (the Turing architecture, released in 2018). Your card is from an earlier generation — Pascal, Maxwell, Kepler — and the cuBLAS library refuses to initialize against it.

The fix has two paths. Either force the runtime to skip the GPU entirely (one-line code change, ships in 30 seconds), or move to compatible hardware. Both are documented below, plus how to confirm the diagnosis before you start.

What's in this article

Confirm the diagnosis (1 minute)
Fix 1: force CPU with ONNX_USE_CPU_ONLY
Fix 2: check your driver (rarely the cause)
Fix 3: move to a Turing-or-newer GPU
Why this error exists (background)

Confirm the diagnosis (1 minute)

Before changing code or buying hardware, verify your GPU's architecture. Open a command prompt and run:

Command Prompt

> nvidia-smi --query-gpu=name,compute_cap --format=csv name, compute_cap NVIDIA GeForce RTX 3060, 8.6 // 8.6 = Ampere — works NVIDIA GeForce GTX 1660, 7.5 // 7.5 = Turing — works (the floor) NVIDIA GeForce GTX 1080 Ti, 6.1 // 6.1 = Pascal — FAILS NVIDIA Tesla M40, 5.2 // 5.2 = Maxwell — FAILS

If the compute_cap column shows a number below 7.5, you've confirmed the cause. Build 5572's CUDA backend is compiled against compute capability 7.5 as its minimum, and the runtime hard-fails at cublasCreate_v2() for anything below that floor.

What if nvidia-smi isn't installed?

Open Device Manager → Display adapters, find your NVIDIA card, and Google "<exact name> compute capability." Anything from GTX 16-series, RTX 20/30/40/50, Quadro RTX, Tesla T4, or A100/H100/H200 is compute 7.5+. Anything older isn't.

Fix 1: force CPU with `ONNX_USE_CPU_ONLY`

The fastest fix — the one that lets you ship today — is to tell MQL5 to skip the GPU entirely. The ONNX Runtime works perfectly well on CPU; for many models (anything sub-1M parameters), it's only modestly slower than GPU once you account for data-transfer overhead. Sometimes it's faster.

The flag you want is ONNX_USE_CPU_ONLY. Pass it to OnnxCreate or OnnxCreateFromBuffer:

EA.mq5 — force CPU on session creation

// BEFORE — tries GPU, fails with CUBLAS_STATUS_ARCH_MISMATCH ExtHandle = OnnxCreateFromBuffer(ExtModel, ONNX_DEFAULT); // AFTER — skips GPU; loads on CPU; works on any hardware ExtHandle = OnnxCreateFromBuffer(ExtModel, ONNX_USE_CPU_ONLY); // If you'd rather force CPU per-call only, you can also do this: OnnxRun(ExtHandle, ONNX_USE_CPU_ONLY, input, output);

Recompile, attach to chart, and check the Experts log. The create session failed line is gone, the model loads, and inference runs — on the CPU.

The flag name changed in Build 5572

If you've seen older articles use ONNX_CUDA_DISABLE, that flag no longer exists. Build 5572 renamed it to ONNX_USE_CPU_ONLY. Code that still references the old name will not compile. See our Build 5572 explainer for the full list of renamed flags.

Will running on CPU be slow enough to matter?

It depends entirely on the model:

Gradient-boosted trees (LightGBM, XGBoost): CPU is the right execution provider anyway. No GPU benefit. Ship with ONNX_USE_CPU_ONLY forever.
Small MLPs (under 100k parameters): CPU is typically 2–5 ms per inference. Fine for an EA that only predicts on bar close (every minute or hour, not every tick).
LSTM / GRU sequence models (typical retail size, 100k–1M parameters): CPU is 10–50 ms. Acceptable for bar-close inference, marginal for tick-by-tick.
Transformer-class or larger LSTMs (multi-million parameters): CPU may be 100+ ms. This is where the GPU genuinely matters — upgrade the hardware.

For most ONNX-in-MQL5 use cases, the model is in the first three categories and the CPU is fine.

Fix 2: check your driver (rarely the cause)

Before declaring the GPU dead, eliminate one outside chance: an outdated NVIDIA driver. The official MetaQuotes recommendation is to keep the driver current, and in theory a very old driver against a current CUDA runtime can throw confusing errors. In practice, when the underlying cause is "compute capability too low," updating the driver doesn't fix it — the error returns. But it's a 5-minute check before you commit to one of the harder fixes.

Download the latest driver from nvidia.com/Download for your card.
Install. Restart MetaTrader 5 (full restart, not just chart reattach).
Re-run the EA. If the same CUBLAS_STATUS_ARCH_MISMATCH appears, you've ruled out the driver — the hardware is the constraint.

Fix 3: move to a Turing-or-newer GPU

If your model genuinely needs GPU acceleration — you've benchmarked CPU and it's too slow — the only real fix is hardware. Two paths:

Local hardware

The lowest-cost Turing-or-newer card that gives you a working CUDA backend is the GTX 1660 Super (compute 7.5, ~$200 used in 2026). The lowest-cost new card is the RTX 4060 (compute 8.9, ~$300). Either is more than enough for any typical EA inference workload. Above that, the RTX 4070 / 4080 / 4090 buy you headroom for training, not just inference.

Cloud GPU

If you don't want to commit to hardware, or if your trading workstation is a laptop, GPU cloud providers rent CUDA-compatible GPUs by the hour. RunPod and Vast.ai run from a few cents per hour for the cheapest cards; DigitalOcean and Vultr offer flat monthly pricing. Important caveat: MT5 requires Windows, so when you spin up a cloud instance, make sure to select a Windows image (not the default Ubuntu).

GPU cloud with CUDA-compatible cards

Rent a Turing-or-newer GPU by the hour.

All four providers below offer cards above the compute 7.5 floor (T4, RTX 4090, A100, H100). The right choice depends on your throughput needs and how comfortable you are with their interfaces:

RunPod → Vast.ai → DigitalOcean → Vultr →

Affiliate disclosure: links above are affiliate links. We compare them honestly in our GPU cloud guide.

Why this error exists (background)

The acronyms are doing a lot of work in the error message, so this is the short version of what's happening underneath.

cuBLAS is NVIDIA's Basic Linear Algebra Subroutines library — the implementation of matrix-multiplication primitives that all GPU ML frameworks ultimately call. ONNX Runtime uses cuBLAS for the matmul-heavy ops in your model.

"Arch mismatch" means the cuBLAS binary was compiled against a target compute capability that your GPU doesn't meet. NVIDIA's CUDA toolchain compiles GPU code to a virtual ISA called PTX, then specializes it for specific generations. Build 5572's runtime was compiled with Turing (compute 7.5) as the lowest target — making the binary smaller and faster than a "compile for everything" build, at the cost of dropping support for older cards. When cuBLAS tries to initialize on a Pascal card (compute 6.1), there's no compatible code path, and it returns failure code 8 (CUBLAS_STATUS_ARCH_MISMATCH).

The error is not a bug. It's an explicit decision by MetaQuotes (and reasonable, given that Turing is now 8 years old). The cost is that anyone trading on hardware from before 2018 has to either fall back to CPU or replace the GPU.

Summary

Cause: GPU compute capability below 7.5 (Turing).
Confirm: nvidia-smi --query-gpu=name,compute_cap --format=csv
Fastest fix: ONNX_USE_CPU_ONLY in OnnxCreate.
Right fix if you actually need GPU: upgrade to Turing-or-newer hardware, or rent one from a GPU cloud.

// related

background

What Build 5572 changed for ONNX + CUDA

hardware

Which CUDA GPUs work with MT5

error fix

ERROR 5800 — Failed to load ONNX model

infrastructure

Best GPU cloud for ONNX inference