This is the page you bookmark. Every working ONNX session in MQL5 follows the same four-step lifecycle: load → configure → execute → release. The functions, their flags, and their failure modes are documented here as one coherent reference, with code that compiles against Build 5572.
What's in this article
The four-step lifecycle
The MQL5 ONNX API mirrors the underlying ONNX Runtime session pattern. You create a session from a model file (or embedded buffer), tell it the exact shape of inputs and outputs, run it against real data, then release the resources. Every working EA does these four steps in order:
Skip step 2 and OnnxRun will fail with an error about undefined input shape. Skip step 4 and you leak GPU memory until MT5 restarts. Every step matters.
Step 1 — OnnxCreate / OnnxCreateFromBuffer
You have two ways to load a model: from a file on disk, or from a byte buffer embedded in the EA itself. Both return a session handle of type long, or INVALID_HANDLE on failure.
From a file
Two things about the filename to know:
- It's relative to
<Terminal Data Folder>\MQL5\Files\. If your model is atMQL5\Files\models\eurusd.onnx, pass"models\eurusd.onnx". - If MQL5 doesn't find the file, it automatically retries with
.onnxappended. So"eurusd"and"eurusd.onnx"both work if the actual file iseurusd.onnx.
From a buffer (embedded as a resource)
For production EAs, embed the model inside the compiled .ex5 file using the #resource directive. This produces a self-contained binary that doesn't depend on filesystem state on the deploy target:
The 1 GB resource size limit raised in Build 5572 means you can embed transformers, large LSTMs, anything that fits in a gigabyte. The trade-off is iteration speed: any change to the .onnx file requires recompiling the EA. For active development, load from disk. For production, embed.
The flags argument
The second parameter to both functions is a bitmask of flags. The common combinations:
| Combination | What it does |
|---|---|
ONNX_DEFAULT | Use GPU if available; fall back to CPU. The right default. |
ONNX_USE_CPU_ONLY | Force CPU. Use when your GPU is below Turing or when CPU is faster for your model. |
ONNX_GPU_DEVICE_0 | Pin to GPU 0 (multi-GPU systems). _1..._7 for other devices. |
ONNX_USE_CPU_ONLY | ONNX_LOGLEVEL_VERBOSE | CPU only, with full debug logging. Use during integration to see what the runtime is doing. |
ONNX_ENABLE_PROFILING | Dumps execution-provider trace to JSON. Combine with any of the above. |
For the complete flag reference and what changed in Build 5572, see the Build 5572 explainer.
Step 2 — OnnxSetInputShape / OnnxSetOutputShape
When you exported the model from Python, you almost certainly declared some dimensions as dynamic — typically the batch dimension. ONNX stores this as a symbolic name ("batch") rather than a number. MQL5's runtime needs that number resolved before OnnxRun can execute. This is the step everyone forgets.
The signature for both is the same:
- handle: the session from step 1.
- index: which input or output you're configuring. Most models have one input (index 0) and one output (index 0). If yours has multiple, call
OnnxSetInputShapefor each. - shape array: a
const long[]with one element per dimension, in the same order your Python code uses.
If you don't know your model's exact shape, open the .onnx file in Netron — the visualization tool shows every input and output with its declared shape. Dynamic dimensions appear with their symbolic name; concrete dimensions appear as numbers.
Common mistake
If your model was trained with batch_first=True on a PyTorch LSTM, the input shape is (batch, seq, features). If you set it to (seq, batch, features) in MQL5, the runtime accepts it but the values are misaligned — the model predicts garbage. Match the Python order exactly.
Step 3 — OnnxRun
The actual inference. You pass the session handle, runtime flags, input data, and an output container for the result:
Type matching matters
Your model was exported with a specific input dtype, usually float32. MQL5 matches that with matrixf (float32 matrix) or vectorf (float32 vector). If you pass a double-based matrix to a float32 model, MQL5 silently does the conversion — unless you set the ONNX_NO_CONVERSION flag.
The flag tells the runtime "I promise the types match; skip the check." When they actually match, it's a few percent faster on the hot path. When they don't, you get an immediate error instead of silent miscalculation, which is usually what you want.
OnnxRun runtime flags
The flags passed to OnnxRun are per-call and override session-level flags from OnnxCreate. Common uses:
- Per-call CPU override: session created with
ONNX_DEFAULT(using GPU), but one inference forced to CPU withONNX_USE_CPU_ONLY— useful for benchmarking. - Skip type conversion:
ONNX_NO_CONVERSIONwhen input types already match. - Default behavior: just pass
ONNX_DEFAULTif you have nothing special to set.
Step 4 — OnnxRelease
Every session you create must be released. Call OnnxRelease in OnDeinit, or any time the EA is shutting down a model session:
If you forget this, the model stays loaded in GPU (or CPU) memory until MT5 itself shuts down. For a single session per EA, the leak is bounded. For an EA that creates sessions in a loop — don't ask why anyone would, but it happens — you'll exhaust GPU memory in minutes.
Full working EA template
Putting it all together — an EA that loads an embedded model, sets shapes, runs inference on each new bar, and cleanly releases on shutdown:
This template compiles against Build 5572. Drop your .onnx file into MQL5\Files\models\, set SAMPLE_SIZE to match your model's sequence length, and you have a working ONNX-backed EA.
When something goes wrong
Common failures by step:
| Symptom | Cause | Fix |
|---|---|---|
OnnxCreate returns INVALID_HANDLE, error 5800 | File not found or invalid ONNX | ERROR 5800 guide |
CUBLAS_STATUS_ARCH_MISMATCH in log | GPU below Turing | CUBLAS error fix |
OnnxSetInputShape returns false | Shape array wrong length, or session is invalid | Verify shape with Netron; check handle is valid |
OnnxRun returns false silently | Type mismatch, or NaN in input | Check input dtype matches model; check for NaN/Inf in features |
| Inference returns wrong values | Normalization mismatch between training and runtime | Normalization fix |
Runs on CPU even with ONNX_DEFAULT | Silent CUDA fallback | Verify GPU is being used |
You'll want a GPU somewhere in the pipeline.
The EA above runs on CPU just fine for most retail-sized models. If your model is larger and needs CUDA — or you're training before you ship — rent a CUDA-compatible GPU by the hour:
Affiliate disclosure: some links above are affiliate links. We compare them in the cloud guide.