CLI

The nex subcommands — tokens, parse, elaborate, run, test, compile — and what each one does.

Subcommands

nex tokens    <file>                # lex; print the token stream
nex parse     <file>                # lex + parse; print the AST
nex elaborate <file>                # parse + name resolution; print the typed AST
nex run       <file>                # parse, elaborate, execute via the tree-walking interpreter
nex test      <file>                # discover and run every @test function reachable from <file>'s project root
nex compile [--backend B] <file>    # emit IR and invoke a backend toolchain to produce a native binary

<file> is always a single .nex source. For test and compile, the directory containing <file> is the project root — the import graph is discovered from there.

--backend (only for compile) selects the codegen pipeline:

llvm (default) — emits LLVM IR text and invokes clang -O1. Covers the full language surface.
mlir — emits MLIR via linalg / arith / scf / tensor / func dialects and runs mlir-opt → mlir-translate → clang -O1. Experimental and growing on the strangler-fig pattern: scalar arithmetic (+ - * / div % ^, unary -), comparisons (scalar and array-element-wise, returning [bool]) + bool + and / or / not, min / max / abs, the libm transcendentals (sqrt / exp / log / sin / cos / …), the explicit conversions to_real / to_integer, array literals + length + indexing, array-array element-wise binops (matching element types, or mixed integer / real promoting up the lattice), scalar–array broadcasts (arithmetic and comparison), and the @ operator (matmul, matvec, dot) all parity-check against the interpreter. Control flow lowers via the scf dialect — scalar if-expressions, statement-position if / if/else (lowered to a no-result scf.if with optional else region), while cond do ... loops, and for loops in both forms (for i in lo..hi do ... over an integer range, for x in xs do ... over a rank-1 array) are wired up. Mutable scalar var bindings lower to memref.alloca slots so a counter-style var n = 0; while ... do n = n + 1 works end-to-end. Mutable tensor var bindings rebind through fresh SSA via tensor.insert / tensor.insert_slice / scf.for accumulators, covering whole-tensor reassign, scalar index-assign at rank-1 and rank-2 (negative-index wrap + OOB trap), rank-1 slice-assign (open bounds, inclusive, stride > 1), and rank-2 row/column replace (m[i, :] = rhs / m[:, j] = rhs); auto-clone insertions (spec §8.3) become emit-inner identities because SSA tensor semantics already diverge the bindings on the first mutation. Rank-1 and rank-2 slicing (xs[2..5], m[:, 1], m[0..2, 1..3], etc.) routes through tensor.extract_slice for both literal and runtime bounds, including the strided forms (xs[0..n by 2], m[..r by k, c]) — the rank-reducing form drops a dimension whenever an axis collapses to a scalar index. Rank-1 filter(arr, pred) and flatMap(arr, x -> [...]) lower to two-pass tensor.empty + tensor.insert builders so the output shape is sized exactly. Index-style traps match the interpreter: negative-index wrap on TIndex, plus out-of-bounds traps on both indexing and slice bounds. The MLIR C runtime now carries the same setjmp/longjmp trap-catching infrastructure the LLVM backend has had since milestone 1 — a thread-local nex_trap_buf plus a nex_trap_with(msg) shim that longjmps when an assert_traps is on the stack and aborts to stderr otherwise — so assert lowers to scf.if !cond { call nex_assert_failed } with the message extracted from the optional nex_str at compile time, assert_traps(fn, sub?) lowers to a nex_assert_traps call that pulls (fn_addr, env_addr) out of the closure descriptor via the existing nex_closure_fn / nex_closure_env helpers (with the thunk cast to int64_t (*)(int64_t) so unit / int / pointer return types all flow), and integer div/%, real /, and int-div-int-to-real / get an scf.if-guarded nex_trap_int_div_zero check before the divide. assert_traps cases that close over an array value inside the closure are still deferred until the tensor-capture closure ABI lands, and assert_approx (rank-1, rank-2, and complex variants) is also deferred pending dedicated emitters. User-defined def functions lower to func.func @nex_user_<name>_<id> at module level — scalar (i64 / f64 / i1) and string parameters and return types are supported, plus tensor params / tensor returns (including tensor-returning if-expressions) and unit-returning defs (def scale(v: mut [real], k: real) = ...); recursion, mutual recursion, forward references, and prelude consts (pi / e) referenced from inside a user def all flow naturally because module-level ops in MLIR are order-independent. Value-returning defs with early return inside a loop or branch are desugared to an equivalent if-expression chain before codegen, so return doesn’t need a dedicated MLIR shape. mut scalar parameters lower to memref<T> ref slots — the callee binds the param as a varSlot, reads and assignments flow through memref.load / memref.store on the same memref the caller’s slot uses, and the call site forwards its slot register directly so writes are visible on both sides without a copy. mut rank-1 and rank-2 tensor parameters take a different shape: they are admitted as plain tensor<...> by-value rather than boxed through memref<tensor<...>>, and the callee seeds them into its var-tensor map so the same index-assign / slice-assign machinery covers writes inside the function body. The elaborator’s NexLifetime auto-clone pass (spec §8.3) wraps the call-site arg in a TClone whenever the caller’s var is read after the mut call, so visible-aliasing cases are diverged at the language level before MLIR ever sees them. Compound assignment (+= -= *= /= %=) rides the existing var-slot and index-assign machinery for free, because the elaborator already desugars n += k to TAssign(n, n + k) before the backend sees it. Top-level const X = ... inits flow through the same emitter as ordinary expressions, so const X = sqrt(2.0) and recursive-fact / cos(pi/2) style initializers compile without a dedicated shape. Closures lower end-to-end: each TLambda whose signature and captures fit i64-shaped LLVM-compatible slots becomes a synthetic top-level llvm.func @nex_lambda_<N>(%env, args...); free-var captures are classified ByVal (immutable) or ByRef (the source is a var), and any var referenced inside any lambda is “boxed” out of its parent’s memref.alloca into a heap cell so the box outlives the parent frame and mutations through either side stay visible. Closure construction sites build a heap descriptor {fn_ptr, env_ptr} via llvm.mlir.addressof + llvm.ptrtoint, the closure type lowers as an opaque i64, and indirect call extracts (fn, env) from the descriptor and dispatches through llvm.call. Top-level defs reached in value position (passed to a HOF, bound to a val, returned) lower to a closure descriptor with a synthetic thunk function pointer and a null env, so a top-level def can be passed wherever a closure is expected. Arrays of closures work ([f, g, h] and for f in fs do f(x)), and map(xs, f) accepts a val- or arg-bound closure value rather than only an inline lambda — the rank-1 driver extracts (fn, env) once outside the scf.for and dispatches per-iteration. if-expressions can return string values (if cond then "yes" else "no") since the i64 string-descriptor encoding matches the closure-style boundary type the scf.if regions already use. User-defined generic functions (Spec §6.10) flow through to MLIR transparently: NexMonomorphize runs in the elaborator before any backend sees the typed IR, so identity / numeric-constrained / ord-constrained / two-type-param / generic-lambda / overload-resolved call sites all reach codegen as concrete monomorphized defs without any MLIR-side dispatch logic. Structs and tuples lower to !llvm.struct<(...)> from the LLVM dialect — the pass pipeline already runs convert-func-to-llvm, so LLVM-dialect types flow through func-boundary signatures unchanged. Struct construction is an llvm.mlir.undef of the struct type folded with llvm.insertvalue per declared slot (int → real promoted at the slot boundary); field reads emit llvm.extractvalue at the declared field index; nested-field writes (o.i.v = 99) walk the TField chain back to its TVarRef root, recursively extractvalue along the path, perform the leaf insertvalue, then insertvalue the updated sub-struct back out, with the final SSA rebound at the var. var-bound structs and tuples track per-var SSA registers parallel to the var-tensor scheme, so auto-clone falls out for free: var c = b copies the SSA register but the first write through either binding produces a fresh value, leaving the other binding unaffected. Tuples are the positional twin (no field names); TTupleProj lowers to extractvalue and the print(tuple) / s"$tup" paths share an emitTupleToString formatter. Tuple-returning and struct-returning if-expressions flow through the typed-if scaffold via parallel MTuple / MStruct arms. Scalar complex values lower as !llvm.struct<(f64, f64)> from the same LLVM-dialect path — a pack/unpack helper pair feeds per-component lowerings for + - * / (with a zero-denominator trap on /), equality and unary minus, the field projections .re and .im, the to_complex conversion, the modulus |z|, complex-arm if-expressions, and print / s"..." formatting; a branchless sign(x) built-in is along for the ride and unlocks the prelude’s source-defined complex extensions for sqrt / exp / log / sin / cos / tan, which lower through the same monomorphized-def machinery as any other prelude function. User-defined generic structs (Spec §3.6) flip transparently for the same monomorphization reason — Box[T], Pair[A, B], nested generic structs, generic-into-generic-fn, ord-constrained generic struct, explicit-annotation construction, and generic-struct field write all reach codegen as concrete MStruct(...) lowerings. String values are represented as opaque i64 descriptor pointers — literals share a module-level memref.global pool, the runtime ships refcounted concat / eq / from_i64 / from_bool / from_double helpers, and refcount discipline (–1 immortal for literals, +1 heap for concat / value-to-string results, dec on every consume site) mirrors LLVM exactly; print("..."), string == / != / +, s"..." interpolation with integer / bool / real, and f"..." format-spec values (%d / %f / %e / %g / %x / %X / %o / %b with width / 0 / - flags and .precision) all parity-check against the interpreter, and the real-value branch uses a shortest-round-trip helper that matches Java’s Double.toString byte-for-byte. The array builders range / zeros / ones / linspace plus transpose round out the rank-1 / rank-2 surface. The corpus opts cases in one at a time via a per-case mlir = true flag in NexProgramCorpus — see NexCorpusMlirParityTests for what’s currently covered. User-defined enums (Spec §3.7 / §3.8) lower as tagged unions !llvm.struct<(i32, !llvm.array<N x i64>)> where N is the max field count across variants and each field occupies one i64 payload slot — int as identity, real ↔ i64 via llvm.bitcast, bool ↔ i64 via llvm.zext / llvm.trunc, string as the i64 descriptor it already is. match walks arms in source order as a chain of scf.if -> (T) (or no-result scf.if for unit-typed arms), each variant pattern testing arith.cmpi eq %tag, idx and binding sub-pattern fields by llvm.extractvalue [1, i] + per-type decode; var- and wildcard-patterns terminate as the default arm. A per-enum func.func @nex_enum_str_<Name> flushed after @main powers print(enum) and s"${enum}". Generic enums (Opt[T], Result[T, E], ord-constrained generic enum) flip transparently for the same monomorphization reason generic structs already do. Aggregate field types inside a variant (struct- or tensor-typed fields) stay rejected at codegen — a separate lowering sub-project. Arrays of structs are also deferred (a separate struct-element tensor sub-project). The elaborator’s NexFusion pass is now wired into the LLVM compile path (so 2 * a + b lowers to a single fused loop on the default backend), but is intentionally not run before MLIR codegen yet — the MLIR TFusedLoop lowering only covers rank-1 fused loops with statically-known lengths, so any program reaching MLIR with fused IR (a broadcast inside a def, any rank-2 fusion) would raise the “not yet supported” diagnostic. Picking those up is a strangler-fig follow-up. Anything outside the opted-in surface raises a compile-time “not yet supported” diagnostic.

Locating the source prelude

The scalar transcendentals plus their complex extensions live in a prelude/*.nex directory and are auto-imported on every elaboration (see §10 of the spec). The CLI locates that directory at startup; the precedence order is:

--prelude-path <dir> — an explicit CLI flag (highest priority).
NEX_HOME environment variable — looks at $NEX_HOME/lib/prelude/.
-Dnex.prelude.path=<dir> JVM system property. build.sbt sets this automatically so sbt run and sbt test use the in-repo prelude/ with no extra wiring.
Walk up from the current working directory — at each ancestor, check for a prelude/ sibling and then lib/prelude/.

If none of the four locate a directory, the compiler falls back to the compiler-built-in name table and proceeds silently. This was the design during the Stage 1/2 migration and remains the fallback today for environments without a sysroot.

Running the CLI from sbt

There is no shipped binary yet; the canonical entry point is sbt’s runMain:

sbt "nexJVM/runMain io.github.edadma.nex.run <subcommand> <file>"

Examples:

sbt "nexJVM/runMain io.github.edadma.nex.run run     examples/fft/main.nex"
sbt "nexJVM/runMain io.github.edadma.nex.run compile examples/fft/main.nex"
./examples/fft/main

run is also a published @main entry point on the JVM JAR — once packaged, java -jar nex.jar <subcommand> <file> does the same thing.

Requirements

JVM 17+ to invoke the CLI.
clang on $PATH for the compile subcommand. The compiler writes LLVM IR text to a temp file and shells out to clang -O1 to produce a native binary.
No other native dependencies.

What each subcommand prints

tokens — one token per line, with source position.
parse — the untyped AST as a pretty-printed tree.
elaborate — the elaborated (typed, name-resolved, lowered) AST. Useful for debugging type inference or fusion-pass output.
run — the program’s stdout (whatever it prints); exit status reflects whether a trap occurred.
test — one line per discovered test, with a pass/fail marker; a summary at the bottom; non-zero exit on any failure.
compile — diagnostics during code generation, then either a native binary at <file-without-extension> or a non-zero exit on error.