CLI
The nex subcommands — tokens, parse, elaborate, run, test, compile — and what each one does.
Subcommands
nex tokens <file> # lex; print the token stream
nex parse <file> # lex + parse; print the AST
nex elaborate <file> # parse + name resolution; print the typed AST
nex run <file> # parse, elaborate, execute via the tree-walking interpreter
nex test <file> # discover and run every @test function reachable from <file>'s project root
nex compile [--backend B] <file> # emit IR and invoke a backend toolchain to produce a native binary
<file> is always a single .nex source. For test and compile, the directory containing <file> is the project root — the import graph is discovered from there.
--backend (only for compile) selects the codegen pipeline:
llvm(default) — emits LLVM IR text and invokesclang -O1. Covers the full language surface.mlir— emits MLIR vialinalg/arith/scf/tensor/funcdialects and runsmlir-opt→mlir-translate→clang -O1. Experimental and growing on the strangler-fig pattern: scalar arithmetic (+ - * / div % ^, unary-), comparisons (scalar and array-element-wise, returning[bool]) + bool +and/or/not,min/max/abs, the libm transcendentals (sqrt/exp/log/sin/cos/ …), the explicit conversionsto_real/to_integer, array literals +length+ indexing, array-array element-wise binops (matching element types, or mixed integer / real promoting up the lattice), scalar–array broadcasts (arithmetic and comparison), and the@operator (matmul, matvec, dot) all parity-check against the interpreter. Control flow lowers via thescfdialect — scalarif-expressions, statement-positionif/if/else(lowered to a no-resultscf.ifwith optional else region),while cond do ...loops, andforloops in both forms (for i in lo..hi do ...over an integer range,for x in xs do ...over a rank-1 array) are wired up. Mutable scalarvarbindings lower tomemref.allocaslots so a counter-stylevar n = 0; while ... do n = n + 1works end-to-end. Mutable tensorvarbindings rebind through fresh SSA viatensor.insert/tensor.insert_slice/scf.foraccumulators, covering whole-tensor reassign, scalar index-assign at rank-1 and rank-2 (negative-index wrap + OOB trap), rank-1 slice-assign (open bounds, inclusive, stride > 1), and rank-2 row/column replace (m[i, :] = rhs/m[:, j] = rhs); auto-clone insertions (spec §8.3) become emit-inner identities because SSA tensor semantics already diverge the bindings on the first mutation. Rank-1 and rank-2 slicing (xs[2..5],m[:, 1],m[0..2, 1..3], etc.) routes throughtensor.extract_slicefor both literal and runtime bounds, including the strided forms (xs[0..n by 2],m[..r by k, c]) — the rank-reducing form drops a dimension whenever an axis collapses to a scalar index. Rank-1filter(arr, pred)andflatMap(arr, x -> [...])lower to two-passtensor.empty+tensor.insertbuilders so the output shape is sized exactly. Index-style traps match the interpreter: negative-index wrap onTIndex, plus out-of-bounds traps on both indexing and slice bounds. The MLIR C runtime now carries the samesetjmp/longjmptrap-catching infrastructure the LLVM backend has had since milestone 1 — a thread-localnex_trap_bufplus anex_trap_with(msg)shim that longjmps when anassert_trapsis on the stack and aborts to stderr otherwise — soassertlowers toscf.if !cond { call nex_assert_failed }with the message extracted from the optionalnex_strat compile time,assert_traps(fn, sub?)lowers to anex_assert_trapscall that pulls(fn_addr, env_addr)out of the closure descriptor via the existingnex_closure_fn/nex_closure_envhelpers (with the thunk cast toint64_t (*)(int64_t)so unit / int / pointer return types all flow), and integerdiv/%, real/, and int-div-int-to-real/get anscf.if-guardednex_trap_int_div_zerocheck before the divide.assert_trapscases that close over an array value inside the closure are still deferred until the tensor-capture closure ABI lands, andassert_approx(rank-1, rank-2, and complex variants) is also deferred pending dedicated emitters. User-defineddeffunctions lower tofunc.func @nex_user_<name>_<id>at module level — scalar (i64 / f64 / i1) and string parameters and return types are supported, plus tensor params / tensor returns (including tensor-returningif-expressions) and unit-returning defs (def scale(v: mut [real], k: real) = ...); recursion, mutual recursion, forward references, and prelude consts (pi/e) referenced from inside a user def all flow naturally because module-level ops in MLIR are order-independent. Value-returning defs with earlyreturninside a loop or branch are desugared to an equivalentif-expression chain before codegen, soreturndoesn’t need a dedicated MLIR shape.mutscalar parameters lower tomemref<T>ref slots — the callee binds the param as a varSlot, reads and assignments flow throughmemref.load/memref.storeon the same memref the caller’s slot uses, and the call site forwards its slot register directly so writes are visible on both sides without a copy.mutrank-1 and rank-2 tensor parameters take a different shape: they are admitted as plaintensor<...>by-value rather than boxed throughmemref<tensor<...>>, and the callee seeds them into itsvar-tensor map so the same index-assign / slice-assign machinery covers writes inside the function body. The elaborator’sNexLifetimeauto-clone pass (spec §8.3) wraps the call-site arg in aTClonewhenever the caller’svaris read after themutcall, so visible-aliasing cases are diverged at the language level before MLIR ever sees them. Compound assignment (+= -= *= /= %=) rides the existingvar-slot and index-assign machinery for free, because the elaborator already desugarsn += ktoTAssign(n, n + k)before the backend sees it. Top-levelconst X = ...inits flow through the same emitter as ordinary expressions, soconst X = sqrt(2.0)and recursive-fact/cos(pi/2)style initializers compile without a dedicated shape. Closures lower end-to-end: eachTLambdawhose signature and captures fit i64-shaped LLVM-compatible slots becomes a synthetic top-levelllvm.func @nex_lambda_<N>(%env, args...); free-var captures are classified ByVal (immutable) or ByRef (the source is avar), and anyvarreferenced inside any lambda is “boxed” out of its parent’smemref.allocainto a heap cell so the box outlives the parent frame and mutations through either side stay visible. Closure construction sites build a heap descriptor{fn_ptr, env_ptr}viallvm.mlir.addressof+llvm.ptrtoint, the closure type lowers as an opaquei64, and indirect call extracts(fn, env)from the descriptor and dispatches throughllvm.call. Top-leveldefs reached in value position (passed to a HOF, bound to aval, returned) lower to a closure descriptor with a synthetic thunk function pointer and a null env, so a top-level def can be passed wherever a closure is expected. Arrays of closures work ([f, g, h]andfor f in fs do f(x)), andmap(xs, f)accepts a val- or arg-bound closure value rather than only an inline lambda — the rank-1 driver extracts(fn, env)once outside thescf.forand dispatches per-iteration.if-expressions can return string values (if cond then "yes" else "no") since the i64 string-descriptor encoding matches the closure-style boundary type thescf.ifregions already use. User-defined generic functions (Spec §6.10) flow through to MLIR transparently:NexMonomorphizeruns in the elaborator before any backend sees the typed IR, so identity / numeric-constrained / ord-constrained / two-type-param / generic-lambda / overload-resolved call sites all reach codegen as concrete monomorphized defs without any MLIR-side dispatch logic. Structs and tuples lower to!llvm.struct<(...)>from the LLVM dialect — the pass pipeline already runsconvert-func-to-llvm, so LLVM-dialect types flow through func-boundary signatures unchanged. Struct construction is anllvm.mlir.undefof the struct type folded withllvm.insertvalueper declared slot (int → real promoted at the slot boundary); field reads emitllvm.extractvalueat the declared field index; nested-field writes (o.i.v = 99) walk theTFieldchain back to itsTVarRefroot, recursivelyextractvaluealong the path, perform the leafinsertvalue, theninsertvaluethe updated sub-struct back out, with the final SSA rebound at the var.var-bound structs and tuples track per-var SSA registers parallel to thevar-tensor scheme, so auto-clone falls out for free:var c = bcopies the SSA register but the first write through either binding produces a fresh value, leaving the other binding unaffected. Tuples are the positional twin (no field names);TTupleProjlowers toextractvalueand theprint(tuple)/s"$tup"paths share anemitTupleToStringformatter. Tuple-returning and struct-returningif-expressions flow through the typed-if scaffold via parallelMTuple/MStructarms. Scalarcomplexvalues lower as!llvm.struct<(f64, f64)>from the same LLVM-dialect path — a pack/unpack helper pair feeds per-component lowerings for+ - * /(with a zero-denominator trap on/), equality and unary minus, the field projections.reand.im, theto_complexconversion, the modulus|z|, complex-armif-expressions, andprint/s"..."formatting; a branchlesssign(x)built-in is along for the ride and unlocks the prelude’s source-defined complex extensions forsqrt/exp/log/sin/cos/tan, which lower through the same monomorphized-defmachinery as any other prelude function. User-defined generic structs (Spec §3.6) flip transparently for the same monomorphization reason —Box[T],Pair[A, B], nested generic structs, generic-into-generic-fn, ord-constrained generic struct, explicit-annotation construction, and generic-struct field write all reach codegen as concreteMStruct(...)lowerings. String values are represented as opaque i64 descriptor pointers — literals share a module-levelmemref.globalpool, the runtime ships refcountedconcat/eq/from_i64/from_bool/from_doublehelpers, and refcount discipline (–1 immortal for literals, +1 heap for concat / value-to-string results, dec on every consume site) mirrors LLVM exactly;print("..."), string==/!=/+,s"..."interpolation withinteger/bool/real, andf"..."format-spec values (%d/%f/%e/%g/%x/%X/%o/%bwith width /0/-flags and.precision) all parity-check against the interpreter, and the real-value branch uses a shortest-round-trip helper that matches Java’sDouble.toStringbyte-for-byte. The array buildersrange/zeros/ones/linspaceplustransposeround out the rank-1 / rank-2 surface. The corpus opts cases in one at a time via a per-casemlir = trueflag inNexProgramCorpus— seeNexCorpusMlirParityTestsfor what’s currently covered. User-defined enums (Spec §3.7 / §3.8) lower as tagged unions!llvm.struct<(i32, !llvm.array<N x i64>)>whereNis the max field count across variants and each field occupies one i64 payload slot — int as identity, real ↔ i64 viallvm.bitcast, bool ↔ i64 viallvm.zext/llvm.trunc, string as the i64 descriptor it already is.matchwalks arms in source order as a chain ofscf.if -> (T)(or no-resultscf.iffor unit-typed arms), each variant pattern testingarith.cmpi eq %tag, idxand binding sub-pattern fields byllvm.extractvalue [1, i]+ per-type decode; var- and wildcard-patterns terminate as the default arm. A per-enumfunc.func @nex_enum_str_<Name>flushed after@mainpowersprint(enum)ands"${enum}". Generic enums (Opt[T],Result[T, E], ord-constrained generic enum) flip transparently for the same monomorphization reason generic structs already do. Aggregate field types inside a variant (struct- or tensor-typed fields) stay rejected at codegen — a separate lowering sub-project. Arrays of structs are also deferred (a separate struct-element tensor sub-project). The elaborator’sNexFusionpass is now wired into the LLVM compile path (so2 * a + blowers to a single fused loop on the default backend), but is intentionally not run before MLIR codegen yet — the MLIRTFusedLooplowering only covers rank-1 fused loops with statically-known lengths, so any program reaching MLIR with fused IR (a broadcast inside adef, any rank-2 fusion) would raise the “not yet supported” diagnostic. Picking those up is a strangler-fig follow-up. Anything outside the opted-in surface raises a compile-time “not yet supported” diagnostic.
Locating the source prelude
The scalar transcendentals plus their complex extensions live in a prelude/*.nex directory and are auto-imported on every elaboration (see §10 of the spec). The CLI locates that directory at startup; the precedence order is:
--prelude-path <dir>— an explicit CLI flag (highest priority).NEX_HOMEenvironment variable — looks at$NEX_HOME/lib/prelude/.-Dnex.prelude.path=<dir>JVM system property.build.sbtsets this automatically sosbt runandsbt testuse the in-repoprelude/with no extra wiring.- Walk up from the current working directory — at each ancestor, check for a
prelude/sibling and thenlib/prelude/.
If none of the four locate a directory, the compiler falls back to the compiler-built-in name table and proceeds silently. This was the design during the Stage 1/2 migration and remains the fallback today for environments without a sysroot.
Running the CLI from sbt
There is no shipped binary yet; the canonical entry point is sbt’s runMain:
sbt "nexJVM/runMain io.github.edadma.nex.run <subcommand> <file>"
Examples:
sbt "nexJVM/runMain io.github.edadma.nex.run run examples/fft/main.nex"
sbt "nexJVM/runMain io.github.edadma.nex.run compile examples/fft/main.nex"
./examples/fft/main
run is also a published @main entry point on the JVM JAR — once packaged, java -jar nex.jar <subcommand> <file> does the same thing.
Requirements
- JVM 17+ to invoke the CLI.
clangon$PATHfor thecompilesubcommand. The compiler writes LLVM IR text to a temp file and shells out toclang -O1to produce a native binary.- No other native dependencies.
What each subcommand prints
tokens— one token per line, with source position.parse— the untyped AST as a pretty-printed tree.elaborate— the elaborated (typed, name-resolved, lowered) AST. Useful for debugging type inference or fusion-pass output.run— the program’sstdout(whatever itprints); exit status reflects whether a trap occurred.test— one line per discovered test, with a pass/fail marker; a summary at the bottom; non-zero exit on any failure.compile— diagnostics during code generation, then either a native binary at<file-without-extension>or a non-zero exit on error.