Infrastructure — API Reference

Autogenerated reference for evaluators, novelty search / MAP-Elites archives, structured run logs, checkpoint / resume, constant optimization, the AST sanitizer, code-generation primitives, the protected arithmetic operators used in symbolic regression, and shared run-utility helpers (train_test_split, summarize, run_multi_seed).

Types

Arborist.TableFitnessEvaluator — Type

TableFitnessEvaluator <: AbstractEvaluator

Evaluates a function against a table of input/output examples. Fitness is mean squared error over rows where execution succeeded. Returns Inf if more than 50% of rows throw exceptions or exceed the time limit.

Fields

input_cols::Dict{Symbol, DataType}: input variable names and types
output_cols::Dict{Symbol, DataType}: output variable names and types
input_rows::Vector{Dict{Symbol, Any}}: input data rows
output_rows::Vector{Dict{Symbol, Any}}: expected output data rows
time_limit_ns::Int: per-call time limit in nanoseconds (default: 1,000,000)

Arborist.TableFitnessEvaluator — Method

TableFitnessEvaluator(input_cols, output_cols, input_rows, output_rows; time_limit_ns=1_000_000)

Construct a TableFitnessEvaluator with an optional time limit per function call.

Arborist.NoveltyArchive — Type

NoveltyArchive{B}

A thread-safe, append-only collection of behavioral fingerprints used by NoveltySearchEvaluator for k-nearest-neighbor novelty scoring.

Fields

entries::Vector{B}: stored fingerprints, in insertion order.
max_size::Int: cap on the number of entries. When the archive is full, insertion is silently dropped (oldest-out eviction would change novelty scores for already-evaluated genomes; bounded growth is the standard Lehman-Stanley behavior).
add_threshold::Float64: minimum novelty (mean k-NN distance) at which a new fingerprint is added. Setting to 0.0 means add every evaluated fingerprint (saturates fast); setting too high prevents archive growth and starves later evaluations of references.
lock::ReentrantLock: guards entries for parallel=true evaluation.

Arborist.NoveltySearchEvaluator — Type

NoveltySearchEvaluator{F,D,B} <: AbstractEvaluator

Behavior-based evaluator. Returns the negative mean of the k nearest neighbor distances between the current genome's fingerprint and the archive — lower is better, matching the framework's convention.

Type parameters

F: type of the fingerprint function genome -> B.
D: type of the distance function (B, B) -> Float64.
B: type of a single behavioral fingerprint.

Fields

fingerprint_fn::F: extracts a behavioral descriptor from a genome. Typically performs the same rollout the base evaluator would, but records a behavior summary rather than a fitness scalar.
distance_fn::D: distance metric over fingerprints. Should return 0.0 for identical behaviors and positive for different ones.
archive::NoveltyArchive{B}: behavioral memory.
k::Int: number of nearest neighbors used in the novelty score.

Arborist.Checkpoint — Type

Checkpoint{G}

Opaque snapshot of a mid-run single-objective GP evolution. Holds everything _run_evolution! needs to resume: population, fitnesses, generation counter, RNG state, fitness/mean histories, wall-time, and a signature derived from the algorithm config.

Not meant for direct construction — save_checkpoint is invoked internally by solve(...; checkpoint_every, checkpoint_path). Users construct one only when implementing a custom solve loop.

Fields

format_version::Int: internal version of the Checkpoint layout (bumped on incompatible field changes).
arborist_version::VersionNumber: project version from Project.toml.
julia_version::VersionNumber: VERSION at save time.
generation::Int: generation just completed. Resume starts at generation + 1.
population::Vector{G}: final population of the completed generation, sorted best-first.
fitnesses::Vector{Float64}: aligned with population.
rng_state::Any: copy(rng) at save time, so resumed runs draw the same random stream the interrupted one would have.
best_genome::G: the single best across the whole run so far (may be different from population[1] if elitism lost it through breeding).
best_fitness::Float64: paired with best_genome.
fitness_history::Vector{Float64}: per-generation best-fitness trajectory.
mean_history::Vector{Float64}: per-generation mean-finite-fitness trajectory.
wall_time::Float64: cumulative seconds elapsed (does not include pre-resume idle time).
algorithm_signature::UInt64: hash of the algorithm config (see _algorithm_signature) — checked on resume so the user can't hot-swap hyperparameters silently.
hall_of_fame::Any: optional HallOfFame{G} archive at checkpoint time, or nothing when disabled.

Arborist.GenerationLog — Type

GenerationLog

One record per generation in a RunLog. All fields are populated by record!. Fields intended for population in a later phase (Phase F.5) are left as empty containers by F.0.

Fields

generation::Int: 1-based generation number.
best_fitness::Float64: minimum finite fitness in the generation. Inf if no individual evaluated successfully.
mean_fitness::Float64: mean of finite fitnesses.
median_fitness::Float64: median of finite fitnesses.
worst_fitness::Float64: maximum finite fitness. Inf if no finite fitnesses at all.
n_species::Int: number of active species. 1 when speciation is NoSpeciation or speciation state is unavailable.
species_sizes::Vector{Int}: member counts per species, in the same order as the speciation state's internal list. Empty when speciation is inactive or the solve path does not thread a SpeciationSnapshot.
operator_success::Dict{Symbol,Int}: count of offspring per mutation/ crossover operator that beat the parent's fitness. Populated in F.5.
operator_attempted::Dict{Symbol,Int}: count of times each operator was invoked. Populated in F.5.
unique_structures::Int: number of distinct genomes in the generation by serialize-hash. Coarse genotypic diversity proxy.
wall_time::Float64: cumulative wall-clock seconds elapsed since t0 (run start).

Arborist.RunLog — Type

RunLog

A vector-like container of GenerationLog entries. Callers construct one as RunLog() and pass it to solve(...; log=log). Iteration and indexed access are supported via entries(log), length(log), and log[i].

RunLog is mutable: record! appends one entry per generation.

Arborist.SpeciationSnapshot — Type

SpeciationSnapshot

Mutable carrier passed as a kwarg to _apply_speciation!. Populated with the post-culling species count and per-species member sizes so that record! can record them without the solve path re-computing speciation.

Constructed fresh per generation and discarded; not part of the public API.

Arborist.ConstantOptimization — Type

ConstantOptimization(; frequency=25, top_k=5, max_iter=50, tol=1e-8, fd_step=1e-3)

Configuration for the periodic constant-optimization pass. Enable by passing GeneticProgramming(; constant_optimization=ConstantOptimization(), ...).

Fields

frequency::Int: generations between optimization passes (default: 25). The pass runs at the end of each Nth generation — gen 25, 50, 75, ...
top_k::Int: number of top (lowest-fitness) individuals to optimize per pass (default: 5). Applying to all individuals would double evaluation cost every generation; applying only to elites refines the best candidates.
max_iter::Int: maximum BFGS iterations per individual (default: 50).
tol::Float64: gradient-norm convergence tolerance (default: 1e-8).
fd_step::Float64: half-step size for central finite differences (default: 1e-3). Too small amplifies roundoff; too large linearizes too coarsely.

Arborist.ASTSanitizer — Type

ASTSanitizer

Validates ExprGenome expression trees against a whitelist of permitted function calls before @eval compilation. Rejects any expression containing calls to functions outside the whitelist.

This is a defense-in-depth measure for use with LLMMutationOperator. For purely classical GP (no LLM operator), the function set already constrains what can appear, but sanitization adds an explicit check.

Fields

allowed_calls::Set{Symbol}: whitelist of permitted function call symbols
allow_literals::Bool: whether to allow literal values (default: true)
allow_variables::Bool: whether to allow variable references (default: true)

Arborist.ASTSanitizer — Method

ASTSanitizer(; allowed_calls=DEFAULT_SAFE_CALLS, allow_literals=true, allow_variables=true)

Construct an ASTSanitizer with the default mathematical/logical whitelist.

Arborist.FunctionDetails — Type

FunctionDetails(name, args, return_type)

Signature record for a single primitive available to evolved programs. name is the Julia Symbol the evolved code will call, args is the ordered vector of argument types, and return_type is the type produced by the call. FunctionSet collects these into the palette from which create_random_rvalue draws.

Arborist.FunctionSet — Type

FunctionSet(funcs::Set{FunctionDetails})

Container for the set of primitives that evolved expression-tree programs are permitted to call. Populated via add! or by constructing the Set directly, and passed into GenState / GPProblem to define the search space. See default_function_set and boolean_function_set for prebuilt palettes.

Arborist.GenState — Type

GenState(rng, fset, inputs, outputs, num_temps)

Code-generation state shared across the construction and mutation of a single expression-tree genome. Holds the rng used for every random choice (no global state), the FunctionSet palette, the input/output/temp variable dictionaries typed by Symbol => DataType, the set of types in use, and a cached union of all addressable variables. All stochastic helpers in codegen.jl take a GenState and draw exclusively from state.rng, so seeded runs reproduce.

Arborist.LoopLimitExceeded — Type

Custom exception thrown when a loop exceeds its iteration limit.

Functions

Arborist.evaluate — Method

evaluate(fe::TableFitnessEvaluator, f::Function) -> Float64

Evaluate f against the table of examples. Returns mean squared error for rows where execution succeeded. Returns Inf if the function fails on more than 50% of rows or exceeds the per-call time limit.

Uses Base.invokelatest to handle world-age issues from @eval-defined functions.

Arborist.evaluate_cases — Method

evaluate_cases(e::TableFitnessEvaluator, f::Function) -> Vector{Float64}

Return per-row squared error (Inf for rows that raised, timed out, or produced non-finite output). One entry per input row, in row order. Used by lexicase selection.

Arborist.input_signature — Method

input_signature(fe::TableFitnessEvaluator) -> Dict{Symbol, DataType}

Return the input variable names and types expected by this evaluator.

Arborist.output_signature — Method

output_signature(fe::TableFitnessEvaluator) -> Dict{Symbol, DataType}

Return the output variable names and types expected by this evaluator.

Arborist.evaluate_genome — Method

evaluate_genome(g::AbstractGenome, e::NoveltySearchEvaluator) -> Float64

Compute the novelty score: take the genome's behavioral fingerprint, find the k nearest fingerprints in the archive, return the negative mean of those distances. The fingerprint may be added to the archive (under lock) when its novelty exceeds archive.add_threshold and the archive isn't full.

Returns Inf if fingerprint_fn raises. Returns 0.0 when the archive is empty (first genome of the run) — there's nothing to be novel against yet, but adding to the archive seeds it for subsequent calls.

Arborist.load_checkpoint — Method

load_checkpoint(path::AbstractString) -> Checkpoint

Load a checkpoint previously written by save_checkpoint. Raises ArgumentError if the file's Julia version or checkpoint format version differs from the current process — Julia's Serialization format is not stable across minor versions.

Arborist.save_checkpoint — Method

save_checkpoint(ckpt::Checkpoint, path::AbstractString)

Atomically write ckpt to path. Uses Julia's Serialization stdlib. Writes to path * ".tmp" then renames, so a partial file never clobbers an older good checkpoint.

Arborist.entries — Method

entries(log::RunLog) -> Vector{GenerationLog}

Return the vector of GenerationLog entries recorded so far.

Arborist.record! — Method

record!(log::RunLog, gen, fitnesses, genomes, wall_time;
        snapshot=nothing)

Append one GenerationLog to log with aggregate fitness statistics, optional speciation snapshot, structural diversity, and wall-clock time.

gen::Integer: 1-based generation index.
fitnesses::AbstractVector: raw fitness per individual (may contain Inf for failed evaluations).
genomes::AbstractVector: population genomes, parallel to fitnesses.
wall_time::Real: seconds since run start.
snapshot::Union{Nothing, SpeciationSnapshot}: if provided, its n_species / sizes fields are copied into the entry. If nothing, the entry records n_species=1 and species_sizes=[length(genomes)] (the NoSpeciation case).

Arborist.sanitize — Method

sanitize(san::ASTSanitizer, expr::Expr) -> Bool

Return true if the expression tree is safe (all function calls are in the whitelist), false if it contains any unsafe call. Walks the entire AST recursively.

Flags as unsafe:

:call nodes where args[1] is a Symbol not in allowed_calls
:call nodes where args[1] is a qualified name (e.g., Base.run)
:macrocall nodes
:quote or :$ interpolation nodes

Does NOT flag: assignment, block, if, while, for, literal values, variable symbols.

Arborist.sanitize — Method

sanitize(san::ASTSanitizer, body::Vector{Expr}) -> Bool

Check all statements in a genome body.

Arborist.add! — Method

add!(fset, f, nargs, input_type, return_type)

Add a primitive f taking nargs arguments of input_type and returning return_type to fset. Shorthand for building homogeneous-signature entries; for mixed argument types, push a FunctionDetails directly into fset.funcs.

Arborist.add_loop_checks — Method

add_loop_checks(body; limit=10_000)

Instrument a vector of body expressions with loop iteration checks. Returns a new vector (the original is not modified).

Arborist.add_loop_checks_expr — Method

add_loop_checks_expr(expr, limit)

Recursively instrument an expression tree, wrapping each :for and :while node with an iteration counter and a check that throws LoopLimitExceeded if the counter exceeds limit.

Arborist.construct_and_define_function — Method

Construct a Julia function from a signature, body expressions, return expression, and return type. Evaluates the function into the current scope via @eval.

The return_expr may or may not be wrapped in :return; if it is, the value is extracted and re-wrapped with a type assertion.

Arborist.create_harness — Method

Create a function expression wrapping generated body code in a typed, callable function skeleton with initialized temps and outputs.

Returns an Expr that can be @eval'd to define the function.

Arborist.create_random_assignment — Method

create_random_assignment(s::GenState) -> Expr

Generate a random :(lhs = rhs) expression, rejecting self-assignments like x = x. The lvalue is drawn from get_lvalues(s) and the rvalue is built by create_random_rvalue matched to the lvalue's type. Used as the leaf case of random program construction.

Arborist.create_random_block — Method

Create a random block of 1-3 statements. Depth limits nesting recursion.

Arborist.create_random_for_loop — Method

Create a random for loop that iterates over an Int32 range. Body is a random block; depth limits nesting recursion.

Arborist.create_random_function_call — Method

Create a random function call which returns the required type.

Arborist.create_random_if_statement — Method

Create a random if-else statement with a Bool-typed condition. Both branches are random blocks; depth limits nesting recursion.

Arborist.create_random_statement — Method

Create a random statement of any supported type. Depth limits nesting recursion; at depth 0, returns an assignment.

Arborist.create_random_while_loop — Method

Create a random while loop with a Bool-typed condition. Body is a random block; depth limits nesting recursion.

Arborist.default_value — Method

Return a default zero-equivalent value for the given type.

Arborist.get_functions — Method

Return all functions with the given name and argument types.

Arborist.get_functions — Method

Return all functions with the given name.

Arborist.get_similar_random_function — Method

Return a function with the same argument types and return type as the given function call expression.

Arborist.unravel — Function

unravel(tree, expressions=[])

Flatten an Expr tree into a list of all sub-expressions via pre-order traversal.

Arborist.wrap_rvalue — Method

Wrap an rvalue in a function call, if possible.

Arborist.crossover — Method

crossover(s::GenState, parent_a::Expr, parent_b::Expr) -> Tuple{Expr, Expr}

Perform subtree crossover between two parent expression trees.

Strategy:

Flatten both trees with unravel().
Find pairs of sub-expressions with matching types (via get_rvalue_type).
Pick a random compatible pair, deepcopy both parents, and swap the subtrees.
If no compatible pair exists, return deepcopy of both parents unchanged.

Arborist.replace_subtree! — Method

replace_subtree!(tree::Expr, target::Expr, replacement::Expr) -> Bool

Replace the first occurrence of target (by object identity) in tree with replacement. Returns true if a replacement was made, false otherwise.

Arborist.boolean_function_set — Method

boolean_function_set() -> FunctionSet

Return a function set containing boolean operators suitable for boolean GP problems (e.g., even parity).

Includes: AND (&), OR (|), NOT (!), NAND (gp_nand), NOR (gp_nor), XOR (xor), all operating on Bool.

Arborist.default_function_set — Method

default_function_set() -> FunctionSet

Return a default function set containing basic arithmetic, transcendental, and comparison operators suitable for numerical symbolic regression.

Includes:

Binary arithmetic (+, -, *, /, ^) for Float32 and Int32
Unary transcendentals (cos, sin, tanh, exp, sign) for Float32
Binary comparisons (>, <, ==, !=, >=, <=) for Float32 and Int32, returning Bool

Arborist.gp_nand — Method

gp_nand(a::Bool, b::Bool) -> Bool

Boolean NAND: returns !(a & b).

Arborist.gp_nor — Method

gp_nor(a::Bool, b::Bool) -> Bool

Boolean NOR: returns !(a | b).

Arborist.default_protected_function_set — Method

default_protected_function_set() -> FunctionSet

Return a symbolic-regression FunctionSet built around the protected operators. Suitable for ExprGenome-based symbolic regression where evolved programs must evaluate without raising domain errors.

Contents:

Binary arithmetic: +, -, * for Float32 and Int32; pdiv for Float32.
Unary transcendentals: plog, psqrt, pexp, sin, cos for Float32.

The set follows the Nguyen/Keijzer convention used in the modern symbolic regression literature (McDermott et al., 2012). pinv is not included by default — use it as a drop-in replacement for pdiv(1.0, x) problems where an explicit inverse primitive is desired.

TreeGenome users do not need this helper: pass the raw functions directly to DynamicExpressions.OperatorEnum, e.g. OperatorEnum(; binary_operators=[+, -, *, pdiv], unary_operators=[plog, psqrt, pexp, sin, cos]).

Arborist.pdiv — Method

pdiv(a, b)

Protected division. Returns one(a) when |b| < 1.0e-10; otherwise a / b. The canonical Koza-style guard against division by zero.

Arborist.pexp — Method

pexp(x)

Protected exponential, exp(clamp(x, -50, 50)). Prevents Inf overflow for large positive x while preserving finite behavior everywhere else. The clamp bounds correspond to exp(50) ≈ 5.18e21, comfortably within Float64 range.

Arborist.pinv — Method

pinv(x)

Protected multiplicative inverse. Returns zero(x) when |x| < 1.0e-10, otherwise one(x) / x.

Arborist.plog — Method

plog(x)

Protected natural logarithm, log(|x| + 1.0e-10). Always finite and real-valued; tracks log|x| away from zero and saturates at log(PROTECTED_EPS) near zero.

Arborist.psqrt — Method

psqrt(x)

Protected square root, sqrt(|x|). Always finite and real-valued.

Arborist.run_multi_seed — Method

run_multi_seed(f, seeds::AbstractVector{Int}; parallel=false) -> Vector

Call f(seed) for each integer in seeds and return a vector of the results. When parallel=true and Threads.nthreads() > 1, runs concurrently via Threads.@threads — callers must ensure f is thread-safe (no shared mutable state without locking).

Typical use:

fitnesses = run_multi_seed([1, 2, 3, 4, 5]) do seed
    problem = GPProblem(evaluator, TreeGenome{Float32}; seed=seed)
    result = solve(problem, alg)
    result.best_fitness
end
println(summarize(fitnesses))

Arborist.summarize — Method

summarize(xs::AbstractVector{<:Real}) -> NamedTuple

Return (; mean, std, median, min, max, q25, q75, n) for a vector of real values. Non-finite entries are excluded from every statistic so a single Inf fitness does not poison the summary. n reports the number of finite entries used.

Matches the shape most benchmark reporting expects: mean ± std for a quick headline, quartiles for distribution shape. No weak dependency on Statistics (so Pkg.test in a sandboxed environment works).

Arborist.train_test_split — Method

train_test_split(X::AbstractMatrix, y::AbstractVector;
                 test_size=0.2,
                 rng=Random.default_rng(),
                 stratify=nothing
                ) -> (X_train, y_train, X_test, y_test)

Split a feature matrix X (features × samples) and matching target vector y into train and test partitions.

Arguments

X::AbstractMatrix: features-by-samples (columns are samples).
y::AbstractVector: one target per column of X.

Keyword arguments

test_size::Real (default 0.2): fraction of samples to place in the test set. Must be in (0, 1).
rng::AbstractRNG (default Random.default_rng()): RNG used for the permutation. Pass a MersenneTwister(seed) for reproducibility.
stratify::Union{Nothing, AbstractVector} (default nothing): when supplied, a class-label vector of the same length as y. Sampling is done class-wise so the class proportions in both partitions match the input distribution as closely as integer rounding allows.

Returns a 4-tuple (X_train, y_train, X_test, y_test) of sub-matrices and sub-vectors.

Constants

Arborist.DEFAULT_SAFE_CALLS — Constant

Default whitelist of safe function calls — mathematical and logical operations only.