Infrastructure — API Reference
Autogenerated reference for evaluators, novelty search / MAP-Elites archives, structured run logs, checkpoint / resume, constant optimization, the AST sanitizer, code-generation primitives, the protected arithmetic operators used in symbolic regression, and shared run-utility helpers (train_test_split, summarize, run_multi_seed).
Types
Arborist.TableFitnessEvaluator — Type
TableFitnessEvaluator <: AbstractEvaluatorEvaluates a function against a table of input/output examples. Fitness is mean squared error over rows where execution succeeded. Returns Inf if more than 50% of rows throw exceptions or exceed the time limit.
Fields
input_cols::Dict{Symbol, DataType}: input variable names and typesoutput_cols::Dict{Symbol, DataType}: output variable names and typesinput_rows::Vector{Dict{Symbol, Any}}: input data rowsoutput_rows::Vector{Dict{Symbol, Any}}: expected output data rowstime_limit_ns::Int: per-call time limit in nanoseconds (default: 1,000,000)
Arborist.TableFitnessEvaluator — Method
TableFitnessEvaluator(input_cols, output_cols, input_rows, output_rows; time_limit_ns=1_000_000)Construct a TableFitnessEvaluator with an optional time limit per function call.
Arborist.NoveltyArchive — Type
NoveltyArchive{B}A thread-safe, append-only collection of behavioral fingerprints used by NoveltySearchEvaluator for k-nearest-neighbor novelty scoring.
Fields
entries::Vector{B}: stored fingerprints, in insertion order.max_size::Int: cap on the number of entries. When the archive is full, insertion is silently dropped (oldest-out eviction would change novelty scores for already-evaluated genomes; bounded growth is the standard Lehman-Stanley behavior).add_threshold::Float64: minimum novelty (mean k-NN distance) at which a new fingerprint is added. Setting to0.0means add every evaluated fingerprint (saturates fast); setting too high prevents archive growth and starves later evaluations of references.lock::ReentrantLock: guardsentriesforparallel=trueevaluation.
Arborist.NoveltySearchEvaluator — Type
NoveltySearchEvaluator{F,D,B} <: AbstractEvaluatorBehavior-based evaluator. Returns the negative mean of the k nearest neighbor distances between the current genome's fingerprint and the archive — lower is better, matching the framework's convention.
Type parameters
F: type of the fingerprint functiongenome -> B.D: type of the distance function(B, B) -> Float64.B: type of a single behavioral fingerprint.
Fields
fingerprint_fn::F: extracts a behavioral descriptor from a genome. Typically performs the same rollout the base evaluator would, but records a behavior summary rather than a fitness scalar.distance_fn::D: distance metric over fingerprints. Should return 0.0 for identical behaviors and positive for different ones.archive::NoveltyArchive{B}: behavioral memory.k::Int: number of nearest neighbors used in the novelty score.
Arborist.Checkpoint — Type
Checkpoint{G}Opaque snapshot of a mid-run single-objective GP evolution. Holds everything _run_evolution! needs to resume: population, fitnesses, generation counter, RNG state, fitness/mean histories, wall-time, and a signature derived from the algorithm config.
Not meant for direct construction — save_checkpoint is invoked internally by solve(...; checkpoint_every, checkpoint_path). Users construct one only when implementing a custom solve loop.
Fields
format_version::Int: internal version of theCheckpointlayout (bumped on incompatible field changes).arborist_version::VersionNumber: project version from Project.toml.julia_version::VersionNumber:VERSIONat save time.generation::Int: generation just completed. Resume starts atgeneration + 1.population::Vector{G}: final population of the completed generation, sorted best-first.fitnesses::Vector{Float64}: aligned withpopulation.rng_state::Any:copy(rng)at save time, so resumed runs draw the same random stream the interrupted one would have.best_genome::G: the single best across the whole run so far (may be different frompopulation[1]if elitism lost it through breeding).best_fitness::Float64: paired withbest_genome.fitness_history::Vector{Float64}: per-generation best-fitness trajectory.mean_history::Vector{Float64}: per-generation mean-finite-fitness trajectory.wall_time::Float64: cumulative seconds elapsed (does not include pre-resume idle time).algorithm_signature::UInt64: hash of the algorithm config (see_algorithm_signature) — checked on resume so the user can't hot-swap hyperparameters silently.hall_of_fame::Any: optionalHallOfFame{G}archive at checkpoint time, ornothingwhen disabled.
Arborist.GenerationLog — Type
GenerationLogOne record per generation in a RunLog. All fields are populated by record!. Fields intended for population in a later phase (Phase F.5) are left as empty containers by F.0.
Fields
generation::Int: 1-based generation number.best_fitness::Float64: minimum finite fitness in the generation.Infif no individual evaluated successfully.mean_fitness::Float64: mean of finite fitnesses.median_fitness::Float64: median of finite fitnesses.worst_fitness::Float64: maximum finite fitness.Infif no finite fitnesses at all.n_species::Int: number of active species.1when speciation isNoSpeciationor speciation state is unavailable.species_sizes::Vector{Int}: member counts per species, in the same order as the speciation state's internal list. Empty when speciation is inactive or the solve path does not thread aSpeciationSnapshot.operator_success::Dict{Symbol,Int}: count of offspring per mutation/ crossover operator that beat the parent's fitness. Populated in F.5.operator_attempted::Dict{Symbol,Int}: count of times each operator was invoked. Populated in F.5.unique_structures::Int: number of distinct genomes in the generation byserialize-hash. Coarse genotypic diversity proxy.wall_time::Float64: cumulative wall-clock seconds elapsed sincet0(run start).
Arborist.RunLog — Type
RunLogA vector-like container of GenerationLog entries. Callers construct one as RunLog() and pass it to solve(...; log=log). Iteration and indexed access are supported via entries(log), length(log), and log[i].
RunLog is mutable: record! appends one entry per generation.
Arborist.SpeciationSnapshot — Type
SpeciationSnapshotMutable carrier passed as a kwarg to _apply_speciation!. Populated with the post-culling species count and per-species member sizes so that record! can record them without the solve path re-computing speciation.
Constructed fresh per generation and discarded; not part of the public API.
Arborist.ConstantOptimization — Type
ConstantOptimization(; frequency=25, top_k=5, max_iter=50, tol=1e-8, fd_step=1e-3)Configuration for the periodic constant-optimization pass. Enable by passing GeneticProgramming(; constant_optimization=ConstantOptimization(), ...).
Fields
frequency::Int: generations between optimization passes (default: 25). The pass runs at the end of each Nth generation — gen 25, 50, 75, ...top_k::Int: number of top (lowest-fitness) individuals to optimize per pass (default: 5). Applying to all individuals would double evaluation cost every generation; applying only to elites refines the best candidates.max_iter::Int: maximum BFGS iterations per individual (default: 50).tol::Float64: gradient-norm convergence tolerance (default: 1e-8).fd_step::Float64: half-step size for central finite differences (default: 1e-3). Too small amplifies roundoff; too large linearizes too coarsely.
Arborist.ASTSanitizer — Type
ASTSanitizerValidates ExprGenome expression trees against a whitelist of permitted function calls before @eval compilation. Rejects any expression containing calls to functions outside the whitelist.
This is a defense-in-depth measure for use with LLMMutationOperator. For purely classical GP (no LLM operator), the function set already constrains what can appear, but sanitization adds an explicit check.
Fields
allowed_calls::Set{Symbol}: whitelist of permitted function call symbolsallow_literals::Bool: whether to allow literal values (default: true)allow_variables::Bool: whether to allow variable references (default: true)
Arborist.ASTSanitizer — Method
ASTSanitizer(; allowed_calls=DEFAULT_SAFE_CALLS, allow_literals=true, allow_variables=true)Construct an ASTSanitizer with the default mathematical/logical whitelist.
Arborist.FunctionDetails — Type
FunctionDetails(name, args, return_type)Signature record for a single primitive available to evolved programs. name is the Julia Symbol the evolved code will call, args is the ordered vector of argument types, and return_type is the type produced by the call. FunctionSet collects these into the palette from which create_random_rvalue draws.
Arborist.FunctionSet — Type
FunctionSet(funcs::Set{FunctionDetails})Container for the set of primitives that evolved expression-tree programs are permitted to call. Populated via add! or by constructing the Set directly, and passed into GenState / GPProblem to define the search space. See default_function_set and boolean_function_set for prebuilt palettes.
Arborist.GenState — Type
GenState(rng, fset, inputs, outputs, num_temps)Code-generation state shared across the construction and mutation of a single expression-tree genome. Holds the rng used for every random choice (no global state), the FunctionSet palette, the input/output/temp variable dictionaries typed by Symbol => DataType, the set of types in use, and a cached union of all addressable variables. All stochastic helpers in codegen.jl take a GenState and draw exclusively from state.rng, so seeded runs reproduce.
Arborist.LoopLimitExceeded — Type
Custom exception thrown when a loop exceeds its iteration limit.
Functions
Arborist.evaluate — Method
evaluate(fe::TableFitnessEvaluator, f::Function) -> Float64Evaluate f against the table of examples. Returns mean squared error for rows where execution succeeded. Returns Inf if the function fails on more than 50% of rows or exceeds the per-call time limit.
Uses Base.invokelatest to handle world-age issues from @eval-defined functions.
Arborist.evaluate_cases — Method
evaluate_cases(e::TableFitnessEvaluator, f::Function) -> Vector{Float64}Return per-row squared error (Inf for rows that raised, timed out, or produced non-finite output). One entry per input row, in row order. Used by lexicase selection.
Arborist.input_signature — Method
input_signature(fe::TableFitnessEvaluator) -> Dict{Symbol, DataType}Return the input variable names and types expected by this evaluator.
Arborist.output_signature — Method
output_signature(fe::TableFitnessEvaluator) -> Dict{Symbol, DataType}Return the output variable names and types expected by this evaluator.
Arborist.evaluate_genome — Method
evaluate_genome(g::AbstractGenome, e::NoveltySearchEvaluator) -> Float64Compute the novelty score: take the genome's behavioral fingerprint, find the k nearest fingerprints in the archive, return the negative mean of those distances. The fingerprint may be added to the archive (under lock) when its novelty exceeds archive.add_threshold and the archive isn't full.
Returns Inf if fingerprint_fn raises. Returns 0.0 when the archive is empty (first genome of the run) — there's nothing to be novel against yet, but adding to the archive seeds it for subsequent calls.
Arborist.load_checkpoint — Method
load_checkpoint(path::AbstractString) -> CheckpointLoad a checkpoint previously written by save_checkpoint. Raises ArgumentError if the file's Julia version or checkpoint format version differs from the current process — Julia's Serialization format is not stable across minor versions.
Arborist.save_checkpoint — Method
save_checkpoint(ckpt::Checkpoint, path::AbstractString)Atomically write ckpt to path. Uses Julia's Serialization stdlib. Writes to path * ".tmp" then renames, so a partial file never clobbers an older good checkpoint.
Arborist.entries — Method
entries(log::RunLog) -> Vector{GenerationLog}Return the vector of GenerationLog entries recorded so far.
Arborist.record! — Method
record!(log::RunLog, gen, fitnesses, genomes, wall_time;
snapshot=nothing)Append one GenerationLog to log with aggregate fitness statistics, optional speciation snapshot, structural diversity, and wall-clock time.
gen::Integer: 1-based generation index.fitnesses::AbstractVector: raw fitness per individual (may containInffor failed evaluations).genomes::AbstractVector: population genomes, parallel tofitnesses.wall_time::Real: seconds since run start.snapshot::Union{Nothing, SpeciationSnapshot}: if provided, itsn_species/sizesfields are copied into the entry. Ifnothing, the entry recordsn_species=1andspecies_sizes=[length(genomes)](the NoSpeciation case).
Arborist.sanitize — Method
sanitize(san::ASTSanitizer, expr::Expr) -> BoolReturn true if the expression tree is safe (all function calls are in the whitelist), false if it contains any unsafe call. Walks the entire AST recursively.
Flags as unsafe:
:callnodes whereargs[1]is a Symbol not inallowed_calls:callnodes whereargs[1]is a qualified name (e.g.,Base.run):macrocallnodes:quoteor:$interpolation nodes
Does NOT flag: assignment, block, if, while, for, literal values, variable symbols.
Arborist.sanitize — Method
sanitize(san::ASTSanitizer, body::Vector{Expr}) -> BoolCheck all statements in a genome body.
Arborist.add! — Method
add!(fset, f, nargs, input_type, return_type)Add a primitive f taking nargs arguments of input_type and returning return_type to fset. Shorthand for building homogeneous-signature entries; for mixed argument types, push a FunctionDetails directly into fset.funcs.
Arborist.add_loop_checks — Method
add_loop_checks(body; limit=10_000)Instrument a vector of body expressions with loop iteration checks. Returns a new vector (the original is not modified).
Arborist.add_loop_checks_expr — Method
add_loop_checks_expr(expr, limit)Recursively instrument an expression tree, wrapping each :for and :while node with an iteration counter and a check that throws LoopLimitExceeded if the counter exceeds limit.
Arborist.construct_and_define_function — Method
Construct a Julia function from a signature, body expressions, return expression, and return type. Evaluates the function into the current scope via @eval.
The return_expr may or may not be wrapped in :return; if it is, the value is extracted and re-wrapped with a type assertion.
Arborist.create_harness — Method
Create a function expression wrapping generated body code in a typed, callable function skeleton with initialized temps and outputs.
Returns an Expr that can be @eval'd to define the function.
Arborist.create_random_assignment — Method
create_random_assignment(s::GenState) -> ExprGenerate a random :(lhs = rhs) expression, rejecting self-assignments like x = x. The lvalue is drawn from get_lvalues(s) and the rvalue is built by create_random_rvalue matched to the lvalue's type. Used as the leaf case of random program construction.
Arborist.create_random_block — Method
Create a random block of 1-3 statements. Depth limits nesting recursion.
Arborist.create_random_for_loop — Method
Create a random for loop that iterates over an Int32 range. Body is a random block; depth limits nesting recursion.
Arborist.create_random_function_call — Method
Create a random function call which returns the required type.
Arborist.create_random_if_statement — Method
Create a random if-else statement with a Bool-typed condition. Both branches are random blocks; depth limits nesting recursion.
Arborist.create_random_statement — Method
Create a random statement of any supported type. Depth limits nesting recursion; at depth 0, returns an assignment.
Arborist.create_random_while_loop — Method
Create a random while loop with a Bool-typed condition. Body is a random block; depth limits nesting recursion.
Arborist.default_value — Method
Return a default zero-equivalent value for the given type.
Arborist.get_functions — Method
Return all functions with the given name and argument types.
Arborist.get_functions — Method
Return all functions with the given name.
Arborist.get_similar_random_function — Method
Return a function with the same argument types and return type as the given function call expression.
Arborist.unravel — Function
unravel(tree, expressions=[])Flatten an Expr tree into a list of all sub-expressions via pre-order traversal.
Arborist.wrap_rvalue — Method
Wrap an rvalue in a function call, if possible.
Arborist.crossover — Method
crossover(s::GenState, parent_a::Expr, parent_b::Expr) -> Tuple{Expr, Expr}Perform subtree crossover between two parent expression trees.
Strategy:
- Flatten both trees with
unravel(). - Find pairs of sub-expressions with matching types (via
get_rvalue_type). - Pick a random compatible pair, deepcopy both parents, and swap the subtrees.
- If no compatible pair exists, return deepcopy of both parents unchanged.
Arborist.replace_subtree! — Method
replace_subtree!(tree::Expr, target::Expr, replacement::Expr) -> BoolReplace the first occurrence of target (by object identity) in tree with replacement. Returns true if a replacement was made, false otherwise.
Arborist.boolean_function_set — Method
boolean_function_set() -> FunctionSetReturn a function set containing boolean operators suitable for boolean GP problems (e.g., even parity).
Includes: AND (&), OR (|), NOT (!), NAND (gp_nand), NOR (gp_nor), XOR (xor), all operating on Bool.
Arborist.default_function_set — Method
default_function_set() -> FunctionSetReturn a default function set containing basic arithmetic, transcendental, and comparison operators suitable for numerical symbolic regression.
Includes:
- Binary arithmetic (
+,-,*,/,^) forFloat32andInt32 - Unary transcendentals (
cos,sin,tanh,exp,sign) forFloat32 - Binary comparisons (
>,<,==,!=,>=,<=) forFloat32andInt32, returningBool
Arborist.gp_nand — Method
gp_nand(a::Bool, b::Bool) -> BoolBoolean NAND: returns !(a & b).
Arborist.gp_nor — Method
gp_nor(a::Bool, b::Bool) -> BoolBoolean NOR: returns !(a | b).
Arborist.default_protected_function_set — Method
default_protected_function_set() -> FunctionSetReturn a symbolic-regression FunctionSet built around the protected operators. Suitable for ExprGenome-based symbolic regression where evolved programs must evaluate without raising domain errors.
Contents:
- Binary arithmetic:
+,-,*forFloat32andInt32;pdivforFloat32. - Unary transcendentals:
plog,psqrt,pexp,sin,cosforFloat32.
The set follows the Nguyen/Keijzer convention used in the modern symbolic regression literature (McDermott et al., 2012). pinv is not included by default — use it as a drop-in replacement for pdiv(1.0, x) problems where an explicit inverse primitive is desired.
TreeGenome users do not need this helper: pass the raw functions directly to DynamicExpressions.OperatorEnum, e.g. OperatorEnum(; binary_operators=[+, -, *, pdiv], unary_operators=[plog, psqrt, pexp, sin, cos]).
Arborist.pdiv — Method
pdiv(a, b)Protected division. Returns one(a) when |b| < 1.0e-10; otherwise a / b. The canonical Koza-style guard against division by zero.
Arborist.pexp — Method
pexp(x)Protected exponential, exp(clamp(x, -50, 50)). Prevents Inf overflow for large positive x while preserving finite behavior everywhere else. The clamp bounds correspond to exp(50) ≈ 5.18e21, comfortably within Float64 range.
Arborist.pinv — Method
pinv(x)Protected multiplicative inverse. Returns zero(x) when |x| < 1.0e-10, otherwise one(x) / x.
Arborist.plog — Method
plog(x)Protected natural logarithm, log(|x| + 1.0e-10). Always finite and real-valued; tracks log|x| away from zero and saturates at log(PROTECTED_EPS) near zero.
Arborist.psqrt — Method
psqrt(x)Protected square root, sqrt(|x|). Always finite and real-valued.
Arborist.run_multi_seed — Method
run_multi_seed(f, seeds::AbstractVector{Int}; parallel=false) -> VectorCall f(seed) for each integer in seeds and return a vector of the results. When parallel=true and Threads.nthreads() > 1, runs concurrently via Threads.@threads — callers must ensure f is thread-safe (no shared mutable state without locking).
Typical use:
fitnesses = run_multi_seed([1, 2, 3, 4, 5]) do seed
problem = GPProblem(evaluator, TreeGenome{Float32}; seed=seed)
result = solve(problem, alg)
result.best_fitness
end
println(summarize(fitnesses))Arborist.summarize — Method
summarize(xs::AbstractVector{<:Real}) -> NamedTupleReturn (; mean, std, median, min, max, q25, q75, n) for a vector of real values. Non-finite entries are excluded from every statistic so a single Inf fitness does not poison the summary. n reports the number of finite entries used.
Matches the shape most benchmark reporting expects: mean ± std for a quick headline, quartiles for distribution shape. No weak dependency on Statistics (so Pkg.test in a sandboxed environment works).
Arborist.train_test_split — Method
train_test_split(X::AbstractMatrix, y::AbstractVector;
test_size=0.2,
rng=Random.default_rng(),
stratify=nothing
) -> (X_train, y_train, X_test, y_test)Split a feature matrix X (features × samples) and matching target vector y into train and test partitions.
Arguments
X::AbstractMatrix: features-by-samples (columns are samples).y::AbstractVector: one target per column ofX.
Keyword arguments
test_size::Real(default 0.2): fraction of samples to place in the test set. Must be in(0, 1).rng::AbstractRNG(defaultRandom.default_rng()): RNG used for the permutation. Pass aMersenneTwister(seed)for reproducibility.stratify::Union{Nothing, AbstractVector}(defaultnothing): when supplied, a class-label vector of the same length asy. Sampling is done class-wise so the class proportions in both partitions match the input distribution as closely as integer rounding allows.
Returns a 4-tuple (X_train, y_train, X_test, y_test) of sub-matrices and sub-vectors.
Constants
Arborist.DEFAULT_SAFE_CALLS — Constant
Default whitelist of safe function calls — mathematical and logical operations only.