Changelog
All notable changes to Arborist.jl will be documented in this file. Format follows Keep a Changelog.
[0.1.2] — 2026-05-08
Documentation
- Add Zenodo DOI (10.5281/zenodo.20076827) to
CITATION.cff, the README bibtex block, and a Zenodo badge in the README header.
[0.1.1] — 2026-05-07
Security
ASTSanitizerwhitelist bypass via indirect calls. Strengthen the sanitizer to require that the callable in any:callexpression is a plainSymbolpresent inallowed_calls. Previously, indirect calls built from arrays, tuples, or refs (e.g.[open][1](),(run,)[1]()) could route around the whitelist and reach unsafe functions through@eval. The fix only affects theExprGenome@evalpath with sanitization enabled;TreeGenomeandGraphGenomedo not use@evaland were not affected.
Fixed
- Track the true all-time best genome from the evaluated initial population across
ExprGenome,TreeGenome, andGraphGenomesolve paths, including runs where later generations regress orelitism=0. - Apply
bloat_penaltyconsistently inGraphGenomesolves. - Validate public evolutionary algorithm configuration earlier, including population size, generations, rates, elitism, bloat penalty, island counts, and migration settings; operator compatibility checks now respect zero mutation or crossover rates.
- Seed
HallOfFamearchives from the initial population and preserve them through checkpoint/resume.
[0.1.0] — 2026-04-24
Initial public release. The scope expanded between the original pre-registration readiness cut (2026-04-06) and the final registration date as the framework continued to land research-grade capabilities — notably six classical GP benchmark families (Phase E), ten neuroevolution benchmarks (Phases A–D), and nine infrastructure additions (Phase F.0 through F.8). All are folded into this single 0.1.0 entry below.
Added
Core framework
- Problem/Algorithm/Solve API following SciML conventions
GeneticProgrammingalgorithm with tournament selection, elitism, subtree crossover, subtree/point/hoist/expansion mutationGPProblem,GPResultwith per-generation best/mean fitness history- Explicit RNG threading for reproducible runs (
parallel=false) - Parallel population evaluation via
Threads.@threads convergence_thresholdparameter for early termination on minimization problems
Genome types
ExprGenome— typed AST-based genome with@evalcompilation; supports loops, conditionals, mutable state, side-effectful primitivesTreeGenome{T}— DynamicExpressions.jl-backed expression-tree genome for fast vectorized symbolic regression (8.4× faster thanExprGenomeon the Koza suite); now part of the core module rather than a package extensionGraphGenome— NEAT-style neural topology genome with innovation numbers, structural mutation (add node, add connection), innovation-number-aligned crossover, and compatibility distanceAntGenome— side-effectful imperative genome for agent control problems (Santa Fe ant trail and similar)
Multi-objective GP
NSGAIIalgorithm — non-dominated sorting + crowding distance with (μ+λ) survivor selectionAbstractMultiObjectiveEvaluatorinterface (evaluate_multi,objective_names)ParsimonyEvaluator{E}— wraps any single-objectiveAbstractEvaluatorinto a two-objective(fitness, complexity)problem; replacesbloat_penaltywith a real Pareto frontNSGAIIResultwith the Pareto front, full final population, generation count, wall time, objective names, and per-generation 2D hypervolume- Example:
examples/nsga2_regression.jl
Speciation
ThresholdSpeciation— NEAT-style speciation by genomic distance with stagnation culling and configurable fitness sharingBehavioralSpeciation— speciation by behavioral fingerprint (probe- based), groups syntactically distinct programs that make the same decisions; empirically the most reliable variant on the bin packing benchmark (lowest cross-seed variance)- Configurable
sharing_formula::none,:linear,:sqrt,:log2(adapted for minimization:shared = raw × penalty(species_size)) apply_sharing(raw, size, formula)helper for custom speciation implementations
Island model
IslandModelalgorithm with three execution backends: sequential (single-process), synchronous distributed, and asynchronous distributed (Distributed.jlworkers)RingTopology,CompleteTopology,RandomTopologyfor migration routingMigrantGenome+to_migrant/from_migrantfor cross-process serialization
LLM operator
LLMMutationOperator— FunSearch/AlphaEvolve-style LLM-as-mutation operator that serializes anExprGenometo source, prompts an LLM for a meaningfully modified variant, and parses + type-checks the response. Silent fallback to a classical operator on any failure (API error, timeout, parse failure, type-check failure)- Backends: Anthropic, OpenAI, and local Ollama (any OpenAI-compatible endpoint). Uses
Downloads.jl(Julia stdlib) — no third-party HTTP dependency _http_postRef{Function}hook for mocking in tests
Security
ASTSanitizer— opt-in function call whitelist for@eval-based genomes; defense-in-depth against code injection in evolved programs- Default whitelist (
DEFAULT_SAFE_CALLS) covers all standard math and control-flow operators; users can extend it for domain primitives
Evaluators
TableFitnessEvaluator— input/output table evaluation forExprGenomewith per-row time limitsTreeFitnessEvaluator— vectorized data-matrix evaluation forTreeGenomeGraphEvaluator— forward-propagation evaluation forGraphGenomeAntEvaluator— Santa Fe ant trail simulatorSymbolicRegressionEvaluator— convenience wrapper that builds aTreeFitnessEvaluatorfrom a target function and a sampling domain
Examples and benchmarks
examples/feynman_regression.jl— Feynman symbolic regression benchmark (Udrescu & Tegmark)examples/lorenz_recovery.jl— Lorenz attractor recovery benchmarkexamples/sorting.jl+examples/sorting_distributed.jl— sorting algorithm evolution with curriculum learning, comparison-count penalty, and seed template ablationexamples/bin_packing.jl— bin packing with classical / LLM / behavioral-speciation modes; results synthesized inexamples/bin_packing_overnight_results.mdandexamples/bin_packing_combined_results.mdexamples/nsga2_regression.jl— multi-objective symbolic regression- Test benchmarks: Koza-1/2/3, 4-bit boolean parity, XOR (NEAT), Max Ones, x² symbolic regression, Santa Fe ant trail (smoke test)
Operators
SubtreeMutation,PointMutation,HoistMutation,ExpansionMutationSubtreeCrossoverTournamentSelectionFitnessProportionateSelection— classical roulette-wheel selection adapted to the lower-is-better convention (weight1 / (ε + f_i − f_min))RankSelection(; selection_pressure=1.5)— linear-ranking selection withselection_pressure ∈ [1.0, 2.0]TruncationSelection(; ratio=0.5)— keep the topratiofraction by fitness, then sample uniformly
Tooling
- Tiered test execution (
ARBORIST_RUN_BENCHMARKS=true/GENPROG_RUN_BENCHMARKS=truefor the slower benchmark tier) - GitHub Actions CI on Ubuntu, Windows, and macOS for Julia 1.10, 1.11, and nightly
- Documenter.jl-based documentation site
Research-grade infrastructure (Phase F)
- F.0 Per-case fitness API + structured RunLog.
evaluate_cases(g, e)generic with opt-in per evaluator (raisesMethodErrorwhen unimplemented).RunLog+GenerationLog+record!capturing per-generation best / mean / median / worst fitness, species count and sizes, unique-structure hash diversity, per-operator attempt and success tallies, and wall time.SpeciationSnapshotcarrier threads through_apply_speciation!so the solve path can log without recomputing.log::Union{Nothing, RunLog}kwarg on every single-objectivesolve. - F.1 Lexicase + epsilon-lexicase selection.
LexicaseSelectionandEpsilonLexicaseSelection(; epsilon=0.0). Polymorphicselect_parent(strategy, sel_fits, case_fits, rng)dispatch replaces the hardcoded_tournament_select.needs_cases(selection)trait — whentrue, solve materializes the per-case matrix viaevaluate_cases. Auto-epsilon uses MAD per case (La Cava et al. 2016). - F.2 GraphGenome deserialize + LLM-on-NEAT.
deserialize(::Type{GraphGenome}, s, n_inputs, n_outputs; reassign_innovations=false)is implemented (was anothing-returning stub). Tolerates LLM preambles and code fences.reassign_innovations=trueissues fresh innovation IDs via_next_innovation!()so LLM-generated IDs cannot collide with the parent pool.mutate(::LLMMutationOperator, ::GraphGenome, rng)dispatches for LLM-driven NEAT mutation. - F.3 Constant optimization for TreeGenome SR. Periodic BFGS pass on top-K individuals every N generations. Central finite-difference gradients; dense BFGS with Armijo line search and a rollback guard so fitness cannot regress.
ConstantOptimization(; frequency, top_k, max_iter, tol, fd_step)field onGeneticProgramming.optimize_constants!(g, e; kwargs)entry point for manual use. - F.4 Novelty Search + MAP-Elites.
NoveltyArchive{B}thread-safe bounded-size store,NoveltySearchEvaluator{F,D,B}wrapping a fingerprint function and distance metric (returns-mean_knn_distance).MAPElites <: AbstractEvolutionaryAlgorithmwith grid-based archive and its own solve loop sampling parents from the filled archive;MAPElitesArchive{G},MAPElitesResult{G},coverage(archive),qd_score(archive). - F.5 Checkpoint / resume + operator-success tracking.
Checkpoint{G}+save_checkpoint(ckpt, path)(atomic via tmp+rename) +load_checkpoint(path)(format-version + Julia-version stamp, rejects mismatches).solve(...; checkpoint_every, checkpoint_path, resume_from, allow_signature_mismatch)onExprGenomeandTreeGenomepaths. All-time-best tracking added so the best genome survives elitism loss. Per-operator attempt / success tallies populated via a newoperator_name(op) -> Symbolgeneric. - F.6 Plots.jl recipes via package extension.
ext/ArboristRecipesBaseExt.jldefines@reciperules forGPResult(fitness trajectory),NSGAIIResult(Pareto front, 2D and 3D),RunLog(3-subplot fitness / species / structures), andMAPElitesResult(coverage + QD histories).@userplotsplothypervolumetrajectory(nsga2_result)andplotarchive(map_elites_archive). Weak dep onRecipesBase; zero runtime cost without Plots. - F.7 Automatically Defined Functions (Koza-style ADFs).
ADFGenome{T}with a main tree plus N Koza-style ADF subroutines at fixed binary arity. Macro-expansion viaexpand_adfs(g) -> Node{T}: walks the main tree, replacing ADF-slot ops with copies of the ADF body and substituting ARG references. FullAbstractGenomeinterface (mutation, crossover, distance, complexity, serialize). - F.8 CMA-ES. Full (μw/μ, λ)-CMA-ES with rank-1 and rank-μ covariance updates, evolution-path step-size control,
hsigHeaviside step.CMAES <: AbstractEvolutionaryAlgorithmwith auto-λ (Hansen 2016 default). `flattenweights(g)/unflatten_weights!(g, w)interface — opt-in per genome;GraphGenomeadaptor sorts by innovation number for determinism.LinearAlgebra` (stdlib) added as a direct dependency.
Neuroevolution benchmarks (Phases A–D)
- GraphGenome framework integration.
NEATDefaultMutation+ five individual mutation operators (WeightPerturbMutation,WeightReplaceMutation,AddConnectionMutation,AddNodeMutation,ToggleConnectionMutation) andNEATCrossover. Works with NSGA-II (Pareto of fitness vs enabled-connection count), sequentialIslandModel, and distributedIslandModel(per-worker disjoint innovation ID ranges viainit_innovation_range!).GraphEvaluatorsupports recurrent networks viaallow_recurrent=truewithrelaxation_passes=N; samples are treated as a time sequence with persistent node state.neat_defaults()returns(mutation_ops, crossover_ops)for drop-in use. EpisodicEvaluator. Closed-loop control-task evaluator forGraphGenome. Declarative API: the user provides(initial_state, dynamics, reward, done, observe, decode_action)callables; the RNG is threaded throughinitial_state(rng)for reproducibility;n_episodesrollouts are averaged per fitness evaluation. Returns-mean_reward(lower-is-better).- Phase A: classification benchmarks. 3-bit / 5-bit parity, two-spirals (plus an NSGA-II variant).
- Phase B: control-task benchmarks. Single-pole cart-pole, double-pole (Markovian), mountain car — all with
GraphGenome+EpisodicEvaluator. Benchstone-registered. - Phase C: non-Markovian stress benchmarks. Variable-delay sequence recall (test-only).
- Phase D: modularity, time series, multi-objective showcases. Retina left-and-right modularity benchmark, Mackey-Glass τ=17 time-series prediction, NSGA-II retina with (error, modularity_proxy) objectives.
Classical GP benchmarks (Phase E)
- Nguyen-1..10 (
nguyen_regression.jl): canonical modern SR suite per McDermott et al. 2012.TreeGenome+ protected pdiv / plog / psqrt. 3/5 seeds gate at fitness < 0.01 (Nguyen-7 relaxed to 2/5). - Keijzer-4, Keijzer-11 (
keijzer_extrapolation.jl): extrapolation benchmarks. K-4 reports train fitness plus interior and out-of-range test RMSE as diagnostics; K-11 uses 20 sparse training points with a 20×20 test grid. - 6-bit / 11-bit multiplexer (
multiplexer.jl): Koza's Boolean scaler using Boolean-complete {AND, OR, NAND, NOR, XOR, NOT}. - UCI Iris (
iris_classification.jl): Fisher's 3-class dataset, 150 rows embedded inline (no network dependency). One-vs-rest with threeTreeGenomeevolutions and argmax at classification. Gate 4/5 seeds ≥ 90% test accuracy on a fixed 80/20 split. - Acrobot swing-up (
acrobot_neat.jl): Sutton 1996 two-link under-actuated pendulum, RK4 integration at dt=0.2.
Bin packing research enhancements
behavioral_initialize: MAP-Elites-style diverse population seeding. Generates apool_sizerandom-program pool, evaluates, fingerprints, greedy-bins by fingerprint distance, and returns the best-fitness representative from each distinct bin.- NSGA-II bin packing (canonical):
BPMultiObjectiveEvaluatorwith two objectives(fitness, failed_placements). The 2026-04-14 ablation study found that a thirdsuccess_rateobjective is redundant givenfailed_placements; the historical 3-objective formulation is reproducible via thethree_objectiveablation. - Configurable prompt enrichment (
MutationContext,AbstractPromptSection,FitnessSection,ElitesSection,GenerationSection,render_enrichment). Populated once per generation by the solve loop; sections render into the LLM user message. LLMCallStats: per-operator accumulator tracking call counts, API successes / failures / fallback skips, input / output tokens (from API usage fields), input / output character counts, and cumulative latency.debug_log::Union{IO, Nothing}field onLLMMutationOperator: when set, writes the full user message, raw LLM response, and outcome for each call.- "Container" rename.
binrenamed tocontaineracross the bin-packing example and extension primitives to improve clarity for the generalized heuristic-discovery framing.
TreeGenome improvements
IslandModelsupport forTreeGenome. Migration transportsNode{T}directly across workers so op indices remain valid against every island's sharedOperatorEnum.serialize/deserializeround-trip fixed via aMeta.parsewalker instead of string-based substitution. Prefix-notation and unary-op round-trips now succeed.
Run-management additions
- All-time-best Hall of Fame. New
HallOfFame{G}archive type, opt-in viasolve(...; hall_of_fame_size=K). Returns a top-K best-first archive of distinct fitnesses observed across all generations (de-duplicated within1e-12to avoid storing structurally equivalent solutions). The archive survives elitism loss and is exposed asresult.hall_of_fame.K=0(default) disables the archive and leavesresult.hall_of_fame === nothing; the older 8-argGPResultconstructor is preserved for backward compatibility. - Warm-starting
solvefrom a seed pool. Newinitial_populationkwarg on the single-objectiveGeneticProgrammingand on the sequentialIslandModelsolve paths. Element type and length must matchalgorithm.pop_size(orn_islands * pop_sizeforIslandModel). Mutually exclusive withresume_from. - Ephemeral Random Constants (Koza-style).
GeneticProgrammingnow carries an optionalconstant_sampler::Union{Function, Nothing}field threaded intoTreeGenomerandom-tree creation and point-mutation leaf creation via task-local storage.erc_uniform(lo, hi)is the canonical helper, returning a(rng) -> Tcallable that samples uniformly. Legacy default (T(randn(rng))) is preserved when the field isnothing.
Run utilities
Arborist.Benchmarkssubmodule. Canonical GP and neuroevolution benchmark problems as reusable data generators consumed by user code: symbolic regression (nguyen(n),keijzer(variant),koza(name),pagie()), classification (iris(),two_spirals()), Boolean (multiplexer(address_bits),parity(n_bits)), control (cartpole(),mountain_car(),acrobot(),double_pole(; markovian)), sequence/memory (sequence_memory(length),sequence_recall(; delay)), and classic neuroevolution (xor_env()). Each generator returns aNamedTuplecarrying dataset (or environment callables), shape metadata, target descriptions, and sensible success thresholds — decoupled from evaluator choice.- Cross-cutting helpers.
train_test_split(X, y; test_size, rng, stratify)for deterministic train/test partition with optional class-stratified sampling;summarize(xs)returning(; mean, std, median, min, max, q25, q75, n)(non-finite entries excluded so a singleInfdoes not poison the report); andrun_multi_seed(f, seeds; parallel=false)for multi-seed runs with optionalThreads.@threadsparallelism.
Visualization, inspection, and bloat control
- Graphviz DOT export.
to_dot(g)andto_dot(io, g)produce a self-contained DOT document for every genome type —TreeGenome,ExprGenome,ADFGenome,AntGenome, andGraphGenome. Thedotbinary is not a runtime dependency;to_dotonly emits the source document. Tree-shaped genomes render as a top-down DAG;GraphGenomerenders left-to-right with role-distinguished node shapes and weight-labeled edges (disabled connections drawn dashed). Base.showmethods forGPResult,NSGAIIResult,MAPElitesResult,RunLog,Checkpoint,HallOfFame, and every concrete genome — so the REPL no longer dumps internal field structure for the most common return types.tree_depth(g) -> Intgeneric exported alongsidecomplexity(g). Implemented forTreeGenome,ExprGenome,AntGenome, andADFGenome; reports the longest root-to-leaf path.- Per-operator
max_depth/max_sizecaps.SubtreeMutation,PointMutation,HoistMutation,ExpansionMutation, andSubtreeCrossovernow acceptmax_depthandmax_sizekeyword arguments. When set, the operator returns the parent unchanged rather than producing an offspring that exceeds the caps. Both default tonothing(no cap). These supersede the dead struct-levelGeneticProgramming.max_depthfield that was removed during cleanup (see Removed below) — the new caps are per-operator, so different operators in the samemutation_opsvector can carry different limits. default_protected_function_set()plus the corresponding Koza protected primitivespdiv,plog,psqrt,pexp,pinv(and the sharedPROTECTED_EPSconstant). The set is the standard SR starting point and is what the Phase E benchmarks (Nguyen, Keijzer) evolve against.- Extended
ACTIVATION_FNSwith the canonical CPPN / HyperNEAT activation set —gauss,sin,abs,step— joining the existingsigmoid,tanh,relu,identity.AddNodeMutation/NEATDefaultMutationstill defaulthidden_activationsto{sigmoid, tanh, relu}, so the new activations are opt-in via the keyword argument.
Changed
TreeGenomeis now part of the core module rather than a package extension. DynamicExpressions.jl is a hard dep of Arborist; users no longer need to callBase.get_extension(...)to accessTreeGenome,TreeFitnessEvaluator, orSymbolicRegressionEvaluator. Same forLLMMutationOperator, which was never an extension despite the prior documentation claiming otherwise.LLMMutationOperatorswitched from HTTP.jl toDownloads.jl(Julia stdlib). Eliminates a third-party HTTP dependency and matches the convention used by other parts of the framework.- README and
docs/src/*.mdrewritten to document the actual exported API. Previous code blocks referenced an extension-based API (Base.get_extension(Arborist, :DynExprExt),:LLMOperatorExt,using HTTP) that did not exist in the codebase; copy-pasting them producedMethodErrorornothingfromget_extension. _breed_next_generation!extracted to deduplicate the breeding loop shared byGeneticProgramming,IslandModel, andNSGAII.Project.toml: removed duplicateDynamicExpressionsentry from[extras](already in[deps]).Manifest.tomlremoved from version control (libraries should not pin downstream environments).examples/reorganized: research apparatus (tuning harness, ablation drivers, shell orchestration, 11 files) moved underexamples/research/. Tutorial-level scripts remain at the top ofexamples/. TwoREADME.mdindex files added.- Session notes migrated to refstore. Per-session progress logs no longer live in the repo; new updates go to the refstore project
genprog_jlviamcp__refstore__write_session_note. SeeCLAUDE.md's Session Notes section.
Fixed
- GraphGenome non-deterministic Dict iteration (6 sites). All iteration over node/connection Dicts now goes through sorted helpers for reproducibility.
- GraphGenome crossover now produces two distinct children instead of two copies of the same offspring.
- NEAT compatibility distance: disjoint and excess genes are now counted separately, matching the original NEAT formulation.
- Async island deadlock:
put!on a full channel could deadlock the worker pool; replaced with non-blocking offer + retry. - ExprGenome migrant naturalization: migrants from other islands are now re-typed against the receiving island's
GenStateso that cross-island crossover does not produce type-inconsistent ASTs. FunctionDetails.isequal: was reflexively true for unrelated primitives with the same name; now compares the full type signature.crossover_rate + mutation_rate > 1.0is now caught atGeneticProgrammingconstruction with a clearArgumentErrorrather than silently producing invalid offspring distributions.- LLM operator JSON escape/unescape: control characters and backslash escapes are now round-trip safe; previously a backslash-n in source would deserialize as a literal newline.
- LLM operator robustness:
deserializeandsanitizeare wrapped intry/catch; any exception falls back silently to the configured classical operator instead of aborting the generation. - Five defects found by code audit (
8d1f3d4) plus 290 additional test assertions covering the gaps that allowed them through. - Eight findings from a 3-iteration adversarial research loop covering correctness, reproducibility, and edge cases in the solve loop, distance metrics, and migration paths.
docs/src/quickstart.mdXOR example: passedVectortoGraphEvaluator(which requiresMatrix{Float64}) and was missingreset_innovation_counter!().- Original README LLM example: used
TreeGenome{Float32}withLLMMutationOperator, but the operator did not (and still does not) dispatch onTreeGenome. Anyone copy-pasting the example would have hit aMethodError. The corrected example usesExprGenome. - LLM operator JSON extraction:
\uXXXXunicode escapes in the response text are now decoded correctly before being handed to the deserializer. - LLM operator literal type coercion: numeric literals in deserialized output are now coerced to the parent genome's expected type before the type-check pass, preventing spurious rejection of LLM output that returned
1.0where the parent had1.0f0. - Windows CI flake: LLM latency measurement now uses
time_ns()instead oftime()to avoid millisecond-granularity round-trip glitches on Windows runners. ASTSanitizerfalse rejection: domain-specific function calls from LLM output (bin-packing extension primitives) are no longer rejected as unsafe when the caller has registered them in the allow-list.- Docs-build size limit: the single-page
api.mdwas a 220 KiB@autodocsblock over Documenter's 200 KiB threshold, failing every docs-CI run. Split into an overview + four per-topic sub-pages underdocs/src/api/.
Removed
- Dead
max_depthfield fromGeneticProgramming(was never read). The replacement is per-operatormax_depth/max_sizekeyword arguments onSubtreeMutation,PointMutation,HoistMutation,ExpansionMutation, andSubtreeCrossover(see Visualization, inspection, and bloat control above). - Adversarial-loop scratch artifacts (after all 8 findings were addressed and committed).
- Legacy imperative API:
Individual,Population,evaluate_individual!,evolve!, and the supportingtournament_select/mutate_individual/crossover_individualshelpers are removed. Callers should use the Problem/Algorithm/Solve path:solve(::GPProblem, ::GeneticProgramming). The corresponding tests intest/unit/test_coverage_gaps.jlare removed — the replacement API is covered by the other unit tests. - Pre-migration session progress logs (
progress-YYYYMMDD.md, 20 files) from the repo tree. Project migrated to refstore session notes on 2026-04-22. Logs remain recoverable via git history. - Internal planning and paper-draft artifacts (
archive/,arborist_paper_observations.md,plan_treegenome_parser_fix.md). - Per-run bin packing ablation result files (50 tracked
.mdfiles, mostlybin_packing_results_scaling_*_seed*.mdand_unseeded_*_seed*.md). Canonical synthesis files (bin_packing_overnight_results.md,bin_packing_combined_results.md) are retained. - Statistics.jl dropped as a direct dependency (Phase E housekeeping).