Algorithms
GeneticProgramming
The standard GP algorithm with tournament selection, elitism, and configurable genetic operators.
using Arborist
algorithm = GeneticProgramming(
pop_size = 100, # population size
generations = 200, # number of generations
mutation_rate = 0.3, # probability of mutation per offspring
crossover_rate = 0.3, # probability of crossover per pair
elitism = 2, # top individuals carried forward
bloat_penalty = 0.0, # coefficient on complexity(g)
parallel = true, # threaded evaluation
speciation = NoSpeciation(),
mutation_ops = [SubtreeMutation(), PointMutation()],
crossover_ops = [SubtreeCrossover()],
selection = TournamentSelection(3), # tournament size set here
)IslandModel
Multiple independent populations with periodic ring-topology migration.
using Arborist
algorithm = IslandModel(
n_islands = 4,
island_algorithm = GeneticProgramming(pop_size=50, generations=100),
migration_interval = 10, # generations between migrations
migration_size = 2, # individuals migrated per event
)IslandModel (sequential, synchronous distributed, and asynchronous distributed) dispatches on ExprGenome, TreeGenome, and GraphGenome. For TreeGenome, the evaluator must be a TreeFitnessEvaluator; migration transports Node{T} directly across workers so the op indices remain valid against every island's shared OperatorEnum. For GraphGenome under distributed=true, each worker is assigned a disjoint innovation-ID range via init_innovation_range!((island_id - 1) * INNOVATION_STRIDE) so cross-process IDs cannot collide; the trade-off is that structurally identical mutations on different workers receive different IDs and are treated as disjoint by NEAT crossover. AntGenome must use the single-population GeneticProgramming solver — the simulator is not thread-safe and has no _initialize_population method on the IslandModel path.
NSGAII
Multi-objective GP using non-dominated sorting and crowding distance with (μ+λ) survivor selection. Requires an evaluator that implements evaluate_multi(evaluator, genome) -> Vector{Float64} (i.e., a subtype of AbstractMultiObjectiveEvaluator). The included ParsimonyEvaluator wraps any single-objective AbstractEvaluator into the standard (fitness, complexity) two-objective problem.
using Arborist, DynamicExpressions
inner = SymbolicRegressionEvaluator(
x -> x^2 + x, domain=(-2f0, 2f0), points=30
)
evaluator = ParsimonyEvaluator(inner)
algorithm = NSGAII(
pop_size = 200, # must be even, ≥ 4
generations = 100,
mutation_rate = 0.3,
crossover_rate = 0.3,
parallel = true,
mutation_ops = [SubtreeMutation(), PointMutation()],
crossover_ops = [SubtreeCrossover()],
)
result = solve(GPProblem(evaluator, TreeGenome{Float32}; seed=42), algorithm)
println("Pareto front size: ", length(result.pareto_front))Pareto front size: 200solve(::GPProblem, ::NSGAII) returns an NSGAIIResult with these fields:
pareto_front::Vector{G}— non-dominated genomes from the final populationpareto_fitnesses::Vector{Vector{Float64}}— objective vectors for the Pareto-front genomespopulation::Vector{G}andall_fitnesses::Vector{Vector{Float64}}— full final population and its objective vectorshypervolume_history::Vector{Float64}— hypervolume of front 1 per generation (2D only; problems with three or more objectives return 0.0)generations_run::Int,wall_time::Float64,objective_names::Vector{String}
See examples/nsga2_regression.jl for a complete runnable example.
NSGAII dispatches on ExprGenome and TreeGenome. AntGenome and GraphGenome are not yet supported — attempting to solve an NSGA-II problem with those genome types raises a MethodError from _nsga2_init_population. Extending NSGA-II to the remaining genome types is planned for a future release.
CMAES
Covariance Matrix Adaptation Evolution Strategy ((μ_w/μ, λ)-CMA-ES, Hansen 2016) for continuous-parameter optimization. Used to refine the weight vector of a genome whose topology is fixed — for example, a GraphGenome whose connections you want to tune after a NEAT search has discovered the structure. The algorithm itself does not mutate topology; pair it with a topology search if both are needed.
using Arborist
algorithm = CMAES(
generations = 200,
pop_size = 0, # 0 → auto λ = 4 + ⌊3·ln(n)⌋ (Hansen 2016 default)
sigma0 = 0.5, # initial step size
seed_genome = true, # start the mean from the problem's initial genome
)The genome must opt in by implementing flatten_weights(g) -> Vector{Float64} and unflatten_weights!(g, w) -> g. The GraphGenome adaptor sorts connections by innovation number for deterministic packing; other genomes can implement these two methods to participate. See the Infrastructure API reference for details.
MAPElites
Quality-Diversity search via grid-based behavioral archives (Mouret & Clune, 2015). Replaces single-best convergence with a coverage goal: fill as many distinct behavior cells as possible, keeping the fittest representative in each. MAPElites runs its own solve loop that samples parents from the filled archive — there is no notion of "population" carried across generations.
algorithm = MAPElites(
feature_fn = g -> (my_behavior_x(g), my_behavior_y(g)),
feature_bounds = [(0.0, 1.0), (0.0, 1.0)],
n_bins = [20, 20],
generations = 200,
batch_size = 50, # children produced per generation
n_init = 200, # seed pool of random genomes (gen 0)
mutation_ops = [SubtreeMutation()],
)
result = solve(GPProblem(evaluator, ExprGenome; ...), algorithm)feature_fn returns an NTuple{N, Float64} (or any indexable length-N collection of Real) describing where the genome lands in behavior space. feature_bounds and n_bins discretize that space — features outside [lo, hi] clamp to the extreme bin. coverage and qd_score are queryable on the archive, and the result type is MAPElitesResult{G} with archive, coverage_history, and qd_score_history. The Plots.jl recipe plotarchive(result) produces the canonical 2D heatmap (currently 2D archives only).
Novelty Search
Replace the fitness signal with behavioral novelty — the negative mean distance from the k nearest neighbors in a thread-safe bounded archive of past behaviors. Useful on deceptive problems where the fitness gradient points away from the global optimum.
archive = NoveltyArchive(Vector{Float64};
max_size = 2000,
add_threshold = 0.0)
evaluator = NoveltySearchEvaluator(
g -> behavioral_signature(g), # fingerprint_fn :: genome → fingerprint
(a, b) -> norm(a - b), # distance_fn :: (fp, fp) → Float64
archive;
k = 15,
)
algorithm = GeneticProgramming(
pop_size = 200,
generations = 200,
mutation_ops = [SubtreeMutation(), PointMutation()],
)
result = solve(GPProblem(evaluator, ExprGenome; ...), algorithm)NoveltySearchEvaluator returns -mean_knn_distance, so the standard minimization solve loop drives the population toward novelty. To combine novelty with task fitness, build a multi-objective evaluator and use NSGAII with both signals.
Constant optimization
GeneticProgramming accepts an optional constant_optimization field that runs a periodic BFGS pass over the numeric constants of the top-K genomes:
using Arborist
algorithm = GeneticProgramming(
pop_size = 100,
generations = 200,
constant_optimization = ConstantOptimization(
frequency = 25, # run every 25 generations
top_k = 5, # refine the top 5 individuals
max_iter = 50,
tol = 1e-8,
fd_step = 1e-3, # central finite-difference gradient step
),
)Currently implemented for TreeGenome symbolic regression. The pass is rollback-guarded: a refined genome's fitness can only improve. To optimize a single genome's constants by hand, call optimize_constants!(g, evaluator; kwargs...).
Warm-starting from a seed population
solve(::GPProblem, ::GeneticProgramming) accepts an initial_population keyword carrying a user-provided seed pool. The solve loop replaces the random initial population with these genomes — useful for resuming from a hand-curated set, transferring solutions between problem variants, or seeding a refinement run from a previous best:
seed_pool = [g1, g2, g3, ...] # length must equal algorithm.pop_size
@assert length(seed_pool) == algorithm.pop_size
result = solve(problem, algorithm; initial_population = seed_pool)Element type must match the problem's genome type and length must equal algorithm.pop_size. initial_population is mutually exclusive with resume_from — use one or the other.
All-time-best Hall of Fame
By default GPResult reports only the single best genome at the end of the run. Pass hall_of_fame_size = K to attach a top-K archive of the distinct best fitnesses seen across all generations, surviving elitism loss:
using Arborist, DynamicExpressions
evaluator = SymbolicRegressionEvaluator(x -> x^2 + x, domain=(-1f0, 1f0), points=15)
problem = GPProblem(evaluator, TreeGenome{Float32}; seed=42)
algorithm = GeneticProgramming(pop_size=40, generations=20)
result = solve(problem, algorithm; hall_of_fame_size = 5)
println("HoF size: ", length(result.hall_of_fame), " / capacity ",
result.hall_of_fame.capacity)
for i in 1:min(3, length(result.hall_of_fame))
println("[$i] fitness=", result.hall_of_fame.fitnesses[i])
endHoF size: 5 / capacity 5
[1] fitness=0.0035834189504926933
[2] fitness=0.003961255149946844
[3] fitness=0.005436814407233899The archive de-duplicates by fitness within 1e-12 (cheap structural proxy — two genomes with identical fitness are treated as the same solution). hall_of_fame_size = 0 (default) disables the archive and leaves result.hall_of_fame === nothing. The archive composes with resume_from: a fresh archive is allocated on resume, since the checkpoint format does not yet persist the archive.
Checkpoint and resume
Long single-objective GeneticProgramming runs (on ExprGenome or TreeGenome) can checkpoint to disk and resume:
result = solve(problem, algorithm;
checkpoint_every = 10, # generations between writes
checkpoint_path = "run.checkpoint",
)
# Later, in a separate process:
result = solve(problem, algorithm;
resume_from = "run.checkpoint",
)save_checkpoint writes atomically (write-tmp + rename) and load_checkpoint rejects mismatched format-version or Julia-version stamps. By default, resuming refuses to load a checkpoint produced by a different _algorithm_signature (different hyperparameters or operator types); pass allow_signature_mismatch=true to bypass the check. IslandModel, NSGAII, MAPElites, AntGenome, and GraphGenome are not yet wired for checkpoint/resume.
Structured run logs
For research-grade reproducibility every single-objective solve path accepts an optional RunLog:
using Arborist, DynamicExpressions
evaluator = SymbolicRegressionEvaluator(x -> x^2 + x, domain=(-1f0, 1f0), points=15)
problem = GPProblem(evaluator, TreeGenome{Float32}; seed=42)
algorithm = GeneticProgramming(pop_size=40, generations=10)
log = RunLog()
result = solve(problem, algorithm; log = log)
# log.entries :: Vector{GenerationLog} — one entry per generation
e = entries(log)[end]
println("generations recorded: ", length(entries(log)))
println("last gen best/mean/worst: ",
round(e.best_fitness; sigdigits=3), " / ",
round(e.mean_fitness; sigdigits=3), " / ",
round(e.worst_fitness; sigdigits=3))generations recorded: 10
last gen best/mean/worst: 0.0385 / 14.1 / 210.0The Plots.jl recipe renders three stacked subplots — fitness trajectory, species count, and unique-structure count — directly from the log:
using Plots
plot(log)