Algorithms — API Reference

Autogenerated reference for every exported algorithm type (GeneticProgramming, IslandModel, NSGAII, CMAES, MAPElites), the speciation strategies, migration topologies, and the shared solve entry points.

Types

Arborist.GeneticProgrammingType
GeneticProgramming <: AbstractEvolutionaryAlgorithm

Standard genetic programming algorithm configuration. All fields are set via keyword arguments with sensible defaults.

Fields

  • pop_size::Int: population size (default: 100)
  • generations::Int: number of generations to run (default: 200)
  • mutation_rate::Float64: probability of mutation per offspring (default: 0.3)
  • crossover_rate::Float64: probability of crossover per offspring pair (default: 0.3)
  • elitism::Int: number of top individuals carried forward unchanged (default: 2)
  • bloat_penalty::Float64: coefficient on complexity(g) added to fitness (default: 0.0)
  • parallel::Bool: enable threaded population evaluation (default: true)
  • speciation::AbstractSpeciation: speciation strategy (default: NoSpeciation())
  • mutation_ops::Vector{AbstractMutationOperator}: mutation operators
  • crossover_ops::Vector{AbstractCrossoverOperator}: crossover operators
  • selection::AbstractSelectionStrategy: selection strategy
  • convergence_threshold::Float64: best fitness below this value sets converged=true in result (default: Inf, meaning never converged)
  • constant_sampler::Union{Function, Nothing}: callable (rng) -> T used by TreeGenome when a new numeric-literal leaf is created (Koza-style Ephemeral Random Constants). nothing preserves the default T(randn(rng)). See erc_uniform for a helper that builds the canonical uniform-over-range sampler.
source
Arborist.GeneticProgrammingMethod
GeneticProgramming(; kwargs...) -> GeneticProgramming

Construct a GeneticProgramming algorithm with keyword arguments and sensible defaults.

source
Arborist.IslandModelType
IslandModel <: AbstractEvolutionaryAlgorithm

Island model that runs multiple independent populations (islands) with periodic migration. Supports pluggable topologies and optional distributed execution where each island runs in its own worker process.

Fields

  • n_islands::Int: number of islands (default: 4)
  • island_algorithm::GeneticProgramming: algorithm for each island
  • migration_interval::Int: generations between migrations (default: 10)
  • migration_size::Int: number of individuals to migrate per event (default: 2)
  • topology::AbstractTopology: migration topology (default: RingTopology())
  • distributed::Bool: run each island in a separate worker process (default: false)
  • async::Bool: asynchronous evolution — islands evolve independently (default: false, requires distributed=true)
source
Arborist.IslandModelMethod
IslandModel(; n_islands=4, island_algorithm=GeneticProgramming(),
              migration_interval=10, migration_size=2,
              topology=RingTopology(), distributed=false, async=false)

Construct an IslandModel with keyword arguments and sensible defaults. When distributed=true, each island runs in its own worker process via Distributed.jl, eliminating @eval contention between islands.

source
Arborist.AbstractMultiObjectiveEvaluatorType
AbstractMultiObjectiveEvaluator <: AbstractEvaluator

Base type for multi-objective fitness evaluators. All objectives are minimized (lower is better), consistent with the rest of Arborist.

Concrete subtypes must implement:

  • evaluate_multi(e, genome::AbstractGenome) -> Vector{Float64}
  • objective_names(e) -> Vector{String}
  • input_signature(e) and output_signature(e) (inherited from AbstractEvaluator)
source
Arborist.NSGAIIType
NSGAII <: AbstractEvolutionaryAlgorithm

NSGA-II (Non-dominated Sorting Genetic Algorithm II) for multi-objective genetic programming. Uses Pareto non-dominated sorting and crowding distance for selection, with a (mu+lambda) replacement strategy.

No elitism field — NSGA-II's combined parent+offspring sorting IS the elitism mechanism. No speciation — crowding distance maintains diversity. No bloat_penalty — complexity should be an explicit objective via ParsimonyEvaluator.

Fields

  • pop_size::Int: population size, must be even (default: 100)
  • generations::Int: number of generations (default: 200)
  • mutation_rate::Float64: probability of mutation (default: 0.3)
  • crossover_rate::Float64: probability of crossover (default: 0.3)
  • parallel::Bool: enable threaded evaluation (default: true)
  • mutation_ops::Vector{AbstractMutationOperator}: mutation operators
  • crossover_ops::Vector{AbstractCrossoverOperator}: crossover operators
source
Arborist.NSGAIIMethod
NSGAII(; kwargs...) -> NSGAII

Construct an NSGAII algorithm with keyword arguments and sensible defaults.

source
Arborist.NSGAIIResultType
NSGAIIResult{G<:AbstractGenome} <: AbstractEvolutionResult

Result returned by solve with NSGAII. Contains the Pareto front, full population, fitness vectors, and convergence history.

Fields

  • pareto_front::Vector{G}: non-dominated genomes from the final population
  • pareto_fitnesses::Vector{Vector{Float64}}: fitness vectors for the Pareto front
  • population::Vector{G}: full final population
  • all_fitnesses::Vector{Vector{Float64}}: fitness vectors for all individuals
  • hypervolume_history::Vector{Float64}: hypervolume of front 1 per generation
  • generations_run::Int: number of generations completed
  • wall_time::Float64: elapsed wall-clock time in seconds
  • objective_names::Vector{String}: human-readable names for each objective
source
Arborist.ParsimonyEvaluatorType
ParsimonyEvaluator{E<:AbstractEvaluator} <: AbstractMultiObjectiveEvaluator

Wraps a single-objective evaluator to produce two objectives:

  1. The original fitness (from the wrapped evaluator)
  2. Genome complexity (from complexity(genome))

This is the canonical GP multi-objective setup: accuracy vs. parsimony.

Example

inner = TreeFitnessEvaluator(X, y, operators)
evaluator = ParsimonyEvaluator(inner)
problem = GPProblem(evaluator, TreeGenome{Float32}; seed=42)
result = solve(problem, NSGAII(pop_size=100, generations=50))
source
Arborist.MAPElitesType
MAPElites{F, FB, MO, CO} <: AbstractEvolutionaryAlgorithm

Illuminates a user-defined feature space by evolving a grid-based archive.

Fields (constructor kwargs with defaults)

  • generations::Int = 100: number of outer iterations (each generation samples and evaluates batch_size children from the archive).
  • batch_size::Int = 50: mutations produced per generation.
  • n_init::Int = 50: random genomes generated at generation 0 to seed the archive before any sampling happens.
  • feature_fn::F: genome -> NTuple{N, Float64}. Behavior descriptor used to place genomes into the discretized grid.
  • feature_bounds::FB: Vector{Tuple{Float64,Float64}} of per-dimension (low, high) intervals. Features outside the range clamp to the extreme bin.
  • n_bins::Vector{Int}: per-dimension bin count. Total archive capacity is prod(n_bins).
  • mutation_ops::MO: vector of mutation operators (as in GeneticProgramming).
  • crossover_ops::CO: optional vector of crossover operators. If empty, every child is produced by mutation only.
  • crossover_rate::Float64 = 0.0: probability of picking crossover over mutation when both are available.
  • parallel::Bool = true: threaded evaluation of a batch.
  • elitism::Int = 0: placeholder for interface parity; MAP-Elites has archive-based elitism intrinsically, so this is unused.
source
Arborist.MAPElitesArchiveType
MAPElitesArchive{G}

Grid-based archive for MAP-Elites. Maps a cell index (an N-tuple of integers) to a (genome, fitness) pair. Fitness is lower-is-better, matching the framework convention.

Not thread-safe — MAP-Elites runs its own evaluation batches and serializes archive updates; parallel evaluation happens within a batch, insertion after the batch synchronizes.

source
Arborist.MAPElitesResultType
MAPElitesResult{G}

Result returned by solve(problem, ::MAPElites). Carries the final archive and per-generation coverage/QD-score trajectories.

Fields

  • archive::MAPElitesArchive{G}: final grid.
  • coverage_history::Vector{Float64}: coverage fraction per generation.
  • qd_score_history::Vector{Float64}: qd_score per generation.
  • best_genome::G: the single best (lowest-fitness) genome in the archive.
  • best_fitness::Float64: its fitness.
  • wall_time::Float64: seconds elapsed.
  • generations_run::Int: outer iterations completed.
source
Arborist.CMAESType
CMAES <: AbstractEvolutionaryAlgorithm

Covariance Matrix Adaptation Evolution Strategy. Suitable for continuous parameter optimization where the search dimension is fixed at solve time. Pairs with GraphGenome via flatten_weights / unflatten_weights! to optimize connection weights under a frozen topology.

Fields (all kwargs, sensible defaults)

  • generations::Int = 100: outer iteration count.
  • pop_size::Int = 0: population size λ. 0 means auto: 4 + floor(3 ln n), computed at solve time once n (the parameter count) is known.
  • sigma0::Float64 = 1.0: initial step size.
  • parallel::Bool = true: thread the per-generation evaluations.
  • convergence_threshold::Float64 = -Inf: stop early when best fitness drops below this (lower-is-better convention; default never triggers).
  • seed_genome::Bool = true: when true (default), the starting mean is taken from the problem's initial genome; when false, drawn from N(0, I) scaled by sigma0.
source
Arborist.AsyncStatusMessageType
AsyncStatusMessage

Status update sent from an async island worker to the coordinator. Concrete struct (not NamedTuple) to ensure reliable cross-process serialization.

source
Arborist.CompleteTopologyType
CompleteTopology <: AbstractTopology

All-to-all migration: island i sends to every other island. Fast dissemination of good solutions but rapidly reduces diversity.

source
Arborist.RandomTopologyType
RandomTopology <: AbstractTopology

Random migration: island i sends to n_targets randomly chosen islands. Good pairing with asynchronous evolution where the randomness of timing provides a natural diversity mechanism.

Fields

  • n_targets::Int: number of destination islands per migration event
source
Arborist.RingTopologyType
RingTopology <: AbstractTopology

Ring migration: island i sends to island (i % n) + 1. This is the default and most common topology in the literature. Sparse connectivity preserves diversity across islands.

source
Arborist.BehavioralSpeciationType
BehavioralSpeciation <: AbstractSpeciation

Speciation based on behavioral similarity rather than genomic distance. Two individuals are in the same species if their behavioral fingerprints are within threshold of each other under the provided distance function.

This is useful for behavioral optimization problems (e.g., bin packing, game playing) where multiple syntactically distinct programs can implement the same strategy. Genomic distance treats them as different; behavioral distance correctly collapses them.

Fields

  • fingerprint_fn::Function: (genome) -> fingerprint — computes a behavioral fingerprint. The return type can be anything that distance_fn can compare.
  • distance_fn::Function: (a, b) -> Float64 — behavioral distance between two fingerprints. Should return 0.0 for identical behavior.
  • threshold::Float64: behavioral distance below which individuals are same-species (default: 0.15)
  • min_species_size::Int: minimum species size before culling (default: 2)
  • stagnation_limit::Int: generations without improvement before culling (default: 15)
  • sharing_formula::Symbol: fitness sharing strength — :linear, :sqrt, :log2, or :none (default: :sqrt)
source
Arborist.NoSpeciationType
NoSpeciation <: AbstractSpeciation

Trivial speciation strategy: all individuals belong to a single species.

source
Arborist.ThresholdSpeciationType
ThresholdSpeciation <: AbstractSpeciation

Species individuals by compatibility distance. Individuals are assigned to the first species whose representative has distance(g1, g2) <= threshold. Tracks stagnation per species and culls stagnant species below min_species_size.

The default threshold of 10.0 is appropriate for ExprGenome where distance is a symmetric-difference node count.

Fitness sharing penalizes members of large species to protect structural innovations in small species.

Fields

  • threshold::Float64: compatibility distance cutoff (default: 10.0)
  • min_species_size::Int: minimum species size to avoid culling (default: 2)
  • stagnation_limit::Int: generations without improvement before culling (default: 15)
  • sharing_formula::Symbol: fitness sharing strength — :linear, :sqrt, :log2, or :none (default: :log2)
source
Arborist.ThresholdSpeciationMethod
ThresholdSpeciation(; threshold=10.0, min_species_size=2, stagnation_limit=15, sharing_formula=:log2)

Construct a ThresholdSpeciation with keyword arguments and sensible defaults.

source

Functions

Arborist.evaluate_multiFunction
evaluate_multi(e::AbstractMultiObjectiveEvaluator, genome::AbstractGenome) -> Vector{Float64}

Evaluate a genome on all objectives. Returns a vector of fitness values, one per objective, all minimized (lower is better). Returns a vector of Inf values on evaluation failure.

source
Arborist.objective_namesFunction
objective_names(e::AbstractMultiObjectiveEvaluator) -> Vector{String}

Return human-readable names for each objective.

source
CommonSolve.solveMethod
solve(problem::GPProblem{G,E}, algorithm::NSGAII; verbose=false, callback=nothing) -> NSGAIIResult{G}

Run NSGA-II multi-objective evolution. Returns an NSGAIIResult containing the Pareto front, fitness vectors, and hypervolume history.

Arguments

  • problem::GPProblem{G,E}: problem with a multi-objective evaluator
  • algorithm::NSGAII: NSGA-II configuration

Keyword Arguments

  • verbose::Bool=false: print generation statistics
  • callback: optional (gen::Int, front_size::Int, hypervolume::Float64) -> nothing
source
Arborist.coverageMethod
coverage(a::MAPElitesArchive) -> Float64

Fraction of bins in the grid that hold a genome. Ranges [0, 1].

source
Arborist.qd_scoreMethod
qd_score(a::MAPElitesArchive; worst_fitness=0.0) -> Float64

Quality-Diversity score: sum over filled cells of (worst_fitness - f_cell). Rewards archives that are both deeply filled and broad. Since the framework uses lower-is-better fitness, larger QD score means the archive holds better solutions across more bins. Pass worst_fitness = maximum of the task's plausible loss range for a meaningful score.

source
CommonSolve.solveMethod
solve(problem::GPProblem{G,E}, alg::MAPElites; verbose=false, log=nothing) -> MAPElitesResult{G}

Run MAP-Elites. Requires problem.evaluator to implement evaluate_genome(g, e). The feature function and grid are supplied via alg. Seeds the archive with alg.n_init random individuals via the problem's _initialize_population path, then alternates generations of (sample parent from archive → mutate/crossover → evaluate → maybe insert).

log::Union{Nothing, RunLog} records per-generation coverage and qd_score alongside the standard fitness stats (best from archive as a scalar).

source
Arborist.flatten_weightsFunction
flatten_weights(g::AbstractGenome) -> Vector{Float64}

Return the genome's continuous parameters as a Float64 vector. Defined for GraphGenome over its enabled connection weights, sorted by innovation number for determinism.

Raises MethodError for genome types that don't expose a continuous parameter vector — those genomes can't be searched by CMA-ES directly.

source
Arborist.unflatten_weights!Function
unflatten_weights!(g::AbstractGenome, w::Vector{Float64}) -> AbstractGenome

Inverse of flatten_weights: writes w back into g's continuous parameters. Mutates g and returns it. Length of w must match length(flatten_weights(g)).

source
CommonSolve.solveMethod
solve(problem::GPProblem{G,E}, alg::CMAES; verbose=false, log=nothing) -> GPResult{G}

Run CMA-ES against the problem's evaluator on a fixed-topology genome. Requires flatten_weights(g) / unflatten_weights!(g, w) to be defined for G. The initial genome (constructed via _initialize_population) provides both the topology and, when alg.seed_genome=true, the starting mean of the search distribution.

Returns a GPResult{G} with the best genome found.

source
Arborist.setup_workersMethod
setup_workers(script_path::AbstractString)

Convenience: load a user script on all workers. Equivalent to:

@everywhere include(abspath("your_script.jl"))

Also ensures Arborist is loaded on all workers. The primary documented approach is @everywhere include(...) directly.

source
Arborist.behavioral_initializeMethod
behavioral_initialize(state, evaluator, fingerprint_fn, distance_fn,
                      target_size; kwargs...) -> Vector{ExprGenome}

Generate a behaviorally diverse initial population using a MAP-Elites-style procedure:

  1. Generate pool_size random programs
  2. Evaluate each, discard crashed/timed-out programs (fitness == Inf)
  3. Compute behavioral fingerprints
  4. Greedily bin by fingerprint distance (threshold-based clustering)
  5. Select the best-fitness representative from each bin
  6. Return target_size genomes drawn from distinct bins

This replaces the standard random initialization with a pool that guarantees behavioral diversity — programs that do different things, not just look different syntactically.

Arguments

  • state::GenState: the genome state (function set, variable types, RNG)
  • evaluator::AbstractEvaluator: for fitness evaluation
  • fingerprint_fn: genome -> fingerprint (any type; same as BehavioralSpeciation)
  • distance_fn: (fp_a, fp_b) -> Float64 (same as BehavioralSpeciation)
  • target_size::Int: desired population size

Keyword Arguments

  • pool_size::Int = target_size * 50: number of random programs to generate
  • bin_threshold::Float64 = 0.1: distance below which two fingerprints are considered the same behavior
  • body_generator = nothing: optional (state) -> Vector{Expr} for custom random program generation. Default: 3-7 random statements via create_random_statement.
  • bloat_penalty::Float64 = 0.0: bloat penalty for evaluation
  • parallel::Bool = false: use threads for evaluation
  • verbose::Bool = false: print progress

Returns

A Vector{ExprGenome} of length target_size with diverse behaviors. If fewer than target_size distinct behaviors are found in the pool, remaining slots are filled with the best-fitness programs from the pool.

source
CommonSolve.solveMethod
solve(problem::GPProblem{G,E}, algorithm::GeneticProgramming;
      verbose=false, callback=nothing, log=nothing,
      checkpoint_every=0, checkpoint_path=nothing,
      resume_from=nothing, allow_signature_mismatch=false,
      initial_population=nothing, hall_of_fame_size=0) -> GPResult{G}

Run a genetic programming evolution using the specified problem and algorithm configuration. Returns a GPResult containing the best genome, fitness history, and run metadata.

Arguments

  • problem::GPProblem{G,E}: the problem specification (evaluator, genome type, function set)
  • algorithm::GeneticProgramming: the algorithm configuration

Keyword Arguments

  • verbose::Bool=false: if true, print generation statistics
  • callback: optional callback function (gen::Int, best_fitness::Float64, best_genome::G) -> nothing
  • log::Union{Nothing, RunLog}=nothing: optional structured per-generation log. When provided, record! is called once per generation with aggregate fitness, speciation snapshot, structural diversity, and wall-time.
  • checkpoint_every::Int=0: write a checkpoint every N generations. 0 disables checkpointing. When > 0, checkpoint_path must also be set. Supported on ExprGenome and TreeGenome only.
  • checkpoint_path::Union{Nothing, AbstractString}=nothing: file path the periodic checkpoints (and the final-generation checkpoint) are written to via save_checkpoint. Writes are atomic (write-tmp + rename).
  • resume_from::Union{Nothing, AbstractString}=nothing: path to a checkpoint produced by a prior solve. Loads the population, generation counter, RNG state, fitness history, and all-time best from the checkpoint and continues from there. Mutually exclusive with initial_population.
  • allow_signature_mismatch::Bool=false: by default, resume_from rejects a checkpoint whose _algorithm_signature does not match the current algorithm. Pass true to override intentionally (e.g., changing hyperparameters mid-run).
  • initial_population::Union{Nothing, Vector{<:AbstractGenome}}=nothing: warm- start the GA from a user-provided seed pool. Length must equal algorithm.pop_size and element type must match G. Mutually exclusive with resume_from.
  • hall_of_fame_size::Int=0: when > 0, attach a HallOfFame{G} archive of this capacity to the result. The archive tracks the top-K distinct fitnesses observed across all generations (best-first), surviving elitism loss. The archive is exposed as result.hall_of_fame. 0 (default) disables the archive and leaves result.hall_of_fame === nothing.
source
CommonSolve.solveMethod
solve(problem::GPProblem{G,E}, algorithm::IslandModel; verbose=false, callback=nothing) -> GPResult{G}

Run an island-model evolution with multiple independent populations and periodic migration. Islands run sequentially (no threading). Migration uses a ring topology.

Returns a GPResult containing the global best genome across all islands.

source
Arborist.migration_targetsMethod
migration_targets(topology, i, n_islands, rng) -> Vector{Int}

Return destination island indices for emigrants from island i. The rng argument is required by the interface but ignored by deterministic topologies (Ring, Complete).

source
Arborist.apply_sharingMethod
apply_sharing(raw_fitness::Float64, species_size::Int, formula::Symbol) -> Float64

Apply fitness sharing to penalize members of large species (for minimization: lower = better). A larger penalty means more diversity pressure.

Formula options:

  • :none — no sharing (raw fitness unchanged)
  • :linear — multiply by species_size (strong; NEAT default for topology protection)
  • :sqrt — multiply by sqrt(species_size) (moderate; good for behavioral optimization)
  • :log2 — multiply by (1 + log2(species_size)) (gentle; singletons unpenalized)
source

Constants