Reproducibility
Fully Reproducible Runs
Arborist.jl supports fully reproducible GP runs via explicit RNG seeding:
problem = GPProblem(evaluator, ExprGenome; seed=42)
algorithm = GeneticProgramming(parallel=false) # disable threading
result = solve(problem, algorithm)With parallel=false and a fixed seed:
- Population initialization is deterministic
- All random operations use the seeded RNG
- Collection ordering is canonicalized (Dict/Set items are sorted)
- Results are identical across runs and Julia sessions
Parallel Evaluation and Reproducibility
When parallel=true (the default), population evaluation uses Threads.@threads, which introduces non-deterministic evaluation order:
algorithm = GeneticProgramming(parallel=true) # defaultWith parallel evaluation:
- Population initialization is still deterministic from the seed
- Evaluation ORDER within a generation is non-deterministic
- This means the exact fitness values may differ between runs if evaluation has side effects on shared state (rare for most problems)
- For ExprGenome,
@evalacquires a global lock, so the actual speedup from parallel evaluation is limited
Recommendation: Use parallel=false for research requiring exact reproducibility. Use parallel=true (default) for production runs where speed matters more than exact reproducibility.
TreeGenome Reproducibility
TreeGenome evaluation is fully deterministic (no @eval, no global state). With parallel=false and a fixed seed, TreeGenome runs are perfectly reproducible.
With parallel=true, TreeGenome evaluation IS thread-safe and individual evaluations are independent, but thread scheduling order is non-deterministic.
GraphGenome Reproducibility
GraphGenome uses a global innovation counter for NEAT-style crossover. The counter is reset at the start of each solve() call via reset_innovation_counter!(). With parallel=false and a fixed seed, GraphGenome runs are reproducible.
The innovation counter is process-local, so each worker would otherwise assign overlapping innovation numbers to structurally distinct genes. IslandModel(distributed=true) works around this by assigning each worker a disjoint innovation-ID range via init_innovation_range!((island_id - 1) * INNOVATION_STRIDE) at init time, so cross-process IDs cannot collide. The trade-off: structurally identical mutations on different workers receive different IDs and are treated as disjoint by NEAT crossover rather than aligned. Per-generation cross-worker innovation dedup is not implemented; in practice this is a soft cost on diversity, not a correctness bug.
Starting Julia with Multiple Threads
To benefit from parallel evaluation:
julia -t auto # use all available cores
julia -t 4 # use exactly 4 threads
JULIA_NUM_THREADS=auto julia # via environment variable