Multistart
Nonlinear models frequently have multiple local optima, making the estimated parameters sensitive to initialization. The Multistart wrapper addresses this by running multiple fits from different initial parameter values and selecting the best result. This strategy is especially important for complex models where a single optimization run provides limited confidence that the global optimum has been found.
Multistart operates as a method-agnostic wrapper around fit_model: it samples many candidate starting points, screens them with a cheap log-likelihood evaluation to identify the most promising ones, then fully optimizes only the top candidates. Because each optimization is independent, the runs can be executed in parallel.
The call pattern is:
res_ms = fit_model(ms, dm, method; kwargs...)where ms is a NoLimits.Multistart(...) object and method is any fitting method.
Supported Methods
Multistart wraps any FittingMethod and has been tested with:
MLEMAPLaplaceLaplaceMAPFOCEIFOCEIMAPMCEMSAEMVI(see note below)MCMC(supported, but usually not recommended as a primary restart strategy)
Recommendation
Multistart is most beneficial for optimization- and EM-based methods (MLE, MAP, Laplace, LaplaceMAP, FOCEI, FOCEIMAP, MCEM, SAEM, VI), where the choice of starting values strongly influences which local optimum is found.
For MCMC, multistart is technically supported but generally not recommended as the primary strategy. In most Bayesian workflows, tuning sampler settings and chain diagnostics is more effective than varying initial values across restarts.
Constructor
using NoLimits
using Random
using SciMLBase
ms = NoLimits.Multistart(;
dists=NamedTuple(),
n_draws_requested=100,
n_draws_used=50,
sampling=:random, # :random or :lhs
serialization=EnsembleSerial(), # controls parallelism across starts
rng=Random.default_rng(),
)Two-Phase Workflow
Multistart uses a two-phase approach to avoid running full optimizations from all sampled candidates:
Phase 1 — Screening. All n_draws_requested candidates are evaluated cheaply by computing the marginal log-likelihood at η = 0 (no random-effects perturbation) using a pre-compiled ODE and covariate cache. Candidates are ranked by this screening log-likelihood and the top n_draws_used are selected via a partial sort — no full optimization is performed during this phase.
Phase 2 — Optimization. The selected n_draws_used candidates are passed to the wrapped fitting method as independent starting points. Each run produces a full FitResult. The run with the best final objective is returned as the best result.
If n_draws_requested == n_draws_used, Phase 1 is skipped entirely — no screening cache is built and all candidates proceed directly to optimization.
Progress Logging
At the start of each fit_model call, a summary is logged:
┌ Info: Multistart
│ candidates = 20
│ selected = 5
│ varying = "a, σ"
│ best_screening_ll = -12.4
└ worst_screening_ll = -18.7candidates— total starting points generated (=n_draws_requested, after any automatic adjustment).selected— candidates forwarded to full optimization (=n_draws_used).varying— names of parameters whose values differ across starts (those with a prior or an entry indists).best_screening_ll/worst_screening_ll— log-likelihood range of the selected candidates at η = 0. Omitted when screening is skipped (candidates == selected) or when all candidates produce non-finite likelihoods.
Screening and Method Direction
Screening always ranks candidates by the marginal log-likelihood (higher is better). This is the correct direction for every supported method:
- MLE / MAP / Laplace / FOCEI / MCEM / SAEM / VI — all internally minimize the negative log-likelihood (or a penalized variant). Selecting candidates with the highest screening LL puts the optimizer in the most promising region.
- MCMC — though MCMC does not optimize, starting from a high-likelihood region improves early mixing and reduces warm-up cost.
No sign adjustment is made per-method; the screening criterion is uniform.
How Starts Are Built
Starting points are constructed as follows:
- Start 1 is always the model's default initial fixed-effect values (obtained via
get_θ0_untransformed). - Subsequent starts are sampled independently for each parameter, drawing from:
- the distribution specified in
ms.distsfor that parameter name, if provided; - the fixed-effect prior, if one is defined;
- otherwise, the parameter retains its default value across all starts.
- the distribution specified in
All sampled values are validated against the natural-scale bounds of each parameter. If any sampled value violates its bounds, an error is raised before fitting begins.
Distribution Inputs
The dists argument is a NamedTuple keyed by fixed-effect names. The expected distribution type depends on the parameter structure:
- Scalar parameters: a univariate distribution (e.g.,
Normal(...)). - Vector or matrix parameters: either a single multivariate/matrix-variate distribution, or an array of element-wise univariate distributions.
For square-matrix parameters, Multistart symmetrizes the sampled matrix and applies a small diagonal perturbation if needed to ensure numerical stability.
Sampling Modes
Two sampling strategies are available:
sampling=:random– Draws are taken directly from the specified distributions.sampling=:lhs– Latin Hypercube Sampling is used when the distribution supports quantile-based inversion, producing more uniform coverage of the parameter space. For distributions where a direct LHS quantile path is unavailable, the method falls back to random draws.
Requested vs. Used Draws
The n_draws_requested and n_draws_used parameters control the two phases:
| Parameter | Phase | Effect |
|---|---|---|
n_draws_requested | Screening | Total candidates sampled. More candidates give better coverage but increase screening cost. |
n_draws_used | Optimization | Candidates forwarded to full optimization. Larger values improve coverage but increase runtime proportionally. |
If n_draws_used exceeds n_draws_requested, the number of requested draws is automatically increased to match and a warning is emitted.
Setting n_draws_requested == n_draws_used disables screening: all candidates proceed directly to optimization without any pre-evaluation.
Scoring and Best-Run Selection
After all fits complete, successful runs are ranked by the following scoring rule:
- The objective value from
get_objective, if finite. - Otherwise, the negative log-likelihood from
-get_loglikelihood, if finite. - Otherwise,
Inf(effectively deprioritizing the run).
The run with the lowest score is selected as the best result:
best = get_multistart_best(res_ms)
best_idx = get_multistart_best_index(res_ms)If all runs fail, Multistart raises an error reporting the first recorded failure.
Parallelism and RNG Behavior
The ms.serialization field controls execution of the individual fits:
EnsembleSerial(): Fits run sequentially in a single thread.EnsembleThreads(): Fits run in parallel across available threads.
Note that screening (Phase 1) always runs serially regardless of serialization; only Phase 2 (the full optimizations) is parallelized.
Random number generator behavior depends on how rng is supplied:
- If
rngis not passed tofit_model(ms, ...), each start receives an internally spawned child RNG to ensure independence. - If
rngis explicitly provided in the fit keywords, that generator is forwarded to every underlying fit call.
Fit Keyword Forwarding
All fit keywords are forwarded to the wrapped method, with one exception:
theta_0_untransformedis ignored (with a warning), becauseMultistartmanages starting points internally.
All other keywords – such as constants, constants_re, ode_args, ode_kwargs, serialization, and store_data_model – are passed through unchanged.
Multistart Result Accessors
The MultistartFitResult provides detailed access to both successful and failed runs:
ok_runs = get_multistart_results(res_ms)
ok_starts = get_multistart_starts(res_ms)
failed_runs = get_multistart_failed_results(res_ms)
failed_starts = get_multistart_failed_starts(res_ms)
failed_errors = get_multistart_errors(res_ms)
best_run = get_multistart_best(res_ms)
best_idx = get_multistart_best_index(res_ms)Standard fit accessors also work directly on a MultistartFitResult, dispatching to the best run:
theta_best = get_params(res_ms; scale=:untransformed)
obj_best = get_objective(res_ms)Example: Fixed-Effects MLE with Screening
The following example generates 20 candidates via LHS, screens them to the top 5 by log-likelihood, and runs 5 full optimizations:
using NoLimits
using DataFrames
using Distributions
model = @Model begin
@covariates begin
t = Covariate()
end
@fixedEffects begin
a = RealNumber(0.2)
sigma = RealNumber(0.5, scale=:log)
end
@formulas begin
y ~ Laplace(a, sigma)
end
end
df = DataFrame(
ID = [:A, :A, :B, :B],
t = [0.0, 1.0, 0.0, 1.0],
y = [0.1, 0.2, 0.0, -0.1],
)
dm = DataModel(model, df; primary_id=:ID, time_col=:t)
ms = NoLimits.Multistart(;
dists = (; a=Normal(0.0, 1.0)),
n_draws_requested = 20,
n_draws_used = 5,
sampling = :lhs,
)
res_ms = fit_model(ms, dm, NoLimits.MLE(; optim_kwargs=(maxiters=80,)))
best = get_multistart_best(res_ms)
theta_best = get_params(res_ms; scale=:untransformed)The logged output will look similar to:
┌ Info: Multistart
│ candidates = 20
│ selected = 5
│ varying = "a"
│ best_screening_ll = -2.1
└ worst_screening_ll = -8.4Example: Variational Inference with Multistart
VI benefits from multistart when the ELBO landscape is multimodal. The usage is identical to any other method:
using NoLimits
using DataFrames
using Distributions
using Random
model = @Model begin
@covariates begin
t = Covariate()
end
@fixedEffects begin
a = RealNumber(0.0, prior=Normal(0.0, 1.0))
sigma = RealNumber(0.5, scale=:log, prior=LogNormal(0.0, 0.5))
end
@formulas begin
y ~ Normal(a, sigma)
end
end
df = DataFrame(
ID = [:A, :A, :B, :B],
t = [0.0, 1.0, 0.0, 1.0],
y = [0.1, 0.2, 0.0, -0.1],
)
dm = DataModel(model, df; primary_id=:ID, time_col=:t)
ms = NoLimits.Multistart(;
n_draws_requested = 10,
n_draws_used = 3,
sampling = :lhs,
rng = Random.Xoshiro(42),
)
res_ms = fit_model(
ms, dm,
NoLimits.VI(; turing_kwargs=(max_iter=300, family=:meanfield, progress=false)),
rng=Random.Xoshiro(1),
)
posterior = get_variational_posterior(res_ms)
objective = get_objective(res_ms) # final ELBO of the best runOptional: MCMC with Multistart (Supported, Usually Not Recommended)
While multistart is primarily designed for optimization-based methods, it can be used with MCMC when a restart-style sampling workflow is explicitly desired:
using NoLimits
using DataFrames
using Distributions
using Turing
model = @Model begin
@covariates begin
t = Covariate()
end
@fixedEffects begin
a = RealNumber(0.2, prior=Normal(0.0, 1.0))
sigma = RealNumber(0.5, scale=:log, prior=LogNormal(0.0, 0.2))
end
@formulas begin
y ~ LogNormal(a, sigma)
end
end
df = DataFrame(
ID = [:A, :A, :B, :B],
t = [0.0, 1.0, 0.0, 1.0],
y = [1.0, 1.1, 0.9, 1.0],
)
dm = DataModel(model, df; primary_id=:ID, time_col=:t)
ms = NoLimits.Multistart(; n_draws_requested=6, n_draws_used=3)
res_ms = fit_model(
ms, dm,
NoLimits.MCMC(; sampler=MH(), turing_kwargs=(n_samples=200, n_adapt=0, progress=false)),
)
chain_best = get_chain(res_ms)Use this pattern only when a restart-style MCMC workflow is explicitly needed.