MCEM
The Monte Carlo Expectation-Maximization (MCEM) algorithm is a likelihood-based approach for fitting nonlinear mixed-effects models when the marginal likelihood cannot be computed in closed form. It alternates between two steps:
- an E-step that draws samples from (or forms a weighted approximation of) the conditional distribution of random effects given the current fixed-effect estimates, and
- an optimization-based M-step that updates the fixed effects by maximizing the Monte Carlo approximation of the expected complete-data log-likelihood.
This formulation accommodates arbitrary nonlinear observation models, including those defined through ordinary differential equations, without requiring analytically tractable integrals over the random effects.
E-Step Strategies
MCEM supports two E-step implementations, controlled by the e_step argument.
MCMC E-step (MCEM_MCMC)
The default strategy. Draws samples from the exact conditional distribution p(b | y, θ) using Turing.jl.
NoLimits.MCEM_MCMC(;
sampler = Turing.NUTS(0.75),
turing_kwargs = NamedTuple(),
sample_schedule = 250,
warm_start = true,
)sampler— Turing sampler. Common choices:MH(),NUTS(...).turing_kwargs— forwarded to Turing; the keysn_samplesandn_adaptare interpreted explicitly.sample_schedule— number of MCMC samples per iteration; accepts an integer, a vector (iteration-indexed), or a functioniter -> n_samples.warm_start— whentrue, reuses previous latent-state values as chain initialization.
Importance Sampling E-step (MCEM_IS)
An alternative E-step based on self-normalized importance sampling (IS). Samples are drawn from a tractable proposal q(b) and reweighted by log p(y, b | θ) − log q(b). The weighted Q-function replaces the uniform average used by the MCMC E-step.
IS is often substantially faster per iteration than MCMC because it requires only independent forward passes through the log-likelihood rather than a sequential Markov chain.
NoLimits.MCEM_IS(;
n_samples = 200,
proposal = :prior,
adapt = true,
warm_start_mcmc_iters = 0,
mcmc_warmup = nothing,
)n_samples— number of IS draws per E-step iteration.proposal— proposal distribution (see Proposal Modes below).adapt— whentrue, updates the Gaussian proposal blocks after each IS iteration using IS-weighted statistics of the current samples.warm_start_mcmc_iters— number of initial iterations to run with the MCMC E-step before switching to IS. Use this to pre-warm the Gaussian proposal.mcmc_warmup— anMCEM_MCMCconfiguration used during the warm-up phase (required whenwarm_start_mcmc_iters > 0).
Proposal Modes
proposal | Behavior |
|---|---|
:prior | Each RE level is sampled independently from its prior p(η | θ). No adaptation. |
:gaussian | Adaptive block-diagonal Gaussian in the bijected (unconstrained) space. Falls back to the prior in the first iteration. The blocks are updated after each iteration using IS-weighted posterior statistics. |
Function | User-supplied function with signature (θ, batch_info, re_dists, rng, n_samples) → (samples, log_qs). |
The :gaussian proposal approximates the per-individual posterior p(η | yᵢ, θ) in the bijected space. Because all samples are independent, the effective sample size (ESS) is typically close to n_samples, and the proposal improves with each iteration.
MCMC Warm-up then IS
When the Gaussian proposal needs a cold start, a short MCMC warm-up can be used to initialize the blocks before switching to IS:
es = NoLimits.MCEM_IS(
n_samples = 200,
proposal = :gaussian,
adapt = true,
warm_start_mcmc_iters = 5,
mcmc_warmup = NoLimits.MCEM_MCMC(
sampler = MH(),
turing_kwargs = (n_samples=50, n_adapt=0, progress=false),
),
)Iterations 1–5 use the MCMC E-step; the Gaussian proposal is initialized from those MCMC samples (with uniform weights). From iteration 6 onward the IS E-step is used.
User-Supplied Proposal
function my_proposal(θ, batch_info, re_dists, rng, n_samples)
nb = batch_info.n_b
samples = randn(rng, nb, n_samples) .* 2.0
log_qs = vec(sum(logpdf.(Normal(0.0, 2.0), samples); dims=1))
return samples, log_qs
end
es = NoLimits.MCEM_IS(n_samples=100, proposal=my_proposal)The function must return:
samples— matrix of shape(n_b, n_samples)wheren_bis the total dimension of all RE levels in the batch.log_qs— vector of lengthn_samplescontaininglog q(bₘ)for each draw.
Applicability
MCEM is designed for models that include both fixed and random effects:
- The model must declare at least one random effect and at least one free fixed effect.
- Multiple random-effect grouping columns and multivariate random effects are fully supported.
If fixed-effect priors are defined in the model, MCEM ignores them in its objective. To incorporate priors, use LaplaceMAP or MCMC instead.
Basic Usage
using NoLimits
using DataFrames
using Distributions
using Turing
model = @Model begin
@fixedEffects begin
a = RealNumber(0.2)
b = RealNumber(0.1)
sigma = RealNumber(0.3, scale=:log)
end
@covariates begin
t = Covariate()
end
@randomEffects begin
eta = RandomEffect(Normal(0.0, 1.0); column=:ID)
end
@formulas begin
mu = a + b * t + exp(eta)
y ~ Normal(mu, sigma)
end
end
df = DataFrame(
ID = [:A, :A, :B, :B, :C, :C],
t = [0.0, 1.0, 0.0, 1.0, 0.0, 1.0],
y = [1.0, 1.3, 0.9, 1.2, 1.1, 1.5],
)
dm = DataModel(model, df; primary_id=:ID, time_col=:t)MCMC E-step (classic):
res = fit_model(dm, NoLimits.MCEM(
e_step = NoLimits.MCEM_MCMC(
sampler = MH(),
turing_kwargs = (n_samples=50, n_adapt=0, progress=false),
),
maxiters = 30,
))IS E-step with prior proposal:
res = fit_model(dm, NoLimits.MCEM(
e_step = NoLimits.MCEM_IS(n_samples=200, proposal=:prior),
maxiters = 30,
))IS E-step with adaptive Gaussian proposal:
res = fit_model(dm, NoLimits.MCEM(
e_step = NoLimits.MCEM_IS(n_samples=200, proposal=:gaussian, adapt=true),
maxiters = 30,
))Constructor Options
The full set of constructor arguments is shown below. All arguments have defaults and are keyword-only.
using Optimization
using OptimizationOptimJL
using LineSearches
using Turing
method = NoLimits.MCEM(;
# E-step (new unified interface)
e_step=NoLimits.MCEM_MCMC(), # or MCEM_IS(...)
# Backward-compatible MCMC shorthand (passed to MCEM_MCMC internally)
# sampler=Turing.NUTS(0.75),
# turing_kwargs=NamedTuple(),
# sample_schedule=250,
# warm_start=true,
# M-step
optimizer=OptimizationOptimJL.LBFGS(linesearch=LineSearches.BackTracking()),
optim_kwargs=NamedTuple(),
adtype=Optimization.AutoForwardDiff(),
# EM convergence
verbose=false,
progress=true,
maxiters=100,
rtol_theta=1e-4,
atol_theta=1e-6,
rtol_Q=1e-4,
atol_Q=1e-6,
consecutive_params=3,
# Final EB estimation
ebe_optimizer=OptimizationOptimJL.LBFGS(linesearch=LineSearches.BackTracking()),
ebe_optim_kwargs=NamedTuple(),
ebe_adtype=Optimization.AutoForwardDiff(),
ebe_grad_tol=:auto,
ebe_multistart_n=50,
ebe_multistart_k=10,
ebe_multistart_max_rounds=5,
ebe_multistart_sampling=:lhs,
ebe_rescue_on_high_grad=true,
ebe_rescue_multistart_n=128,
ebe_rescue_multistart_k=32,
ebe_rescue_max_rounds=8,
ebe_rescue_grad_tol=:auto,
ebe_rescue_multistart_sampling=:lhs,
# Bounds
lb=nothing,
ub=nothing,
)Option Groups
| Group | Keywords | What they control |
|---|---|---|
| E-step | e_step | Sampling or IS strategy for the random effects. |
| M-step optimizer | optimizer, optim_kwargs, adtype | Optimization of fixed effects using Optimization.jl. |
| EM stopping | maxiters, rtol_theta, atol_theta, rtol_Q, atol_Q, consecutive_params | Iteration limit and convergence checks. |
| Logging | verbose, progress | Diagnostic output and progress bar. |
| Final EB estimation | ebe_* and ebe_rescue_* options | Post-fit empirical Bayes mode computation used by random-effects accessors and diagnostics. |
| Bounds | lb, ub | Optional transformed-scale bounds for free fixed effects in M-step optimization. |
Constructor Input Reference
M-step Optimization Inputs
optimizer— optimizer for fixed effects in each MCEM iteration. Default:LBFGSwith backtracking line search.optim_kwargs— keyword arguments forwarded toOptimization.solvefor the M-step optimizer.adtype— AD backend for the M-step objective.
EM Convergence Inputs
MCEM monitors both fixed-effect parameter stability and Q-function stability.
maxiters— maximum number of MCEM iterations.rtol_theta,atol_theta— relative and absolute tolerances for fixed-effect stabilization.rtol_Q,atol_Q— relative and absolute tolerances for Q-function stabilization.consecutive_params— number of consecutive iterations that must simultaneously satisfy both criteria before convergence is declared.verbose— enables iteration-level logging ofQ,dtheta,dQ, and sample count.progress— enables or disables the progress bar.
Final EB Mode Inputs
After the EM iterations complete, MCEM computes empirical Bayes (EB) modal estimates of the random effects used by downstream accessors.
ebe_optimizer,ebe_optim_kwargs,ebe_adtype— optimizer, solve arguments, and AD backend for EB mode optimization.ebe_grad_tol— gradient tolerance for EB optimization (:autoselects a data-adaptive value).ebe_multistart_n,ebe_multistart_k,ebe_multistart_max_rounds,ebe_multistart_sampling— multistart configuration for EB optimization.ebe_rescue_on_high_grad— enables a rescue multistart if the final EB gradient norm remains above threshold.ebe_rescue_multistart_n,ebe_rescue_multistart_k,ebe_rescue_max_rounds,ebe_rescue_grad_tol,ebe_rescue_multistart_sampling— configuration for the rescue strategy.
Bound Inputs
lb,ub— optional transformed-scale bounds for free fixed effects. Parameters held constant via theconstantsfit keyword are excluded automatically.
Diagnostics
The notes field of the fit result contains iteration-level diagnostics:
diag = get_notes(res).diagnostics
diag.Q_hist # Vector{Float64} — Q-function value per iteration
diag.ess_hist # Vector{Float64} — effective sample size per iteration
# IS iterations: ESS = 1/Σwₘ² ∈ [1, n_samples]
# MCMC iterations: NaNThe ESS is a per-iteration summary of proposal quality. Values close to n_samples indicate that the proposal closely approximates the posterior; low values indicate weight degeneracy (proposal too wide or misaligned).
using Plots
plot(diag.Q_hist, label="Q", xlabel="iteration")
plot(diag.ess_hist, label="ESS", xlabel="iteration")Backward Compatibility
The legacy MCMC-only keyword interface is fully preserved. Existing code that does not use e_step continues to work:
# Old API — still works
method = NoLimits.MCEM(
sampler = MH(),
turing_kwargs = (n_samples=50, n_adapt=0, progress=false),
maxiters = 20,
)When sampler or turing_kwargs are passed without an explicit e_step, they are forwarded to an internally constructed MCEM_MCMC.
Fit Keywords
res = fit_model(
dm,
method;
constants=(a=0.2,),
constants_re=(; eta=(; A=0.0,)),
penalty=NamedTuple(),
ode_args=(),
ode_kwargs=NamedTuple(),
serialization=EnsembleSerial(),
rng=Random.default_rng(),
theta_0_untransformed=nothing,
store_eb_modes=true,
)The constants_re argument allows specific random-effect levels to be fixed at known values while the remaining levels are estimated.
Optimization.jl Interface (M-step and EB)
MCEM uses Optimization.jl in two places:
- The M-step optimization over fixed effects at each iteration.
- The EB mode optimization (final or recomputed EB modes) through the
ebe_*configuration.
Tested M-step optimizers include:
OptimizationOptimJL.LBFGS(...)(default)OptimizationOptimisers.Adam(...)OptimizationBBO.BBO_adaptive_de_rand_1_bin_radiuslimited()
When using derivative-free optimizers such as those from OptimizationBBO, finite bounds must be supplied:
using OptimizationBBO
lb, ub = default_bounds_from_start(dm; margin=1.0)
method_bbo = NoLimits.MCEM(;
optimizer=OptimizationBBO.BBO_adaptive_de_rand_1_bin_radiuslimited(),
optim_kwargs=(iterations=20,),
e_step=NoLimits.MCEM_IS(n_samples=100, proposal=:gaussian),
maxiters=5,
lb=lb,
ub=ub,
)Accessing Results
After fitting, results are accessed through the standard accessor interface.
theta_u = NoLimits.get_params(res; scale=:untransformed)
obj = get_objective(res)
ok = get_converged(res)
re_df = get_random_effects(res)