@formulas
The @formulas block specifies the observation model that links latent quantities to measured data. It defines deterministic intermediate expressions and the statistical distributions assumed for each observed outcome. Within @Model, this block is required.
Core Syntax
Two statement forms are supported inside @formulas:
- Deterministic assignment:
name = expressiondefines an intermediate quantity computed from model components. - Observation definition:
name ~ distribution_expressiondeclares a likelihood contribution for a named outcome column.
form = @formulas begin
lin = a + eta^2 + x.Age
y ~ Laplace(lin, sigma)
endParsing and Validation Rules
The macro enforces the following constraints at model-construction time:
- The block must use
begin ... endsyntax. - Only assignments (
=) and observation statements (~) are permitted. - The left-hand side of each statement must be a plain symbol.
- The reserved symbols
tandξcannot appear on the left-hand side. - Duplicate names within deterministic definitions, within observation definitions, or across both categories are rejected.
Symbol Resolution and Namespace Rules
Expressions inside @formulas can reference symbols drawn from several model namespaces:
- Fixed effects
- Random effects
- PreDE outputs
- Constant covariates
- Varying covariates
- Helper functions (
@helpers) - Model functions from learned parameter blocks (e.g., neural networks, soft trees)
If a symbol name is shared across more than one namespace among fixed effects, random effects, preDE outputs, and covariates, the macro raises an ambiguity error to prevent silent resolution conflicts.
State and Signal Access from ODE Models
When the model includes a differential equation system, ODE states and derived signals can be referenced in formulas subject to the following rules:
- States and signals must be called with an explicit time argument, e.g.,
x1(t)ors(t). - Bare references without a time argument (e.g.,
x1,s) are rejected. - The set of required states and signals is inferred automatically and stored as
required_statesandrequired_signals.
At runtime, formulas that depend on ODE outputs receive solution accessor functions generated by get_de_accessors_builder(...).
Time Offsets in State and Signal Calls
Formulas support constant time offsets when accessing ODE states and signals, enabling evaluation at shifted time points:
x1(t + 0.25)x1(t - 0.5)x1(t + (1/4))
These offsets are exposed through get_formulas_time_offsets(...) and handled automatically during DataModel construction:
- Constant offsets extend the integration window and save grid as needed.
- Negative offsets that would require evaluation before the start of an individual's trajectory are rejected.
- Non-constant offsets (e.g., expressions involving covariates) require dense ODE solving mode.
Example: Multiple Deterministic Nodes and Outcomes
The following model illustrates the use of multiple intermediate deterministic expressions feeding into two distinct outcome distributions.
using NoLimits
using Distributions
model = @Model begin
@fixedEffects begin
a = RealNumber(0.2)
b = RealNumber(0.1)
s1 = RealNumber(0.3, scale=:log)
s2 = RealNumber(0.4, scale=:log)
end
@covariates begin
t = Covariate()
x = ConstantCovariateVector([:Age, :BMI]; constant_on=:ID)
end
@randomEffects begin
eta = RandomEffect(TDist(6.0); column=:ID)
end
@formulas begin
d1 = a + 0.01 * x.Age^2 + tanh(eta)
d2 = d1 + b * log1p(x.BMI^2) + eta^2
y1 ~ LogNormal(d2, s1)
y2 ~ Gamma(d1^2 + abs(eta) + 1e-6, s2)
end
endExample: Helper Functions and Learned Model Functions
User-defined helpers and learned parameter blocks (such as neural networks) can be called directly within formula expressions.
using NoLimits
using Distributions
using Lux
chain = Chain(Dense(2, 4, tanh), Dense(4, 1))
model = @Model begin
@helpers begin
softplus(u) = log1p(exp(u))
end
@fixedEffects begin
sigma = RealNumber(0.4, scale=:log)
z = NNParameters(chain; function_name=:NN1, calculate_se=false)
end
@covariates begin
t = Covariate()
x = ConstantCovariateVector([:Age, :BMI]; constant_on=:ID)
end
@randomEffects begin
eta = RandomEffect(SkewNormal(0.0, 1.0, 0.8); column=:ID)
end
@formulas begin
lin = NN1([x.Age, x.BMI], z)[1] + softplus(eta^2)
y ~ Gamma(lin + 1e-6, sigma)
end
endExample: ODE State and Signal Access
When a model includes @DifferentialEquation, states and derived signals are accessed in formulas using explicit time arguments. Time offsets such as x1(t + 0.25) enable evaluation at shifted time points.
using NoLimits
using Distributions
model = @Model begin
@fixedEffects begin
a = RealNumber(0.2)
sigma = RealNumber(0.3, scale=:log)
end
@covariates begin
t = Covariate()
end
@DifferentialEquation begin
s(t) = sin(t)
D(x1) ~ -a * x1 + s(t)
end
@initialDE begin
x1 = 1.0
end
@formulas begin
mu = x1(t) + s(t) + x1(t + 0.25)
y ~ Exponential(log1p(abs(mu)) + 1e-6)
end
endExample: Hidden Markov Observation Model
For data generated by regime-switching processes, @formulas supports hidden Markov model likelihoods via ContinuousTimeDiscreteStatesHMM. The ContinuousTransitionMatrix parameter type provides an AD-compatible rate matrix where off-diagonal entries are constrained to be non-negative and the diagonal is always derived as minus the row sum.
using NoLimits
using Distributions
model = @Model begin
@fixedEffects begin
Q = ContinuousTransitionMatrix([-0.2 0.2; 0.3 -0.3])
p1 = RealNumber(0.25)
p2 = RealNumber(0.75)
end
@covariates begin
t = Covariate()
delta_t = Covariate()
end
@formulas begin
outcome ~ ContinuousTimeDiscreteStatesHMM(
Q,
(Bernoulli(p1), Bernoulli(p2)),
Categorical([0.6, 0.4]),
delta_t
)
end
endExample: Multivariate Hidden Markov Observation Model
When each observation consists of several outcome variables that share the same hidden state, use MVDiscreteTimeDiscreteStatesHMM or MVContinuousTimeDiscreteStatesHMM. The observable column in the DataFrame should contain Vector{<:Real} values, one vector per observation time.
Two emission modes are supported:
- Conditionally independent: the emission for state
kis aTupleof M scalar distributions, one per outcome. Missing entries in the observation vector are skipped (contribute zero log-likelihood). - Joint
MvNormal: the emission for statekis a singleMvNormal. Partial missings are handled by marginalising analytically over the observed indices.
Discrete time, conditionally independent emissions
The transition matrix is row-stochastic (each row sums to one). The outer Tuple contains one inner Tuple of scalar distributions per state. DiscreteTransitionMatrix handles the row-stochastic constraint automatically via the logistic stick-breaking transform.
using NoLimits
using Distributions
model = @Model begin
@fixedEffects begin
P = DiscreteTransitionMatrix([0.7 0.3; 0.2 0.8])
mu1 = RealNumber(1.0)
mu2 = RealNumber(3.0)
sigma = RealNumber(0.5, scale=:log)
end
@covariates begin
t = Covariate()
end
@formulas begin
outcome ~ MVDiscreteTimeDiscreteStatesHMM(
P,
(
(Normal(mu1, sigma), Bernoulli(0.2)), # state 1: (continuous, binary)
(Normal(mu2, sigma), Bernoulli(0.8)), # state 2: (continuous, binary)
),
Categorical([0.5, 0.5])
)
end
endThe outcome column in the DataFrame must hold two-element vectors (one Float64 and one 0.0/1.0) matching the M=2 outcomes declared by the emission tuples.
Continuous time, joint MvNormal emissions
The transition matrix is a rate matrix (generator): off-diagonal entries ≥ 0, each row sums to zero. State propagation uses the matrix exponential exp(Q · Δt). Joint MvNormal emissions enable full cross-outcome correlation within each state. ContinuousTransitionMatrix enforces all rate-matrix constraints and exposes the off-diagonal rates as log-transformed free parameters.
using NoLimits
using Distributions
using LinearAlgebra
model = @Model begin
@fixedEffects begin
Q = ContinuousTransitionMatrix([-0.2 0.2; 0.3 -0.3])
mu1 = RealNumber(0.0)
mu2 = RealNumber(2.0)
end
@covariates begin
t = Covariate()
delta_t = Covariate()
end
@formulas begin
outcome ~ MVContinuousTimeDiscreteStatesHMM(
Q,
(
MvNormal([mu1, mu1], I(2)),
MvNormal([mu2, mu2], I(2)),
),
Categorical([0.5, 0.5]),
delta_t
)
end
endHere delta_t is a time-varying covariate holding the elapsed time since the previous observation for each row. The outcome column must hold two-element vectors matching the dimension of the MvNormal emissions. Partially-missing vectors (e.g. [1.2, missing]) are handled by marginalising the MvNormal over the observed index.
Related APIs
The following functions provide programmatic access to the internal representation and evaluation of formulas:
get_formulas_meta(formulas)get_formulas_ir(formulas)get_formulas_builders(formulas; ...)get_formulas_all(formulas, ctx, sol_accessors, constant_covariates_i, varying_covariates; ...)get_formulas_obs(formulas, ctx, sol_accessors, constant_covariates_i, varying_covariates; ...)get_formulas_time_offsets(formulas, state_names, signal_names)