`@formulas`

The @formulas block specifies the observation model that links latent quantities to measured data. It defines deterministic intermediate expressions and the statistical distributions assumed for each observed outcome. Within @Model, this block is required.

Core Syntax

Two statement forms are supported inside @formulas:

Deterministic assignment: name = expression defines an intermediate quantity computed from model components.
Observation definition: name ~ distribution_expression declares a likelihood contribution for a named outcome column.

form = @formulas begin
    lin = a + eta^2 + x.Age
    y ~ Laplace(lin, sigma)
end

Parsing and Validation Rules

The macro enforces the following constraints at model-construction time:

The block must use begin ... end syntax.
Only assignments (=) and observation statements (~) are permitted.
The left-hand side of each statement must be a plain symbol.
The reserved symbols t and ξ cannot appear on the left-hand side.
Duplicate names within deterministic definitions, within observation definitions, or across both categories are rejected.

Symbol Resolution and Namespace Rules

Expressions inside @formulas can reference symbols drawn from several model namespaces:

Fixed effects
Random effects
PreDE outputs
Constant covariates
Varying covariates
Helper functions (@helpers)
Model functions from learned parameter blocks (e.g., neural networks, soft trees)

If a symbol name is shared across more than one namespace among fixed effects, random effects, preDE outputs, and covariates, the macro raises an ambiguity error to prevent silent resolution conflicts.

State and Signal Access from ODE Models

When the model includes a differential equation system, ODE states and derived signals can be referenced in formulas subject to the following rules:

States and signals must be called with an explicit time argument, e.g., x1(t) or s(t).
Bare references without a time argument (e.g., x1, s) are rejected.
The set of required states and signals is inferred automatically and stored as required_states and required_signals.

At runtime, formulas that depend on ODE outputs receive solution accessor functions generated by get_de_accessors_builder(...).

Time Offsets in State and Signal Calls

Formulas support constant time offsets when accessing ODE states and signals, enabling evaluation at shifted time points:

x1(t + 0.25)
x1(t - 0.5)
x1(t + (1/4))

These offsets are exposed through get_formulas_time_offsets(...) and handled automatically during DataModel construction:

Constant offsets extend the integration window and save grid as needed.
Negative offsets that would require evaluation before the start of an individual's trajectory are rejected.
Non-constant offsets (e.g., expressions involving covariates) require dense ODE solving mode.

Example: Multiple Deterministic Nodes and Outcomes

The following model illustrates the use of multiple intermediate deterministic expressions feeding into two distinct outcome distributions.

using NoLimits
using Distributions

model = @Model begin
    @fixedEffects begin
        a = RealNumber(0.2)
        b = RealNumber(0.1)
        s1 = RealNumber(0.3, scale=:log)
        s2 = RealNumber(0.4, scale=:log)
    end

    @covariates begin
        t = Covariate()
        x = ConstantCovariateVector([:Age, :BMI]; constant_on=:ID)
    end

    @randomEffects begin
        eta = RandomEffect(TDist(6.0); column=:ID)
    end

    @formulas begin
        d1 = a + 0.01 * x.Age^2 + tanh(eta)
        d2 = d1 + b * log1p(x.BMI^2) + eta^2
        y1 ~ LogNormal(d2, s1)
        y2 ~ Gamma(d1^2 + abs(eta) + 1e-6, s2)
    end
end

Example: Helper Functions and Learned Model Functions

User-defined helpers and learned parameter blocks (such as neural networks) can be called directly within formula expressions.

using NoLimits
using Distributions
using Lux

chain = Chain(Dense(2, 4, tanh), Dense(4, 1))

model = @Model begin
    @helpers begin
        softplus(u) = log1p(exp(u))
    end

    @fixedEffects begin
        sigma = RealNumber(0.4, scale=:log)
        z = NNParameters(chain; function_name=:NN1, calculate_se=false)
    end

    @covariates begin
        t = Covariate()
        x = ConstantCovariateVector([:Age, :BMI]; constant_on=:ID)
    end

    @randomEffects begin
        eta = RandomEffect(SkewNormal(0.0, 1.0, 0.8); column=:ID)
    end

    @formulas begin
        lin = NN1([x.Age, x.BMI], z)[1] + softplus(eta^2)
        y ~ Gamma(lin + 1e-6, sigma)
    end
end

Example: ODE State and Signal Access

When a model includes @DifferentialEquation, states and derived signals are accessed in formulas using explicit time arguments. Time offsets such as x1(t + 0.25) enable evaluation at shifted time points.

using NoLimits
using Distributions

model = @Model begin
    @fixedEffects begin
        a = RealNumber(0.2)
        sigma = RealNumber(0.3, scale=:log)
    end

    @covariates begin
        t = Covariate()
    end

    @DifferentialEquation begin
        s(t) = sin(t)
        D(x1) ~ -a * x1 + s(t)
    end

    @initialDE begin
        x1 = 1.0
    end

    @formulas begin
        mu = x1(t) + s(t) + x1(t + 0.25)
        y ~ Exponential(log1p(abs(mu)) + 1e-6)
    end
end

Example: Hidden Markov Observation Model

For data generated by regime-switching processes, @formulas supports hidden Markov model likelihoods via ContinuousTimeDiscreteStatesHMM. The ContinuousTransitionMatrix parameter type provides an AD-compatible rate matrix where off-diagonal entries are constrained to be non-negative and the diagonal is always derived as minus the row sum.

using NoLimits
using Distributions

model = @Model begin
    @fixedEffects begin
        Q  = ContinuousTransitionMatrix([-0.2  0.2; 0.3  -0.3])
        p1 = RealNumber(0.25)
        p2 = RealNumber(0.75)
    end

    @covariates begin
        t = Covariate()
        delta_t = Covariate()
    end

    @formulas begin
        outcome ~ ContinuousTimeDiscreteStatesHMM(
            Q,
            (Bernoulli(p1), Bernoulli(p2)),
            Categorical([0.6, 0.4]),
            delta_t
        )
    end
end

Example: Multivariate Hidden Markov Observation Model

When each observation consists of several outcome variables that share the same hidden state, use MVDiscreteTimeDiscreteStatesHMM or MVContinuousTimeDiscreteStatesHMM. The observable column in the DataFrame should contain Vector{<:Real} values, one vector per observation time.

Two emission modes are supported:

Conditionally independent: the emission for state k is a Tuple of M scalar distributions, one per outcome. Missing entries in the observation vector are skipped (contribute zero log-likelihood).
Joint MvNormal: the emission for state k is a single MvNormal. Partial missings are handled by marginalising analytically over the observed indices.

Discrete time, conditionally independent emissions

The transition matrix is row-stochastic (each row sums to one). The outer Tuple contains one inner Tuple of scalar distributions per state. DiscreteTransitionMatrix handles the row-stochastic constraint automatically via the logistic stick-breaking transform.

using NoLimits
using Distributions

model = @Model begin
    @fixedEffects begin
        P     = DiscreteTransitionMatrix([0.7 0.3; 0.2 0.8])
        mu1   = RealNumber(1.0)
        mu2   = RealNumber(3.0)
        sigma = RealNumber(0.5, scale=:log)
    end

    @covariates begin
        t = Covariate()
    end

    @formulas begin
        outcome ~ MVDiscreteTimeDiscreteStatesHMM(
            P,
            (
                (Normal(mu1, sigma), Bernoulli(0.2)),  # state 1: (continuous, binary)
                (Normal(mu2, sigma), Bernoulli(0.8)),  # state 2: (continuous, binary)
            ),
            Categorical([0.5, 0.5])
        )
    end
end

The outcome column in the DataFrame must hold two-element vectors (one Float64 and one 0.0/1.0) matching the M=2 outcomes declared by the emission tuples.

Continuous time, joint MvNormal emissions

The transition matrix is a rate matrix (generator): off-diagonal entries ≥ 0, each row sums to zero. State propagation uses the matrix exponential exp(Q · Δt). Joint MvNormal emissions enable full cross-outcome correlation within each state. ContinuousTransitionMatrix enforces all rate-matrix constraints and exposes the off-diagonal rates as log-transformed free parameters.

using NoLimits
using Distributions
using LinearAlgebra

model = @Model begin
    @fixedEffects begin
        Q   = ContinuousTransitionMatrix([-0.2  0.2; 0.3  -0.3])
        mu1 = RealNumber(0.0)
        mu2 = RealNumber(2.0)
    end

    @covariates begin
        t       = Covariate()
        delta_t = Covariate()
    end

    @formulas begin
        outcome ~ MVContinuousTimeDiscreteStatesHMM(
            Q,
            (
                MvNormal([mu1, mu1], I(2)),
                MvNormal([mu2, mu2], I(2)),
            ),
            Categorical([0.5, 0.5]),
            delta_t
        )
    end
end

Here delta_t is a time-varying covariate holding the elapsed time since the previous observation for each row. The outcome column must hold two-element vectors matching the dimension of the MvNormal emissions. Partially-missing vectors (e.g. [1.2, missing]) are handled by marginalising the MvNormal over the observed index.

The following functions provide programmatic access to the internal representation and evaluation of formulas:

get_formulas_meta(formulas)
get_formulas_ir(formulas)
get_formulas_builders(formulas; ...)
get_formulas_all(formulas, ctx, sol_accessors, constant_covariates_i, varying_covariates; ...)
get_formulas_obs(formulas, ctx, sol_accessors, constant_covariates_i, varying_covariates; ...)
get_formulas_time_offsets(formulas, state_names, signal_names)