ALM

class greybox.alm.ALM(distribution='dnorm', loss='likelihood', occurrence='none', scale_formula=None, orders=(0, 0, 0), alpha=None, shape=None, lambda_bc=None, size=None, nu=None, trim=0.0, lambda_l1=None, lambda_l2=None, nlopt_kargs=None, verbose=0)[source]

Bases: object

Augmented Linear Model estimator.

This estimator fits a linear model with various distributions and loss functions, following scikit-learn principles.

Parameters:
  • distribution (str, default="dnorm") – Distribution name. Options: “dnorm”, “dlaplace”, “ds”, “dgnorm”, “dlogis”, “dt”, “dalaplace”, “dlnorm”, “dllaplace”, “dls”, “dlgnorm”, “dbcnorm”, “dfnorm”, “drectnorm”, “dinvgauss”, “dgamma”, “dexp”, “dchisq”, “dgeom”, “dpois”, “dnbinom”, “dbinom”, “dlogitnorm”, “dbeta”, “plogis”, “pnorm”.

  • loss (str, default="likelihood") – Loss function. Options: “likelihood”, “MSE”, “MAE”, “HAM”, “LASSO”, “RIDGE”, “ROLE”.

  • occurrence (str, default="none") – Occurrence model for zero-inflated data. Options: “none”, “plogis”, “pnorm”.

  • scale_formula (array-like or None, default=None) – Formula for scale parameter. If None, scale is constant.

  • orders (tuple, default=(0, 0, 0)) –

    ARIMA orders (p, d, q). Three integers: AR order (p), differencing order (d), and MA order (q). Only AR(p) and differencing (d) are supported. MA(q) raises NotImplementedError.

    • p (AR order): Number of lagged response variables to include.

    • d (Differencing): Order of differencing. Creates AR(p+d) terms internally but only uses first p in the model.

    • q (MA order): Not implemented.

    Examples:
    • orders=(1, 0, 0): AR(1) model

    • orders=(2, 0, 0): AR(2) model

    • orders=(1, 1, 0): ARIMA(1,1,0) with differencing

  • alpha (float, optional) – Additional parameter for Asymmetric Laplace distribution.

  • shape (float, optional) – Shape parameter for Generalized Normal distribution.

  • lambda_bc (float, optional) – Box-Cox lambda parameter for Box-Cox Normal distribution.

  • size (float, optional) – Size parameter for Negative Binomial/Binomial distributions.

  • nu (float, optional) – Degrees of freedom for Student’s t or Chi-squared distributions.

  • trim (float, default=0.0) – Trim proportion for ROLE loss.

  • lambda_l1 (float, optional) – L1 regularization parameter for LASSO.

  • lambda_l2 (float, optional) – L2 regularization parameter for RIDGE.

  • nlopt_kargs (dict, optional) – Dictionary of nlopt parameters. Options: - “algorithm”: str, default=”NLOPT_LN_NELDERMEAD” - “maxeval”: int, default=40 per parameter - “maxtime”: float, default=600 seconds - “xtol_rel”: float, default=1e-6 - “xtol_abs”: float, default=1e-8 - “ftol_rel”: float, default=1e-4 - “ftol_abs”: float, default=0 - “print_level”: int, default=0 (0=none, 3=full)

  • verbose (int, default=0) – Verbosity level.

coef_

Estimated coefficients (excluding intercept).

Type:

ndarray of shape (n_features,)

intercept_

Estimated intercept.

Type:

float

scale_

Estimated scale parameter.

Type:

float

other_

Other estimated parameters (alpha, shape, etc.).

Type:

dict

fitted_values_

Fitted values.

Type:

ndarray of shape (n_samples,)

residuals_

Model residuals.

Type:

ndarray of shape (n_samples,)

loss_value_

Final value of the loss function.

Type:

float

log_lik_

Log-likelihood (only for likelihood-based losses).

Type:

float or None

aic_

Akaike Information Criterion.

Type:

float or None

bic_

Bayesian Information Criterion.

Type:

float or None

n_iter_

Number of optimization iterations.

Type:

int

Examples

>>> from greybox.formula import formula
>>> from greybox.alm import ALM
>>> data = {'y': [1, 2, 3, 4, 5], 'x1': [1, 2, 3, 4, 5], 'x2': [2, 3, 4, 5, 6]}
>>> y, X = formula("y ~ x1 + x2", data)
>>> model = ALM(distribution="dnorm", loss="likelihood")
>>> model.fit(X, y)
>>> print(model.coef_)
>>> # Using nlopt with custom parameters (like R)
>>> model = ALM(
...     distribution="dnorm",
...     loss="likelihood",
...     nlopt_kargs={
...         "algorithm": "NLOPT_LN_SBPLX",
...         "maxeval": 1000,
...         "maxtime": 600,
...         "xtol_rel": 1e-8,
...         "print_level": 1
...     }
... )
>>> model.fit(X, y)
DISTRIBUTIONS = ['dnorm', 'dlaplace', 'ds', 'dgnorm', 'dlogis', 'dt', 'dalaplace', 'dlnorm', 'dllaplace', 'dls', 'dlgnorm', 'dbcnorm', 'dinvgauss', 'dgamma', 'dexp', 'dchisq', 'dfnorm', 'drectnorm', 'dpois', 'dnbinom', 'dbinom', 'dgeom', 'dbeta', 'dlogitnorm', 'plogis', 'pnorm']
LOSS_FUNCTIONS = ['likelihood', 'MSE', 'MAE', 'HAM', 'LASSO', 'RIDGE', 'ROLE']
property actuals: ndarray

Actual values (response variable).

Returns:

actuals – Actual response values from training data.

Return type:

np.ndarray

property aic: float | None

Akaike Information Criterion.

property aicc: float | None

Corrected Akaike Information Criterion.

property bic: float | None

Bayesian Information Criterion.

property bicc: float | None

Corrected Bayesian Information Criterion.

property coef: ndarray

Estimated coefficients (slope parameters, excluding intercept).

Returns:

coef – Coefficient vector (without intercept).

Return type:

np.ndarray

property coefficients: ndarray

All coefficients including intercept as named vector.

Returns:

coefficients – Full coefficient vector with names (intercept + slopes).

Return type:

np.ndarray

confint(parm: int | list[int] | None = None, level: float = 0.95) ndarray[source]

Confidence intervals for parameters.

Parameters:
  • parm (int or list of int, optional) – Which parameters to include. If None, all parameters are included. 0 = intercept, 1, 2, … = coefficients.

  • level (float, optional) – Confidence level. Default is 0.95.

Returns:

confint – Array with shape (n_params, 2) containing lower and upper bounds.

Return type:

np.ndarray

property data: ndarray

Alias for actuals.

Returns:

data – Original in-sample observations.

Return type:

np.ndarray

property df_residual_: int | None

Residual degrees of freedom.

property distribution_: str

Distribution name (ADAM convention with trailing _).

fit(X, y, formula=None, feature_names=None)[source]

Fit the ALM model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Design matrix. Should include intercept column (column of ones) as first column if you want an intercept.

  • y (array-like of shape (n_samples,)) – Target values.

  • formula (str, optional) – Formula string used to generate X and y. Stored for reference.

  • feature_names (list of str, optional) – Names for feature columns. If provided, used in print output.

Returns:

self – Fitted estimator.

Return type:

ALM

property fitted: ndarray

Fitted values.

Returns:

fitted – Fitted values (predictions on training data).

Return type:

np.ndarray

property formula: str | None

Formula string used to fit the model.

Returns:

formula – Formula string if provided during fit.

Return type:

str or None

get_params()[source]

Get model parameters.

Returns:

params – Dictionary of model parameters.

Return type:

dict

property log_lik: float | None

Log-likelihood (backward-compatible alias for loglik).

property loglik: float | None

Log-likelihood (ADAM-compatible name).

Returns:

loglik – Log-likelihood value.

Return type:

float or None

property loss_: str

Loss function name (ADAM convention with trailing _).

property loss_value: float | None

Final value of the loss function.

multipliers(parm: str, h: int = 10) dict[source]

Compute dynamic multipliers for an ARDL model.

Combines distributed lag coefficients (B(parm, k) terms) with the ARI polynomial to produce impulse-response multipliers over horizon h.

Parameters:
  • parm (str) – Variable name as it appears in the design matrix.

  • h (int, default 10) – Forecast horizon.

Returns:

{“h1”: m1, “h2”: m2, …, “hh”: mh} of dynamic multipliers.

Return type:

dict

Raises:

ValueError – If parm is not found in the model.

property n_param: dict

Parameter count information.

Returns:

n_param – Dictionary containing parameter count information.

Return type:

dict

property nobs: int

Number of observations.

Returns:

nobs – Number of observations used in the model.

Return type:

int

property nparam: int

Number of parameters.

Returns:

nparam – Number of parameters in the model (including intercept and scale).

Return type:

int

predict(X: ndarray, interval: Literal['none', 'confidence', 'prediction'] = 'none', level: float | list[float] = 0.95, side: Literal['both', 'upper', 'lower'] = 'both') PredictionResult[source]

Predict using the fitted model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Design matrix. Should have same number of features as training data.

  • interval ({"none", "confidence", "prediction"}, default="none") – Type of interval to calculate: - “none”: No intervals, return only point forecasts - “confidence”: Confidence interval for the mean - “prediction”: Prediction interval for new observations

  • level (float or list of float, default=0.95) – Confidence level(s) for intervals. Can be a single float (e.g., 0.95) or a list of floats (e.g., [0.8, 0.9, 0.95]). Default is 0.95 (95%).

  • side ({"both", "upper", "lower"}, default="both") – Side of interval: - “both”: Return both lower and upper bounds - “upper”: Return only upper bounds - “lower”: Return only lower bounds

Returns:

Object with the following attributes: - mean : np.ndarray - Predicted values (point forecasts) - lower : np.ndarray or None - Lower prediction bounds - upper : np.ndarray or None - Upper prediction bounds

Return type:

PredictionResult

property residuals: ndarray

Model residuals.

Returns:

residuals – Residuals (y - fitted values).

Return type:

np.ndarray

property scale: float | None

Scale parameter.

score(X, y, metric='likelihood')[source]

Calculate model score.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Design matrix.

  • y (array-like of shape (n_samples,)) – True values.

  • metric (str, optional) – Metric to use: “likelihood”, “MSE”, “MAE”, or “R2”.

Returns:

score – Score value.

Return type:

float

set_params(**params)[source]

Set model parameters.

Parameters:

**params (dict) – Parameters to set.

Returns:

self – Model with updated parameters.

Return type:

ALM

property sigma: float

Residual standard error (sigma).

Returns:

sigma – Residual standard error, computed as sqrt(sum(residuals^2) / (n - k)) where n is the number of observations and k is the number of parameters (including the scale parameter). For dinvgauss/dgamma/dexp, uses (residuals - 1) since residuals are on a multiplicative scale (y/mu), matching R’s sigma.alm().

Return type:

float

summary(level: float = 0.95) SummaryResult[source]

Model summary.

Parameters:

level (float, optional) – Confidence level for parameter intervals. Default is 0.95.

Returns:

Summary of the model with coefficient estimates, standard errors, t-statistics, p-values, and confidence intervals.

Return type:

SummaryResult

property time_elapsed: float | None

Time elapsed during model fitting (seconds).

vcov() ndarray[source]

Calculate variance-covariance matrix of parameter estimates.

Uses distribution-specific methods matching R’s vcov.alm(): - Normal-like + likelihood/MSE: sigma^2 * (X’X)^-1 - Poisson + likelihood: inverse Fisher information - Everything else: inverse numerical Hessian of cost function

Returns:

vcov_matrix – Covariance matrix of shape (n_params, n_params)

Return type:

np.ndarray

PredictionResult

class greybox.alm.PredictionResult(mean=None, lower=None, upper=None, level=None, variances=None, side='both', interval='none')[source]

Bases: object

Prediction result object with mean, interval bounds, and metadata.

Supports DataFrame-like access: indexing by column name, len(), iteration, and conversion to pandas DataFrame via to_dataframe().

mean

Predicted values (point forecasts).

Type:

np.ndarray

lower

Lower prediction bounds.

Type:

np.ndarray or None

upper

Upper prediction bounds.

Type:

np.ndarray or None

level

Confidence level(s) used for the intervals.

Type:

float or list[float] or None

variances

Variance estimates for each observation.

Type:

np.ndarray or None

side

Side of interval: “both”, “upper”, or “lower”.

Type:

str

interval

Type of interval: “none”, “confidence”, or “prediction”.

Type:

str

property columns

Column names of the DataFrame representation.

property index

Index of the DataFrame representation.

interval
level
lower
mean
property shape

Shape of the DataFrame representation.

side
to_dataframe() DataFrame[source]

Convert to a pandas DataFrame.

Returns:

DataFrame with ‘mean’ column and optional ‘lower’/’upper’ columns (or ‘lower_0’, ‘upper_0’, etc. for multiple levels).

Return type:

pd.DataFrame

upper
property values

Values of the DataFrame representation.

variances