ALM
- class greybox.alm.ALM(distribution='dnorm', loss='likelihood', occurrence='none', scale_formula=None, orders=(0, 0, 0), alpha=None, shape=None, lambda_bc=None, size=None, nu=None, trim=0.0, lambda_l1=None, lambda_l2=None, nlopt_kargs=None, verbose=0)[source]
Bases:
objectAugmented Linear Model estimator.
This estimator fits a linear model with various distributions and loss functions, following scikit-learn principles.
- Parameters:
distribution (str, default="dnorm") – Distribution name. Options: “dnorm”, “dlaplace”, “ds”, “dgnorm”, “dlogis”, “dt”, “dalaplace”, “dlnorm”, “dllaplace”, “dls”, “dlgnorm”, “dbcnorm”, “dfnorm”, “drectnorm”, “dinvgauss”, “dgamma”, “dexp”, “dchisq”, “dgeom”, “dpois”, “dnbinom”, “dbinom”, “dlogitnorm”, “dbeta”, “plogis”, “pnorm”.
loss (str, default="likelihood") – Loss function. Options: “likelihood”, “MSE”, “MAE”, “HAM”, “LASSO”, “RIDGE”, “ROLE”.
occurrence (str, default="none") – Occurrence model for zero-inflated data. Options: “none”, “plogis”, “pnorm”.
scale_formula (array-like or None, default=None) – Formula for scale parameter. If None, scale is constant.
orders (tuple, default=(0, 0, 0)) –
ARIMA orders (p, d, q). Three integers: AR order (p), differencing order (d), and MA order (q). Only AR(p) and differencing (d) are supported. MA(q) raises NotImplementedError.
p (AR order): Number of lagged response variables to include.
d (Differencing): Order of differencing. Creates AR(p+d) terms internally but only uses first p in the model.
q (MA order): Not implemented.
- Examples:
orders=(1, 0, 0): AR(1) modelorders=(2, 0, 0): AR(2) modelorders=(1, 1, 0): ARIMA(1,1,0) with differencing
alpha (float, optional) – Additional parameter for Asymmetric Laplace distribution.
shape (float, optional) – Shape parameter for Generalized Normal distribution.
lambda_bc (float, optional) – Box-Cox lambda parameter for Box-Cox Normal distribution.
size (float, optional) – Size parameter for Negative Binomial/Binomial distributions.
nu (float, optional) – Degrees of freedom for Student’s t or Chi-squared distributions.
trim (float, default=0.0) – Trim proportion for ROLE loss.
lambda_l1 (float, optional) – L1 regularization parameter for LASSO.
lambda_l2 (float, optional) – L2 regularization parameter for RIDGE.
nlopt_kargs (dict, optional) – Dictionary of nlopt parameters. Options: - “algorithm”: str, default=”NLOPT_LN_NELDERMEAD” - “maxeval”: int, default=40 per parameter - “maxtime”: float, default=600 seconds - “xtol_rel”: float, default=1e-6 - “xtol_abs”: float, default=1e-8 - “ftol_rel”: float, default=1e-4 - “ftol_abs”: float, default=0 - “print_level”: int, default=0 (0=none, 3=full)
verbose (int, default=0) – Verbosity level.
- coef_
Estimated coefficients (excluding intercept).
- Type:
ndarray of shape (n_features,)
- intercept_
Estimated intercept.
- Type:
float
- scale_
Estimated scale parameter.
- Type:
float
- other_
Other estimated parameters (alpha, shape, etc.).
- Type:
dict
- fitted_values_
Fitted values.
- Type:
ndarray of shape (n_samples,)
- residuals_
Model residuals.
- Type:
ndarray of shape (n_samples,)
- loss_value_
Final value of the loss function.
- Type:
float
- log_lik_
Log-likelihood (only for likelihood-based losses).
- Type:
float or None
- aic_
Akaike Information Criterion.
- Type:
float or None
- bic_
Bayesian Information Criterion.
- Type:
float or None
- n_iter_
Number of optimization iterations.
- Type:
int
Examples
>>> from greybox.formula import formula >>> from greybox.alm import ALM >>> data = {'y': [1, 2, 3, 4, 5], 'x1': [1, 2, 3, 4, 5], 'x2': [2, 3, 4, 5, 6]} >>> y, X = formula("y ~ x1 + x2", data) >>> model = ALM(distribution="dnorm", loss="likelihood") >>> model.fit(X, y) >>> print(model.coef_)
>>> # Using nlopt with custom parameters (like R) >>> model = ALM( ... distribution="dnorm", ... loss="likelihood", ... nlopt_kargs={ ... "algorithm": "NLOPT_LN_SBPLX", ... "maxeval": 1000, ... "maxtime": 600, ... "xtol_rel": 1e-8, ... "print_level": 1 ... } ... ) >>> model.fit(X, y)
- DISTRIBUTIONS = ['dnorm', 'dlaplace', 'ds', 'dgnorm', 'dlogis', 'dt', 'dalaplace', 'dlnorm', 'dllaplace', 'dls', 'dlgnorm', 'dbcnorm', 'dinvgauss', 'dgamma', 'dexp', 'dchisq', 'dfnorm', 'drectnorm', 'dpois', 'dnbinom', 'dbinom', 'dgeom', 'dbeta', 'dlogitnorm', 'plogis', 'pnorm']
- LOSS_FUNCTIONS = ['likelihood', 'MSE', 'MAE', 'HAM', 'LASSO', 'RIDGE', 'ROLE']
- property actuals: ndarray
Actual values (response variable).
- Returns:
actuals – Actual response values from training data.
- Return type:
np.ndarray
- property aic: float | None
Akaike Information Criterion.
- property aicc: float | None
Corrected Akaike Information Criterion.
- property bic: float | None
Bayesian Information Criterion.
- property bicc: float | None
Corrected Bayesian Information Criterion.
- property coef: ndarray
Estimated coefficients (slope parameters, excluding intercept).
- Returns:
coef – Coefficient vector (without intercept).
- Return type:
np.ndarray
- property coefficients: ndarray
All coefficients including intercept as named vector.
- Returns:
coefficients – Full coefficient vector with names (intercept + slopes).
- Return type:
np.ndarray
- confint(parm: int | list[int] | None = None, level: float = 0.95) ndarray[source]
Confidence intervals for parameters.
- Parameters:
parm (int or list of int, optional) – Which parameters to include. If None, all parameters are included. 0 = intercept, 1, 2, … = coefficients.
level (float, optional) – Confidence level. Default is 0.95.
- Returns:
confint – Array with shape (n_params, 2) containing lower and upper bounds.
- Return type:
np.ndarray
- property data: ndarray
Alias for actuals.
- Returns:
data – Original in-sample observations.
- Return type:
np.ndarray
- property df_residual_: int | None
Residual degrees of freedom.
- property distribution_: str
Distribution name (ADAM convention with trailing _).
- fit(X, y, formula=None, feature_names=None)[source]
Fit the ALM model.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Design matrix. Should include intercept column (column of ones) as first column if you want an intercept.
y (array-like of shape (n_samples,)) – Target values.
formula (str, optional) – Formula string used to generate X and y. Stored for reference.
feature_names (list of str, optional) – Names for feature columns. If provided, used in print output.
- Returns:
self – Fitted estimator.
- Return type:
- property fitted: ndarray
Fitted values.
- Returns:
fitted – Fitted values (predictions on training data).
- Return type:
np.ndarray
- property formula: str | None
Formula string used to fit the model.
- Returns:
formula – Formula string if provided during fit.
- Return type:
str or None
- get_params()[source]
Get model parameters.
- Returns:
params – Dictionary of model parameters.
- Return type:
dict
- property log_lik: float | None
Log-likelihood (backward-compatible alias for loglik).
- property loglik: float | None
Log-likelihood (ADAM-compatible name).
- Returns:
loglik – Log-likelihood value.
- Return type:
float or None
- property loss_: str
Loss function name (ADAM convention with trailing _).
- property loss_value: float | None
Final value of the loss function.
- multipliers(parm: str, h: int = 10) dict[source]
Compute dynamic multipliers for an ARDL model.
Combines distributed lag coefficients (B(parm, k) terms) with the ARI polynomial to produce impulse-response multipliers over horizon h.
- Parameters:
parm (str) – Variable name as it appears in the design matrix.
h (int, default 10) – Forecast horizon.
- Returns:
{“h1”: m1, “h2”: m2, …, “hh”: mh} of dynamic multipliers.
- Return type:
dict
- Raises:
ValueError – If parm is not found in the model.
- property n_param: dict
Parameter count information.
- Returns:
n_param – Dictionary containing parameter count information.
- Return type:
dict
- property nobs: int
Number of observations.
- Returns:
nobs – Number of observations used in the model.
- Return type:
int
- property nparam: int
Number of parameters.
- Returns:
nparam – Number of parameters in the model (including intercept and scale).
- Return type:
int
- predict(X: ndarray, interval: Literal['none', 'confidence', 'prediction'] = 'none', level: float | list[float] = 0.95, side: Literal['both', 'upper', 'lower'] = 'both') PredictionResult[source]
Predict using the fitted model.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Design matrix. Should have same number of features as training data.
interval ({"none", "confidence", "prediction"}, default="none") – Type of interval to calculate: - “none”: No intervals, return only point forecasts - “confidence”: Confidence interval for the mean - “prediction”: Prediction interval for new observations
level (float or list of float, default=0.95) – Confidence level(s) for intervals. Can be a single float (e.g., 0.95) or a list of floats (e.g., [0.8, 0.9, 0.95]). Default is 0.95 (95%).
side ({"both", "upper", "lower"}, default="both") – Side of interval: - “both”: Return both lower and upper bounds - “upper”: Return only upper bounds - “lower”: Return only lower bounds
- Returns:
Object with the following attributes: - mean : np.ndarray - Predicted values (point forecasts) - lower : np.ndarray or None - Lower prediction bounds - upper : np.ndarray or None - Upper prediction bounds
- Return type:
PredictionResult
- property residuals: ndarray
Model residuals.
- Returns:
residuals – Residuals (y - fitted values).
- Return type:
np.ndarray
- property scale: float | None
Scale parameter.
- score(X, y, metric='likelihood')[source]
Calculate model score.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Design matrix.
y (array-like of shape (n_samples,)) – True values.
metric (str, optional) – Metric to use: “likelihood”, “MSE”, “MAE”, or “R2”.
- Returns:
score – Score value.
- Return type:
float
- set_params(**params)[source]
Set model parameters.
- Parameters:
**params (dict) – Parameters to set.
- Returns:
self – Model with updated parameters.
- Return type:
- property sigma: float
Residual standard error (sigma).
- Returns:
sigma – Residual standard error, computed as sqrt(sum(residuals^2) / (n - k)) where n is the number of observations and k is the number of parameters (including the scale parameter). For dinvgauss/dgamma/dexp, uses (residuals - 1) since residuals are on a multiplicative scale (y/mu), matching R’s sigma.alm().
- Return type:
float
- summary(level: float = 0.95) SummaryResult[source]
Model summary.
- Parameters:
level (float, optional) – Confidence level for parameter intervals. Default is 0.95.
- Returns:
Summary of the model with coefficient estimates, standard errors, t-statistics, p-values, and confidence intervals.
- Return type:
SummaryResult
- property time_elapsed: float | None
Time elapsed during model fitting (seconds).
- vcov() ndarray[source]
Calculate variance-covariance matrix of parameter estimates.
Uses distribution-specific methods matching R’s vcov.alm(): - Normal-like + likelihood/MSE: sigma^2 * (X’X)^-1 - Poisson + likelihood: inverse Fisher information - Everything else: inverse numerical Hessian of cost function
- Returns:
vcov_matrix – Covariance matrix of shape (n_params, n_params)
- Return type:
np.ndarray
PredictionResult
- class greybox.alm.PredictionResult(mean=None, lower=None, upper=None, level=None, variances=None, side='both', interval='none')[source]
Bases:
objectPrediction result object with mean, interval bounds, and metadata.
Supports DataFrame-like access: indexing by column name, len(), iteration, and conversion to pandas DataFrame via to_dataframe().
- mean
Predicted values (point forecasts).
- Type:
np.ndarray
- lower
Lower prediction bounds.
- Type:
np.ndarray or None
- upper
Upper prediction bounds.
- Type:
np.ndarray or None
- level
Confidence level(s) used for the intervals.
- Type:
float or list[float] or None
- variances
Variance estimates for each observation.
- Type:
np.ndarray or None
- side
Side of interval: “both”, “upper”, or “lower”.
- Type:
str
- interval
Type of interval: “none”, “confidence”, or “prediction”.
- Type:
str
- property columns
Column names of the DataFrame representation.
- property index
Index of the DataFrame representation.
- interval
- level
- lower
- mean
- property shape
Shape of the DataFrame representation.
- side
- to_dataframe() DataFrame[source]
Convert to a pandas DataFrame.
- Returns:
DataFrame with ‘mean’ column and optional ‘lower’/’upper’ columns (or ‘lower_0’, ‘upper_0’, etc. for multiple levels).
- Return type:
pd.DataFrame
- upper
- property values
Values of the DataFrame representation.
- variances