Measures

Statistical error measures for model evaluation.

This module provides functions for calculating various statistical error measures for point and interval forecasts. The functions include:

Error measures: - ME - Mean Error - MAE - Mean Absolute Error - MSE - Mean Squared Error - MRE - Mean Root Error (Kourentzes, 2014) - MIS - Mean Interval Score (Gneiting & Raftery, 2007) - MPE - Mean Percentage Error - MAPE - Mean Absolute Percentage Error (See Svetunkov, 2017 for critique) - MASE - Mean Absolute Scaled Error (Hyndman & Koehler, 2006) - RMSSE - Root Mean Squared Scaled Error (used in M5 Competition) - SAME - Scaled Absolute Mean Error (similar to MASE, but measures bias) - rMAE - Relative Mean Absolute Error (Davydenko & Fildes, 2013) - rRMSE - Relative Root Mean Squared Error - rAME - Relative Absolute Mean Error - rMIS - Relative Mean Interval Score - sMSE - Scaled Mean Squared Error (Petropoulos & Kourentzes, 2015) - sPIS - Scaled Periods-In-Stock (Wallstrom & Segerstedt, 2010) - sCE - Scaled Cumulative Error - sMIS - Scaled Mean Interval Score - GMRAE - Geometric Mean Relative Absolute Error

References

Kourentzes N. (2014). The Bias Coefficient: a new metric for forecast bias. https://kourentzes.com/forecasting/2014/12/17/the-bias-coefficient-a-new-metric-for-forecast-bias/
Svetunkov, I. (2017). Naughty APEs and the quest for the holy grail. https://openforecast.org/2017/07/29/naughty-apes-and-the-quest-for-the-holy-grail/
Fildes R. (1992). The evaluation of extrapolative forecasting methods. International Journal of Forecasting, 8, pp.81-98.
Hyndman R.J., Koehler A.B. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 22, pp.679-688.
Petropoulos F., Kourentzes N. (2015). Forecast combinations for intermittent demand. Journal of the Operational Research Society, 66, pp.914-924.
Wallstrom P., Segerstedt A. (2010). Evaluation of forecasting error measurements and techniques for intermittent demand. International Journal of Production Economics, 128, pp.625-636.
Davydenko, A., Fildes, R. (2013). Measuring Forecasting Accuracy: The Case Of Judgmental Adjustments To Sku-Level Demand Forecasts. International Journal of Forecasting, 29(3), 510-522.
Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), 359-378.

greybox.measures.accuracy(actual: ndarray, forecast: ndarray, na_rm: bool = True) → dict[source]

Calculate multiple accuracy measures at once.

Parameters:

actual (np.ndarray) – Actual (observed) values.
forecast (np.ndarray) – Forecasted values.
na_rm (bool, default=True) – Remove NA values.

Returns:

Dictionary containing: ME, MAE, MSE, RMSE, MPE, MAPE, MASE.

Return type:

dict

Examples

>>> actual = np.array([1, 2, 3, 4, 5])
>>> forecast = np.array([1.1, 2.0, 3.2, 3.9, 5.1])
>>> accuracy(actual, forecast)

greybox.measures.gmrae(actual: ndarray, forecast: ndarray, benchmark: ndarray, na_rm: bool = True) → float[source]

Geometric Mean Relative Absolute Error (GMRAE).

The GMRAE compares the absolute errors of a forecast to a benchmark using geometric mean. It is robust to extreme values.

Formula:: GMRAE = exp(mean(log(|error| / |benchmark_error|)))

Parameters:

actual (np.ndarray) – Actual (observed) values.
forecast (np.ndarray) – Forecasted values.
benchmark (np.ndarray) – Benchmark forecast values.
na_rm (bool, default=True) – Remove NA values.

Returns:

Geometric Mean Relative Absolute Error.

Return type:

float

Examples

>>> actual = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> forecast = np.array([1.1, 2.0, 3.2, 3.9, 5.1,
...                      6.0, 7.1, 8.0, 9.2, 10.1])
>>> benchmark = np.array([1.0, 2.0, 3.0, 4.0, 5.0,
...                      6.0, 7.0, 8.0, 9.0, 10.0])
>>> gmrae(actual, forecast, benchmark)

References

See Davydenko & Fildes (2013).

greybox.measures.mae(actual: ndarray, forecast: ndarray, na_rm: bool = True) → float[source]

Mean Absolute Error (MAE).

The MAE measures the average magnitude of errors without considering direction. It is robust to outliers and gives a direct interpretation in the same units as the data.

Formula:: MAE = mean(|actual - forecast|)

Parameters:

actual (np.ndarray) – Actual (observed) values.
forecast (np.ndarray) – Forecasted values.
na_rm (bool, default=True) – Remove NA values.

Returns:

Mean Absolute Error.

Return type:

float

Examples

>>> actual = np.array([1, 2, 3, 4, 5])
>>> forecast = np.array([1.1, 2.0, 3.2, 3.9, 5.1])
>>> mae(actual, forecast)
0.18

References

See e.g. Hyndman & Koehler (2006).

greybox.measures.mape(actual: ndarray, forecast: ndarray, na_rm: bool = True) → float[source]

Mean Absolute Percentage Error (MAPE).

The MAPE measures the average absolute percentage error. It is scale-independent and expressed as a percentage, making it easy to interpret across different contexts. However, it is undefined when actual values are zero and can be heavily influenced by small actual values.

Formula:: MAPE = mean(|actual - forecast| / actual) * 100

Parameters:

actual (np.ndarray) – Actual (observed) values.
forecast (np.ndarray) – Forecasted values.
na_rm (bool, default=True) – Remove NA values and zero actuals.

Returns:

Mean Absolute Percentage Error (in percentage points).

Return type:

float

Examples

>>> actual = np.array([1, 2, 3, 4, 5])
>>> forecast = np.array([1.1, 2.0, 3.2, 3.9, 5.1])
>>> mape(actual, forecast)
14.933333333333332

References

See Svetunkov, I. (2017) for a critique of MAPE: https://openforecast.org/2017/07/29/naughty-apes-and-the-quest-for-the-holy-grail/

greybox.measures.mase(actual: ndarray, forecast: ndarray, scale: float | floating[Any] | None = None, na_rm: bool = True) → float[source]

Mean Absolute Scaled Error (MASE).

The MASE scales the MAE by the average error of a naive (random walk) forecast. This makes it scale-independent and allows comparison across different time series. A MASE < 1 means the forecast is better than the naive forecast; MASE > 1 means worse.

Formula:: MASE = MAE / scale where scale = mean(|actual[t] - actual[t-1]|) by default

Parameters:

actual (np.ndarray) – Actual (observed) values.
forecast (np.ndarray) – Forecasted values.
scale (float, optional) – Scale parameter. If None, uses mean absolute difference of consecutive actual values (naive forecast error). Typical values include: - mean absolute deviation of in-sample one step ahead naive error - mean absolute value of the in-sample actuals
na_rm (bool, default=True) – Remove NA values.

Returns:

Mean Absolute Scaled Error.

Return type:

float

Examples

>>> actual = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> forecast = np.array([1.1, 2.0, 3.2, 3.9, 5.1,
...                      6.0, 7.1, 8.0, 9.2, 10.1])
>>> mase(actual, forecast)
>>> # With custom scale
>>> mase(actual, forecast, scale=1.0)

References

Hyndman R.J., Koehler A.B. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 22, pp.679-688.

greybox.measures.me(actual: ndarray, forecast: ndarray, na_rm: bool = True) → float[source]

Mean Error (ME).

The ME measures the average bias in forecasts. A positive value indicates forecasts are on average too high (over-forecasting), while a negative value indicates forecasts are on average too low (under-forecasting).

Formula:: ME = mean(actual - forecast)

Parameters:

actual (np.ndarray) – Actual (observed) values.
forecast (np.ndarray) – Forecasted values.
na_rm (bool, default=True) – Remove NA values.

Returns:

Mean Error.

Return type:

float

Examples

>>> actual = np.array([1, 2, 3, 4, 5])
>>> forecast = np.array([1.1, 2.0, 3.2, 3.9, 5.1])
>>> me(actual, forecast)
-0.1

References

Basic measure, see e.g. Hyndman & Koehler (2006).

greybox.measures.measures(holdout: ndarray, forecast: ndarray, actual: ndarray, digits: int | None = None, benchmark: Literal['naive', 'mean'] = 'naive') → dict[source]

Calculate comprehensive error measures for forecasts.

This function calculates multiple error measures using holdout (test) data, forecasted values, and in-sample (training) data for scale calculation.

Parameters:

holdout (np.ndarray) – The vector of holdout (test) actual values.
forecast (np.ndarray) – The vector of forecasts produced by a model.
actual (np.ndarray) – The vector of actual in-sample (training) values.
digits (int, optional) – Number of digits for rounding. If None, no rounding is done.
benchmark ({"naive", "mean"}, default="naive") – Benchmark method for relative measures: - “naive”: Use last value repeated (random walk) - “mean”: Use arithmetic mean of the series

Returns:

Dictionary containing error measures: - ME: Mean Error - MAE: Mean Absolute Error - MSE: Mean Squared Error - MPE: Mean Percentage Error - MAPE: Mean Absolute Percentage Error - sCE: Scaled Cumulative Error - sMAE: Scaled Mean Absolute Error (MASE with mean scale) - sMSE: Scaled Mean Squared Error - MASE: Mean Absolute Scaled Error (with diff scale) - RMSSE: Root Mean Squared Scaled Error - SAME: Scaled Absolute Mean Error - rMAE: Relative Mean Absolute Error - rRMSE: Relative Root Mean Squared Error - rAME: Relative Absolute Mean Error - asymmetry: Asymmetry coefficient - sPIS: Scaled Periods-In-Stock

Return type:

dict

Examples

>>> np.random.seed(42)
>>> actual = np.random.normal(10, 2, 100)
>>> forecast = np.full(10, np.mean(actual[:90]))
>>> measures(actual[91:], forecast, actual[:90], digits=5)

References

See individual function references: - Hyndman & Koehler (2006) for MASE, RMSSE - Petropoulos & Kourentzes (2015) for sMSE, sCE, sPIS - Davydenko & Fildes (2013) for rMAE, rRMSE, rAME

greybox.measures.mpe(actual: ndarray, forecast: ndarray, na_rm: bool = True) → float[source]

Mean Percentage Error (MPE).

The MPE measures the average percentage error. Like ME, it can detect bias but expresses it as a percentage. However, it is undefined when actual values are zero.

Formula:: MPE = mean((actual - forecast) / actual) * 100

Parameters:

actual (np.ndarray) – Actual (observed) values.
forecast (np.ndarray) – Forecasted values.
na_rm (bool, default=True) – Remove NA values and zero actuals.

Returns:

Mean Percentage Error (in percentage points).

Return type:

float

Examples

>>> actual = np.array([1, 2, 3, 4, 5])
>>> forecast = np.array([1.1, 2.0, 3.2, 3.9, 5.1])
>>> mpe(actual, forecast)
-4.466666666666667

References

See e.g. Hyndman & Koehler (2006).

greybox.measures.mse(actual: ndarray, forecast: ndarray, na_rm: bool = True) → float[source]

Mean Squared Error (MSE).

The MSE measures the average squared difference between actual and forecast. It penalizes larger errors more heavily than MAE.

Formula:: MSE = mean((actual - forecast)^2)

Parameters:

actual (np.ndarray) – Actual (observed) values.
forecast (np.ndarray) – Forecasted values.
na_rm (bool, default=True) – Remove NA values.

Returns:

Mean Squared Error.

Return type:

float

Examples

>>> actual = np.array([1, 2, 3, 4, 5])
>>> forecast = np.array([1.1, 2.0, 3.2, 3.9, 5.1])
>>> mse(actual, forecast)
0.038

References

See e.g. Hyndman & Koehler (2006).

greybox.measures.rame(actual: ndarray, forecast: ndarray, benchmark: ndarray, na_rm: bool = True) → float[source]

Relative Absolute Mean Error (rAME).

The rAME compares the absolute Mean Error (bias) of a forecast to a benchmark forecast. It measures relative bias.

Formula:: rAME = |ME(forecast)| / |ME(benchmark)|

Parameters:

actual (np.ndarray) – Actual (observed) values.
forecast (np.ndarray) – Forecasted values.
benchmark (np.ndarray) – Benchmark forecast values.
na_rm (bool, default=True) – Remove NA values.

Returns:

Relative Absolute Mean Error.

Return type:

float

Examples

>>> actual = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> forecast = np.array([1.1, 2.0, 3.2, 3.9, 5.1,
...                      6.0, 7.1, 8.0, 9.2, 10.1])
>>> benchmark = np.array([1.0, 2.0, 3.0, 4.0, 5.0,
...                      6.0, 7.0, 8.0, 9.0, 10.0])
>>> rame(actual, forecast, benchmark)

References

Davydenko, A., Fildes, R. (2013). Measuring Forecasting Accuracy.

greybox.measures.rmae(actual: ndarray, forecast: ndarray, benchmark: ndarray, na_rm: bool = True) → float[source]

Relative Mean Absolute Error (rMAE).

The rMAE compares the MAE of a forecast to a benchmark forecast. Values < 1 indicate the forecast is better than the benchmark; values > 1 indicate worse.

Formula:: rMAE = MAE(forecast) / MAE(benchmark)

Parameters:

actual (np.ndarray) – Actual (observed) values.
forecast (np.ndarray) – Forecasted values.
benchmark (np.ndarray) – Benchmark forecast values (e.g., naive or mean forecast).
na_rm (bool, default=True) – Remove NA values.

Returns:

Relative Mean Absolute Error.

Return type:

float

Examples

>>> actual = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> forecast = np.array([1.1, 2.0, 3.2, 3.9, 5.1,
...                      6.0, 7.1, 8.0, 9.2, 10.1])
>>> benchmark = np.array([1.0, 2.0, 3.0, 4.0, 5.0,
...                      6.0, 7.0, 8.0, 9.0, 10.0])
>>> rmae(actual, forecast, benchmark)

References

Davydenko, A., Fildes, R. (2013). Measuring Forecasting Accuracy: The Case Of Judgmental Adjustments To Sku-Level Demand Forecasts. International Journal of Forecasting, 29(3), 510-522.

greybox.measures.rmse(actual: ndarray, forecast: ndarray, na_rm: bool = True) → float[source]

Root Mean Squared Error (RMSE).

The RMSE is the square root of MSE. It is in the same units as the data, making it more interpretable than MSE while still penalizing larger errors.

Formula:: RMSE = sqrt(MSE) = sqrt(mean((actual - forecast)^2))

Parameters:

actual (np.ndarray) – Actual (observed) values.
forecast (np.ndarray) – Forecasted values.
na_rm (bool, default=True) – Remove NA values.

Returns:

Root Mean Squared Error.

Return type:

float

Examples

>>> actual = np.array([1, 2, 3, 4, 5])
>>> forecast = np.array([1.1, 2.0, 3.2, 3.9, 5.1])
>>> rmse(actual, forecast)
0.194...

References

See e.g. Hyndman & Koehler (2006).

greybox.measures.rmsse(actual: ndarray, forecast: ndarray, scale: float | floating[Any] | None = None, na_rm: bool = True) → float[source]

Root Mean Squared Scaled Error (RMSSE).

The RMSSE is similar to MASE but uses MSE instead of MAE. It was developed for the M5 Competition.

Formula:: RMSSE = sqrt(MSE / scale)

Parameters:

actual (np.ndarray) – Actual (observed) values.
forecast (np.ndarray) – Forecasted values.
scale (float, optional) – Scale parameter. If None, uses mean squared difference of consecutive actual values. Typical value: mean(diff(actual)^2).
na_rm (bool, default=True) – Remove NA values.

Returns:

Root Mean Squared Scaled Error.

Return type:

float

Examples

>>> actual = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> forecast = np.array([1.1, 2.0, 3.2, 3.9, 5.1,
...                      6.0, 7.1, 8.0, 9.2, 10.1])
>>> rmsse(actual, forecast)

References

Used in M5 Competition.

greybox.measures.rrmse(actual: ndarray, forecast: ndarray, benchmark: ndarray, na_rm: bool = True) → float[source]

Relative Root Mean Squared Error (rRMSE).

The rRMSE compares the RMSE of a forecast to a benchmark forecast.

Formula:: rRMSE = RMSE(forecast) / RMSE(benchmark)

Parameters:

actual (np.ndarray) – Actual (observed) values.
forecast (np.ndarray) – Forecasted values.
benchmark (np.ndarray) – Benchmark forecast values.
na_rm (bool, default=True) – Remove NA values.

Returns:

Relative Root Mean Squared Error.

Return type:

float

Examples

>>> actual = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> forecast = np.array([1.1, 2.0, 3.2, 3.9, 5.1,
...                      6.0, 7.1, 8.0, 9.2, 10.1])
>>> benchmark = np.array([1.0, 2.0, 3.0, 4.0, 5.0,
...                      6.0, 7.0, 8.0, 9.0, 10.0])
>>> rrmse(actual, forecast, benchmark)

References

Davydenko, A., Fildes, R. (2013). Measuring Forecasting Accuracy.

greybox.measures.same(actual: ndarray, forecast: ndarray, scale: float | floating[Any] | None = None, na_rm: bool = True) → float[source]

Scaled Absolute Mean Error (SAME).

SAME is similar to MASE but uses Mean Error (ME) instead of MAE. It measures bias rather than absolute error.

Formula:: SAME = |ME| / scale

Parameters:

actual (np.ndarray) – Actual (observed) values.
forecast (np.ndarray) – Forecasted values.
scale (float, optional) – Scale parameter. If None, uses mean absolute difference of consecutive actual values.
na_rm (bool, default=True) – Remove NA values.

Returns:

Scaled Absolute Mean Error.

Return type:

float

Examples

>>> actual = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> forecast = np.array([1.1, 2.0, 3.2, 3.9, 5.1,
...                      6.0, 7.1, 8.0, 9.2, 10.1])
>>> same(actual, forecast)

References

See Petropoulos & Kourentzes (2015).

greybox.measures.sce(actual: ndarray, forecast: ndarray, scale: float | floating[Any], na_rm: bool = True) → float[source]

Scaled Cumulative Error (sCE).

The sCE measures the cumulative forecast error scaled by a parameter. It can detect systematic bias over time.

Formula:: sCE = sum(actual - forecast) / scale

Parameters:

actual (np.ndarray) – Actual (observed) values.
forecast (np.ndarray) – Forecasted values.
scale (float) – Scale parameter.
na_rm (bool, default=True) – Remove NA values.

Returns:

Scaled Cumulative Error.

Return type:

float

Examples

>>> actual = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> forecast = np.array([1.1, 2.0, 3.2, 3.9, 5.1,
...                      6.0, 7.1, 8.0, 9.2, 10.1])
>>> scale = np.mean(np.abs(actual[actual != 0]))
>>> sce(actual, forecast, scale)

References

Petropoulos F., Kourentzes N. (2015). Forecast combinations for intermittent demand.

greybox.measures.smse(actual: ndarray, forecast: ndarray, scale: float | floating[Any], na_rm: bool = True) → float[source]

Scaled Mean Squared Error (sMSE).

The sMSE scales the MSE by a scale parameter. Note that the scale should be a squared value (e.g., mean of squared actuals).

Formula:: sMSE = MSE / scale

Parameters:

actual (np.ndarray) – Actual (observed) values.
forecast (np.ndarray) – Forecasted values.
scale (float) – Scale parameter. Should typically be a squared value (e.g., mean(actual)^2 or mean(abs(actual[actual!=0]))^2).
na_rm (bool, default=True) – Remove NA values.

Returns:

Scaled Mean Squared Error.

Return type:

float

Examples

>>> actual = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> forecast = np.array([1.1, 2.0, 3.2, 3.9, 5.1,
...                      6.0, 7.1, 8.0, 9.2, 10.1])
>>> scale = np.mean(np.abs(actual)) ** 2
>>> smse(actual, forecast, scale)

References

Petropoulos F., Kourentzes N. (2015). Forecast combinations for intermittent demand. Journal of the Operational Research Society, 66, pp.914-924.

greybox.measures.spis(actual: ndarray, forecast: ndarray, scale: float | floating[Any], na_rm: bool = True) → float[source]

Scaled Periods-In-Stock (sPIS).

The sPIS measures the cumulative inventory performance over time. It is useful for intermittent demand forecasting.

Formula:: sPIS = sum(cumsum(forecast - actual)) / scale

Parameters:

actual (np.ndarray) – Actual (observed) values.
forecast (np.ndarray) – Forecasted values.
scale (float) – Scale parameter (typically mean of non-zero actuals).
na_rm (bool, default=True) – Remove NA values.

Returns:

Scaled Periods-In-Stock.

Return type:

float

Examples

>>> actual = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> forecast = np.array([1.1, 2.0, 3.2, 3.9, 5.1,
...                      6.0, 7.1, 8.0, 9.2, 10.1])
>>> scale = np.mean(np.abs(actual[actual != 0]))
>>> spis(actual, forecast, scale)

References

Wallstrom P., Segerstedt A. (2010). Evaluation of forecasting error measurements and techniques for intermittent demand. International Journal of Production Economics, 128, pp.625-636.