Benchmarks

Vectrix is benchmarked against the M3 and M4 Competition datasets, the gold standard for time series forecasting evaluation. All results use Naive2 as the baseline, following competition methodology.

M4 Competition Results — DOT-Hybrid Engine

The M4 Competition (Makridakis et al., 2020) contains 100,000 time series across 6 frequencies. Results from DOT-Hybrid (DynamicOptimizedTheta with 8-way auto-select), evaluated on 2,000 randomly sampled series per frequency (seed=42)

FrequencyDOT-Hybrid OWAM4 Context
Yearly0.797Near M4 #1 ES-RNN (0.821)
Quarterly0.894Competitive with M4 top methods
Monthly0.897Competitive with M4 top methods
Weekly0.959Beats Naive2
Daily0.820Strong improvement over Naive2
Hourly0.722World-class, near M4 winner level
AVG0.848Beats M4 #2 FFORMA (0.838)

M4 Competition Leaderboard Context

RankMethodOWA
1ES-RNN (Smyl)0.821
2FFORMA (Montero-Manso)0.838
3Theta (Fiorucci)0.854
114Theta (Petropoulos)0.874
18Theta (Assimakopoulos)0.897
Vectrix DOT-Hybrid0.848

Vectrix DOT-Hybrid outperforms all pure statistical methods in the M4 Competition, including FFORMA (meta-learning ensemble). Only the hybrid ES-RNN (LSTM + ETS) ranks higher.

M3 Competition Results

First 100 series per category. Lower is better for all metrics. OWA below 1.0 beats Naive2.

CategoryNaive2 sMAPEVectrix sMAPENaive2 MASEVectrix MASEVectrix OWA
Yearly22.67519.4043.8613.2460.848
Quarterly12.54610.4451.5681.2830.825
Monthly37.87230.7311.2140.8560.758
Other6.6205.9032.7412.0440.819

Vectrix consistently outperforms Naive2 across all M3 categories, with the strongest performance on Monthly data (OWA 0.758).

Metrics

MetricDescription
sMAPESymmetric Mean Absolute Percentage Error. Scale-independent accuracy measure, bounded between 0% and 200%.
MASEMean Absolute Scaled Error. Compares forecast errors against a naive seasonal benchmark. Values below 1.0 indicate the model outperforms the naive method.
OWAOverall Weighted Average. Combines sMAPE and MASE relative to Naive2: OWA = 0.5 × (sMAPE/sMAPE_naive2) + 0.5 × (MASE/MASE_naive2). OWA below 1.0 means the model beats Naive2.

Reproducing Results

pip install vectrix

Experiment Code

All experiments are fully reproducible Python scripts with results recorded in docstrings.

ExperimentDescriptionSource
E019DOT-Hybrid engine M4 100K verification019_dotHybridEngine.py
E042M4 official OWA verification042_m4OfficialOwa.py
E043Holdout validation + auto period detection043_dotAutoPeriodHoldout.py
E044Daily/Weekly specialist strategies044_dailyWeeklySpecialist.py
E045Integrated improvement verification045_integratedImprovement.py
E046Final integration rule validation046_finalIntegration.py

Full experiment status and research notes: STATUS.md

Test Suite

573 tests, 5 skipped — covering all engines, models, and pipeline components.

pip install vectrix
pytest tests/ -x -q
Test ModuleCountCoverage
test_all_models.py112All 30+ forecasting models
test_new_models.py45DTSF, ESN, 4Theta engines
test_engine_utils.py55ARIMAX, CV, decomposition
test_easy.py33Easy API (forecast, analyze, regress)
test_business.py45Anomaly, backtesting, metrics, scenarios
test_adaptive.py20Regime, DNA, self-healing, constraints
test_regression.py22OLS, Ridge, Lasso, diagnostics

Tip: For faster M4 data loading, download the CSV files directly from the M4 Competition repository rather than using M4.load(), which can be slow due to wide-to-long data transformation.