Benchmarks
Vectrix is benchmarked against the M3 and M4 Competition datasets, the gold standard for time series forecasting evaluation. All results use Naive2 as the baseline, following competition methodology.
M4 Competition Results — DOT-Hybrid Engine
The M4 Competition (Makridakis et al., 2020) contains 100,000 time series across 6 frequencies. Results from DOT-Hybrid (DynamicOptimizedTheta with 8-way auto-select), evaluated on 2,000 randomly sampled series per frequency (seed=42)
| Frequency | DOT-Hybrid OWA | M4 Context |
|---|---|---|
| Yearly | 0.797 | Near M4 #1 ES-RNN (0.821) |
| Quarterly | 0.894 | Competitive with M4 top methods |
| Monthly | 0.897 | Competitive with M4 top methods |
| Weekly | 0.959 | Beats Naive2 |
| Daily | 0.820 | Strong improvement over Naive2 |
| Hourly | 0.722 | World-class, near M4 winner level |
| AVG | 0.848 | Beats M4 #2 FFORMA (0.838) |
M4 Competition Leaderboard Context
| Rank | Method | OWA |
|---|---|---|
| 1 | ES-RNN (Smyl) | 0.821 |
| 2 | FFORMA (Montero-Manso) | 0.838 |
| 3 | Theta (Fiorucci) | 0.854 |
| 11 | 4Theta (Petropoulos) | 0.874 |
| 18 | Theta (Assimakopoulos) | 0.897 |
| — | Vectrix DOT-Hybrid | 0.848 |
Vectrix DOT-Hybrid outperforms all pure statistical methods in the M4 Competition, including FFORMA (meta-learning ensemble). Only the hybrid ES-RNN (LSTM + ETS) ranks higher.
M3 Competition Results
First 100 series per category. Lower is better for all metrics. OWA below 1.0 beats Naive2.
| Category | Naive2 sMAPE | Vectrix sMAPE | Naive2 MASE | Vectrix MASE | Vectrix OWA |
|---|---|---|---|---|---|
| Yearly | 22.675 | 19.404 | 3.861 | 3.246 | 0.848 |
| Quarterly | 12.546 | 10.445 | 1.568 | 1.283 | 0.825 |
| Monthly | 37.872 | 30.731 | 1.214 | 0.856 | 0.758 |
| Other | 6.620 | 5.903 | 2.741 | 2.044 | 0.819 |
Vectrix consistently outperforms Naive2 across all M3 categories, with the strongest performance on Monthly data (OWA 0.758).
Metrics
| Metric | Description |
|---|---|
| sMAPE | Symmetric Mean Absolute Percentage Error. Scale-independent accuracy measure, bounded between 0% and 200%. |
| MASE | Mean Absolute Scaled Error. Compares forecast errors against a naive seasonal benchmark. Values below 1.0 indicate the model outperforms the naive method. |
| OWA | Overall Weighted Average. Combines sMAPE and MASE relative to Naive2: OWA = 0.5 × (sMAPE/sMAPE_naive2) + 0.5 × (MASE/MASE_naive2). OWA below 1.0 means the model beats Naive2. |
Reproducing Results
pip install vectrix Experiment Code
All experiments are fully reproducible Python scripts with results recorded in docstrings.
| Experiment | Description | Source |
|---|---|---|
| E019 | DOT-Hybrid engine M4 100K verification | 019_dotHybridEngine.py |
| E042 | M4 official OWA verification | 042_m4OfficialOwa.py |
| E043 | Holdout validation + auto period detection | 043_dotAutoPeriodHoldout.py |
| E044 | Daily/Weekly specialist strategies | 044_dailyWeeklySpecialist.py |
| E045 | Integrated improvement verification | 045_integratedImprovement.py |
| E046 | Final integration rule validation | 046_finalIntegration.py |
Full experiment status and research notes: STATUS.md
Test Suite
573 tests, 5 skipped — covering all engines, models, and pipeline components.
pip install vectrix
pytest tests/ -x -q | Test Module | Count | Coverage |
|---|---|---|
| test_all_models.py | 112 | All 30+ forecasting models |
| test_new_models.py | 45 | DTSF, ESN, 4Theta engines |
| test_engine_utils.py | 55 | ARIMAX, CV, decomposition |
| test_easy.py | 33 | Easy API (forecast, analyze, regress) |
| test_business.py | 45 | Anomaly, backtesting, metrics, scenarios |
| test_adaptive.py | 20 | Regime, DNA, self-healing, constraints |
| test_regression.py | 22 | OLS, Ridge, Lasso, diagnostics |
Tip: For faster M4 data loading, download the CSV files directly from the M4 Competition repository rather than using
M4.load(), which can be slow due to wide-to-long data transformation.