Tutorial 02 — Analysis & DNA

Open in Colab

Understand your data before you forecast it. Every time series has a unique statistical fingerprint — its “DNA.” Vectrix extracts 65+ features from your data to build a profile that reveals trend direction, seasonal patterns, volatility regime, structural breaks, and forecasting difficulty. This profile drives automatic model selection and helps you make better decisions.

Basic Analysis

The analyze() function accepts the same input formats as forecast() — lists, arrays, DataFrames, Series, or CSV paths

import pandas as pd
from vectrix import analyze

df = pd.read_csv("sales.csv")
report = analyze(df, date="date", value="sales")

print(report.summary())

Expected output:

=== Vectrix Analysis Report ===

DNA Profile:
  Fingerprint: a3f7c2d1
  Difficulty: medium (42/100)
  Category: seasonal
  Recommended Models: ['AutoETS', 'MSTL', 'Theta']

Data Characteristics:
  Length: 365 observations
  Period: 7 (weekly)
  Trend: Yes (upward, strength 0.72)
  Seasonality: Yes (strength 0.85)
  Volatility: low (0.0312)

Changepoints: [89, 201]
Anomalies: [15, 156, 298]

You can also pass raw lists or arrays

from vectrix import analyze

data = [120, 135, 148, 130, 155, 170, 162, 180, 195, 185, 200, 215]
report = analyze(data)
print(report.summary())

DNA Profile

The DNA profile is the core of Vectrix’s intelligence — it determines which models to prioritize, how to set hyperparameters, and what difficulty level to expect. Access it through report.dna

dna = report.dna

print(f"Fingerprint: {dna.fingerprint}")
print(f"Difficulty: {dna.difficulty} ({dna.difficultyScore:.0f}/100)")
print(f"Category: {dna.category}")
print(f"Recommended: {dna.recommendedModels[:5]}")

Expected output:

Fingerprint: a3f7c2d1
Difficulty: medium (42/100)
Category: seasonal
Recommended: ['AutoETS', 'MSTL', 'Theta', 'DOT', 'AutoCES']

DNA Attributes

AttributeTypeDescription
fingerprintstrDeterministic hash — identical data always produces the same value
difficultystreasy, medium, hard, or very_hard
difficultyScorefloatNumeric difficulty score (0-100, higher = harder)
categorystrseasonal, trending, volatile, intermittent, stationary
recommendedModelslist[str]Ordered list of optimal models for this data

The fingerprint is deterministic: the same data always produces the same hash. This makes it useful for caching and reproducibility.

Changepoints

Changepoints are locations where the underlying statistical properties of the time series shift — a sudden change in mean level, variance, or trend slope. Detecting these is critical because data before a changepoint may follow different dynamics than data after it

print(f"Changepoints at indices: {report.changepoints}")

Expected output:

Changepoints at indices: [89, 201]

These correspond to positions in your data where structural breaks occurred — for example, a policy change, a product launch, or a market shift.

Anomalies

Anomalies are individual observations that deviate significantly from the expected pattern. Unlike changepoints (which are structural shifts), anomalies are isolated spikes or dips — often caused by data errors, one-time events, or external shocks

print(f"Anomaly indices: {report.anomalies}")
print(f"Number of anomalies: {len(report.anomalies)}")

Expected output:

Anomaly indices: [15, 156, 298]
Number of anomalies: 3

Data Characteristics

The characteristics object provides a comprehensive statistical profile of your data — trend direction and strength, seasonal patterns, volatility level, and predictability score

c = report.characteristics

print(f"Length: {c.length}")
print(f"Period: {c.period}")
print(f"Trend: {c.hasTrend} ({c.trendDirection}, strength {c.trendStrength:.2f})")
print(f"Seasonality: {c.hasSeasonality} (strength {c.seasonalStrength:.2f})")
print(f"Volatility: {c.volatilityLevel} ({c.volatility:.4f})")
print(f"Predictability: {c.predictabilityScore}/100")
print(f"Outliers: {c.outlierCount} ({c.outlierRatio:.1%})")

Expected output:

Length: 365
Period: 7
Trend: True (upward, strength 0.72)
Seasonality: True (strength 0.85)
Volatility: low (0.0312)
Predictability: 78/100
Outliers: 3 (0.8%)

Characteristics Reference

AttributeTypeDescription
lengthintNumber of observations
periodintDetected seasonal period
hasTrendboolWhether trend is present
trendDirectionstrupward, downward, or none
trendStrengthfloatTrend strength (0-1)
hasSeasonalityboolWhether seasonality is present
seasonalStrengthfloatSeasonal strength (0-1)
volatilityfloatCoefficient of variation
volatilityLevelstrlow, medium, or high
predictabilityScoreint0-100, higher = more predictable
outlierCountintNumber of detected outliers
outlierRatiofloatFraction of outliers

Full Summary

The summary() method combines all analysis components — DNA profile, characteristics, changepoints, anomalies, and model recommendations — into a single formatted report

full_report = report.summary()
print(full_report)

This includes the DNA profile, characteristics, changepoints, anomalies, and model recommendations in a human-readable format.

Extracted Features

Access the raw 65+ statistical features as a dictionary. These features include autocorrelation, Hurst exponent, entropy, volatility clustering, and many more — they power the DNA classification system

features = report.features
for key, value in list(features.items())[:10]:
    print(f"  {key}: {value}")

Expected output:

  trendStrength: 0.72
  seasonalStrength: 0.85
  acf1: 0.91
  hurstExponent: 0.78
  volatilityClustering: 0.15
  nonlinearAutocorr: 0.23
  demandDensity: 1.0
  seasonalPeakPeriod: 7
  entropy: 2.34
  stability: 0.88

Practical Usage: Analyze Before Forecasting

A powerful pattern is to analyze first, then adjust your forecasting strategy based on what the DNA reveals. This lets you handle edge cases gracefully

from vectrix import analyze, forecast

report = analyze(df, date="date", value="sales")

if report.dna.difficulty == "very_hard":
    print("Warning: This series is very hard to forecast.")
    print(f"Consider using: {report.dna.recommendedModels[:3]}")

if len(report.changepoints) > 0:
    print(f"Structural breaks detected at {report.changepoints}.")
    print("Recent data may be more relevant than older data.")

if report.characteristics.hasSeasonality:
    period = report.characteristics.period
    print(f"Seasonal period: {period}")
    result = forecast(df, date="date", value="sales", steps=period)
else:
    result = forecast(df, date="date", value="sales", steps=14)

Direct ForecastDNA Access

For lower-level control — such as building custom model selection logic or caching DNA profiles — use the ForecastDNA class directly

from vectrix import ForecastDNA
import numpy as np

dna = ForecastDNA()
data = np.random.randn(200).cumsum() + 100
profile = dna.analyze(data, period=7)

print(f"Fingerprint: {profile.fingerprint}")
print(f"Difficulty: {profile.difficulty} ({profile.difficultyScore:.0f}/100)")
print(f"Recommended: {profile.recommendedModels}")

Complete Example

import pandas as pd
from vectrix import analyze

df = pd.read_csv("monthly_revenue.csv")
report = analyze(df, date="month", value="revenue")

print("=== DNA ===")
print(f"Category: {report.dna.category}")
print(f"Difficulty: {report.dna.difficulty}")
print(f"Top models: {report.dna.recommendedModels[:3]}")

print()
print("=== Characteristics ===")
c = report.characteristics
print(f"Trend: {c.trendDirection} (strength {c.trendStrength:.2f})")
print(f"Seasonal: period={c.period}, strength={c.seasonalStrength:.2f}")
print(f"Predictability: {c.predictabilityScore}/100")

print()
print("=== Structural Breaks ===")
print(f"Changepoints: {report.changepoints}")
print(f"Anomalies: {report.anomalies}")