3.3.3. Time-Series Feature Engineering
First Principle: Time-series feature engineering fundamentally extracts temporal patterns and contextual information from sequential data, transforming raw time series into features that enable ML models to understand trends, seasonality, and dependencies over time.
Time series data (e.g., stock prices, sensor readings, sales data over time) requires specialized feature engineering to capture its inherent temporal characteristics (trends, seasonality, cycles, lags) that are crucial for prediction.
Key Time-Series Feature Engineering Techniques:
- Extracting Date/Time Components:
- Cyclical Features: Day of week, month of year, hour of day (e.g.,
sin(2*pi*day_of_week/7)
). - Ordinal Features: Day of year, week of year, quarter, year.
- Binary Indicators: Is_weekend, is_holiday, is_quarter_end.
- Cyclical Features: Day of week, month of year, hour of day (e.g.,
- Lag Features (Shifted Features):
- Method: Create new features by taking the value of a variable at a previous time step (e.g.,
sales_yesterday
,temperature_last_hour
). - Use Cases: Capture autocorrelation and dependencies over time. Essential for autoregressive models.
- Method: Create new features by taking the value of a variable at a previous time step (e.g.,
- Window-based Features (Rolling Aggregations):
- Method: Calculate summary statistics over a rolling window of past observations (e.g., moving average, rolling sum, min, max, standard deviation over the last 7 days).
- Use Cases: Smooth out noise, capture trends, detect changes over time.
- Time-Series Decomposition: Separating a time series into trend, seasonal, and residual components.
- Fourier Transforms: Can extract seasonal components.
- External/Calendar Features: Incorporating information about holidays, promotions, or external events that impact the time series.
- Domain-Specific Features: E.g., for financial data, features like volatility or momentum.
AWS Tools:
- SageMaker Processing Jobs or Glue ETL Jobs (especially with Spark for window functions) are ideal for large-scale time-series feature engineering.
- SageMaker Notebook Instances / Studio Notebooks for interactive development using Python libraries like Pandas (for
shift()
,rolling()
), NumPy. - Amazon Forecast (an AI Service) automates many time-series feature engineering aspects by design, but you have less control over specific feature creation.
Scenario: You are building a model to predict future product sales using historical daily sales data. You need to capture daily, weekly, and monthly sales trends, as well as the impact of previous days' sales on current sales.
Reflection Question: How do time-series feature engineering techniques (e.g., extracting day of week from timestamps, creating lag features, computing rolling averages) fundamentally transform raw time series into features that enable ML models to understand trends, seasonality, and dependencies over time for improved predictive accuracy?