Building the prediction engine required solving a fundamental data problem first: raw NBA statistics are almost useless as predictors on their own. A player who scored 18 points per game last season will score approximately 18 points per game next season — but that tells us almost nothing about whether they'll score 15 or 21. The signal is in the context.
So before training a single model, we built a 12-layer feature engineering pipeline. 347 features per player-season, capturing everything we believe actually drives year-to-year performance changes: momentum, opportunity, context, age, environment, matchups, and archetype. Zero DB errors across 150-player integration test. 9 of 10 pipeline checks passing.
Each layer contributes a distinct signal class. Together they give the model enough context to understand not just what a player did, but why — and whether those conditions will persist.
Layer 01
Player Talent Baseline
Per-game stats, percentages, and volume metrics from the current and prior seasons. The raw signal the model builds on.
pts_per_gamereb_per_36usage_ratetrue_shooting_pct
Layer 02
Rolling Momentum Windows
3, 5, 10, 20-game rolling averages for every core stat. Captures hot/cold streaks and multi-scale momentum signals.
pts_rolling_10greb_trend_5gfg_pct_rolling_20gast_momentum
Layer 03
Opportunity Signals
Projected minutes, usage share, shot attempts, pace context. A player with the same talent at higher opportunity projects better.
projected_mpgshot_opportunity_scoreteam_paceusage_delta_yoy
Layer 04
Lineup Context
Role in starting vs bench units, lineup net rating with the player on/off, positional scarcity within the team roster.
on_court_net_rtglineup_usage_sharestarter_probabilitypositional_scarcity
Layer 05
Injury Ripple Effects
How teammate injuries affect each player's opportunity. Built with a separate injury ripple model — when a star is injured, role players see opportunity gains.
teammate_injury_scoreopportunity_bumpinjury_ripple_ptsstar_absence_days
Layer 06
Team & Coach Context
Team pace, offensive system, coach tendencies, three-point rate, playoff pressure. Some systems produce more points; others produce more assists.
team_off_ratingcoach_3pt_emphasissystem_archetypeplayoff_urgency_score
Layer 07
Opponent Matchups
Projected opponent difficulty, opponent defensive rating by position, strength of schedule for the projection window.
opp_def_rtgmatchup_difficultysos_next_30dposition_def_rank
Layer 08
Schedule Effects
Back-to-backs, rest days, travel load, home/away split, density of the next 30-game window.
rest_daysback_to_back_pcttravel_mileshome_game_pct
Layer 09
Age Curves
Player age relative to position-specific peak years, career trajectory stage, age-adjusted decay rates. Separate curves per position.
age_vs_position_peakcareer_stageage_decay_rateprime_years_remaining
Layer 10
Volatility Modelling
Historical stat variance, game-to-game consistency scores, boom/bust probability. High variance players get wider confidence intervals.
pts_std_dev_10gconsistency_scoreboom_bust_flagvolatility_tier
Layer 11
Market Signals
ADP, expert consensus ranks, waiver wire velocity, ownership trends. Stubbed for Season 1 — will incorporate when market data is live.
adp_rankour_rank_vs_adpwaiver_add_velocityexpert_consensus
Layer 12
Archetype Embeddings
Player archetype from clustering: Heliocentric Creator, Elite Wing Defender, 3-and-D Specialist, etc. Peer cohort averages used as priors.
archetype_labelarchetype_peer_avg_ptsarchetype_peer_avg_agearchetype_cluster_id