Machine Learning

Stock Market Prediction & Evaluation Framework

May 2026 - Present

About This Project

A Python framework for registering, running, and comparing stock return prediction models across 24 algorithms and 7 model families. Supports multi-horizon forecasting (1, 5, and 21 days), classification and regression targets, walk-forward cross-validation, SHAP feature importance, macroeconomic feature enrichment via the FRED API, automated strategy optimisation, portfolio construction with four allocators, and realistic backtesting with transaction costs.

Project Details

Stock Model Tester is a personal research framework I built to systematically test and compare machine learning approaches to stock return prediction. The motivation came from wanting a rigorous, reproducible way to evaluate whether any given model actually adds predictive value over simple baselines — and to do so without the usual pitfalls of data leakage or look-ahead bias. Architecture (v2) The framework is built around a registry-everywhere pattern: every extensible category (data loaders, feature transforms, models, metrics, allocators, cost models) uses the same @register_* decorator, so adding a new component never requires changes to existing code. A strict dependency DAG enforces that no lower layer imports from a layer above it, keeping each module independently testable. Results are stored through a ResultsStore interface so the backend (currently file-based) can be swapped in a future version without touching strategy code. Models (24 total) The 24 registered models span seven families: - Baselines: NaiveLastValue, RollingMean, HistoricalMean (Goyal-Welch prevailing-mean benchmark) - Linear / Regularised: Ridge, Lasso, ElasticNet - Tree-Based Regression: XGBoost, LightGBM, CatBoost, RandomForest, SVR - Tree-Based Classification: XGBoostClassifier, LightGBMClassifier, RandomForestClassifier, LogisticRegressionClassifier - Classical Time Series: ARIMA, SARIMA, ETS, GARCH, MonteCarlo (GBM), OrnsteinUhlenbeck - Regime Models: HMM (3-state Gaussian), MarkovSwitching AR(1) - Ensemble: equal-weight forecast combination across all models PyTorch-based models (LSTM, Transformer, TCN) are written but disabled due to an OpenMP conflict on macOS ARM; they can be re-enabled on a CUDA Linux system. Evaluation & Walk-Forward Validation All evaluation is strictly out-of-sample. Walk-forward cross-validation (expanding or rolling window) refits the scaler on training data only before each fold, and predictions are aggregated across folds before any metric is computed. The step size automatically adjusts to avoid overlapping test windows at longer horizons. Metrics include RMSE, OOS R² (Campbell-Thompson 2008), directional accuracy, Sharpe, Calmar, Rank IC, and max drawdown for regression targets, and AUC-ROC, log loss, Brier score, and F1 for classification targets. Strategy Layer Beyond single-model evaluation, the framework includes a full strategy layer: - Model selection: walk-forward OOS model selection per ticker, writing a strategy_recommendation.yaml that can be passed directly to the run command. - Portfolio construction: four registered allocators (equal weight, signal-weighted, minimum variance with Ledoit-Wolf shrinkage, and maximum Sharpe tangency portfolio). - Backtesting: realistic simulation where transaction costs (configurable in basis points) are applied only on trade days (when position sign changes), producing gross/net equity curves and associated metrics. Hyperparameter Tuning Optuna integration enables automated hyperparameter search with a walk-forward OOS R² objective. Studies are persisted to SQLite and can be resumed across sessions. Best parameters are written to ticker-specific config overrides that are deep-merged on top of base configs at runtime. Macroeconomic Features (FRED API) The macro_fred transform enriches the feature set with seven FRED series: 10-year Treasury yield, 3-month T-bill, term spread, BAA and AAA corporate yields, default spread, and CPI inflation. Responses are cached locally with a 24-hour TTL and publication lags are enforced per series. If the API key is absent or FRED is unavailable, the transform logs a warning and the pipeline continues without macro columns. Testing The project has 89 tests covering baseline models, all 15 metrics including classification, feature pipeline shapes and no-leakage guarantees, full pipeline smoke tests for 4 models, config validation, CLI commands, registry completeness, DAG enforcement, and the v1/v2 results store interface.

Technologies

PythonXGBoostLightGBMCatBoostOptunaSHAPscikit-learnstatsmodelsyfinanceFRED APIPandasNumPy

Stock Market Prediction & Evaluation Framework

About This Project

Project Details

Technologies

Project Info