Portfolio

My Projects

Data science, machine learning, and AI projects built across my studies and internships

Stock Market Prediction & Evaluation Framework
Machine Learning

May 2026 - Present

Active

Stock Market Prediction & Evaluation Framework

A Python framework for registering, running, and comparing stock return prediction models across 24 algorithms and 7 model families. Supports multi-horizon forecasting (1, 5, and 21 days), classification and regression targets, walk-forward cross-validation, SHAP feature importance, macroeconomic feature enrichment via the FRED API, automated strategy optimisation, portfolio construction with four allocators, and realistic backtesting with transaction costs.

PythonXGBoostLightGBM+9
Project NoCap: AI-Powered Fact-Checking for Instagram
LLMs & Prompt Engineering

September 2024 - Present

Project NoCap: AI-Powered Fact-Checking for Instagram

An AI-powered fact-checking assistant for Instagram that helps users quickly assess the credibility of posts and reels. By forwarding content to the @project_nocap account, users receive an automated analysis that highlights potential misinformation, bias, and links to more reliable sources, making it easier to navigate the information overload on social media.

PythonLLMsPrompt Engineering+2
Machine Learning Analysis of Diabetes-Related Health Outcomes (University of London Coursework)
Machine Learning

September 2025 - April 2026

Machine Learning Analysis of Diabetes-Related Health Outcomes (University of London Coursework)

A machine learning coursework project for ST3189 (Machine Learning) at the University of London, applying unsupervised learning, classification, and regression to the 2024 CDC Behavioral Risk Factor Surveillance System (BRFSS) - a telephone survey of 457,670 US adults. PCA and K-means clustering reveal interpretable health dimensions and identify a high-risk subgroup with 55% diabetic prevalence. Seven classifiers achieve AUC scores of 0.75-0.81 for predicting diabetes status without clinical tests, and gradient boosting predicts physical health burden among confirmed diabetics with R-squared = 0.45.

RPCAK-Means Clustering+10
Grandma vs. Data Scientist Student: Information-Theoretic Wordle Solver
Algorithms & Optimization

April 2025 - Present

Grandma vs. Data Scientist Student: Information-Theoretic Wordle Solver

This project is an intelligent Wordle solver that uses information theory and optimization algorithms to play the New York Times Wordle game with high accuracy and efficiency. It models each guess as an information-gathering step, selecting words that maximize expected information gain and minimize the number of guesses needed to find the correct answer. I originally built it to compete playfully with my grandmother, a retired English professor and lifelong word-game enthusiast, and it has become a fun way for us to connect, compare strategies, and talk about language from two very different perspectives: hers as a human expert in words and mine as a data science student building algorithms.

PythonNumPySelenium+4
Handwritten Digit Recognition with Neural Networks
Machine Learning

December 2023 - January 2024

Handwritten Digit Recognition with Neural Networks

This project is a neural network implementation from scratch for handwritten digit recognition. Built entirely using fundamental machine learning principles, it demonstrates the core concepts of feedforward neural networks, backpropagation, and gradient descent without relying on high-level deep learning frameworks. The project includes an interactive graphical user interface that allows users to draw digits on a canvas and receive real-time predictions from the trained model, making it both an educational tool and a practical demonstration of neural network capabilities.

PythonMatplotlibNumPy+3
Programming for Data Science Coursework: MCMC Algorithms & Flight Data Analysis (University of London)
Statistical Computing & Data Analysis

September 2024 - April 2025

Programming for Data Science Coursework: MCMC Algorithms & Flight Data Analysis (University of London)

A comprehensive statistical computing project completed for ST2195 (Programming for Data Science) at the University of London, consisting of two parts: (1) implementation and analysis of the Metropolis-Hastings MCMC algorithm for simulating random numbers from a Laplace distribution, and (2) analysis of commercial flight data from the 2009 ASA Statistical Computing and Graphics Data Expo. The project demonstrates proficiency in both R and Python, covering topics from Bayesian statistics and convergence diagnostics to logistic regression modeling and large-scale data analysis.

PythonRPandas+2

Interested in collaborating?

I'm always open to discussing new projects, creative ideas, or opportunities to be part of your vision.

Get In Touch