RK

Rehema Kemunto

My work spans behavioral economics, operational research, and product analytics. I've tested pricing psychology across 185,000+ transactions and quantified weather's impact on transit demand to save £23K/month.

See the work

Technical Skills

Hypothesis Testing
Bayesian Inference
Regression Analysis
Python
R
SQL
Power BI
Looker Studio
Root Cause Analysis
Django

Projects

A record of questions I asked, methods I used and what I actually found.

Beyond the Treadmill: A Multivariate Analysis of Exercise and Mental Health

Does exercise actually help depression, or are we missing the bigger picture in the data?

Finding: Exercise is a significant predictor of depression (p = 0.014) but explains only 0.25% of variance in isolation. A whole-person model revealed Age and BMI are mathematically stronger predictors. Clustering exposed a High-Risk group that exercises maximally yet shows 67.5% clinical depression prevalence. Three categorical hypothesis tests failing in sequence is not a dead end - it is proof that continuous data must not be bucketed.

Impact: Demolished the simplistic "exercise equals happiness" narrative with real-world NHANES data (n = 2,004). The methodological finding - that forcing continuous data into categories destroys statistical signal - is the most transferable result.

R K-Means Clustering Multiple Regression Hypothesis Testing
View on GitHub

Feature Redundancy in Breast Cancer Diagnostics

Clinical labs record 30 features per tumour sample. Are all 30 doing independent work, or is the dataset carrying a lot of redundant weight?

Finding: Radius, perimeter, and area are near-perfectly correlated (r ≥ 0.98) because they are geometrically linked. Measuring all three is statistically measuring one thing three times. PCA confirmed the dataset compresses to 10 components at 95% variance explained. Welch's t-tests ranked all 30 features by diagnostic power: worst concave points led by a wide margin (t = 29.12), while fractal dimension and symmetry failed Bonferroni correction entirely.

Impact: Redundancy in clinical measurement inflates perceived complexity and can obscure which features actually drive diagnosis. This analysis identifies exactly which features earn their place and which don't.

Python PCA Welch's t-test Bonferroni Correction Dimensionality Reduction
View on GitHub

Weather Impact Analysis: Hypothesis Testing

How much does weather actually affect UK public transit demand?

Finding: Weather accounts for 38.7% of demand variance across 17,400+ hourly observations. The pattern is consistent and predictable enough to schedule around.

Impact: Turned a vague operational problem into a staffing model. Dynamic scheduling based on weather forecasts saved £23K/month. Rigorous assumption-checking is what made the result trustworthy enough to act on.

Python Welch's t-test Hypothesis Testing Operations Research
View on GitHub

Left-Digit Bias Audit: Pricing Psychology

Does X.99 pricing actually work? I analyzed 185,000+ transactions to find out.

Finding: It works for some products, not others. Phones showed a +5.16% sales lift with X.99 pricing (p<0.001). Laptops showed no effect at all. Pricing psychology is category-specific, not a universal lever.

Impact: Retailers applying charm pricing uniformly are leaving money on the table in some categories and wasting it in others. This analysis tells you exactly which is which.

Python Mann-Whitney U Test Non-Parametric Statistics Behavioral Economics
View on GitHub

Bayesian A/B Testing: E-Commerce Optimization

Does reducing product density improve conversions? And how confident can I be in the answer?

Finding: 98% probability that lower product density improves conversion rates. The sensitivity analysis held across multiple priors, meaning the result is not fragile to starting assumptions.

Impact: A frequentist test tells you whether the result is significant. This tells you the probability that the decision is correct. This matters for real-time business decisions.

Python Bayesian Statistics Beta-Binomial Model Monte Carlo Simulation
View on GitHub

Retail Operations and Finance Optimization

Where are the bottlenecks in this $66.31M operation? Why are fulfillment times inconsistent?

Finding: Specific warehouses were responsible for 40% of delays. Not a systemic failure across the operation. One identifiable root cause, isolated through segmentation.

Impact: Reduced manual data processing by 60%. More importantly, the question changed from "why are things slow?" to "warehouse X has a process problem, here is what needs to change."

SQL Power BI Google Apps Script Root Cause Analysis
View on GitHub

Education

BSc in Mathematics

University of Nairobi

Let's Connect