<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Arun Murali</title><link>https://thearunmurali.com/</link><description>Recent content on Arun Murali</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Sun, 31 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://thearunmurali.com/index.xml" rel="self" type="application/rss+xml"/><item><title>I Ditched Obsidian Sync and Built My Own — Here's What Actually Happened</title><link>https://thearunmurali.com/post/2026/05/31/i-ditched-obsidian-sync-and-built-my-own-heres-what-actually-happened/</link><pubDate>Sun, 31 May 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/05/31/i-ditched-obsidian-sync-and-built-my-own-heres-what-actually-happened/</guid><description>&lt;h2 id="the-96year-problem"&gt;The $96/Year Problem&lt;/h2&gt;
&lt;p&gt;Obsidian Sync costs $8/month. That&amp;rsquo;s $96/year to sync markdown files — plain text — across your devices.&lt;/p&gt;
&lt;p&gt;I already pay for a VPS. I already run Docker. And I was already staring at a bill for a note-syncing service that, fundamentally, moves &lt;code&gt;.md&lt;/code&gt; files between computers.&lt;/p&gt;
&lt;p&gt;So I asked myself: &lt;em&gt;how hard could this be?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The answer: harder than it should be. But completely worth it.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="my-first-attempt-was-embarrassingly-wrong"&gt;My First Attempt Was Embarrassingly Wrong&lt;/h2&gt;
&lt;p&gt;My first instinct was to run &lt;strong&gt;Obsidian itself&lt;/strong&gt; in a Docker container on the VPS and access it through a browser. No local app needed — just open a tab, take notes.&lt;/p&gt;</description></item><item><title>I Just Wanted to Review My Chess Games. I Built a Multiplayer App Instead.</title><link>https://thearunmurali.com/post/2026/05/31/i-just-wanted-to-review-my-chess-games.-i-built-a-multiplayer-app-instead./</link><pubDate>Sun, 31 May 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/05/31/i-just-wanted-to-review-my-chess-games.-i-built-a-multiplayer-app-instead./</guid><description>&lt;h2 id="it-started-with-a-simple-frustration"&gt;It Started With a Simple Frustration&lt;/h2&gt;
&lt;p&gt;I lose a chess game. I want to know why.&lt;/p&gt;
&lt;p&gt;Lichess and Chess.com both have analysis boards, but they&amp;rsquo;re cluttered, slow to load, and I don&amp;rsquo;t control my data. I just want to paste a PGN — the standard text format chess games are saved in — and step through my moves on a clean board.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s it. That&amp;rsquo;s the whole requirement.&lt;/p&gt;
&lt;p&gt;What I shipped six weeks later: a retro terminal-style chess app with 3D WebGPU rendering, a Stockfish bot at three difficulty levels, real-time online multiplayer, JWT authentication, a PostgreSQL database tracking win/loss stats, and an AI lesson generator powered by LiteLLM.&lt;/p&gt;</description></item><item><title>Building a Couple Outfit Configurator with Layered SVG Avatars in React</title><link>https://thearunmurali.com/post/2026/03/06/building-a-couple-outfit-configurator-with-layered-svg-avatars-in-react/</link><pubDate>Fri, 06 Mar 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/03/06/building-a-couple-outfit-configurator-with-layered-svg-avatars-in-react/</guid><description>&lt;p&gt;A browser-based side-by-side bride and groom outfit configurator, built as a proof-of-concept to validate layered SVG avatar rendering, outfit switching, and coordinated couple palette application.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Live Demo:&lt;/strong&gt; &lt;a href="https://outfits.anmious.cloud"&gt;outfits.anmious.cloud&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-what-it-does"&gt;🎯 What It Does&lt;/h2&gt;
&lt;p&gt;The configurator lets you customize two avatars simultaneously:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Skin tone&lt;/strong&gt; — light, medium, dark&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Body type&lt;/strong&gt; — petite to plus-size (bride), lean to stocky (groom)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Height&lt;/strong&gt; — short through very tall, with realistic proportions&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Hair style and color&lt;/strong&gt; — updo, short bob, long straight (bride); multiple buzz cuts and spikes (groom)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Outfits&lt;/strong&gt; — Western, Indian, and Casual categories per avatar&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Glasses toggle&lt;/strong&gt; — overlaid frame layer&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Couple palettes&lt;/strong&gt; — curated coordinated color sets (Classic White, Blush &amp;amp; Rose, Sage Garden, Midnight Blue, Golden Hour)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Manual color override&lt;/strong&gt; — per-avatar primary color picker&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Export to PNG&lt;/strong&gt; — download the full couple preview as a retina-quality image&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="-architecture-why-layered-svgs"&gt;🏗️ Architecture: Why Layered SVGs?&lt;/h2&gt;
&lt;h3 id="the-core-idea"&gt;The Core Idea&lt;/h3&gt;
&lt;p&gt;Each avatar is rendered as a stack of independent SVG layer components, all sharing the same &lt;code&gt;viewBox=&amp;quot;0 0 200 450&amp;quot;&lt;/code&gt; coordinate space:&lt;/p&gt;</description></item><item><title>How I Built Smart Debt Planner with AI Prompts: FastAPI, Docker, and CI/CD Deployment</title><link>https://thearunmurali.com/post/2026/03/01/how-i-built-smart-debt-planner-with-ai-prompts-fastapi-docker-and-ci/cd-deployment/</link><pubDate>Sun, 01 Mar 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/03/01/how-i-built-smart-debt-planner-with-ai-prompts-fastapi-docker-and-ci/cd-deployment/</guid><description>&lt;p&gt;If you want to build a production-ready app quickly, this is the workflow I used to create &lt;strong&gt;Smart Debt Planner&lt;/strong&gt;: a FastAPI app for debt payoff simulations with a clean deployment pipeline.&lt;/p&gt;
&lt;p&gt;This guide covers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How I used prompts to build features faster&lt;/li&gt;
&lt;li&gt;The exact project architecture&lt;/li&gt;
&lt;li&gt;Docker + VPS deployment&lt;/li&gt;
&lt;li&gt;GitHub Actions CI/CD&lt;/li&gt;
&lt;li&gt;Real-world troubleshooting (SSH auth, Docker Compose mismatch, 502 errors)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="-latest-update-march-2026"&gt;🔄 Latest Update (March 2026)&lt;/h2&gt;
&lt;p&gt;New additions were shipped after the initial backend-only deployment:&lt;/p&gt;</description></item><item><title>Build a 3D Chess Replay Viewer with WebGPU in Under 30 Minutes</title><link>https://thearunmurali.com/post/2026/01/24/build-a-3d-chess-replay-viewer-with-webgpu-in-under-30-minutes/</link><pubDate>Sat, 24 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/24/build-a-3d-chess-replay-viewer-with-webgpu-in-under-30-minutes/</guid><description>&lt;p&gt;A complete step-by-step guide to creating an interactive 3D chess replay viewer using Babylon.js, React, and TypeScript. Watch chess games come to life with smooth animations, glowing highlights, and WebGPU-accelerated rendering!&lt;/p&gt;
&lt;h2 id="-what-youll-build"&gt;🎯 What You&amp;rsquo;ll Build&lt;/h2&gt;
&lt;p&gt;An interactive 3D chess board that can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Parse chess games in PGN notation&lt;/li&gt;
&lt;li&gt;Replay games move by move with smooth 3D animations&lt;/li&gt;
&lt;li&gt;Switch between different chess piece sets&lt;/li&gt;
&lt;li&gt;Auto-play games with pause/resume controls&lt;/li&gt;
&lt;li&gt;Support both WebGPU (modern) and WebGL (fallback) rendering&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Live Demo Features:&lt;/strong&gt;&lt;/p&gt;</description></item><item><title>Accuracy</title><link>https://thearunmurali.com/post/2026/01/10/accuracy/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/accuracy/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Accuracy is the proportion of correct predictions: (TP + TN) / Total. Simple and intuitive, but MISLEADING for imbalanced datasets. If 95% of emails are not spam, a model that always predicts &amp;ldquo;not spam&amp;rdquo; gets 95% accuracy but is useless. Use accuracy only for balanced datasets; prefer precision, recall, or F1 for imbalanced data.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="formula"&gt;Formula&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="when-to-use-accuracy"&gt;When to Use Accuracy&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>AdaBoost (Adaptive Boosting)</title><link>https://thearunmurali.com/post/2026/01/10/adaboost-adaptive-boosting/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/adaboost-adaptive-boosting/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;AdaBoost builds models sequentially, each focusing on examples the previous models got wrong by adjusting their weights. Combines weak learners (usually shallow trees/stumps) into a strong learner. Each model&amp;rsquo;s influence is weighted by its accuracy. Pros: simple, works well with weak learners. Cons: sensitive to outliers and noise, slower than Random Forest.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="how-adaboost-works"&gt;How AdaBoost Works&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="the-algorithm"&gt;The Algorithm&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>ANOVA (Analysis of Variance)</title><link>https://thearunmurali.com/post/2026/01/10/anova-analysis-of-variance/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/anova-analysis-of-variance/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;ANOVA (Analysis of Variance) tests if means of 3+ groups are significantly different. Instead of multiple t-tests (which inflates Type I error), ANOVA does one omnibus test by comparing variance between groups to variance within groups. If significant, use post-hoc tests to find which specific groups differ. One-way ANOVA has one factor; two-way has two factors.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-is-anova"&gt;What is ANOVA?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Chi-Square Tests (χ² Tests)</title><link>https://thearunmurali.com/post/2026/01/10/chi-square-tests-%CF%87-tests/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/chi-square-tests-%CF%87-tests/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Chi-square tests determine if there&amp;rsquo;s a significant association between categorical variables. Common types: (1) Test of Independence (are two variables related?), (2) Goodness of Fit (does data match expected distribution?). Use when both variables are categorical. The test compares observed frequencies to expected frequencies.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-is-a-chi-square-test"&gt;What is a Chi-Square Test?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="chi-square-test-of-independence"&gt;Chi-Square Test of Independence&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Classification Report (Precision, Recall, F1-Score)</title><link>https://thearunmurali.com/post/2026/01/10/classification-report-precision-recall-f1-score/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/classification-report-precision-recall-f1-score/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Classification report provides a comprehensive view of model performance: precision, recall, F1-score, and support for each class. Shows how well the model performs overall and per-class. Essential for imbalanced datasets where accuracy alone is misleading. Use to identify which classes the model struggles with.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="whats-in-a-classification-report"&gt;What&amp;rsquo;s in a Classification Report?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="metrics-explained"&gt;Metrics Explained&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Precision&lt;/li&gt;
&lt;li&gt;Recall&lt;/li&gt;
&lt;li&gt;F1-Score&lt;/li&gt;
&lt;li&gt;Support&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="macro-vs-weighted-averages"&gt;Macro vs Weighted Averages&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Clustering Overview - Unsupervised Learning</title><link>https://thearunmurali.com/post/2026/01/10/clustering-overview-unsupervised-learning/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/clustering-overview-unsupervised-learning/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Clustering is unsupervised learning that groups similar data points together without predefined labels. Common algorithms: K-Means (fast, needs K), Hierarchical (dendrogram, no K needed), DBSCAN (density-based, finds arbitrary shapes). Use cases: customer segmentation, anomaly detection, data exploration. Unlike supervised learning, there&amp;rsquo;s no &amp;ldquo;correct&amp;rdquo; answer - evaluate with silhouette score, elbow method, domain knowledge.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-is-clustering"&gt;What is Clustering?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="supervised-vs-unsupervised-learning"&gt;Supervised vs Unsupervised Learning&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Confusion Matrix</title><link>https://thearunmurali.com/post/2026/01/10/confusion-matrix/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/confusion-matrix/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;A confusion matrix shows the performance of a classification model by comparing actual vs predicted labels. Four quadrants: True Positive (TP), False Positive (FP), True Negative (TN), False Negative (FN). All classification metrics (precision, recall, accuracy, F1) derive from these four values. Essential for understanding where your model makes mistakes.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="structure-of-confusion-matrix"&gt;Structure of Confusion Matrix&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; Predicted
 Pos Neg
Actual Pos TP FN
 Neg FP TN
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id="the-four-quadrants"&gt;The Four Quadrants&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Cross Validation (CV)</title><link>https://thearunmurali.com/post/2026/01/10/cross-validation-cv/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/cross-validation-cv/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Cross-validation (CV) evaluates model performance by splitting data into K folds, training on K-1 folds and testing on the remaining fold, repeating K times. Averages results for more robust estimate than single train-test split. Common: K-Fold (K=5 or 10), Stratified K-Fold (preserves class distribution), Leave-One-Out. Use to detect overfitting and compare models fairly.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-is-cross-validation"&gt;What is Cross Validation?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="types-of-cross-validation"&gt;Types of Cross Validation&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Data Science Interview Cheat Sheets Index</title><link>https://thearunmurali.com/post/2026/01/10/data-science-interview-cheat-sheets-index/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/data-science-interview-cheat-sheets-index/</guid><description>&lt;h1 id="data-science-interview-cheat-sheets"&gt;Data Science Interview Cheat Sheets&lt;/h1&gt;
&lt;p&gt;Quick reference guides organized by topic. These are meant for last-minute review before interviews.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-statistics--probability"&gt;📊 Statistics &amp;amp; Probability&lt;/h2&gt;
&lt;h3 id="cheat-sheet-descriptive-statistics"&gt;Cheat Sheet: Descriptive Statistics&lt;/h3&gt;
&lt;table&gt;
	&lt;thead&gt;
			&lt;tr&gt;
					&lt;th&gt;Metric&lt;/th&gt;
					&lt;th&gt;Formula&lt;/th&gt;
					&lt;th&gt;Use Case&lt;/th&gt;
			&lt;/tr&gt;
	&lt;/thead&gt;
	&lt;tbody&gt;
			&lt;tr&gt;
					&lt;td&gt;Mean&lt;/td&gt;
					&lt;td&gt;&lt;code&gt;Σx / n&lt;/code&gt;&lt;/td&gt;
					&lt;td&gt;Central tendency, continuous data&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;Median&lt;/td&gt;
					&lt;td&gt;Middle value&lt;/td&gt;
					&lt;td&gt;Skewed distributions, outliers present&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;Mode&lt;/td&gt;
					&lt;td&gt;Most frequent&lt;/td&gt;
					&lt;td&gt;Categorical data&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;Variance&lt;/td&gt;
					&lt;td&gt;&lt;code&gt;Σ(x - μ)² / n&lt;/code&gt;&lt;/td&gt;
					&lt;td&gt;Data spread&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;Std Dev&lt;/td&gt;
					&lt;td&gt;&lt;code&gt;√Variance&lt;/code&gt;&lt;/td&gt;
					&lt;td&gt;Same units as data&lt;/td&gt;
			&lt;/tr&gt;
	&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="cheat-sheet-probability-distributions"&gt;Cheat Sheet: Probability Distributions&lt;/h3&gt;
&lt;table&gt;
	&lt;thead&gt;
			&lt;tr&gt;
					&lt;th&gt;Distribution&lt;/th&gt;
					&lt;th&gt;Type&lt;/th&gt;
					&lt;th&gt;Parameters&lt;/th&gt;
					&lt;th&gt;Use Case&lt;/th&gt;
			&lt;/tr&gt;
	&lt;/thead&gt;
	&lt;tbody&gt;
			&lt;tr&gt;
					&lt;td&gt;Normal&lt;/td&gt;
					&lt;td&gt;Continuous&lt;/td&gt;
					&lt;td&gt;μ, σ&lt;/td&gt;
					&lt;td&gt;Natural phenomena, errors&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;Binomial&lt;/td&gt;
					&lt;td&gt;Discrete&lt;/td&gt;
					&lt;td&gt;n, p&lt;/td&gt;
					&lt;td&gt;Success/failure trials&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;Poisson&lt;/td&gt;
					&lt;td&gt;Discrete&lt;/td&gt;
					&lt;td&gt;λ&lt;/td&gt;
					&lt;td&gt;Rare events over time&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;Uniform&lt;/td&gt;
					&lt;td&gt;Continuous&lt;/td&gt;
					&lt;td&gt;a, b&lt;/td&gt;
					&lt;td&gt;Equal probability&lt;/td&gt;
			&lt;/tr&gt;
	&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr&gt;
&lt;h2 id="-hypothesis-testing-decision-tree"&gt;🔬 Hypothesis Testing Decision Tree&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;Start: Do you have a question about relationships?
 │
 ├─ YES → What type of data?
 │ │
 │ ├─ Categorical vs Categorical → Chi-Square Test
 │ │
 │ ├─ Numerical vs Categorical (2 groups) → t-test
 │ │ ├─ Known population σ? → Z-test
 │ │ └─ Unknown σ, small sample → t-test
 │ │
 │ ├─ Numerical vs Categorical (3+ groups) → ANOVA
 │ │
 │ └─ Numerical vs Numerical → Correlation / Regression
 │
 └─ NO → EDA / Descriptive Statistics
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id="cheat-sheet-hypothesis-tests-comparison"&gt;Cheat Sheet: Hypothesis Tests Comparison&lt;/h3&gt;
&lt;table&gt;
	&lt;thead&gt;
			&lt;tr&gt;
					&lt;th&gt;Test&lt;/th&gt;
					&lt;th&gt;Data Types&lt;/th&gt;
					&lt;th&gt;Null Hypothesis&lt;/th&gt;
					&lt;th&gt;When to Use&lt;/th&gt;
			&lt;/tr&gt;
	&lt;/thead&gt;
	&lt;tbody&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;Chi-Square&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;Cat vs Cat&lt;/td&gt;
					&lt;td&gt;No association&lt;/td&gt;
					&lt;td&gt;Independence, goodness-of-fit&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;t-test&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;Num vs Cat (2 groups)&lt;/td&gt;
					&lt;td&gt;Means are equal&lt;/td&gt;
					&lt;td&gt;Compare 2 group means&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;Z-test&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;Num vs Cat (2 groups)&lt;/td&gt;
					&lt;td&gt;Means are equal&lt;/td&gt;
					&lt;td&gt;Large sample, known σ&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;ANOVA&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;Num vs Cat (3+ groups)&lt;/td&gt;
					&lt;td&gt;All means equal&lt;/td&gt;
					&lt;td&gt;Compare 3+ group means&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;F-test&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;Num vs Num&lt;/td&gt;
					&lt;td&gt;Variances equal&lt;/td&gt;
					&lt;td&gt;Compare variances&lt;/td&gt;
			&lt;/tr&gt;
	&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr&gt;
&lt;h2 id="-machine-learning-algorithms"&gt;🤖 Machine Learning Algorithms&lt;/h2&gt;
&lt;h3 id="cheat-sheet-supervised-learning-algorithm-selection"&gt;Cheat Sheet: Supervised Learning Algorithm Selection&lt;/h3&gt;
&lt;table&gt;
	&lt;thead&gt;
			&lt;tr&gt;
					&lt;th&gt;Algorithm&lt;/th&gt;
					&lt;th&gt;Problem Type&lt;/th&gt;
					&lt;th&gt;Pros&lt;/th&gt;
					&lt;th&gt;Cons&lt;/th&gt;
					&lt;th&gt;When to Use&lt;/th&gt;
			&lt;/tr&gt;
	&lt;/thead&gt;
	&lt;tbody&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;Linear Regression&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;Regression&lt;/td&gt;
					&lt;td&gt;Fast, interpretable&lt;/td&gt;
					&lt;td&gt;Assumes linearity&lt;/td&gt;
					&lt;td&gt;Linear relationships&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;Logistic Regression&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;Classification&lt;/td&gt;
					&lt;td&gt;Interpretable, probabilities&lt;/td&gt;
					&lt;td&gt;Linear boundary&lt;/td&gt;
					&lt;td&gt;Binary/multi-class, need probabilities&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;Decision Tree&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;Both&lt;/td&gt;
					&lt;td&gt;Non-linear, interpretable&lt;/td&gt;
					&lt;td&gt;Overfits easily&lt;/td&gt;
					&lt;td&gt;Complex patterns, explainability needed&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;Random Forest&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;Both&lt;/td&gt;
					&lt;td&gt;Reduces overfitting, robust&lt;/td&gt;
					&lt;td&gt;Slow, black box&lt;/td&gt;
					&lt;td&gt;High accuracy, less interpretable OK&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;KNN&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;Both&lt;/td&gt;
					&lt;td&gt;Simple, no training&lt;/td&gt;
					&lt;td&gt;Slow prediction, sensitive to scale&lt;/td&gt;
					&lt;td&gt;Small datasets, simple patterns&lt;/td&gt;
			&lt;/tr&gt;
	&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="cheat-sheet-clustering-algorithms"&gt;Cheat Sheet: Clustering Algorithms&lt;/h3&gt;
&lt;table&gt;
	&lt;thead&gt;
			&lt;tr&gt;
					&lt;th&gt;Algorithm&lt;/th&gt;
					&lt;th&gt;Type&lt;/th&gt;
					&lt;th&gt;Pros&lt;/th&gt;
					&lt;th&gt;Cons&lt;/th&gt;
					&lt;th&gt;When to Use&lt;/th&gt;
			&lt;/tr&gt;
	&lt;/thead&gt;
	&lt;tbody&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;K-Means&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;Partitioning&lt;/td&gt;
					&lt;td&gt;Fast, scalable&lt;/td&gt;
					&lt;td&gt;Need to set K, spherical clusters&lt;/td&gt;
					&lt;td&gt;Large datasets, known # clusters&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;Hierarchical&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;Agglomerative/Divisive&lt;/td&gt;
					&lt;td&gt;No need to set K, dendrogram&lt;/td&gt;
					&lt;td&gt;Slow, memory intensive&lt;/td&gt;
					&lt;td&gt;Small datasets, explore # clusters&lt;/td&gt;
			&lt;/tr&gt;
	&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr&gt;
&lt;h2 id="-model-evaluation-metrics"&gt;📈 Model Evaluation Metrics&lt;/h2&gt;
&lt;h3 id="cheat-sheet-regression-metrics"&gt;Cheat Sheet: Regression Metrics&lt;/h3&gt;
&lt;table&gt;
	&lt;thead&gt;
			&lt;tr&gt;
					&lt;th&gt;Metric&lt;/th&gt;
					&lt;th&gt;Formula&lt;/th&gt;
					&lt;th&gt;Range&lt;/th&gt;
					&lt;th&gt;Interpretation&lt;/th&gt;
					&lt;th&gt;When to Use&lt;/th&gt;
			&lt;/tr&gt;
	&lt;/thead&gt;
	&lt;tbody&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;RMSE&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;&lt;code&gt;√(Σ(y - ŷ)² / n)&lt;/code&gt;&lt;/td&gt;
					&lt;td&gt;[0, ∞]&lt;/td&gt;
					&lt;td&gt;Same units as target&lt;/td&gt;
					&lt;td&gt;Penalize large errors&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;MAE&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;&lt;code&gt;Σ|y - ŷ| / n&lt;/code&gt;&lt;/td&gt;
					&lt;td&gt;[0, ∞]&lt;/td&gt;
					&lt;td&gt;Same units as target&lt;/td&gt;
					&lt;td&gt;Treat all errors equally&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;MAPE&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;&lt;code&gt;(100/n) * Σ|y - ŷ|/|y|&lt;/code&gt;&lt;/td&gt;
					&lt;td&gt;[0, ∞]%&lt;/td&gt;
					&lt;td&gt;Percentage error&lt;/td&gt;
					&lt;td&gt;Relative error important&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;R²&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;&lt;code&gt;1 - (SS_res / SS_tot)&lt;/code&gt;&lt;/td&gt;
					&lt;td&gt;(-∞, 1]&lt;/td&gt;
					&lt;td&gt;Variance explained&lt;/td&gt;
					&lt;td&gt;Model comparison&lt;/td&gt;
			&lt;/tr&gt;
	&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="cheat-sheet-classification-metrics"&gt;Cheat Sheet: Classification Metrics&lt;/h3&gt;
&lt;table&gt;
	&lt;thead&gt;
			&lt;tr&gt;
					&lt;th&gt;Metric&lt;/th&gt;
					&lt;th&gt;Formula&lt;/th&gt;
					&lt;th&gt;Range&lt;/th&gt;
					&lt;th&gt;When to Use&lt;/th&gt;
			&lt;/tr&gt;
	&lt;/thead&gt;
	&lt;tbody&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;Accuracy&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;&lt;code&gt;(TP + TN) / Total&lt;/code&gt;&lt;/td&gt;
					&lt;td&gt;[0, 1]&lt;/td&gt;
					&lt;td&gt;Balanced classes&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;Precision&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;&lt;code&gt;TP / (TP + FP)&lt;/code&gt;&lt;/td&gt;
					&lt;td&gt;[0, 1]&lt;/td&gt;
					&lt;td&gt;Minimize false alarms&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;Recall&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;&lt;code&gt;TP / (TP + FN)&lt;/code&gt;&lt;/td&gt;
					&lt;td&gt;[0, 1]&lt;/td&gt;
					&lt;td&gt;Find all positives (e.g., disease detection)&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;F1-Score&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;&lt;code&gt;2 * (Prec * Rec) / (Prec + Rec)&lt;/code&gt;&lt;/td&gt;
					&lt;td&gt;[0, 1]&lt;/td&gt;
					&lt;td&gt;Balance precision &amp;amp; recall&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;AUC-ROC&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;Area under ROC curve&lt;/td&gt;
					&lt;td&gt;[0, 1]&lt;/td&gt;
					&lt;td&gt;Overall classifier performance&lt;/td&gt;
			&lt;/tr&gt;
	&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="confusion-matrix-quick-reference"&gt;Confusion Matrix Quick Reference&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt; Predicted
 Pos Neg
Actual Pos TP FN
 Neg FP TN
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Precision&lt;/strong&gt; = &amp;ldquo;Of all predicted positives, how many were correct?&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Recall&lt;/strong&gt; = &amp;ldquo;Of all actual positives, how many did we find?&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="-overfitting-vs-underfitting"&gt;🎯 Overfitting vs Underfitting&lt;/h2&gt;
&lt;table&gt;
	&lt;thead&gt;
			&lt;tr&gt;
					&lt;th&gt;Aspect&lt;/th&gt;
					&lt;th&gt;Underfitting&lt;/th&gt;
					&lt;th&gt;Good Fit&lt;/th&gt;
					&lt;th&gt;Overfitting&lt;/th&gt;
			&lt;/tr&gt;
	&lt;/thead&gt;
	&lt;tbody&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;Training Error&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;High&lt;/td&gt;
					&lt;td&gt;Low&lt;/td&gt;
					&lt;td&gt;Very Low&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;Validation Error&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;High&lt;/td&gt;
					&lt;td&gt;Low&lt;/td&gt;
					&lt;td&gt;High&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;Model Complexity&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;Too simple&lt;/td&gt;
					&lt;td&gt;Just right&lt;/td&gt;
					&lt;td&gt;Too complex&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;What&amp;rsquo;s happening&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;Not learning patterns&lt;/td&gt;
					&lt;td&gt;Learning generalizable patterns&lt;/td&gt;
					&lt;td&gt;Memorizing noise&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;Fix&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;More features, complex model&lt;/td&gt;
					&lt;td&gt;✓ Good to go&lt;/td&gt;
					&lt;td&gt;Regularization, more data, simpler model&lt;/td&gt;
			&lt;/tr&gt;
	&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr&gt;
&lt;h2 id="-regularization"&gt;🔧 Regularization&lt;/h2&gt;
&lt;table&gt;
	&lt;thead&gt;
			&lt;tr&gt;
					&lt;th&gt;Technique&lt;/th&gt;
					&lt;th&gt;Type&lt;/th&gt;
					&lt;th&gt;Formula&lt;/th&gt;
					&lt;th&gt;Effect&lt;/th&gt;
					&lt;th&gt;When to Use&lt;/th&gt;
			&lt;/tr&gt;
	&lt;/thead&gt;
	&lt;tbody&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;Ridge (L2)&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;Linear&lt;/td&gt;
					&lt;td&gt;&lt;code&gt;+ λΣβ²&lt;/code&gt;&lt;/td&gt;
					&lt;td&gt;Shrinks coefficients&lt;/td&gt;
					&lt;td&gt;Multicollinearity, keep all features&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;Lasso (L1)&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;Linear&lt;/td&gt;
					&lt;td&gt;&lt;code&gt;+ λΣ|β|&lt;/code&gt;&lt;/td&gt;
					&lt;td&gt;Sets some β to 0&lt;/td&gt;
					&lt;td&gt;Feature selection needed&lt;/td&gt;
			&lt;/tr&gt;
	&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr&gt;
&lt;h2 id="-ensemble-methods"&gt;🎲 Ensemble Methods&lt;/h2&gt;
&lt;table&gt;
	&lt;thead&gt;
			&lt;tr&gt;
					&lt;th&gt;Method&lt;/th&gt;
					&lt;th&gt;Type&lt;/th&gt;
					&lt;th&gt;How it Works&lt;/th&gt;
					&lt;th&gt;Best For&lt;/th&gt;
			&lt;/tr&gt;
	&lt;/thead&gt;
	&lt;tbody&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;Random Forest&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;Bagging&lt;/td&gt;
					&lt;td&gt;Average of many trees&lt;/td&gt;
					&lt;td&gt;Reduce variance, high accuracy&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;AdaBoost&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;Boosting&lt;/td&gt;
					&lt;td&gt;Sequential, focus on errors&lt;/td&gt;
					&lt;td&gt;Weak learners, binary classification&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;Gradient Boosting&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;Boosting&lt;/td&gt;
					&lt;td&gt;Sequential, fit residuals&lt;/td&gt;
					&lt;td&gt;High accuracy, regression/classification&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;&lt;strong&gt;XGBoost&lt;/strong&gt;&lt;/td&gt;
					&lt;td&gt;Boosting&lt;/td&gt;
					&lt;td&gt;Optimized gradient boosting&lt;/td&gt;
					&lt;td&gt;Competition winning, production systems&lt;/td&gt;
			&lt;/tr&gt;
	&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr&gt;
&lt;h2 id="-quick-interview-formulas"&gt;⚡ Quick Interview Formulas&lt;/h2&gt;
&lt;h3 id="must-know-formulas"&gt;Must-Know Formulas&lt;/h3&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;# Standard Error
SE = σ / √n

# Z-Score
z = (x - μ) / σ

# Confidence Interval
CI = x̄ ± (z * SE)

# R² (coefficient of determination)
R² = 1 - (SS_residual / SS_total)

# Bias-Variance Tradeoff
Total Error = Bias² + Variance + Irreducible Error
&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;h2 id="-navigation"&gt;🗺️ Navigation&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/data-science-roadmap/"&gt;Back to Roadmap&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/start-here/"&gt;How to Study&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/lesson-template/"&gt;Lesson Template&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;Pro Tip&lt;/strong&gt;: Print these cheat sheets and review them the night before your interview!&lt;/p&gt;</description></item><item><title>Data Science Interview Notes (Full-Stack Roadmap)</title><link>https://thearunmurali.com/post/2026/01/10/data-science-interview-notes-full-stack-roadmap/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/data-science-interview-notes-full-stack-roadmap/</guid><description>&lt;h1 id="data-science-interview-notes-full-stack-roadmap"&gt;Data Science Interview Notes (Full-Stack Roadmap)&lt;/h1&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;How to use this site&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;
&lt;li&gt;🟪 &lt;strong&gt;1-minute Summary&lt;/strong&gt; = skim mode&lt;/li&gt;
&lt;li&gt;🟦 &lt;strong&gt;Core Notes&lt;/strong&gt; = must-know&lt;/li&gt;
&lt;li&gt;🟨 &lt;strong&gt;Interview Triggers&lt;/strong&gt; = what interviewers really test&lt;/li&gt;
&lt;li&gt;🟥 &lt;strong&gt;Common Mistakes&lt;/strong&gt; = traps&lt;/li&gt;
&lt;li&gt;🟩 &lt;strong&gt;Mini Example&lt;/strong&gt; = quick application&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="0-start-here-read-first"&gt;0) Start Here (Read First)&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/start-here/"&gt;How to study this blog for interviews&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/lesson-template/"&gt;Reusable lesson template&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/cheatsheets/"&gt;Cheat sheets index&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="a-statistics-foundations"&gt;A) Statistics Foundations&lt;/h2&gt;
&lt;h3 id="a1-statistics-basics"&gt;A1: Statistics Basics&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/descriptive-vs-inferential/"&gt;Descriptive vs Inferential Statistics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/central-tendency/"&gt;Mean / Median / Mode&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/spread-dispersion/"&gt;Range / IQR / Variance / Standard Deviation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="a2-probability"&gt;A2: Probability&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/probability-fundamentals/"&gt;Probability Fundamentals&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/probability-types/"&gt;Types of Probability&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="a3-random-variables--distributions"&gt;A3: Random Variables &amp;amp; Distributions&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/random-variables/"&gt;Random Variables (Discrete vs Continuous)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/distributions-overview/"&gt;Distributions Overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/normal-distribution/"&gt;Normal Distribution&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/standard-normal-distribution/"&gt;Standard Normal Distribution (Z-score)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="b-statistical-inference--hypothesis-testing"&gt;B) Statistical Inference &amp;amp; Hypothesis Testing&lt;/h2&gt;
&lt;h3 id="b1-hypothesis-testing-core-master-template"&gt;B1: Hypothesis Testing Core (Master Template)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/hypothesis-testing-framework/"&gt;Hypothesis Testing: General Step-by-Step Framework&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="b2-test-families-each-page-repeats-the-same-framework"&gt;B2: Test Families (Each page repeats the same framework)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/chi-square-tests/"&gt;Chi-Square Tests&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/t-test-z-test/"&gt;t-test / z-test&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/f-test/"&gt;F-test&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/anova/"&gt;ANOVA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/standard-error/"&gt;Standard Error (supporting concept)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="c-eda--data-preparation"&gt;C) EDA &amp;amp; Data Preparation&lt;/h2&gt;
&lt;h3 id="c1-eda-workflow"&gt;C1: EDA Workflow&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/eda-general-steps/"&gt;EDA: General Steps (Master Checklist)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/eda-conclusions/"&gt;Draw General Conclusions from Data&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="c2-data-cleaning-modules"&gt;C2: Data Cleaning Modules&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/missing-values/"&gt;Null / Missing Value Treatment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/duplicates/"&gt;Duplicate Treatment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/outliers/"&gt;Outlier Treatment&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="d-machine-learning-core"&gt;D) Machine Learning Core&lt;/h2&gt;
&lt;h3 id="d1-unsupervised-learning-clustering"&gt;D1: Unsupervised Learning (Clustering)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/clustering-overview/"&gt;Clustering Overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/dendrogram/"&gt;Dendrogram / Hierarchical Clustering&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/kmeans/"&gt;K-Means Clustering&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="d2-supervised-learning-prediction"&gt;D2: Supervised Learning (Prediction)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/supervised-learning-overview/"&gt;Supervised Learning Overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/linear-regression/"&gt;Linear Regression&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/logistic-regression/"&gt;Logistic Regression&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/knn/"&gt;K-Nearest Neighbors (KNN)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/decision-tree/"&gt;Decision Tree&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="e-model-evaluation--model-selection"&gt;E) Model Evaluation &amp;amp; Model Selection&lt;/h2&gt;
&lt;h3 id="e1-regression-evaluation"&gt;E1: Regression Evaluation&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/regression-metrics-overview/"&gt;Regression Metrics Overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/rmse/"&gt;RMSE&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/mape/"&gt;MAPE&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/linear-regression-score/"&gt;Linear Regression Score (R² etc.)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="e2-classification-evaluation"&gt;E2: Classification Evaluation&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/confusion-matrix/"&gt;Confusion Matrix&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/classification-report/"&gt;Classification Report (Precision/Recall/F1)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/accuracy/"&gt;Accuracy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/precision/"&gt;Precision&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/recall/"&gt;Recall&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/fpr/"&gt;FPR (False Positive Rate)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/roc-auc/"&gt;ROC Curve + AUC&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="e3-model-selection-workflows"&gt;E3: Model Selection Workflows&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/cross-validation/"&gt;Cross Validation (CV) Overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/grid-search-cv/"&gt;Grid Search CV&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="f-generalization-regularization-and-fit"&gt;F) Generalization, Regularization, and Fit&lt;/h2&gt;
&lt;h3 id="f1-fit--generalization"&gt;F1: Fit &amp;amp; Generalization&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/overfitting/"&gt;Overfitting&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/underfitting/"&gt;Underfitting&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="f2-regularization-linear-models"&gt;F2: Regularization (Linear Models)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/regularization-overview/"&gt;What is Regularization?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/ridge-regression/"&gt;Ridge Regression&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/lasso-regression/"&gt;Lasso Regression&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="g-feature-engineering--non-linear-modeling"&gt;G) Feature Engineering &amp;amp; Non-Linear Modeling&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/non-linear-modeling-overview/"&gt;Non-Linear Modeling Overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/polynomial-features/"&gt;Polynomial Features&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="h-imbalanced-data-toolkit"&gt;H) Imbalanced Data Toolkit&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/imbalanced-data-overview/"&gt;Imbalanced Data Overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/undersampling/"&gt;Undersampling&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="i-ensemble-methods"&gt;I) Ensemble Methods&lt;/h2&gt;
&lt;h3 id="i1-ensemble-overview"&gt;I1: Ensemble Overview&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/ensemble-methods-overview/"&gt;Ensemble Methods Overview&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="i2-bagging--forests"&gt;I2: Bagging &amp;amp; Forests&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/random-forest/"&gt;Random Forest&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="i3-boosting-family"&gt;I3: Boosting Family&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/adaboost/"&gt;AdaBoost&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/gradient-boosting/"&gt;Gradient Boosting&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thearunmurali.com/post/2026/01/10/xgboost/"&gt;XGBoost&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h1 id="suggested-reading-paths"&gt;Suggested Reading Paths&lt;/h1&gt;
&lt;h2 id="path-1--interview-sprint-fast-catch-up"&gt;Path 1 — Interview Sprint (fast catch-up)&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;A1 → A3 (Stats + distributions)&lt;/li&gt;
&lt;li&gt;B1 (Hypothesis testing steps)&lt;/li&gt;
&lt;li&gt;C1 + C2 (EDA workflow + cleaning)&lt;/li&gt;
&lt;li&gt;D2 (Linear + Logistic + Trees)&lt;/li&gt;
&lt;li&gt;E2 (Confusion matrix + ROC/AUC)&lt;/li&gt;
&lt;li&gt;F1 + F2 (Over/Under + regularization)&lt;/li&gt;
&lt;li&gt;E3 (CV + GridSearch)&lt;/li&gt;
&lt;li&gt;I (Ensembles)&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="path-2--deep-study-build-mastery"&gt;Path 2 — Deep Study (build mastery)&lt;/h2&gt;
&lt;p&gt;Follow pillars A → I in order and do the mini example + interview questions on every page.&lt;/p&gt;</description></item><item><title>Decision Tree</title><link>https://thearunmurali.com/post/2026/01/10/decision-tree/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/decision-tree/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Decision trees make predictions by learning decision rules from features, creating a tree structure of if-then conditions. Splits data recursively to maximize purity (using Gini or entropy). Pros: highly interpretable, handles non-linear relationships, no scaling needed. Cons: prone to overfitting, unstable (small data changes = big tree changes). Control depth to prevent overfitting.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="how-decision-trees-work"&gt;How Decision Trees Work&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="splitting-criteria"&gt;Splitting Criteria&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Dendrogram and Hierarchical Clustering</title><link>https://thearunmurali.com/post/2026/01/10/dendrogram-and-hierarchical-clustering/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/dendrogram-and-hierarchical-clustering/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Hierarchical clustering builds a tree (dendrogram) showing how data points group together at different similarity levels. Two types: Agglomerative (bottom-up: merge) and Divisive (top-down: split). Advantage: don&amp;rsquo;t need to specify K beforehand, dendrogram visualizes structure. Disadvantage: slow for large datasets. Cut the dendrogram at desired height to get clusters.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-is-hierarchical-clustering"&gt;What is Hierarchical Clustering?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="agglomerative-vs-divisive"&gt;Agglomerative vs Divisive&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Descriptive vs Inferential Statistics</title><link>https://thearunmurali.com/post/2026/01/10/descriptive-vs-inferential-statistics/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/descriptive-vs-inferential-statistics/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Descriptive statistics &lt;strong&gt;summarize and describe&lt;/strong&gt; the features of a dataset (mean, median, charts), while inferential statistics &lt;strong&gt;make predictions or inferences&lt;/strong&gt; about a population based on a sample (hypothesis testing, confidence intervals). Descriptive = &amp;ldquo;What happened?&amp;rdquo; | Inferential = &amp;ldquo;What does this mean for the bigger picture?&amp;rdquo;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-are-descriptive-statistics"&gt;What are Descriptive Statistics?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="what-are-inferential-statistics"&gt;What are Inferential Statistics?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Drawing General Conclusions from Data (EDA)</title><link>https://thearunmurali.com/post/2026/01/10/drawing-general-conclusions-from-data-eda/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/drawing-general-conclusions-from-data-eda/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;After completing EDA, you must synthesize findings into conclusions that guide modeling decisions. Key outputs: (1) Data quality assessment, (2) Feature insights (which matter, which don&amp;rsquo;t), (3) Recommended transformations, (4) Potential model approaches, (5) Known limitations. Good conclusions bridge EDA and modeling.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-to-conclude-from-eda"&gt;What to Conclude From EDA&lt;/h3&gt;
&lt;h4 id="data-quality-summary"&gt;Data Quality Summary&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h4 id="feature-insights"&gt;Feature Insights&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h4 id="distribution-patterns"&gt;Distribution Patterns&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Duplicate Treatment</title><link>https://thearunmurali.com/post/2026/01/10/duplicate-treatment/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/duplicate-treatment/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Duplicates are rows that appear multiple times in a dataset. Types: (1) Exact duplicates (all columns identical), (2) Partial duplicates (key columns identical). Detection: &lt;code&gt;df.duplicated()&lt;/code&gt;. Treatment depends on context: remove if errors, keep if legitimate (e.g., multiple purchases by same customer). Always investigate before blindly dropping.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="types-of-duplicates"&gt;Types of Duplicates&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="why-duplicates-occur"&gt;Why Duplicates Occur&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="detection-methods"&gt;Detection Methods&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>EDA General Steps - Master Checklist</title><link>https://thearunmurali.com/post/2026/01/10/eda-general-steps-master-checklist/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/eda-general-steps-master-checklist/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;EDA (Exploratory Data Analysis) is the systematic examination of data before modeling. Standard workflow: (1) Load and understand structure, (2) Check data types and memory, (3) Identify missing values, (4) Detect duplicates, (5) Find outliers, (6) Analyze distributions (univariate), (7) Explore relationships (bivariate/multivariate), (8) Document findings. EDA informs cleaning, feature engineering, and model selection.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="the-eda-checklist"&gt;The EDA Checklist&lt;/h3&gt;
&lt;h4 id="step-1-load-and-understand-structure"&gt;Step 1: Load and Understand Structure&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Ensemble Methods Overview</title><link>https://thearunmurali.com/post/2026/01/10/ensemble-methods-overview/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/ensemble-methods-overview/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Ensemble methods combine multiple models to improve performance. Two main types: (1) Bagging (parallel, reduce variance) - Random Forest, (2) Boosting (sequential, reduce bias) - AdaBoost, Gradient Boosting, XGBoost. Generally outperform single models. Trade-off: better performance but less interpretable, slower, more complex.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-are-ensemble-methods"&gt;What are Ensemble Methods?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="bagging-vs-boosting"&gt;Bagging vs Boosting&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="common-ensemble-algorithms"&gt;Common Ensemble Algorithms&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>F-test - Comparing Variances</title><link>https://thearunmurali.com/post/2026/01/10/f-test-comparing-variances/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/f-test-comparing-variances/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;The F-test compares variances between two groups to determine if they&amp;rsquo;re significantly different. It&amp;rsquo;s the ratio of two variances: F = variance₁ / variance₂. Commonly used to test assumptions before t-tests (equal variance assumption) and as the foundation for ANOVA. F-distribution is right-skewed and always positive.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-is-an-f-test"&gt;What is an F-test?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="f-statistic-formula"&gt;F-statistic Formula&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="f-distribution"&gt;F-distribution&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>FPR (False Positive Rate)</title><link>https://thearunmurali.com/post/2026/01/10/fpr-false-positive-rate/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/fpr-false-positive-rate/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;FPR (False Positive Rate) measures &amp;ldquo;of all actual negatives, how many did we incorrectly predict as positive?&amp;rdquo; Formula: FP / (FP + TN). Also called &amp;ldquo;fall-out&amp;rdquo;. Used in ROC curves (FPR on x-axis, TPR/Recall on y-axis). Lower FPR is better. Complement of specificity (Specificity = 1 - FPR = TN / (TN + FP)).&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="formula"&gt;Formula&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="interpretation"&gt;Interpretation&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Gradient Boosting</title><link>https://thearunmurali.com/post/2026/01/10/gradient-boosting/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/gradient-boosting/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Gradient Boosting builds models sequentially, each trying to correct errors (residuals) of the previous ensemble. Uses gradient descent to minimize loss function. More flexible than AdaBoost (works for regression too). Hyperparameters: learning rate (shrinkage), n_estimators, max_depth. Pros: state-of-the-art performance. Cons: prone to overfitting, slow training, many hyperparameters to tune.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="how-gradient-boosting-works"&gt;How Gradient Boosting Works&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="the-algorithm"&gt;The Algorithm&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Grid Search CV (Hyperparameter Tuning)</title><link>https://thearunmurali.com/post/2026/01/10/grid-search-cv-hyperparameter-tuning/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/grid-search-cv-hyperparameter-tuning/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Grid Search CV exhaustively tries all combinations of hyperparameters you specify, using cross-validation to evaluate each combination. Returns the best parameters and best score. Automates hyperparameter tuning. Pros: thorough, easy to use. Cons: computationally expensive (exponential with parameters). Alternative: RandomizedSearchCV for faster search.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-is-grid-search"&gt;What is Grid Search?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="how-it-works"&gt;How It Works&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="parameters-vs-hyperparameters"&gt;Parameters vs Hyperparameters&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>How to Study This Blog for Data Science Interviews</title><link>https://thearunmurali.com/post/2026/01/10/how-to-study-this-blog-for-data-science-interviews/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/how-to-study-this-blog-for-data-science-interviews/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;This blog is designed as a &lt;strong&gt;structured interview prep system&lt;/strong&gt;, not a traditional textbook. Each lesson follows a 5-part color-coded template: Summary (skim) → Core Notes (must-know) → Interview Triggers (what&amp;rsquo;s tested) → Common Mistakes (traps) → Mini Example (application). Use tags and reading paths to navigate based on your timeline.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-study-strategies"&gt;🟦 Core Study Strategies&lt;/h2&gt;
&lt;h3 id="strategy-1-sprint-mode-1-2-weeks-before-interview"&gt;Strategy 1: Sprint Mode (1-2 weeks before interview)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Follow &lt;strong&gt;Path 1&lt;/strong&gt; from the &lt;a href="https://thearunmurali.com/post/2026/01/10/data-science-roadmap/"&gt;roadmap&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Focus on 🟪 Summaries and 🟨 Interview Triggers only&lt;/li&gt;
&lt;li&gt;Skip deep dives; prioritize breadth over depth&lt;/li&gt;
&lt;li&gt;Review all 🟥 Common Mistakes sections&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="strategy-2-deep-study-mode-1-3-months-prep"&gt;Strategy 2: Deep Study Mode (1-3 months prep)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Follow &lt;strong&gt;Path 2&lt;/strong&gt; from the &lt;a href="https://thearunmurali.com/post/2026/01/10/data-science-roadmap/"&gt;roadmap&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Read every section in sequence (A → I)&lt;/li&gt;
&lt;li&gt;Complete all 🟩 Mini Examples with actual code&lt;/li&gt;
&lt;li&gt;Create flashcards from 🟦 Core Notes&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="strategy-3-topic-specific-review"&gt;Strategy 3: Topic-Specific Review&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;tags&lt;/strong&gt; to find related content:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;statistics&lt;/code&gt; - Statistical foundations&lt;/li&gt;
&lt;li&gt;&lt;code&gt;hypothesis-testing&lt;/code&gt; - All hypothesis tests&lt;/li&gt;
&lt;li&gt;&lt;code&gt;eda&lt;/code&gt; - Exploratory data analysis&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ml-supervised&lt;/code&gt; / &lt;code&gt;ml-clustering&lt;/code&gt; - Machine learning&lt;/li&gt;
&lt;li&gt;&lt;code&gt;evaluation&lt;/code&gt; - Model metrics&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ensembles&lt;/code&gt; - Ensemble methods&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Search by category: &amp;ldquo;Data Science&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="-interview-triggers-when-to-use-this-blog"&gt;🟨 Interview Triggers (When to Use This Blog)&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Use this blog when you need to:&lt;/strong&gt;&lt;/p&gt;</description></item><item><title>Hypothesis Testing - General Step-by-Step Framework</title><link>https://thearunmurali.com/post/2026/01/10/hypothesis-testing-general-step-by-step-framework/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/hypothesis-testing-general-step-by-step-framework/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Hypothesis testing is a systematic way to determine if observed data provides enough evidence to reject a claim (null hypothesis). The universal framework: (1) State hypotheses (H₀ and H₁), (2) Choose significance level (α), (3) Calculate test statistic, (4) Find p-value or critical value, (5) Make decision (reject or fail to reject H₀), (6) Interpret in context. This same structure applies to all tests.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="the-6-step-framework"&gt;The 6-Step Framework&lt;/h3&gt;
&lt;h4 id="step-1-state-the-hypotheses"&gt;Step 1: State the Hypotheses&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Imbalanced Data Overview</title><link>https://thearunmurali.com/post/2026/01/10/imbalanced-data-overview/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/imbalanced-data-overview/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Imbalanced data occurs when classes have very different frequencies (e.g., 95% no-fraud, 5% fraud). Problems: model biased toward majority class, accuracy misleading. Solutions: (1) Resampling (over/undersample), (2) Different metrics (precision/recall/F1, not accuracy), (3) Class weights, (4) Anomaly detection. Choose based on data size and importance of minority class.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-is-imbalanced-data"&gt;What is Imbalanced Data?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="why-its-a-problem"&gt;Why It&amp;rsquo;s a Problem&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>K-Means Clustering</title><link>https://thearunmurali.com/post/2026/01/10/k-means-clustering/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/k-means-clustering/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;K-Means partitions data into K clusters by minimizing within-cluster variance. Algorithm: (1) Initialize K centroids randomly, (2) Assign points to nearest centroid, (3) Update centroids, (4) Repeat until convergence. Choose K using elbow method or silhouette score. Pros: fast, scalable. Cons: need to specify K, assumes spherical clusters, sensitive to initialization and outliers.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="how-k-means-works"&gt;How K-Means Works&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="algorithm-steps"&gt;Algorithm Steps&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>K-Nearest Neighbors (KNN)</title><link>https://thearunmurali.com/post/2026/01/10/k-nearest-neighbors-knn/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/k-nearest-neighbors-knn/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;KNN classifies a data point by looking at the K nearest neighbors and taking a majority vote (classification) or average (regression). It&amp;rsquo;s a &amp;ldquo;lazy learner&amp;rdquo; - no training phase, just stores data. Pros: simple, no assumptions, works for multi-class. Cons: slow prediction, sensitive to scale and irrelevant features, needs optimal K. Always scale features first!&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="how-knn-works"&gt;How KNN Works&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Lasso Regression (L1 Regularization)</title><link>https://thearunmurali.com/post/2026/01/10/lasso-regression-l1-regularization/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/lasso-regression-l1-regularization/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Lasso Regression adds L1 penalty (sum of absolute coefficients) to linear regression. Can shrink coefficients to EXACTLY zero, performing automatic feature selection. Hyperparameter α controls strength. Use when you have many features and want to identify the important ones. Sparse solutions make model more interpretable. Must scale features first.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="how-lasso-works"&gt;How Lasso Works&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="formula"&gt;Formula&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Linear Regression</title><link>https://thearunmurali.com/post/2026/01/10/linear-regression/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/linear-regression/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Linear regression models the relationship between independent variables (X) and a continuous dependent variable (y) using a straight line: y = β₀ + β₁x₁ + &amp;hellip; + βₙxₙ. Finds coefficients that minimize error (typically using least squares). Assumptions: linearity, independence, homoscedasticity, normality of residuals. Evaluate with R², RMSE, MAE. Simple but powerful baseline model.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-is-linear-regression"&gt;What is Linear Regression?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Linear Regression Score (R² and Adjusted R²)</title><link>https://thearunmurali.com/post/2026/01/10/linear-regression-score-r-and-adjusted-r/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/linear-regression-score-r-and-adjusted-r/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;R² (coefficient of determination) measures the proportion of variance in the target explained by the model. Range: -∞ to 1 (1 = perfect fit, 0 = model is no better than mean, negative = worse than mean). Formula: 1 - (SS_residual / SS_total). Adjusted R² penalizes adding useless features. Use for model comparison, but not as the only metric.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-is-r"&gt;What is R²?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Logistic Regression</title><link>https://thearunmurali.com/post/2026/01/10/logistic-regression/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/logistic-regression/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Logistic regression predicts binary outcomes (0/1, yes/no) by estimating probabilities using the sigmoid function. Despite the name, it&amp;rsquo;s CLASSIFICATION, not regression. Outputs probability (0 to 1), use threshold (default 0.5) to make final decision. Pros: interpretable, outputs probabilities, works well for linearly separable data. Evaluate with accuracy, precision, recall, ROC-AUC.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-is-logistic-regression"&gt;What is Logistic Regression?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="sigmoid-function"&gt;Sigmoid Function&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>MAPE (Mean Absolute Percentage Error)</title><link>https://thearunmurali.com/post/2026/01/10/mape-mean-absolute-percentage-error/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/mape-mean-absolute-percentage-error/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;MAPE expresses error as a percentage of actual values: (100/n) * Σ|actual - predicted|/|actual|. Useful when relative error matters more than absolute error. Pros: scale-independent, interpretable. Cons: undefined when actual = 0, asymmetric (over-predictions penalized less), biased toward low forecasts. Use for business metrics (sales, revenue) where % error is meaningful.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="formula"&gt;Formula&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="interpretation"&gt;Interpretation&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Mean, Median, and Mode - Measures of Central Tendency</title><link>https://thearunmurali.com/post/2026/01/10/mean-median-and-mode-measures-of-central-tendency/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/mean-median-and-mode-measures-of-central-tendency/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Mean (average), median (middle value), and mode (most frequent) are the three ways to describe where the &amp;ldquo;center&amp;rdquo; of your data lies. Mean is sensitive to outliers, median is robust to them, and mode works for categorical data. In interviews, you&amp;rsquo;ll be asked when to use each.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="mean-average"&gt;Mean (Average)&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="median-middle-value"&gt;Median (Middle Value)&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="mode-most-frequent"&gt;Mode (Most Frequent)&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Non-Linear Modeling Overview</title><link>https://thearunmurali.com/post/2026/01/10/non-linear-modeling-overview/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/non-linear-modeling-overview/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Not all relationships are linear. Non-linear modeling captures curves and interactions. Options: (1) Polynomial features (x, x², x³), (2) Interaction terms (x₁*x₂), (3) Non-linear algorithms (trees, neural nets), (4) Transformations (log, sqrt). Still use linear regression with polynomial features - it&amp;rsquo;s linear in coefficients, not features.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="when-linear-models-fail"&gt;When Linear Models Fail&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="approaches-to-non-linearity"&gt;Approaches to Non-Linearity&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Normal Distribution (Gaussian Distribution)</title><link>https://thearunmurali.com/post/2026/01/10/normal-distribution-gaussian-distribution/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/normal-distribution-gaussian-distribution/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;The normal distribution (bell curve) is a symmetric, continuous probability distribution defined by its mean (μ) and standard deviation (σ). It&amp;rsquo;s the foundation of many statistical methods because many natural phenomena approximate it, and the Central Limit Theorem says sample means tend toward normality. The 68-95-99.7 rule describes how data spreads.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-is-the-normal-distribution"&gt;What is the Normal Distribution?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="key-properties"&gt;Key Properties&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Null and Missing Value Treatment</title><link>https://thearunmurali.com/post/2026/01/10/null-and-missing-value-treatment/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/null-and-missing-value-treatment/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Missing values are inevitable in real datasets. Treatment options: (1) Drop (if &amp;lt; 5% missing or MCAR), (2) Impute (mean/median/mode for numerical, mode for categorical, or advanced methods like KNN/MICE), (3) Create missing indicator (if missingness is informative). Choice depends on missingness mechanism (MCAR, MAR, MNAR) and percentage missing.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="types-of-missingness"&gt;Types of Missingness&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;MCAR (Missing Completely At Random)&lt;/li&gt;
&lt;li&gt;MAR (Missing At Random)&lt;/li&gt;
&lt;li&gt;MNAR (Missing Not At Random)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="detection-strategies"&gt;Detection Strategies&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Outlier Treatment</title><link>https://thearunmurali.com/post/2026/01/10/outlier-treatment/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/outlier-treatment/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Outliers are data points significantly different from others. Detection methods: (1) IQR method (Q3 + 1.5*IQR), (2) Z-score (&amp;gt;3 or &amp;lt;-3), (3) Visual inspection (box plots, scatter plots). Treatment: (1) Remove if errors, (2) Cap/floor (winsorization), (3) Transform (log), (4) Keep if legitimate. NEVER blindly remove - investigate first!&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-are-outliers"&gt;What are Outliers?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="types-of-outliers"&gt;Types of Outliers&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Overfitting</title><link>https://thearunmurali.com/post/2026/01/10/overfitting/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/overfitting/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Overfitting occurs when a model learns the training data too well, including noise and outliers, performing poorly on new data. Signs: high training accuracy, low validation/test accuracy. Causes: model too complex, too little data, training too long. Solutions: regularization, more data, simpler model, cross-validation, early stopping, dropout (neural nets).&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-is-overfitting"&gt;What is Overfitting?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="how-to-detect-overfitting"&gt;How to Detect Overfitting&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Polynomial Features</title><link>https://thearunmurali.com/post/2026/01/10/polynomial-features/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/polynomial-features/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Polynomial features transform input features into higher-degree terms (x → x, x²) and interactions (x₁, x₂ → x₁, x₂, x₁², x₂², x₁x₂). Allows linear regression to fit curves. Degree 2 = quadratic, degree 3 = cubic. Warning: features grow exponentially (2 features, degree 3 = 9 features). Use regularization to prevent overfitting. Visualize first to choose appropriate degree.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-are-polynomial-features"&gt;What are Polynomial Features?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Precision</title><link>https://thearunmurali.com/post/2026/01/10/precision/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/precision/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Precision measures &amp;ldquo;of all predicted positives, how many were actually positive?&amp;rdquo; Formula: TP / (TP + FP). High precision means low false alarm rate. Use when false positives are costly (e.g., spam filter marking important emails as spam, recommending irrelevant products). Trade-off with recall: being more selective (higher precision) means catching fewer positives (lower recall).&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="formula"&gt;Formula&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="interpretation"&gt;Interpretation&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Probability Distributions Overview</title><link>https://thearunmurali.com/post/2026/01/10/probability-distributions-overview/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/probability-distributions-overview/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;A probability distribution describes how likely different outcomes are for a random variable. Common discrete distributions include binomial (yes/no trials) and Poisson (rare events). Common continuous distributions include normal (bell curve) and uniform (equal probability). Choosing the right distribution is crucial for modeling and hypothesis testing.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-is-a-probability-distribution"&gt;What is a Probability Distribution?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="common-discrete-distributions"&gt;Common Discrete Distributions&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Probability Fundamentals</title><link>https://thearunmurali.com/post/2026/01/10/probability-fundamentals/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/probability-fundamentals/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Probability measures the likelihood of an event occurring, ranging from 0 (impossible) to 1 (certain). Key concepts include sample space, events, independent vs dependent events, and conditional probability. Understanding probability is foundational for hypothesis testing, Bayes theorem, and machine learning.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="basic-definitions"&gt;Basic Definitions&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="sample-space-and-events"&gt;Sample Space and Events&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="probability-rules"&gt;Probability Rules&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Random Forest</title><link>https://thearunmurali.com/post/2026/01/10/random-forest/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/random-forest/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Random Forest builds multiple decision trees on random subsets of data and features, then averages predictions (regression) or votes (classification). Bagging reduces variance and overfitting. Pros: high accuracy, handles non-linearity, robust to outliers, feature importance. Cons: less interpretable, slower, memory intensive. Often a go-to algorithm for tabular data.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="how-random-forest-works"&gt;How Random Forest Works&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="key-hyperparameters"&gt;Key Hyperparameters&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Random Variables - Discrete vs Continuous</title><link>https://thearunmurali.com/post/2026/01/10/random-variables-discrete-vs-continuous/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/random-variables-discrete-vs-continuous/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;A random variable is a variable whose values are determined by chance. Discrete random variables take countable values (e.g., number of customers, dice rolls), while continuous random variables can take any value within a range (e.g., height, temperature). This distinction determines which probability distributions and statistical methods you use.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-is-a-random-variable"&gt;What is a Random Variable?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="discrete-random-variables"&gt;Discrete Random Variables&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Range, IQR, Variance, and Standard Deviation - Measures of Spread</title><link>https://thearunmurali.com/post/2026/01/10/range-iqr-variance-and-standard-deviation-measures-of-spread/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/range-iqr-variance-and-standard-deviation-measures-of-spread/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Measures of spread tell you how dispersed your data is. Range (max - min) is simple but sensitive to outliers. IQR (interquartile range) is robust. Variance measures average squared deviation from the mean. Standard deviation is the square root of variance and shares the same units as your data, making it most interpretable.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="range"&gt;Range&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="interquartile-range-iqr"&gt;Interquartile Range (IQR)&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Recall (Sensitivity, True Positive Rate)</title><link>https://thearunmurali.com/post/2026/01/10/recall-sensitivity-true-positive-rate/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/recall-sensitivity-true-positive-rate/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Recall measures &amp;ldquo;of all actual positives, how many did we find?&amp;rdquo; Formula: TP / (TP + FN). High recall means low miss rate. Use when false negatives are costly (e.g., disease detection - missing a sick patient is worse than false alarm). Trade-off with precision: being less selective (higher recall) means more false alarms (lower precision).&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="formula"&gt;Formula&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="interpretation"&gt;Interpretation&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Regression Metrics Overview</title><link>https://thearunmurali.com/post/2026/01/10/regression-metrics-overview/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/regression-metrics-overview/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Regression metrics measure how well your model predicts continuous values. Common metrics: RMSE (penalizes large errors, same units), MAE (average absolute error, robust to outliers), MAPE (percentage error), R² (variance explained, 0-1). Choose based on context: RMSE for penalizing large errors, MAE for balanced view, MAPE for relative error, R² for overall fit.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="common-regression-metrics"&gt;Common Regression Metrics&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="when-to-use-each-metric"&gt;When to Use Each Metric&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Regularization Overview</title><link>https://thearunmurali.com/post/2026/01/10/regularization-overview/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/regularization-overview/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Regularization adds a penalty term to the loss function to discourage complex models and prevent overfitting. Main types: L1 (Lasso) adds |coefficient| penalty, L2 (Ridge) adds coefficient² penalty. L1 can zero out coefficients (feature selection), L2 shrinks all coefficients. Elastic Net combines both. Hyperparameter λ controls strength.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-is-regularization"&gt;What is Regularization?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="why-regularization-works"&gt;Why Regularization Works&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Reusable Lesson Template (Data Science Interview Prep)</title><link>https://thearunmurali.com/post/2026/01/10/reusable-lesson-template-data-science-interview-prep/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/reusable-lesson-template-data-science-interview-prep/</guid><description>&lt;h1 id="topic-name-here"&gt;[Topic Name Here]&lt;/h1&gt;
&lt;hr&gt;
&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;A 2-3 sentence explanation that captures the absolute essence. If you only read this section, you&amp;rsquo;d know enough to recognize when the topic is mentioned in an interview.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Example&lt;/strong&gt;: &amp;ldquo;Linear Regression models the relationship between a dependent variable and one or more independent variables using a straight line. The goal is to find the best-fit line that minimizes prediction errors. It&amp;rsquo;s used when you need to predict continuous numerical values.&amp;rdquo;&lt;/p&gt;</description></item><item><title>Ridge Regression (L2 Regularization)</title><link>https://thearunmurali.com/post/2026/01/10/ridge-regression-l2-regularization/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/ridge-regression-l2-regularization/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Ridge Regression adds L2 penalty (sum of squared coefficients) to linear regression. Shrinks all coefficients toward zero but never exactly zero. Good for multicollinearity. Hyperparameter α controls strength (higher α = more regularization). Must scale features first. Reduces variance at cost of slight bias. Use when you want to keep all features but reduce overfitting.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="how-ridge-works"&gt;How Ridge Works&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>RMSE (Root Mean Squared Error)</title><link>https://thearunmurali.com/post/2026/01/10/rmse-root-mean-squared-error/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/rmse-root-mean-squared-error/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;RMSE measures the average magnitude of prediction errors, penalizing large errors more heavily due to squaring. Formula: √(Σ(actual - predicted)² / n). Same units as target variable. Lower is better. Use when large errors are particularly bad (e.g., price prediction). More sensitive to outliers than MAE.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="formula"&gt;Formula&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="interpretation"&gt;Interpretation&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="when-to-use-rmse"&gt;When to Use RMSE&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>ROC Curve and AUC (Area Under the Curve)</title><link>https://thearunmurali.com/post/2026/01/10/roc-curve-and-auc-area-under-the-curve/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/roc-curve-and-auc-area-under-the-curve/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;ROC (Receiver Operating Characteristic) curve plots TPR (recall) vs FPR at all classification thresholds. AUC (Area Under Curve) summarizes ROC in one number (0 to 1). AUC = 1 is perfect, 0.5 is random guessing. Use to evaluate model&amp;rsquo;s ability to distinguish classes across all thresholds, independent of class distribution. Better than accuracy for imbalanced data.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-is-roc-curve"&gt;What is ROC Curve?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Standard Error (SE)</title><link>https://thearunmurali.com/post/2026/01/10/standard-error-se/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/standard-error-se/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Standard Error (SE) measures the variability of a sample statistic (like the sample mean). Formula: SE = σ / √n. It gets smaller as sample size increases. SE is crucial for calculating confidence intervals and test statistics. Don&amp;rsquo;t confuse with standard deviation: SD describes data spread, SE describes estimate precision.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-is-standard-error"&gt;What is Standard Error?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="formula"&gt;Formula&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Standard Normal Distribution and Z-Scores</title><link>https://thearunmurali.com/post/2026/01/10/standard-normal-distribution-and-z-scores/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/standard-normal-distribution-and-z-scores/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;The standard normal distribution is a special case of the normal distribution with mean = 0 and standard deviation = 1. Z-scores transform any normal distribution to this standard form, allowing you to compare values from different distributions and look up probabilities in standard tables. Formula: z = (x - μ) / σ.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-is-the-standard-normal-distribution"&gt;What is the Standard Normal Distribution?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Supervised Learning Overview</title><link>https://thearunmurali.com/post/2026/01/10/supervised-learning-overview/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/supervised-learning-overview/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Supervised learning trains models on labeled data (input-output pairs) to predict outcomes for new data. Two types: Regression (continuous target: price, temperature) and Classification (categorical target: yes/no, categories). Process: train on labeled data → validate → test on unseen data. Success requires good features, sufficient data, and appropriate algorithm selection.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-is-supervised-learning"&gt;What is Supervised Learning?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="regression-vs-classification"&gt;Regression vs Classification&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>t-test and z-test - Comparing Group Means</title><link>https://thearunmurali.com/post/2026/01/10/t-test-and-z-test-comparing-group-means/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/t-test-and-z-test-comparing-group-means/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;t-tests and z-tests compare means between groups or against a known value. Use z-test when you know the population standard deviation and have a large sample (n &amp;gt; 30). Use t-test when you don&amp;rsquo;t know population σ or have small samples. Common types: one-sample, two-sample (independent), and paired t-tests.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="when-to-use-t-test-vs-z-test"&gt;When to Use t-test vs z-test&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="types-of-t-tests"&gt;Types of t-tests&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Types of Probability</title><link>https://thearunmurali.com/post/2026/01/10/types-of-probability/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/types-of-probability/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;There are three main types of probability: Classical (theoretical, based on equally likely outcomes), Empirical (based on observed data/experiments), and Subjective (based on judgment/belief). Data scientists primarily use empirical probability when working with real-world data.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="classical-probability"&gt;Classical Probability&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="empirical-probability"&gt;Empirical Probability&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="subjective-probability"&gt;Subjective Probability&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="when-to-use-each"&gt;When to Use Each&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Underfitting</title><link>https://thearunmurali.com/post/2026/01/10/underfitting/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/underfitting/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Underfitting occurs when a model is too simple to capture underlying patterns in data. Both training and validation performance are poor. Causes: model too simple, insufficient features, over-regularization. Solutions: more complex model, add features, reduce regularization, train longer. Less common than overfitting but equally problematic.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-is-underfitting"&gt;What is Underfitting?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="how-to-detect-underfitting"&gt;How to Detect Underfitting&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="causes-of-underfitting"&gt;Causes of Underfitting&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Undersampling</title><link>https://thearunmurali.com/post/2026/01/10/undersampling/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/undersampling/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;Undersampling reduces the majority class to match minority class size. Simple and fast. Types: random undersampling, Tomek links, NearMiss. Pros: faster training, reduces class imbalance. Cons: loses information, may underfit. Use when you have abundant data and can afford to discard some. Alternative: oversampling (SMOTE) when data is limited.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-is-undersampling"&gt;What is Undersampling?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="types-of-undersampling"&gt;Types of Undersampling&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Work Dashboard</title><link>https://thearunmurali.com/work/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/work/</guid><description>&lt;h1 id="content-status-dashboard"&gt;Content Status Dashboard&lt;/h1&gt;
&lt;p&gt;Enter password to view all posts and their completion status.&lt;/p&gt;</description></item><item><title>XGBoost (Extreme Gradient Boosting)</title><link>https://thearunmurali.com/post/2026/01/10/xgboost-extreme-gradient-boosting/</link><pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2026/01/10/xgboost-extreme-gradient-boosting/</guid><description>&lt;h2 id="-1-minute-summary"&gt;🟪 1-Minute Summary&lt;/h2&gt;
&lt;p&gt;XGBoost is an optimized implementation of gradient boosting with built-in regularization, parallel processing, and tree pruning. Dominates Kaggle competitions. Key features: handles missing values, L1/L2 regularization, early stopping, feature importance. Faster than sklearn&amp;rsquo;s GradientBoosting. Hyperparameters similar to GB but with extras (reg_alpha, reg_lambda). Default choice for structured/tabular data competitions.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-core-notes-must-know"&gt;🟦 Core Notes (Must-Know)&lt;/h2&gt;
&lt;h3 id="what-makes-xgboost-special"&gt;What Makes XGBoost Special?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="key-features"&gt;Key Features&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;[Content to be filled in]&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Schedule a Meeting</title><link>https://thearunmurali.com/schedule/</link><pubDate>Sun, 26 Oct 2025 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/schedule/</guid><description>&lt;h1 id="schedule-a-meeting"&gt;Schedule a Meeting&lt;/h1&gt;
&lt;p&gt;Welcome! Use the calendar below to book a Google Hangout or video call with me. Pick a time that works for you, and you&amp;rsquo;ll receive a confirmation email with the meeting link.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Click the button below to book a meeting:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://calendar.app.google/8ixUg7gZbMrfygpG9" target="_blank" style="display:inline-block;padding:1em 2em;background:#4285F4;color:#fff;font-size:1.2em;border-radius:6px;text-decoration:none;margin:1em 0;"&gt;Book a Meeting&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="my-calendar-booked--upcoming-slots"&gt;My Calendar (Booked &amp;amp; Upcoming Slots)&lt;/h2&gt;
&lt;iframe src="https://calendar.google.com/calendar/embed?src=anmiousdev%40gmail.com&amp;ctz=America%2FLos_Angeles" style="border: 0" width="800" height="600" frameborder="0" scrolling="no"&gt;&lt;/iframe&gt;
&lt;p&gt;If you have any questions or need a different time, please &lt;a href="https://thearunmurali.com/about/"&gt;contact me&lt;/a&gt;.&lt;/p&gt;</description></item><item><title>Setting Up Your Personal AI Playground: OpenWebUI + LiteLLM + Multiple LLM Models</title><link>https://thearunmurali.com/post/2025/09/30/setting-up-your-personal-ai-playground-openwebui--litellm--multiple-llm-models/</link><pubDate>Tue, 30 Sep 2025 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/post/2025/09/30/setting-up-your-personal-ai-playground-openwebui--litellm--multiple-llm-models/</guid><description>&lt;p&gt;&lt;img src="https://thearunmurali.com/images/ai-learning/llm-self-hosted.png" alt="LLM Self Hosted"&gt;&lt;/p&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&amp;ldquo;The best investment you can make is in tools that create leverage for yourself.&amp;rdquo;&lt;/em&gt; - Naval Ravikant&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Have you ever wanted to try out different AI models without paying for multiple subscriptions? Or perhaps share access to these powerful tools with family members without breaking the bank?&lt;/p&gt;
&lt;p&gt;I recently discovered a brilliant solution after watching a NetworkChuck video: using APIs for various LLM (Large Language Model) services and displaying them all in one interface through OpenWebUI. The best part? You can evaluate all these models with just $5 worth of API credits and share access with your entire family!&lt;/p&gt;</description></item><item><title>Arun Murali - Senior Staff Engineer</title><link>https://thearunmurali.com/about/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://thearunmurali.com/about/</guid><description>&lt;h2 id="arun-murali"&gt;Arun Murali&lt;/h2&gt;
&lt;p&gt;I build and scale distributed systems that handle millions of transactions. Currently at Gap Inc, I&amp;rsquo;ve scaled systems 10× in throughput while making them more reliable and faster.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Livermore, CA&lt;/strong&gt; • &lt;a href="mailto:arun.murali@outlook.com"&gt;arun.murali@outlook.com&lt;/a&gt; • &lt;a href="https://linkedin.com/in/arunmurali"&gt;LinkedIn&lt;/a&gt; • &lt;a href="https://github.com/murali-arun"&gt;GitHub&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="what-i-do"&gt;What I Do&lt;/h2&gt;
&lt;p&gt;Backend architecture • Event-driven systems • Performance optimization • Reliability engineering&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Core technologies:&lt;/strong&gt; Java, Spring Boot, Kafka, PostgreSQL, Kubernetes, Python, React&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="gap-inc"&gt;Gap Inc&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;Staff Software Engineer → Senior Staff Engineer, Jul 2018 – Present&lt;/em&gt;&lt;/p&gt;</description></item></channel></rss>