Lab 01: Logistic Regression from ScratchΒΆ
Sources:
Deep Learning Interviews (Shlomo Kashani, 2nd Edition) β Chapter: Logistic Regression
Speech and Language Processing (Jurafsky & Martin) β Chapter 4: Logistic Regression
Topics covered:
Sigmoid function and its properties
Odds, log-odds, and the logit function
Binary cross-entropy loss
Gradient descent for logistic regression
Evaluation metrics (precision, recall, F1, confusion matrix)
Multinomial logistic regression (softmax)
Regularization (L2)
End-to-end pipeline with decision boundary visualization
import numpy as np
import matplotlib.pyplot as plt
from typing import Tuple
%matplotlib inline
plt.rcParams['figure.figsize'] = (8, 5)
plt.rcParams['font.size'] = 12
PART 1: The Sigmoid FunctionΒΆ
Reference: DLI β βThe Sigmoidβ, SLP Ch 4.2
The sigmoid (logistic) function maps any real number to the interval \((0, 1)\):
Key propertiesΒΆ
Property |
Formula |
|---|---|
Value at zero |
\(\sigma(0) = 0.5\) |
Symmetry |
\(\sigma(-z) = 1 - \sigma(z)\) |
Derivative |
\(\frac{d}{dz}\sigma(z) = \sigma(z)\,(1 - \sigma(z))\) |
Range |
\((0, 1)\) β output is always a valid probability |
Monotonicity |
Strictly increasing |
The derivative being expressible purely in terms of \(\sigma(z)\) itself makes backpropagation efficient: once you have the forward-pass value, the gradient is essentially free.
def sigmoid(z: np.ndarray) -> np.ndarray:
"""Compute the sigmoid function element-wise.
Uses np.clip to prevent overflow in np.exp for very large
positive or negative inputs.
"""
z = np.clip(z, -500, 500)
return 1.0 / (1.0 + np.exp(-z))
def sigmoid_derivative(z: np.ndarray) -> np.ndarray:
"""Compute the derivative of the sigmoid: sigma(z) * (1 - sigma(z))."""
s = sigmoid(z)
return s * (1.0 - s)
# Quick sanity check
print(f"sigmoid(0) = {sigmoid(np.array([0.0]))[0]:.4f} (expect 0.5)")
print(f"sigmoid(100) = {sigmoid(np.array([100.0]))[0]:.6f} (expect ~1.0)")
print(f"sigmoid(-100)= {sigmoid(np.array([-100.0]))[0]:.6f} (expect ~0.0)")
# Verify the symmetry property: sigma(-z) = 1 - sigma(z)
z_values = np.array([-5.0, -2.0, -1.0, 0.0, 1.0, 2.0, 5.0])
lhs = sigmoid(-z_values) # sigma(-z)
rhs = 1.0 - sigmoid(z_values) # 1 - sigma(z)
print("Verifying sigma(-z) = 1 - sigma(z)")
print(f"{'z':>6s} | {'sigma(-z)':>12s} | {'1-sigma(z)':>12s} | {'match':>6s}")
print("-" * 48)
for z, l, r in zip(z_values, lhs, rhs):
print(f"{z:6.1f} | {l:12.8f} | {r:12.8f} | {np.isclose(l, r)}")
assert np.allclose(lhs, rhs), "Symmetry property failed!"
print("\nAll checks passed.")
# Plot sigmoid and its derivative on the same axes
z = np.linspace(-10, 10, 300)
fig, ax = plt.subplots(figsize=(9, 5))
ax.plot(z, sigmoid(z), label=r'$\sigma(z)$', linewidth=2)
ax.plot(z, sigmoid_derivative(z), label=r"$\sigma'(z) = \sigma(z)(1-\sigma(z))$",
linewidth=2, linestyle='--')
ax.axhline(y=0.5, color='gray', linestyle=':', alpha=0.5)
ax.axvline(x=0, color='gray', linestyle=':', alpha=0.5)
ax.set_xlabel('z')
ax.set_ylabel('Value')
ax.set_title('Sigmoid Function and Its Derivative')
ax.legend(fontsize=13)
ax.set_ylim(-0.05, 1.05)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
PART 2: Odds, Log-Odds, and the Logit FunctionΒΆ
Reference: DLI β βOdds, Log-oddsβ, SLP Ch 4.2
Given a probability \(p \in (0, 1)\):
Concept |
Definition |
Range |
|---|---|---|
Odds |
\(\text{odds}(p) = \dfrac{p}{1-p}\) |
\((0, \infty)\) |
Log-odds (logit) |
\(\text{logit}(p) = \log\!\left(\dfrac{p}{1-p}\right)\) |
\((-\infty, \infty)\) |
Logit is the inverse of sigmoidΒΆ
This means the sigmoid maps log-odds to probabilities, and the logit maps probabilities back to log-odds. In logistic regression the model computes \(z = \mathbf{w} \cdot \mathbf{x} + b\) (a log-odds score) and then applies \(\sigma\) to obtain a probability.
def logit(p: np.ndarray) -> np.ndarray:
"""Compute the logit (log-odds): log(p / (1-p)).
Input p must be in (0, 1).
"""
p = np.asarray(p, dtype=float)
return np.log(p / (1.0 - p))
# Verify logit is the inverse of sigmoid
z_values = np.array([-3.0, -1.0, 0.0, 1.5, 4.0])
roundtrip = logit(sigmoid(z_values))
print("Verifying logit(sigmoid(z)) = z")
print(f"{'z':>8s} | {'sigmoid(z)':>12s} | {'logit(sigmoid(z))':>18s} | {'match':>6s}")
print("-" * 55)
for z_val, rt in zip(z_values, roundtrip):
print(f"{z_val:8.2f} | {sigmoid(np.array([z_val]))[0]:12.8f} | {rt:18.8f} | {np.isclose(z_val, rt)}")
assert np.allclose(z_values, roundtrip), "Inverse property failed!"
print("\nAll checks passed.")
# Medical odds problem (DLI-style)
# A doctor estimates the probability of disease at p = 0.8.
# 1) What are the odds?
# 2) What are the log-odds?
# 3) If the log-odds increase by 1.5, what is the new probability?
p = 0.8
odds = p / (1.0 - p)
log_odds = logit(np.array([p]))[0]
new_log_odds = log_odds + 1.5
new_probability = sigmoid(np.array([new_log_odds]))[0]
print("Medical Odds Problem")
print("=" * 40)
print(f"Initial probability p = {p}")
print(f"Odds = {odds:.4f} (4 to 1)")
print(f"Log-odds (logit) = {log_odds:.4f}")
print(f"New log-odds (+1.5) = {new_log_odds:.4f}")
print(f"New probability = {new_probability:.4f}")
print(f"\nInterpretation: adding 1.5 to the log-odds raises")
print(f"the probability from {p:.2f} to {new_probability:.4f}.")
PART 3: Binary Cross-Entropy LossΒΆ
Reference: DLI β βThe Logit Function and Entropyβ, SLP Ch 4.5
For a single training example with true label \(y \in \{0, 1\}\) and predicted probability \(\hat{p}\):
For a dataset of \(N\) examples the cost function is the mean loss:
Intuition: When \(y=1\) the loss is \(-\log(\hat{p})\), which penalises small \(\hat{p}\) heavily. When \(y=0\) the loss is \(-\log(1-\hat{p})\), which penalises large \(\hat{p}\) heavily. The loss is always non-negative and equals zero only when the prediction is perfect.
def binary_cross_entropy(y_true: np.ndarray, y_pred: np.ndarray) -> float:
"""Compute the mean binary cross-entropy loss over a batch.
Args:
y_true: array of true labels (0 or 1), shape (N,)
y_pred: array of predicted probabilities, shape (N,)
Returns:
Scalar mean loss.
"""
# Clip predictions to avoid log(0)
eps = 1e-15
y_pred = np.clip(y_pred, eps, 1.0 - eps)
loss = -(y_true * np.log(y_pred) + (1.0 - y_true) * np.log(1.0 - y_pred))
return np.mean(loss)
# Quick test
print(f"Loss when y=1, p=0.9: {binary_cross_entropy(np.array([1]), np.array([0.9])):.4f} (low -- good prediction)")
print(f"Loss when y=1, p=0.1: {binary_cross_entropy(np.array([1]), np.array([0.1])):.4f} (high -- bad prediction)")
print(f"Loss when y=0, p=0.1: {binary_cross_entropy(np.array([0]), np.array([0.1])):.4f} (low -- good prediction)")
print(f"Loss when y=0, p=0.9: {binary_cross_entropy(np.array([0]), np.array([0.9])):.4f} (high -- bad prediction)")
# Plot loss curves for y=0 and y=1
p = np.linspace(0.01, 0.99, 200)
loss_y1 = -np.log(p) # Loss when y = 1
loss_y0 = -np.log(1.0 - p) # Loss when y = 0
fig, ax = plt.subplots(figsize=(9, 5))
ax.plot(p, loss_y1, label='y = 1: $-\\log(\\hat{p})$', linewidth=2)
ax.plot(p, loss_y0, label='y = 0: $-\\log(1-\\hat{p})$', linewidth=2, linestyle='--')
ax.set_xlabel('Predicted probability $\\hat{p}$')
ax.set_ylabel('Loss')
ax.set_title('Binary Cross-Entropy Loss')
ax.legend(fontsize=13)
ax.set_ylim(0, 5)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
PART 4: Logistic Regression β Full ImplementationΒΆ
Reference: DLI β βTruly Understanding Logistic Regressionβ, SLP Ch 4.4, 4.6
ModelΒΆ
Training via gradient descentΒΆ
We minimize the binary cross-entropy cost \(J\). The gradients are:
Update rule (gradient descent):
class LogisticRegression:
"""Binary logistic regression trained with gradient descent."""
def __init__(self, learning_rate: float = 0.01, n_iterations: int = 1000):
self.lr = learning_rate
self.n_iter = n_iterations
self.weights = None
self.bias = None
self.losses = []
def predict_proba(self, X: np.ndarray) -> np.ndarray:
"""Compute P(y=1|X) = sigmoid(X @ w + b)."""
linear = X @ self.weights + self.bias
return sigmoid(linear)
def compute_gradients(
self, X: np.ndarray, y: np.ndarray, predictions: np.ndarray
) -> Tuple[np.ndarray, float]:
"""Compute gradients of cross-entropy w.r.t. weights and bias.
Returns:
(dw, db) -- gradient arrays.
"""
N = X.shape[0]
error = predictions - y # (N,)
dw = (1.0 / N) * (X.T @ error) # (n_features,)
db = (1.0 / N) * np.sum(error) # scalar
return dw, db
def fit(self, X: np.ndarray, y: np.ndarray):
"""Train logistic regression using gradient descent."""
n_samples, n_features = X.shape
self.weights = np.zeros(n_features)
self.bias = 0.0
self.losses = []
for i in range(self.n_iter):
# Forward pass
predictions = self.predict_proba(X)
# Compute loss
loss = binary_cross_entropy(y, predictions)
self.losses.append(loss)
# Compute gradients
dw, db = self.compute_gradients(X, y, predictions)
# Update parameters
self.weights -= self.lr * dw
self.bias -= self.lr * db
return self
def predict(self, X: np.ndarray) -> np.ndarray:
"""Predict class labels (0 or 1) using threshold 0.5."""
probas = self.predict_proba(X)
return (probas >= 0.5).astype(int)
print("LogisticRegression class defined.")
# Generate synthetic 2D data and train
def generate_synthetic_data(n_samples=500, n_features=2, seed=42):
"""Generate linearly separable 2D data for testing."""
np.random.seed(seed)
X_pos = np.random.randn(n_samples // 2, n_features) + np.array([2, 2])
X_neg = np.random.randn(n_samples // 2, n_features) + np.array([-2, -2])
X = np.vstack([X_pos, X_neg])
y = np.hstack([np.ones(n_samples // 2), np.zeros(n_samples // 2)])
# Shuffle
idx = np.random.permutation(n_samples)
return X[idx], y[idx]
X_data, y_data = generate_synthetic_data()
print(f"Dataset: {X_data.shape[0]} samples, {X_data.shape[1]} features")
print(f"Class distribution: {int(y_data.sum())} positive, {int(len(y_data) - y_data.sum())} negative")
# Train the model
model = LogisticRegression(learning_rate=0.1, n_iterations=300)
model.fit(X_data, y_data)
print(f"\nFinal loss: {model.losses[-1]:.4f}")
print(f"Learned weights: {model.weights}")
print(f"Learned bias: {model.bias:.4f}")
# Plot the loss curve
fig, ax = plt.subplots(figsize=(9, 5))
ax.plot(model.losses, linewidth=2)
ax.set_xlabel('Iteration')
ax.set_ylabel('Binary Cross-Entropy Loss')
ax.set_title('Training Loss Curve')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
PART 5: Evaluation MetricsΒΆ
Reference: SLP Ch 4.9 β Precision, Recall, F-measure
For binary classification with true labels \(y\) and predictions \(\hat{y}\):
Metric |
Formula |
Meaning |
|---|---|---|
True Positives (TP) |
\(\hat{y}=1\) and \(y=1\) |
Correctly predicted positive |
False Positives (FP) |
\(\hat{y}=1\) and \(y=0\) |
Incorrectly predicted positive |
True Negatives (TN) |
\(\hat{y}=0\) and \(y=0\) |
Correctly predicted negative |
False Negatives (FN) |
\(\hat{y}=0\) and \(y=1\) |
Incorrectly predicted negative |
The confusion matrix summarises all four counts:
def precision_recall_f1(y_true: np.ndarray, y_pred: np.ndarray) -> dict:
"""Compute precision, recall, and F1-score for binary classification."""
y_true = np.asarray(y_true, dtype=int)
y_pred = np.asarray(y_pred, dtype=int)
tp = np.sum((y_pred == 1) & (y_true == 1))
fp = np.sum((y_pred == 1) & (y_true == 0))
fn = np.sum((y_pred == 0) & (y_true == 1))
precision = tp / (tp + fp) if (tp + fp) > 0 else 0.0
recall = tp / (tp + fn) if (tp + fn) > 0 else 0.0
f1 = (
2.0 * precision * recall / (precision + recall)
if (precision + recall) > 0
else 0.0
)
return {"precision": precision, "recall": recall, "f1": f1}
def confusion_matrix(y_true: np.ndarray, y_pred: np.ndarray) -> np.ndarray:
"""Compute a 2x2 confusion matrix: [[TN, FP], [FN, TP]]."""
y_true = np.asarray(y_true, dtype=int)
y_pred = np.asarray(y_pred, dtype=int)
tp = np.sum((y_pred == 1) & (y_true == 1))
fp = np.sum((y_pred == 1) & (y_true == 0))
tn = np.sum((y_pred == 0) & (y_true == 0))
fn = np.sum((y_pred == 0) & (y_true == 1))
return np.array([[tn, fp], [fn, tp]])
print("Metric functions defined.")
# Evaluate the trained model from Part 4
y_pred = model.predict(X_data)
accuracy = np.mean(y_pred == y_data)
metrics = precision_recall_f1(y_data, y_pred)
cm = confusion_matrix(y_data, y_pred)
print("Evaluation on Training Data")
print("=" * 35)
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {metrics['precision']:.4f}")
print(f"Recall: {metrics['recall']:.4f}")
print(f"F1-score: {metrics['f1']:.4f}")
print(f"\nConfusion Matrix (rows=actual, cols=predicted):")
print(f" Pred 0 Pred 1")
print(f"Actual 0 {cm[0, 0]:>5d} {cm[0, 1]:>5d}")
print(f"Actual 1 {cm[1, 0]:>5d} {cm[1, 1]:>5d}")
PART 6: Multinomial Logistic Regression (Softmax)ΒΆ
Reference: DLI β βLogistic Regressionβ, SLP Ch 4.7β4.8
When there are \(K > 2\) classes, we generalise the sigmoid to the softmax function. Given a vector of logits \(\mathbf{z} \in \mathbb{R}^K\):
The outputs form a valid probability distribution: every element is in \((0, 1)\) and they sum to \(1\).
Numerical stability: In practice we compute \(\text{softmax}(\mathbf{z} - \max(\mathbf{z}))\) to prevent overflow. This is mathematically equivalent because the constant cancels in the ratio.
The loss function becomes categorical cross-entropy:
where \(\mathbf{y}_i\) is one-hot encoded.
def softmax(z: np.ndarray) -> np.ndarray:
"""Compute softmax probabilities (numerically stable).
Args:
z: array of shape (N, K) where K is the number of classes.
Returns:
Probabilities of shape (N, K), each row sums to 1.
"""
z = np.atleast_2d(z)
z_shifted = z - np.max(z, axis=-1, keepdims=True) # subtract max for stability
exp_z = np.exp(z_shifted)
return exp_z / np.sum(exp_z, axis=-1, keepdims=True)
# Sanity checks
z_test = np.array([[1.0, 2.0, 3.0]])
probs = softmax(z_test)
print(f"softmax({z_test[0]}) = {probs[0]}")
print(f"Sum = {probs.sum():.6f} (should be 1.0)")
# Test with large values (should not overflow)
z_big = np.array([[1000.0, 1001.0, 1002.0]])
probs_big = softmax(z_big)
print(f"\nsoftmax([1000, 1001, 1002]) = {probs_big[0]}")
print(f"Sum = {probs_big.sum():.6f} (should be 1.0, no overflow)")
def categorical_cross_entropy(y_true_onehot: np.ndarray, y_pred: np.ndarray) -> float:
"""Compute categorical cross-entropy loss.
Args:
y_true_onehot: one-hot encoded labels, shape (N, K)
y_pred: predicted probabilities, shape (N, K)
Returns:
Scalar mean loss.
"""
eps = 1e-15
y_pred = np.clip(y_pred, eps, 1.0 - eps)
loss = -np.sum(y_true_onehot * np.log(y_pred), axis=-1)
return np.mean(loss)
print("categorical_cross_entropy defined.")
# Demo: multinomial softmax with 3 classes
np.random.seed(0)
# Simulate logits for 5 samples and 3 classes
logits = np.random.randn(5, 3)
probs = softmax(logits)
# One-hot true labels: classes [0, 1, 2, 0, 1]
y_true_labels = np.array([0, 1, 2, 0, 1])
y_onehot = np.zeros((5, 3))
y_onehot[np.arange(5), y_true_labels] = 1.0
loss = categorical_cross_entropy(y_onehot, probs)
print("Softmax Demo (5 samples, 3 classes)")
print("=" * 50)
print(f"Logits:\n{logits}\n")
print(f"Softmax probabilities:\n{probs}\n")
print(f"Row sums: {probs.sum(axis=1)} (all should be 1.0)\n")
print(f"True labels (one-hot):\n{y_onehot}\n")
print(f"Categorical cross-entropy loss: {loss:.4f}")
PART 7: RegularizationΒΆ
Reference: SLP Ch 4.14, DLI β βLogistic Regressionβ
Regularization adds a penalty to the loss function to discourage large weights, which helps prevent overfitting.
L2 Regularization (Ridge / Weight Decay)ΒΆ
The gradient of the regularization term with respect to weights is simply \(\lambda \mathbf{w}\), so the update becomes:
Note: the bias \(b\) is typically not regularized.
Effect: L2 regularization shrinks weights toward zero but rarely makes them exactly zero. Larger \(\lambda\) means stronger regularization (simpler model).
class LogisticRegressionL2(LogisticRegression):
"""Logistic regression with L2 regularization."""
def __init__(self, learning_rate=0.01, n_iterations=1000, reg_lambda=0.1):
super().__init__(learning_rate, n_iterations)
self.reg_lambda = reg_lambda
def fit(self, X: np.ndarray, y: np.ndarray):
"""Train with L2-regularized loss."""
n_samples, n_features = X.shape
self.weights = np.zeros(n_features)
self.bias = 0.0
self.losses = []
for i in range(self.n_iter):
# Forward pass
predictions = self.predict_proba(X)
# Compute loss with L2 penalty
ce_loss = binary_cross_entropy(y, predictions)
reg_term = (self.reg_lambda / 2.0) * np.sum(self.weights ** 2)
loss = ce_loss + reg_term
self.losses.append(loss)
# Compute gradients (base gradients + regularization gradient)
dw, db = self.compute_gradients(X, y, predictions)
dw += self.reg_lambda * self.weights # L2 gradient on weights
# bias is NOT regularized
# Update parameters
self.weights -= self.lr * dw
self.bias -= self.lr * db
return self
print("LogisticRegressionL2 class defined.")
# Compare regularized vs unregularized on high-dimensional data
np.random.seed(42)
n_samples = 100
n_features = 50 # many features relative to samples -- overfitting risk
# Only the first 2 features are informative
X_hd = np.random.randn(n_samples, n_features)
y_hd = (X_hd[:, 0] + X_hd[:, 1] > 0).astype(float)
# Train unregularized
model_noreg = LogisticRegression(learning_rate=0.05, n_iterations=500)
model_noreg.fit(X_hd, y_hd)
# Train L2-regularized
model_l2 = LogisticRegressionL2(learning_rate=0.05, n_iterations=500, reg_lambda=0.5)
model_l2.fit(X_hd, y_hd)
print("Weight magnitudes (L2 norm):")
print(f" Unregularized: ||w|| = {np.linalg.norm(model_noreg.weights):.4f}")
print(f" L2 regularized: ||w|| = {np.linalg.norm(model_l2.weights):.4f}")
# Plot loss curves side by side
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
axes[0].plot(model_noreg.losses, linewidth=2, label='No regularization')
axes[0].plot(model_l2.losses, linewidth=2, linestyle='--', label=f'L2 ($\\lambda$={model_l2.reg_lambda})')
axes[0].set_xlabel('Iteration')
axes[0].set_ylabel('Loss')
axes[0].set_title('Loss Curves')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
# Bar chart of weight magnitudes
axes[1].bar(np.arange(n_features) - 0.2, np.abs(model_noreg.weights), 0.4,
label='No regularization', alpha=0.7)
axes[1].bar(np.arange(n_features) + 0.2, np.abs(model_l2.weights), 0.4,
label=f'L2 ($\\lambda$={model_l2.reg_lambda})', alpha=0.7)
axes[1].set_xlabel('Feature index')
axes[1].set_ylabel('|weight|')
axes[1].set_title('Weight Magnitudes by Feature')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
print(f"\nNote: L2 regularization shrinks all weights toward zero,")
print(f"reducing overfitting on the {n_features - 2} noise features.")
PART 8: End-to-End PipelineΒΆ
# Full pipeline: generate data, train/test split, train, evaluate, plot decision boundary
np.random.seed(123)
# 1. Generate data
X_all, y_all = generate_synthetic_data(n_samples=600, seed=123)
# 2. Train/test split (80/20)
split_idx = int(0.8 * len(y_all))
indices = np.random.permutation(len(y_all))
train_idx, test_idx = indices[:split_idx], indices[split_idx:]
X_train, y_train = X_all[train_idx], y_all[train_idx]
X_test, y_test = X_all[test_idx], y_all[test_idx]
print(f"Train set: {X_train.shape[0]} samples")
print(f"Test set: {X_test.shape[0]} samples")
# 3. Train logistic regression
pipeline_model = LogisticRegression(learning_rate=0.1, n_iterations=500)
pipeline_model.fit(X_train, y_train)
# 4. Evaluate
y_pred_train = pipeline_model.predict(X_train)
y_pred_test = pipeline_model.predict(X_test)
train_metrics = precision_recall_f1(y_train, y_pred_train)
test_metrics = precision_recall_f1(y_test, y_pred_test)
test_cm = confusion_matrix(y_test, y_pred_test)
train_acc = np.mean(y_pred_train == y_train)
test_acc = np.mean(y_pred_test == y_test)
print(f"\n{'Metric':<12s} {'Train':>8s} {'Test':>8s}")
print("-" * 30)
print(f"{'Accuracy':<12s} {train_acc:>8.4f} {test_acc:>8.4f}")
print(f"{'Precision':<12s} {train_metrics['precision']:>8.4f} {test_metrics['precision']:>8.4f}")
print(f"{'Recall':<12s} {train_metrics['recall']:>8.4f} {test_metrics['recall']:>8.4f}")
print(f"{'F1':<12s} {train_metrics['f1']:>8.4f} {test_metrics['f1']:>8.4f}")
print(f"\nTest Confusion Matrix:")
print(f" Pred 0 Pred 1")
print(f"Actual 0 {test_cm[0, 0]:>5d} {test_cm[0, 1]:>5d}")
print(f"Actual 1 {test_cm[1, 0]:>5d} {test_cm[1, 1]:>5d}")
# 5. Plot decision boundary
fig, axes = plt.subplots(1, 2, figsize=(16, 6))
# Left: loss curve
axes[0].plot(pipeline_model.losses, linewidth=2)
axes[0].set_xlabel('Iteration')
axes[0].set_ylabel('Loss')
axes[0].set_title('Training Loss Curve')
axes[0].grid(True, alpha=0.3)
# Right: decision boundary
ax = axes[1]
# Create mesh grid
x_min, x_max = X_all[:, 0].min() - 1, X_all[:, 0].max() + 1
y_min, y_max = X_all[:, 1].min() - 1, X_all[:, 1].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
np.linspace(y_min, y_max, 200))
grid = np.c_[xx.ravel(), yy.ravel()]
probs_grid = pipeline_model.predict_proba(grid).reshape(xx.shape)
# Plot decision regions
ax.contourf(xx, yy, probs_grid, levels=50, cmap='RdBu', alpha=0.6)
ax.contour(xx, yy, probs_grid, levels=[0.5], colors='black', linewidths=2)
# Plot test data
ax.scatter(X_test[y_test == 0, 0], X_test[y_test == 0, 1],
c='blue', edgecolors='k', s=40, label='Class 0 (test)', alpha=0.7)
ax.scatter(X_test[y_test == 1, 0], X_test[y_test == 1, 1],
c='red', edgecolors='k', s=40, label='Class 1 (test)', alpha=0.7)
ax.set_xlabel('$x_1$')
ax.set_ylabel('$x_2$')
ax.set_title(f'Decision Boundary (Test Acc = {test_acc:.2%})')
ax.legend()
plt.tight_layout()
plt.show()
SummaryΒΆ
In this lab we built logistic regression entirely from scratch, covering:
Sigmoid function β the core nonlinearity that maps real numbers to probabilities, along with its elegant derivative \(\sigma'(z) = \sigma(z)(1 - \sigma(z))\).
Odds and logit β the logit function as the inverse of the sigmoid, connecting probability space to the unrestricted real line.
Binary cross-entropy β the principled loss function derived from maximum likelihood estimation.
Gradient descent training β computing gradients analytically and iteratively updating weights and bias to minimize the loss.
Evaluation metrics β precision, recall, F1-score, and the confusion matrix for assessing classifier performance.
Softmax / multinomial logistic regression β generalizing to \(K > 2\) classes using the softmax function and categorical cross-entropy.
L2 regularization β adding a weight penalty to reduce overfitting, especially with many features.
End-to-end pipeline β data generation, train/test split, training, evaluation, and decision boundary visualization.
These foundations are critical building blocks for neural networks, where logistic regression is essentially a single-layer network with a sigmoid output.