Lab 03: Matrix DecompositionsΒΆ
Based on Chapter 4 of Mathematics for Machine Learning (Deisenroth, Faisal, Ong)
This lab covers the core matrix decomposition techniques from MML Chapter 4:
Determinant and Trace β Cofactor expansion, Sarrusβ rule, key properties
Eigenvalues and Eigenvectors β Characteristic polynomial, eigenspaces, visualization
Cholesky Decomposition β From-scratch factorization for SPD matrices
Eigendecomposition and Diagonalization β \(A = PDP^{-1}\), when it exists
Singular Value Decomposition β Step-by-step SVD, geometric interpretation
Matrix Approximation β Low-rank approximation via truncated SVD, image compression
Matrix Phylogeny β Taxonomy of matrix types and their decompositions
Matrix decompositions are the factoring of numbers generalized to matrices (MML, p.98). They reveal structure, enable efficient computation, and underpin dimensionality reduction, PCA, and many other ML techniques.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import FancyArrowPatch
np.set_printoptions(precision=4, suppress=True)
plt.rcParams['figure.figsize'] = (8, 5)
plt.rcParams['font.size'] = 12
print("NumPy version:", np.__version__)
print("Setup complete.")
1. Determinant and Trace (MML 4.1)ΒΆ
The determinant \(\det(A)\) maps a square matrix \(A \in \mathbb{R}^{n \times n}\) to a real number. Geometrically, it measures the signed volume of the parallelepiped spanned by the columns of \(A\).
2x2 DeterminantΒΆ
3x3 Determinant (Sarrusβ Rule)ΒΆ
Cofactor (Laplace) Expansion along row \(j\)ΒΆ
where \(A_{j,k}\) is the submatrix obtained by deleting row \(j\) and column \(k\).
The TraceΒΆ
Key properties:
\(\text{tr}(A+B) = \text{tr}(A) + \text{tr}(B)\)
\(\text{tr}(AB) = \text{tr}(BA)\) (cyclic permutation invariance)
\(\text{tr}(I_n) = n\)
# --- 2x2 Determinant by hand ---
A2 = np.array([[4, 2],
[1, 3]])
det_2x2_manual = A2[0, 0] * A2[1, 1] - A2[0, 1] * A2[1, 0]
det_2x2_numpy = np.linalg.det(A2)
print("=== 2x2 Determinant ===")
print(f"A =\n{A2}")
print(f"Manual: {A2[0,0]}*{A2[1,1]} - {A2[0,1]}*{A2[1,0]} = {det_2x2_manual}")
print(f"NumPy: {det_2x2_numpy:.4f}")
print()
# --- 3x3 Determinant: Sarrus' rule (MML Example 4.3) ---
A3 = np.array([[1, 2, 3],
[3, 1, 2],
[0, 0, 1]])
# Sarrus' rule
sarrus = (A3[0,0]*A3[1,1]*A3[2,2] + A3[1,0]*A3[2,1]*A3[0,2] + A3[2,0]*A3[0,1]*A3[1,2]
- A3[2,0]*A3[1,1]*A3[0,2] - A3[0,0]*A3[2,1]*A3[1,2] - A3[1,0]*A3[0,1]*A3[2,2])
print("=== 3x3 Determinant (Sarrus' Rule, MML Example 4.3) ===")
print(f"A =\n{A3}")
print(f"Sarrus: {sarrus}")
print(f"NumPy: {np.linalg.det(A3):.4f}")
print()
# --- Cofactor expansion along first row ---
def cofactor_det_3x3(M):
"""Compute 3x3 determinant via cofactor expansion along row 0."""
det = 0
for j in range(3):
# Minor: delete row 0, column j
minor = np.delete(np.delete(M, 0, axis=0), j, axis=1)
cofactor = (-1)**(0 + j) * M[0, j] * np.linalg.det(minor)
det += cofactor
print(f" j={j}: (-1)^{j} * {M[0,j]} * det({minor.flatten()}) = {cofactor:.4f}")
return det
print("=== Cofactor Expansion (Laplace, MML Theorem 4.2) ===")
det_cofactor = cofactor_det_3x3(A3)
print(f"Result: {det_cofactor:.4f}")
# --- Key Determinant Properties (MML p.103) ---
A = np.array([[2, 1], [1, 3]])
B = np.array([[1, 2], [0, 4]])
print("=== Determinant Properties ===")
print(f"A =\n{A}")
print(f"B =\n{B}")
print()
# Property 1: det(AB) = det(A) * det(B)
det_AB = np.linalg.det(A @ B)
det_A_det_B = np.linalg.det(A) * np.linalg.det(B)
print(f"det(AB) = {det_AB:.4f}")
print(f"det(A)*det(B) = {det_A_det_B:.4f}")
print(f"Equal? {np.isclose(det_AB, det_A_det_B)}")
print()
# Property 2: det(A) = det(A^T)
print(f"det(A) = {np.linalg.det(A):.4f}")
print(f"det(A^T) = {np.linalg.det(A.T):.4f}")
print(f"Equal? {np.isclose(np.linalg.det(A), np.linalg.det(A.T))}")
print()
# Property 3: det(A^{-1}) = 1/det(A)
A_inv = np.linalg.inv(A)
print(f"det(A^{{-1}}) = {np.linalg.det(A_inv):.4f}")
print(f"1/det(A) = {1/np.linalg.det(A):.4f}")
print()
# --- Trace Properties (MML p.104) ---
print("=== Trace Properties ===")
print(f"tr(A) = {np.trace(A):.4f}")
print(f"tr(B) = {np.trace(B):.4f}")
print(f"tr(A+B) = {np.trace(A+B):.4f}, tr(A)+tr(B) = {np.trace(A)+np.trace(B):.4f}")
print(f"tr(AB) = {np.trace(A@B):.4f}, tr(BA) = {np.trace(B@A):.4f} (cyclic invariance)")
print(f"tr(I_2) = {np.trace(np.eye(2)):.0f}")
2. Eigenvalues and Eigenvectors (MML 4.2)ΒΆ
An eigenvalue \(\lambda\) and eigenvector \(x \neq 0\) of a square matrix \(A\) satisfy: $\(Ax = \lambda x\)$
This means \(A\) merely scales \(x\) by factor \(\lambda\). To find eigenvalues, we solve the characteristic polynomial (MML Definition 4.5): $\(p_A(\lambda) = \det(A - \lambda I) = 0\)$
Key relationships (MML Theorems 4.16, 4.17):
\(\det(A) = \prod_{i} \lambda_i\) (determinant is product of eigenvalues)
\(\text{tr}(A) = \sum_{i} \lambda_i\) (trace is sum of eigenvalues)
For symmetric matrices, all eigenvalues are real (Spectral Theorem 4.15).
# --- MML Example 4.5: Eigenvalues of A = [[4,2],[1,3]] ---
A = np.array([[4, 2],
[1, 3]])
print("=== MML Example 4.5: Finding Eigenvalues from Scratch ===")
print(f"A =\n{A}")
print()
# Step 1: Characteristic polynomial p(lambda) = det(A - lambda*I)
# For 2x2: (a11-l)(a22-l) - a12*a21 = 0
# (4-l)(3-l) - 2*1 = l^2 - 7l + 10 = (l-2)(l-5)
a, b, c = 1, -(A[0,0]+A[1,1]), A[0,0]*A[1,1]-A[0,1]*A[1,0]
print(f"Characteristic polynomial: lambda^2 + ({b})*lambda + ({c}) = 0")
discriminant = b**2 - 4*a*c
lambda1 = (-b + np.sqrt(discriminant)) / (2*a)
lambda2 = (-b - np.sqrt(discriminant)) / (2*a)
print(f"Eigenvalues: lambda_1 = {lambda1}, lambda_2 = {lambda2}")
print()
# Step 2: Find eigenvectors by solving (A - lambda*I)x = 0
print("--- Eigenvector for lambda_1 = 5 ---")
print(f"(A - 5I) = \n{A - 5*np.eye(2)}")
# -x1 + 2*x2 = 0 => x1 = 2*x2 => eigenvector [2, 1]
v1 = np.array([2, 1])
print(f"Eigenvector: {v1} (from -x1 + 2x2 = 0)")
print(f"Verify: A @ v1 = {A @ v1}, 5 * v1 = {5*v1}")
print()
print("--- Eigenvector for lambda_2 = 2 ---")
print(f"(A - 2I) = \n{A - 2*np.eye(2)}")
# 2*x1 + 2*x2 = 0 => x2 = -x1 => eigenvector [1, -1]
v2 = np.array([1, -1])
print(f"Eigenvector: {v2} (from 2x1 + 2x2 = 0)")
print(f"Verify: A @ v2 = {A @ v2}, 2 * v2 = {2*v2}")
print()
# Step 3: Verify with numpy
eigenvalues, eigenvectors = np.linalg.eig(A)
print("=== NumPy Verification ===")
print(f"Eigenvalues: {eigenvalues}")
print(f"Eigenvectors (columns):\n{eigenvectors}")
print()
# Verify det = product, trace = sum (MML Theorems 4.16, 4.17)
print("=== Eigenvalue-Determinant-Trace Relations ===")
print(f"det(A) = {np.linalg.det(A):.4f}, product of eigenvalues = {np.prod(eigenvalues):.4f}")
print(f"tr(A) = {np.trace(A):.4f}, sum of eigenvalues = {np.sum(eigenvalues):.4f}")
# --- Visualize eigenvectors: how A stretches them ---
A = np.array([[4, 2],
[1, 3]])
eigenvalues, eigenvectors = np.linalg.eig(A)
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
# Left: Show eigenvectors and their images under A
ax = axes[0]
colors = ['#d62728', '#1f77b4']
for i in range(2):
v = eigenvectors[:, i]
Av = A @ v
ax.annotate('', xy=v, xytext=[0,0],
arrowprops=dict(arrowstyle='->', color=colors[i], lw=2.5))
ax.annotate('', xy=Av, xytext=[0,0],
arrowprops=dict(arrowstyle='->', color=colors[i], lw=2.5, linestyle='--'))
ax.text(v[0]*1.1, v[1]*1.1, f'$v_{i+1}$', fontsize=14, color=colors[i])
ax.text(Av[0]*1.05, Av[1]*1.05, f'$Av_{i+1}={eigenvalues[i]:.0f}v_{i+1}$',
fontsize=12, color=colors[i])
ax.set_xlim(-1.5, 5.5)
ax.set_ylim(-3, 3)
ax.set_aspect('equal')
ax.grid(True, alpha=0.3)
ax.axhline(y=0, color='k', lw=0.5)
ax.axvline(x=0, color='k', lw=0.5)
ax.set_title(f'Eigenvectors of A (solid) and Av (dashed)\n$\\lambda_1={eigenvalues[0]:.0f}$, $\\lambda_2={eigenvalues[1]:.0f}$')
ax.set_xlabel('$x_1$')
ax.set_ylabel('$x_2$')
# Right: How A transforms a grid of points
ax = axes[1]
theta = np.linspace(0, 2*np.pi, 100)
circle = np.array([np.cos(theta), np.sin(theta)])
ellipse = A @ circle
ax.plot(circle[0], circle[1], 'b-', lw=1.5, label='Unit circle')
ax.plot(ellipse[0], ellipse[1], 'r-', lw=2, label='A(unit circle)')
for i in range(2):
v = eigenvectors[:, i]
ax.annotate('', xy=eigenvalues[i]*v, xytext=[0,0],
arrowprops=dict(arrowstyle='->', color=colors[i], lw=2))
ax.set_xlim(-6, 6)
ax.set_ylim(-4, 4)
ax.set_aspect('equal')
ax.grid(True, alpha=0.3)
ax.axhline(y=0, color='k', lw=0.5)
ax.axvline(x=0, color='k', lw=0.5)
ax.legend()
ax.set_title('Unit circle mapped to ellipse by A\nAxes align with eigenvectors')
ax.set_xlabel('$x_1$')
ax.set_ylabel('$x_2$')
plt.tight_layout()
plt.show()
3. Cholesky Decomposition (MML 4.3)ΒΆ
For a symmetric, positive definite (SPD) matrix \(A\), the Cholesky decomposition (MML Theorem 4.18) gives: $\(A = LL^\top\)$
where \(L\) is a lower-triangular matrix with positive diagonal entries. \(L\) is unique and is called the Cholesky factor.
Computing L element-by-element (MML Eqs. 4.47-4.48)ΒΆ
For a \(3 \times 3\) case:
Diagonal: \(l_{jj} = \sqrt{a_{jj} - \sum_{k=1}^{j-1} l_{jk}^2}\)
Off-diagonal (\(i > j\)): \(l_{ij} = \frac{1}{l_{jj}}\left(a_{ij} - \sum_{k=1}^{j-1} l_{ik} l_{jk}\right)\)
The Cholesky decomposition is used to:
Efficiently compute determinants: \(\det(A) = \prod_i l_{ii}^2\)
Sample from multivariate Gaussians
Solve SPD linear systems efficiently
def cholesky_decomposition(A):
"""Compute Cholesky decomposition A = LL^T from scratch (MML Eqs. 4.47-4.48)."""
n = A.shape[0]
L = np.zeros((n, n))
for j in range(n):
# Diagonal element
s = sum(L[j, k]**2 for k in range(j))
L[j, j] = np.sqrt(A[j, j] - s)
# Off-diagonal elements (below diagonal)
for i in range(j + 1, n):
s = sum(L[i, k] * L[j, k] for k in range(j))
L[i, j] = (A[i, j] - s) / L[j, j]
return L
# Create a symmetric, positive definite matrix
# Method: A = M^T M guarantees SPD when M is full rank (MML Theorem 4.14)
M = np.array([[2, 1, 0],
[1, 3, 1],
[0, 1, 2]])
A = M.T @ M
print("=== Cholesky Decomposition (MML Section 4.3) ===")
print(f"A (SPD matrix) =\n{A}")
print(f"A is symmetric: {np.allclose(A, A.T)}")
print(f"Eigenvalues (all positive for SPD): {np.linalg.eigvalsh(A)}")
print()
# From-scratch Cholesky
L_manual = cholesky_decomposition(A)
print(f"L (from scratch) =\n{L_manual}")
print(f"L @ L^T =\n{L_manual @ L_manual.T}")
print(f"Matches A? {np.allclose(A, L_manual @ L_manual.T)}")
print()
# NumPy verification
L_numpy = np.linalg.cholesky(A)
print(f"L (NumPy) =\n{L_numpy}")
print(f"Our L matches NumPy? {np.allclose(L_manual, L_numpy)}")
print()
# Efficient determinant via Cholesky: det(A) = det(L)^2 = (prod of diagonal of L)^2
det_via_cholesky = np.prod(np.diag(L_manual))**2
print(f"det(A) via Cholesky (prod(diag(L))^2) = {det_via_cholesky:.4f}")
print(f"det(A) via NumPy = {np.linalg.det(A):.4f}")
4. Eigendecomposition and Diagonalization (MML 4.4)ΒΆ
A matrix \(A \in \mathbb{R}^{n \times n}\) is diagonalizable (MML Definition 4.19) if there exists an invertible matrix \(P\) such that: $\(D = P^{-1}AP \quad \Longleftrightarrow \quad A = PDP^{-1}\)$
where \(D\) is diagonal with the eigenvalues of \(A\), and the columns of \(P\) are the corresponding eigenvectors.
When does it exist? (MML Theorem 4.20)
\(A\) is diagonalizable iff its eigenvectors form a basis of \(\mathbb{R}^n\).
Every symmetric matrix is diagonalizable (Theorem 4.21), and \(P\) is orthogonal (\(P^{-1} = P^\top\)).
A defective matrix (Definition 4.13) has fewer than \(n\) linearly independent eigenvectors and is NOT diagonalizable.
Geometric intuition (MML Figure 4.7): \(A = PDP^{-1}\) means
\(P^{-1}\): Change basis to eigenbasis
\(D\): Scale along eigenvector axes by eigenvalues
\(P\): Change basis back to standard
# --- MML Example 4.11: Eigendecomposition ---
# A = 1/2 * [[5, -2], [-2, 5]]
A = 0.5 * np.array([[5, -2],
[-2, 5]])
print("=== MML Example 4.11: Eigendecomposition ===")
print(f"A =\n{A}")
print()
# Step 1: Eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"Eigenvalues: {eigenvalues}")
print(f"Eigenvectors (columns of P):\n{eigenvectors}")
print()
# Step 2: Construct P and D
P = eigenvectors
D = np.diag(eigenvalues)
P_inv = np.linalg.inv(P)
print(f"P =\n{P}")
print(f"D =\n{D}")
print(f"P^{{-1}} =\n{P_inv}")
print()
# Step 3: Verify A = P D P^{-1}
A_reconstructed = P @ D @ P_inv
print(f"P @ D @ P^{{-1}} =\n{A_reconstructed}")
print(f"Matches A? {np.allclose(A, A_reconstructed)}")
print()
# For symmetric A, P is orthogonal: P^{-1} = P^T
print(f"A is symmetric: {np.allclose(A, A.T)}")
print(f"P is orthogonal (P^T P = I)? {np.allclose(P.T @ P, np.eye(2))}")
print(f"P^{{-1}} == P^T? {np.allclose(P_inv, P.T)}")
# --- Diagonalizable vs Non-Diagonalizable (Defective) matrices ---
print("=== Diagonalizable vs Defective ===")
print()
# Case 1: Diagonalizable - distinct eigenvalues (MML Theorem 4.12)
A_diag = np.array([[4, 2],
[1, 3]])
vals, vecs = np.linalg.eig(A_diag)
print(f"A1 = [[4,2],[1,3]]")
print(f"Eigenvalues: {vals} (distinct => diagonalizable)")
print(f"P invertible? det(P) = {np.linalg.det(vecs):.4f}")
print()
# Case 2: Defective (MML Example 4.6) - repeated eigenvalue, geometric multiplicity < algebraic
A_defective = np.array([[2, 1],
[0, 2]])
vals_d, vecs_d = np.linalg.eig(A_defective)
print(f"A2 = [[2,1],[0,2]] (MML Example 4.6)")
print(f"Eigenvalues: {vals_d} (repeated lambda=2)")
print(f"Eigenvectors:\n{vecs_d}")
# Only one linearly independent eigenvector [1,0] -> geometric multiplicity 1 < algebraic 2
# Check: can we invert P?
print(f"det(P) = {np.linalg.det(vecs_d):.6f} (near zero => NOT diagonalizable)")
print("This is a defective matrix: algebraic multiplicity 2 but geometric multiplicity 1.")
print()
# Case 3: Symmetric matrices are ALWAYS diagonalizable (MML Theorem 4.21)
A_sym = np.array([[3, 2, 2],
[2, 3, 2],
[2, 2, 3]])
vals_s, vecs_s = np.linalg.eig(A_sym)
print(f"A3 (symmetric) = [[3,2,2],[2,3,2],[2,2,3]] (MML Example 4.8)")
print(f"Eigenvalues: {vals_s}")
print(f"Eigenvectors are orthogonal (P^T P = I)?")
print(np.round(vecs_s.T @ vecs_s, 4))
print("Symmetric => always diagonalizable with orthogonal eigenvectors (Spectral Theorem).")
# --- Efficient matrix powers via eigendecomposition: A^k = P D^k P^{-1} (MML Eq. 4.62) ---
A = 0.5 * np.array([[5, -2],
[-2, 5]])
eigenvalues, P = np.linalg.eig(A)
P_inv = np.linalg.inv(P)
print("=== Matrix Powers via Eigendecomposition (MML Eq. 4.62) ===")
print(f"A =\n{A}")
print(f"Eigenvalues: {eigenvalues}")
print()
for k in [2, 5, 10]:
D_k = np.diag(eigenvalues**k)
A_k_eigen = P @ D_k @ P_inv
A_k_direct = np.linalg.matrix_power(A, k)
print(f"A^{k} via eigendecomp:\n{A_k_eigen}")
print(f"A^{k} via direct power:\n{A_k_direct}")
print(f"Match? {np.allclose(A_k_eigen, A_k_direct)}")
print()
5. Singular Value Decomposition (MML 4.5)ΒΆ
The SVD (MML Theorem 4.22) is the most general matrix decomposition. For any \(A \in \mathbb{R}^{m \times n}\) of rank \(r\): $\(A = U \Sigma V^\top\)$
where:
\(U \in \mathbb{R}^{m \times m}\) is orthogonal (columns are left-singular vectors, eigenvectors of \(AA^\top\))
\(V \in \mathbb{R}^{n \times n}\) is orthogonal (columns are right-singular vectors, eigenvectors of \(A^\top A\))
\(\Sigma \in \mathbb{R}^{m \times n}\) has singular values \(\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0\) on the diagonal
How to compute (MML Section 4.5.2)ΒΆ
Compute \(A^\top A\) and find its eigenvalues/eigenvectors \(\Rightarrow\) gives \(V\) and \(\sigma_i^2 = \lambda_i\)
Singular values: \(\sigma_i = \sqrt{\lambda_i}\)
Left-singular vectors: \(u_i = \frac{1}{\sigma_i} A v_i\) (MML Eq. 4.78)
SVD vs Eigendecomposition (MML Section 4.5.3)ΒΆ
SVD exists for any matrix (not just square). Eigendecomposition requires square + basis of eigenvectors.
For symmetric SPD matrices, SVD and eigendecomposition coincide (\(U = V = P\), \(\Sigma = D\)).
# --- Step-by-step SVD computation (following MML Section 4.5.2) ---
A = np.array([[1, 0, 1],
[-2, 1, 0]])
print("=== Step-by-Step SVD (MML Example 4.13 approach) ===")
print(f"A ({A.shape[0]}x{A.shape[1]}) =\n{A}")
print()
# Step 1: Compute A^T A and its eigen-decomposition
ATA = A.T @ A
print(f"Step 1: A^T A =\n{ATA}")
eigenvalues_ATA, eigenvectors_ATA = np.linalg.eigh(ATA) # eigh for symmetric
# Sort in descending order
idx = np.argsort(eigenvalues_ATA)[::-1]
eigenvalues_ATA = eigenvalues_ATA[idx]
eigenvectors_ATA = eigenvectors_ATA[:, idx]
print(f"Eigenvalues of A^T A: {eigenvalues_ATA}")
print(f"V (eigenvectors of A^T A) =\n{eigenvectors_ATA}")
print()
# Step 2: Singular values
singular_values = np.sqrt(np.maximum(eigenvalues_ATA, 0)) # protect against tiny negatives
print(f"Step 2: Singular values sigma_i = sqrt(lambda_i): {singular_values}")
print()
# Step 3: Left-singular vectors u_i = (1/sigma_i) * A * v_i (MML Eq. 4.78)
V = eigenvectors_ATA
r = np.sum(singular_values > 1e-10) # rank
U_partial = np.zeros((A.shape[0], r))
for i in range(r):
U_partial[:, i] = (1.0 / singular_values[i]) * A @ V[:, i]
print(f"Step 3: u_{i+1} = (1/{singular_values[i]:.4f}) * A @ v_{i+1} = {U_partial[:, i]}")
print()
# Build full Sigma
Sigma_manual = np.zeros_like(A, dtype=float)
for i in range(r):
Sigma_manual[i, i] = singular_values[i]
# Reconstruct
# For the full U, we need to complete the orthonormal basis
# Since m=2 and r=2, U_partial is already complete
U_manual = U_partial
A_reconstructed = U_manual @ Sigma_manual[:r, :] @ V.T
print(f"Reconstructed A = U @ Sigma @ V^T =\n{A_reconstructed}")
print(f"Matches original A? {np.allclose(A, A_reconstructed)}")
print()
# Verify with NumPy
U_np, s_np, VT_np = np.linalg.svd(A)
print("=== NumPy SVD Verification ===")
print(f"Singular values (NumPy): {s_np}")
print(f"Singular values (ours): {singular_values[:r]}")
print(f"Match? {np.allclose(np.sort(s_np)[::-1], np.sort(singular_values[:r])[::-1])}")
# --- Visualize how SVD transforms the unit circle (MML Figure 4.8/4.9 idea) ---
A = np.array([[2, 0.5],
[0.5, 1.5]])
U, s, VT = np.linalg.svd(A)
V = VT.T
theta = np.linspace(0, 2*np.pi, 200)
circle = np.array([np.cos(theta), np.sin(theta)])
fig, axes = plt.subplots(1, 4, figsize=(20, 5))
# Panel 1: Original unit circle with right-singular vectors
ax = axes[0]
ax.plot(circle[0], circle[1], 'b-', lw=1.5)
for i in range(2):
ax.annotate('', xy=V[:, i], xytext=[0,0],
arrowprops=dict(arrowstyle='->', color=['red','green'][i], lw=2.5))
ax.text(V[0,i]*1.2, V[1,i]*1.2, f'$v_{i+1}$', fontsize=14, color=['red','green'][i])
ax.set_aspect('equal')
ax.set_title('1. Unit circle + V\n(right-singular vectors)')
ax.grid(True, alpha=0.3)
ax.set_xlim(-2.5, 2.5)
ax.set_ylim(-2.5, 2.5)
# Panel 2: After V^T (rotation)
rotated = VT @ circle
ax = axes[1]
ax.plot(rotated[0], rotated[1], 'b-', lw=1.5)
for i in range(2):
vr = VT @ V[:, i]
ax.annotate('', xy=vr, xytext=[0,0],
arrowprops=dict(arrowstyle='->', color=['red','green'][i], lw=2.5))
ax.set_aspect('equal')
ax.set_title('2. After $V^T$ (rotate)')
ax.grid(True, alpha=0.3)
ax.set_xlim(-2.5, 2.5)
ax.set_ylim(-2.5, 2.5)
# Panel 3: After Sigma (scale)
S = np.diag(s)
scaled = S @ VT @ circle
ax = axes[2]
ax.plot(scaled[0], scaled[1], 'b-', lw=1.5)
for i in range(2):
vs = S @ VT @ V[:, i]
ax.annotate('', xy=vs, xytext=[0,0],
arrowprops=dict(arrowstyle='->', color=['red','green'][i], lw=2.5))
ax.text(vs[0]*1.1, vs[1]*1.1, f'$\\sigma_{i+1}={s[i]:.2f}$', fontsize=12)
ax.set_aspect('equal')
ax.set_title('3. After $\\Sigma$ (scale)\nCircle becomes ellipse')
ax.grid(True, alpha=0.3)
ax.set_xlim(-2.5, 2.5)
ax.set_ylim(-2.5, 2.5)
# Panel 4: After U (final rotation) = full A transformation
final = A @ circle
ax = axes[3]
ax.plot(final[0], final[1], 'b-', lw=1.5)
for i in range(2):
vf = s[i] * U[:, i]
ax.annotate('', xy=vf, xytext=[0,0],
arrowprops=dict(arrowstyle='->', color=['red','green'][i], lw=2.5))
ax.text(vf[0]*1.15, vf[1]*1.15, f'$\\sigma_{i+1} u_{i+1}$', fontsize=12)
ax.set_aspect('equal')
ax.set_title('4. After $U$ (rotate)\n$A = U\\Sigma V^T$')
ax.grid(True, alpha=0.3)
ax.set_xlim(-2.5, 2.5)
ax.set_ylim(-2.5, 2.5)
plt.suptitle('SVD as Sequential Transformations: $V^T$ (rotate), $\\Sigma$ (scale), $U$ (rotate)',
fontsize=14, y=1.02)
plt.tight_layout()
plt.show()
6. Matrix Approximation (MML 4.6)ΒΆ
The SVD allows us to write any rank-\(r\) matrix as a sum of rank-1 outer products (MML Eq. 4.91): $\(A = \sum_{i=1}^{r} \sigma_i u_i v_i^\top = \sum_{i=1}^{r} \sigma_i A_i\)$
A rank-\(k\) approximation (\(k < r\)) keeps only the top \(k\) terms (MML Eq. 4.92): $\(\hat{A}(k) = \sum_{i=1}^{k} \sigma_i u_i v_i^\top\)$
Eckart-Young Theorem (MML Theorem 4.25)ΒΆ
The rank-\(k\) SVD approximation is the best rank-\(k\) approximation in spectral norm: $\(\hat{A}(k) = \text{argmin}_{\text{rk}(B) = k} \|A - B\|_2\)\( \)\(\|A - \hat{A}(k)\|_2 = \sigma_{k+1}\)$
The approximation error is exactly the \((k+1)\)-th singular value.
# --- Rank-k approximation of a matrix ---
A = np.array([[5, 4, 1],
[5, 5, 0],
[0, 0, 5],
[1, 0, 4]])
U, s, VT = np.linalg.svd(A, full_matrices=False)
print("=== Low-Rank Approximation (MML Section 4.6) ===")
print(f"A ({A.shape[0]}x{A.shape[1]}) =\n{A}")
print(f"Singular values: {s}")
print(f"Rank: {np.sum(s > 1e-10)}")
print()
# Show rank-1 outer products A_i = sigma_i * u_i * v_i^T
print("=== Rank-1 Components (MML Eq. 4.90) ===")
for i in range(len(s)):
Ai = s[i] * np.outer(U[:, i], VT[i, :])
print(f"A_{i+1} = sigma_{i+1} * u_{i+1} @ v_{i+1}^T (sigma={s[i]:.4f})")
print(f"{np.round(Ai, 3)}")
print()
# Rank-k approximation and error (Eckart-Young Theorem)
print("=== Eckart-Young Theorem Verification ===")
for k in range(1, len(s) + 1):
A_hat_k = np.sum([s[i] * np.outer(U[:, i], VT[i, :]) for i in range(k)], axis=0)
error = np.linalg.norm(A - A_hat_k, ord=2) # spectral norm
expected_error = s[k] if k < len(s) else 0.0
print(f"Rank-{k}: ||A - A_hat({k})||_2 = {error:.4f}, "
f"sigma_{k+1} = {expected_error:.4f}, "
f"match = {np.isclose(error, expected_error, atol=1e-8)}")
# --- Image compression demo using SVD (inspired by MML Figures 4.11, 4.12) ---
# Create a synthetic image: a checkerboard pattern with gradients
np.random.seed(42)
size = 64
x = np.linspace(0, 1, size)
y = np.linspace(0, 1, size)
X, Y = np.meshgrid(x, y)
# Compose: smooth gradients + some structure + small noise
image = (
0.5 * np.sin(2 * np.pi * X) * np.cos(2 * np.pi * Y)
+ 0.3 * np.exp(-((X - 0.5)**2 + (Y - 0.5)**2) / 0.1)
+ 0.1 * (X + Y)
+ 0.02 * np.random.randn(size, size)
)
U, s, VT = np.linalg.svd(image, full_matrices=False)
ranks = [1, 2, 5, 10, 20, 64]
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
for idx, k in enumerate(ranks):
ax = axes[idx // 3, idx % 3]
# Rank-k approximation (MML Eq. 4.92)
A_hat = U[:, :k] @ np.diag(s[:k]) @ VT[:k, :]
# Compression ratio
original_size = size * size
compressed_size = k * (size + size + 1) # k left-singular + k right-singular + k singular values
ratio = compressed_size / original_size * 100
error = np.linalg.norm(image - A_hat, 'fro') / np.linalg.norm(image, 'fro')
ax.imshow(A_hat, cmap='viridis')
ax.set_title(f'Rank {k}\nStorage: {ratio:.1f}%, Error: {error:.3f}')
ax.axis('off')
plt.suptitle('Image Compression via Truncated SVD (MML Eq. 4.92)', fontsize=14)
plt.tight_layout()
plt.show()
# Plot singular value spectrum
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
ax1.semilogy(range(1, len(s)+1), s, 'bo-', markersize=4)
ax1.set_xlabel('Index i')
ax1.set_ylabel('$\\sigma_i$ (log scale)')
ax1.set_title('Singular Value Spectrum')
ax1.grid(True, alpha=0.3)
# Cumulative energy: how much of ||A||_F^2 is captured
energy = np.cumsum(s**2) / np.sum(s**2) * 100
ax2.plot(range(1, len(s)+1), energy, 'r-o', markersize=4)
ax2.axhline(y=95, color='gray', linestyle='--', label='95% energy')
ax2.set_xlabel('Rank k')
ax2.set_ylabel('Cumulative energy (%)')
ax2.set_title('Cumulative Energy (fraction of $||A||_F^2$)')
ax2.legend()
ax2.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# --- Another demo: compress a structured pattern with clear low-rank behavior ---
# Create an image with known low-rank structure
rows = np.linspace(0, 1, 80).reshape(-1, 1)
cols = np.linspace(0, 1, 100).reshape(1, -1)
# True rank-3 image + noise
image2 = (
3.0 * np.sin(np.pi * rows) @ np.cos(2 * np.pi * cols)
+ 2.0 * np.cos(2 * np.pi * rows) @ np.sin(np.pi * cols)
+ 1.0 * rows @ np.ones_like(cols)
+ 0.05 * np.random.randn(80, 100)
)
U2, s2, VT2 = np.linalg.svd(image2, full_matrices=False)
fig, axes = plt.subplots(1, 4, figsize=(18, 4))
for idx, k in enumerate([1, 3, 10, 80]):
A_hat = U2[:, :k] @ np.diag(s2[:k]) @ VT2[:k, :]
error_fro = np.linalg.norm(image2 - A_hat, 'fro') / np.linalg.norm(image2, 'fro')
axes[idx].imshow(A_hat, cmap='RdBu_r', aspect='auto')
axes[idx].set_title(f'Rank {k} (rel. error={error_fro:.4f})')
axes[idx].axis('off')
plt.suptitle('Low-rank approximation of a (nearly) rank-3 signal + noise', fontsize=13)
plt.tight_layout()
plt.show()
print(f"Top 5 singular values: {s2[:5]}")
print(f"Notice the sharp drop after sigma_3 -- the true signal has rank 3.")
print(f"sigma_4/sigma_3 ratio: {s2[3]/s2[2]:.4f} (small => rank-3 is a good approximation)")
7. Matrix Phylogeny (MML 4.7)ΒΆ
MML Section 4.7 presents a taxonomy of matrix types and their decompositions. Each class of matrix adds structure that enables stronger decomposition results.
Matrix Type HierarchyΒΆ
All matrices (R^{m x n})
|-- SVD always exists: A = U Sigma V^T
|
+-- Square matrices (R^{n x n})
|-- Determinant, trace, eigenvalues defined
|
+-- Non-singular (invertible): det(A) != 0
|
+-- Non-defective (diagonalizable): A = PDP^{-1}
| |-- n linearly independent eigenvectors
| |-- Includes: all matrices with n distinct eigenvalues
|
+-- Symmetric (A = A^T)
|-- Always diagonalizable (Spectral Theorem 4.15)
|-- Real eigenvalues, orthogonal eigenvectors
|-- A = P D P^T (P orthogonal)
|
+-- Symmetric Positive Definite (SPD)
|-- All eigenvalues > 0
|-- Cholesky: A = LL^T
|-- Always invertible
|-- SVD = Eigendecomposition (with Sigma = D)
Summary TableΒΆ
Matrix Type |
Decomposition |
Requirements |
|---|---|---|
Any \(A \in \mathbb{R}^{m \times n}\) |
\(A = U\Sigma V^\top\) (SVD) |
Always exists |
Non-defective \(A \in \mathbb{R}^{n \times n}\) |
\(A = PDP^{-1}\) |
\(n\) lin. indep. eigenvectors |
Symmetric \(A = A^\top\) |
\(A = PDP^\top\) |
Always (Spectral Theorem) |
Symmetric Positive Definite |
\(A = LL^\top\) (Cholesky) |
\(A = A^\top\), all \(\lambda_i > 0\) |
# --- Demonstrate the matrix phylogeny: verify properties at each level ---
print("=" * 70)
print("MATRIX PHYLOGENY -- Verification of Decompositions at Each Level")
print("=" * 70)
print()
# Level 1: Arbitrary rectangular matrix -> SVD always exists
print("--- Level 1: Any matrix (rectangular) -> SVD ---")
A_rect = np.array([[1, -0.8],
[0, 1 ],
[1, 0 ]])
U, s, VT = np.linalg.svd(A_rect, full_matrices=True)
Sigma = np.zeros_like(A_rect, dtype=float)
for i in range(len(s)):
Sigma[i, i] = s[i]
print(f"A (3x2) =\n{A_rect}")
print(f"Singular values: {s}")
print(f"U @ Sigma @ V^T = A? {np.allclose(A_rect, U @ Sigma @ VT)}")
print()
# Level 2: Square non-defective -> Eigendecomposition
print("--- Level 2: Square non-defective -> Eigendecomposition ---")
A_sq = np.array([[4, 2],
[1, 3]])
vals, vecs = np.linalg.eig(A_sq)
P = vecs
D = np.diag(vals)
P_inv = np.linalg.inv(P)
print(f"A =\n{A_sq}")
print(f"Eigenvalues: {vals.real}")
print(f"P @ D @ P^{{-1}} = A? {np.allclose(A_sq, P @ D @ P_inv)}")
print(f"P orthogonal? {np.allclose(P.T @ P, np.eye(2))} (not required here)")
print()
# Level 3: Symmetric -> Orthogonal eigendecomposition (P^{-1} = P^T)
print("--- Level 3: Symmetric -> A = P D P^T ---")
A_sym = np.array([[3, 1],
[1, 3]])
vals_s, vecs_s = np.linalg.eigh(A_sym)
P_s = vecs_s
D_s = np.diag(vals_s)
print(f"A = A^T =\n{A_sym}")
print(f"Eigenvalues (all real): {vals_s}")
print(f"P orthogonal? {np.allclose(P_s.T @ P_s, np.eye(2))}")
print(f"P @ D @ P^T = A? {np.allclose(A_sym, P_s @ D_s @ P_s.T)}")
print()
# Level 4: SPD -> Cholesky + Eigendecomposition + SVD all coincide
print("--- Level 4: Symmetric Positive Definite -> Cholesky + Eigen + SVD ---")
A_spd = np.array([[4, 2],
[2, 3]])
vals_spd = np.linalg.eigvalsh(A_spd)
print(f"A (SPD) =\n{A_spd}")
print(f"Eigenvalues (all > 0): {vals_spd}")
# Cholesky
L = np.linalg.cholesky(A_spd)
print(f"Cholesky L L^T = A? {np.allclose(A_spd, L @ L.T)}")
# Eigendecomposition
vals_e, vecs_e = np.linalg.eigh(A_spd)
print(f"Eigen P D P^T = A? {np.allclose(A_spd, vecs_e @ np.diag(vals_e) @ vecs_e.T)}")
# SVD
U_spd, s_spd, VT_spd = np.linalg.svd(A_spd)
print(f"SVD singular values: {s_spd}")
print(f"SVD eigenvalues: {vals_e}")
print(f"For SPD: singular values = eigenvalues? {np.allclose(sorted(s_spd), sorted(vals_e))}")
# --- Visual summary: decomposition hierarchy ---
fig, ax = plt.subplots(figsize=(12, 7))
ax.set_xlim(0, 10)
ax.set_ylim(0, 10)
ax.axis('off')
# Draw nested boxes from outermost to innermost
boxes = [
(0.5, 0.5, 9, 9, '#E8F5E9', 'All Matrices ($\mathbb{R}^{m \\times n}$)\nSVD: $A = U\\Sigma V^T$'),
(1.2, 1.0, 7.6, 7.5, '#E3F2FD', 'Square ($\mathbb{R}^{n \\times n}$)\ndet, tr, eigenvalues defined'),
(1.9, 1.5, 6.2, 6.0, '#FFF3E0', 'Non-defective (Diagonalizable)\n$A = PDP^{-1}$'),
(2.6, 2.0, 4.8, 4.5, '#F3E5F5', 'Symmetric ($A = A^T$)\n$A = PDP^T$ (P orthogonal)'),
(3.3, 2.5, 3.4, 3.0, '#FFEBEE', 'Symm. Pos. Definite\n$A = LL^T$ (Cholesky)\nAll $\\lambda_i > 0$'),
]
for x, y, w, h, color, label in boxes:
rect = plt.Rectangle((x, y), w, h, linewidth=2, edgecolor='black',
facecolor=color, alpha=0.7, zorder=1)
ax.add_patch(rect)
ax.text(x + w/2, y + h - 0.6, label, ha='center', va='top', fontsize=10,
fontweight='bold', zorder=2)
ax.set_title('Matrix Phylogeny: Hierarchy of Matrix Types and Decompositions\n(MML Section 4.7)',
fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()
SummaryΒΆ
This lab covered the core matrix decomposition techniques from MML Chapter 4:
Topic |
Key Idea |
MML Reference |
|---|---|---|
Determinant |
Signed volume; \(\det(A)=0\) iff singular; cofactor expansion |
Section 4.1 |
Trace |
Sum of diag; cyclic invariance \(\text{tr}(AB)=\text{tr}(BA)\); sum of eigenvalues |
Section 4.1 |
Eigenvalues/Eigenvectors |
\(Ax=\lambda x\); found via characteristic polynomial \(\det(A-\lambda I)=0\) |
Section 4.2 |
Cholesky |
\(A=LL^T\) for SPD matrices; efficient determinant computation |
Section 4.3 |
Eigendecomposition |
\(A=PDP^{-1}\); requires non-defective matrix; symmetric \(\Rightarrow P\) orthogonal |
Section 4.4 |
SVD |
\(A=U\Sigma V^T\) always exists; \(V\)=right-singular (eigvecs of \(A^TA\)), \(U\)=left-singular |
Section 4.5 |
Matrix Approximation |
Rank-\(k\) truncated SVD; Eckart-Young: optimal in spectral norm; \(|A-\hat{A}(k)|_2=\sigma_{k+1}\) |
Section 4.6 |
Matrix Phylogeny |
All \(\supset\) Square \(\supset\) Non-defective \(\supset\) Symmetric \(\supset\) SPD |
Section 4.7 |
Key takeaway: Matrix decompositions reveal hidden structure. The SVD is the most universal tool β it always exists, gives the optimal low-rank approximation, and unifies many other decompositions as special cases.