Run this notebook: Open in Colab Open in Kaggle

Lab 01: Linear Algebra¶

Based on Chapter 2 of Mathematics for Machine Learning (Deisenroth, Faisal, Ong)¶

This notebook is a hands-on practice lab covering the core linear algebra concepts from MML Chapter 2. We explore systems of linear equations, matrices, Gaussian elimination, vector spaces, linear independence, basis and rank, linear mappings, and affine spaces – all implemented with NumPy and Matplotlib.

Reference: Mathematics for Machine Learning – Chapter 2

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import FancyArrowPatch
import matplotlib.patches as mpatches

np.set_printoptions(precision=4, suppress=True)
%matplotlib inline
plt.rcParams['figure.figsize'] = (8, 5)
plt.rcParams['font.size'] = 12
print('Setup complete.')

1. Systems of Linear Equations (MML 2.1)¶

A system of linear equations has the general form:

\[a_{11}x_1 + a_{12}x_2 + \cdots + a_{1n}x_n = b_1\]

\[\vdots\]

\[a_{m1}x_1 + a_{m2}x_2 + \cdots + a_{mn}x_n = b_m\]

which can be written compactly as $\mathbf{A}\mathbf{x} = \mathbf{b}$, where $\mathbf{A} \in \mathbb{R}^{m \times n}$, $\mathbf{x} \in \mathbb{R}^n$, $\mathbf{b} \in \mathbb{R}^m$.

For a system of two equations with two unknowns, each equation defines a line in 2D. The solution is the intersection of these lines. There are three possibilities:

Exactly one solution (lines intersect at a point)
No solution (lines are parallel)
Infinitely many solutions (lines are identical)

# Example from MML (2.8): 4x1 + 4x2 = 5, 2x1 - 4x2 = 1
# Unique solution: (x1, x2) = (1, 1/4)
A = np.array([[4, 4],
              [2, -4]])
b = np.array([5, 1])

x = np.linalg.solve(A, b)
print(f'System: 4x1 + 4x2 = 5,  2x1 - 4x2 = 1')
print(f'Solution: x1 = {x[0]:.4f}, x2 = {x[1]:.4f}')
print(f'Verification: A @ x = {A @ x}')

# Visualize three cases: unique solution, no solution, infinitely many
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

x_range = np.linspace(-1, 3, 300)

# Case 1: Unique solution -- 4x1 + 4x2 = 5, 2x1 - 4x2 = 1
ax = axes[0]
y1 = (5 - 4*x_range) / 4
y2 = (2*x_range - 1) / 4
ax.plot(x_range, y1, 'b-', linewidth=2, label=r'$4x_1 + 4x_2 = 5$')
ax.plot(x_range, y2, 'r-', linewidth=2, label=r'$2x_1 - 4x_2 = 1$')
ax.plot(1, 0.25, 'ko', markersize=8, zorder=5)
ax.annotate('(1, 0.25)', xy=(1, 0.25), xytext=(1.3, 0.6),
            arrowprops=dict(arrowstyle='->', color='black'), fontsize=10)
ax.set_title('Unique Solution')
ax.set_xlabel(r'$x_1$'); ax.set_ylabel(r'$x_2$')
ax.legend(fontsize=9); ax.grid(True, alpha=0.3)
ax.set_xlim(-0.5, 2.5); ax.set_ylim(-1, 2)

# Case 2: No solution -- parallel lines: x1 + x2 = 1, x1 + x2 = 3
ax = axes[1]
y1 = 1 - x_range
y2 = 3 - x_range
ax.plot(x_range, y1, 'b-', linewidth=2, label=r'$x_1 + x_2 = 1$')
ax.plot(x_range, y2, 'r-', linewidth=2, label=r'$x_1 + x_2 = 3$')
ax.set_title('No Solution (Parallel)')
ax.set_xlabel(r'$x_1$'); ax.set_ylabel(r'$x_2$')
ax.legend(fontsize=9); ax.grid(True, alpha=0.3)
ax.set_xlim(-0.5, 2.5); ax.set_ylim(-1, 2)

# Case 3: Infinitely many -- same line: x1 + x2 = 1, 2x1 + 2x2 = 2
ax = axes[2]
y1 = 1 - x_range
ax.plot(x_range, y1, 'b-', linewidth=4, alpha=0.5, label=r'$x_1 + x_2 = 1$')
ax.plot(x_range, y1, 'r--', linewidth=2, label=r'$2x_1 + 2x_2 = 2$')
ax.set_title('Infinitely Many Solutions')
ax.set_xlabel(r'$x_1$'); ax.set_ylabel(r'$x_2$')
ax.legend(fontsize=9); ax.grid(True, alpha=0.3)
ax.set_xlim(-0.5, 2.5); ax.set_ylim(-1, 2)

plt.tight_layout()
plt.suptitle('Geometric Interpretation of 2D Linear Systems', y=1.03, fontsize=14)
plt.show()

# 3D system: three planes intersecting at a point
# x1 + x2 + x3 = 3,  x1 - x2 + 2x3 = 2,  x2 + x3 = 2  (MML Example 2.2 variant)
A3 = np.array([[1, 1, 1],
               [1, -1, 2],
               [0, 1, 1]])
b3 = np.array([3, 2, 2])
x3 = np.linalg.solve(A3, b3)
print(f'3D System solution: x = {x3}')
print(f'Verification: A @ x = {A3 @ x3}')

# Visualize the three planes
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')

xx, yy = np.meshgrid(np.linspace(-1, 3, 20), np.linspace(-1, 3, 20))

# Plane 1: x1 + x2 + x3 = 3  =>  x3 = 3 - x1 - x2
zz1 = 3 - xx - yy
ax.plot_surface(xx, yy, zz1, alpha=0.25, color='blue')

# Plane 2: x1 - x2 + 2x3 = 2  =>  x3 = (2 - x1 + x2) / 2
zz2 = (2 - xx + yy) / 2
ax.plot_surface(xx, yy, zz2, alpha=0.25, color='red')

# Plane 3: x2 + x3 = 2  =>  x3 = 2 - x2
zz3 = 2 - yy
ax.plot_surface(xx, yy, zz3, alpha=0.25, color='green')

ax.scatter(*x3, color='black', s=100, zorder=10)
ax.set_xlabel(r'$x_1$'); ax.set_ylabel(r'$x_2$'); ax.set_zlabel(r'$x_3$')
ax.set_title('Three planes intersecting at a single point')

blue_patch = mpatches.Patch(color='blue', alpha=0.4, label=r'$x_1+x_2+x_3=3$')
red_patch = mpatches.Patch(color='red', alpha=0.4, label=r'$x_1-x_2+2x_3=2$')
green_patch = mpatches.Patch(color='green', alpha=0.4, label=r'$x_2+x_3=2$')
ax.legend(handles=[blue_patch, red_patch, green_patch], fontsize=9)
plt.tight_layout()
plt.show()

2. Matrices (MML 2.2)¶

A matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ is a rectangular array of real numbers with $m$ rows and $n$ columns.

Key operations:

Addition: $(\mathbf{A} + \mathbf{B})_{ij} = a_{ij} + b_{ij}$ (element-wise, same dimensions required)
Multiplication: $(\mathbf{A}\mathbf{B})_{ij} = \sum_{l=1}^{n} a_{il}b_{lj}$ (inner dimensions must match)
Transpose: $(\mathbf{A}^\top)_{ij} = a_{ji}$
Inverse: $\mathbf{A}\mathbf{A}^{-1} = \mathbf{A}^{-1}\mathbf{A} = \mathbf{I}_n$ (only for square, non-singular matrices)

Important properties:

Associativity: $(\mathbf{A}\mathbf{B})\mathbf{C} = \mathbf{A}(\mathbf{B}\mathbf{C})$
Distributivity: $\mathbf{A}(\mathbf{B} + \mathbf{C}) = \mathbf{A}\mathbf{B} + \mathbf{A}\mathbf{C}$
Not commutative in general: $\mathbf{A}\mathbf{B} \neq \mathbf{B}\mathbf{A}$
$(\mathbf{A}\mathbf{B})^\top = \mathbf{B}^\top \mathbf{A}^\top$
$(\mathbf{A}\mathbf{B})^{-1} = \mathbf{B}^{-1}\mathbf{A}^{-1}$

# Matrix operations -- using Example 2.3 from MML
A = np.array([[1, 2, 3],
              [3, 2, 1]])

B = np.array([[0, 2],
              [1, -1],
              [0, 1]])

print('A (2x3):')
print(A)
print('\nB (3x2):')
print(B)

AB = A @ B
BA = B @ A
print(f'\nAB (2x2):\n{AB}')
print(f'\nBA (3x3):\n{BA}')
print(f'\nAB != BA  (matrix multiplication is NOT commutative, dimensions even differ here)')

# Transpose, inverse, and key identities
# Example 2.4 from MML: A and B are inverses
A_inv_ex = np.array([[1, 2, 1],
                     [4, 4, 5],
                     [6, 7, 7]])

B_inv_ex = np.array([[-7, -7, 6],
                     [2, 1, -1],
                     [4, 5, -4]])

print('A @ B (should be identity):')
print(A_inv_ex @ B_inv_ex)

print('\n--- Transpose properties ---')
C = np.array([[1, 2], [3, 4], [5, 6]])  # 3x2
D = np.array([[7, 8, 9], [10, 11, 12]])  # 2x3
print(f'(CD)^T == D^T @ C^T: {np.allclose((C @ D).T, D.T @ C.T)}')
print(f'(A^T)^T == A:        {np.allclose(C.T.T, C)}')

print('\n--- Inverse properties ---')
M = np.array([[2, 1], [1, 3]])
N = np.array([[1, 2], [0, 1]])
M_inv = np.linalg.inv(M)
N_inv = np.linalg.inv(N)
print(f'M @ M_inv == I:       {np.allclose(M @ M_inv, np.eye(2))}')
print(f'(MN)^-1 == N^-1 M^-1: {np.allclose(np.linalg.inv(M @ N), N_inv @ M_inv)}')

# Symmetric matrix check
S = np.array([[1, 2, 3], [2, 5, 6], [3, 6, 9]])
print(f'\nS is symmetric (S == S^T): {np.allclose(S, S.T)}')
print(f'S:\n{S}')

3. Solving Systems of Linear Equations – Gaussian Elimination (MML 2.3)¶

Gaussian elimination transforms $\mathbf{A}\mathbf{x} = \mathbf{b}$ into row echelon form (REF) using three elementary row operations:

Swap two rows
Multiply a row by a non-zero scalar
Add a multiple of one row to another

A matrix is in row echelon form if:

All zero rows are at the bottom
The leading entry (pivot) of each non-zero row is strictly to the right of the pivot above it

A matrix is in reduced row echelon form (RREF) if additionally:

Every pivot is 1
Each pivot is the only non-zero entry in its column

We build the augmented matrix $[\mathbf{A} \mid \mathbf{b}]$ and then apply row operations.

def gaussian_elimination(A, b, verbose=True):
    """
    Solve Ax = b using Gaussian elimination with partial pivoting.
    Returns the solution vector x.
    """
    n = len(b)
    # Build augmented matrix [A|b]
    M = np.hstack([A.astype(float), b.astype(float).reshape(-1, 1)])
    if verbose:
        print('Augmented matrix [A|b]:')
        print(M)
        print()

    # Forward elimination (to row echelon form)
    for col in range(n):
        # Partial pivoting: find the row with the largest absolute value in current column
        max_row = col + np.argmax(np.abs(M[col:, col]))
        if max_row != col:
            M[[col, max_row]] = M[[max_row, col]]
            if verbose:
                print(f'Swap row {col} and row {max_row}')

        if np.abs(M[col, col]) < 1e-12:
            raise ValueError('Matrix is singular or nearly singular.')

        # Eliminate entries below the pivot
        for row in range(col + 1, n):
            factor = M[row, col] / M[col, col]
            M[row, col:] -= factor * M[col, col:]
            if verbose:
                print(f'R{row} <- R{row} - ({factor:.4f}) * R{col}')

    if verbose:
        print(f'\nRow echelon form:')
        print(M)
        print()

    # Back substitution
    x = np.zeros(n)
    for i in range(n - 1, -1, -1):
        x[i] = (M[i, -1] - M[i, i+1:n] @ x[i+1:n]) / M[i, i]

    return x


# Test: system from MML (2.5): unique solution (1, 1, 1)
A_test = np.array([[1, 1, 1],
                   [1, -1, 2],
                   [0, 1, 1]])
b_test = np.array([3, 2, 2])

print('=== Gaussian Elimination (from scratch) ===')
x_gauss = gaussian_elimination(A_test, b_test)
print(f'Solution: {x_gauss}')
print(f'Verification: A @ x = {A_test @ x_gauss}')

# Compare with np.linalg.solve
x_numpy = np.linalg.solve(A_test, b_test)
print('Comparison: Gaussian elimination vs np.linalg.solve')
print(f'Our solution:   {x_gauss}')
print(f'NumPy solution: {x_numpy}')
print(f'Match: {np.allclose(x_gauss, x_numpy)}')

# Larger random system
np.random.seed(42)
n = 5
A_big = np.random.randn(n, n)
b_big = np.random.randn(n)
x_ours = gaussian_elimination(A_big, b_big, verbose=False)
x_np = np.linalg.solve(A_big, b_big)
print(f'\n5x5 random system:')
print(f'Our solution:   {x_ours}')
print(f'NumPy solution: {x_np}')
print(f'Max absolute error: {np.max(np.abs(x_ours - x_np)):.2e}')

4. Vector Spaces (MML 2.4)¶

A vector space $V = (\mathcal{V}, +, \cdot)$ is a set $\mathcal{V}$ equipped with:

An inner operation $+$ (vector addition), making $(\mathcal{V}, +)$ an Abelian group
An outer operation $\cdot$ (scalar multiplication)

satisfying distributivity, associativity, and a neutral element for scalar multiplication.

A vector subspace $U \subseteq V$ is a subset that is itself a vector space under the same operations. To verify a subspace, check:

$\mathbf{0} \in U$ (contains the zero vector)
Closed under addition: $\mathbf{x}, \mathbf{y} \in U \Rightarrow \mathbf{x} + \mathbf{y} \in U$
Closed under scalar multiplication: $\lambda \in \mathbb{R}, \mathbf{x} \in U \Rightarrow \lambda\mathbf{x} \in U$

The span of a set of vectors $\{\mathbf{x}_1, \ldots, \mathbf{x}_k\}$ is the set of all their linear combinations: $$\text{span}[\mathbf{x}_1, \ldots, \mathbf{x}_k] = \left\{\sum_{i=1}^{k} \lambda_i \mathbf{x}_i \mid \lambda_i \in \mathbb{R}\right\}$$

# Demonstrate subspaces visually in R^2
# A line through the origin IS a subspace; a line NOT through the origin is NOT

fig, axes = plt.subplots(1, 3, figsize=(15, 5))

t = np.linspace(-3, 3, 100)

# Subspace: line through origin y = 2x
ax = axes[0]
ax.plot(t, 2*t, 'b-', linewidth=2)
ax.plot(0, 0, 'ko', markersize=6)
# Show closure: pick two points, show their sum is on the line
p1 = np.array([1, 2])
p2 = np.array([0.5, 1])
p_sum = p1 + p2
ax.plot(*p1, 'ro', markersize=8)
ax.plot(*p2, 'go', markersize=8)
ax.plot(*p_sum, 'ms', markersize=10)
ax.annotate('p1', xy=p1, xytext=(1.3, 1.5), fontsize=10)
ax.annotate('p2', xy=p2, xytext=(0.8, 0.5), fontsize=10)
ax.annotate('p1+p2', xy=p_sum, xytext=(1.8, 3.3), fontsize=10)
ax.axhline(0, color='k', linewidth=0.5)
ax.axvline(0, color='k', linewidth=0.5)
ax.set_title('Subspace: line through origin\n(closed under + and scalar mult.)', fontsize=11)
ax.set_xlim(-3, 3); ax.set_ylim(-3, 4)
ax.grid(True, alpha=0.3)

# NOT a subspace: line y = 2x + 1 (does not pass through origin)
ax = axes[1]
ax.plot(t, 2*t + 1, 'r-', linewidth=2)
ax.plot(0, 0, 'ko', markersize=6)
ax.annotate('0 is NOT on the line', xy=(0, 0), xytext=(0.5, -1.5),
            arrowprops=dict(arrowstyle='->', color='black'), fontsize=10)
ax.axhline(0, color='k', linewidth=0.5)
ax.axvline(0, color='k', linewidth=0.5)
ax.set_title('NOT a subspace: y = 2x + 1\n(does not contain zero vector)', fontsize=11)
ax.set_xlim(-3, 3); ax.set_ylim(-3, 4)
ax.grid(True, alpha=0.3)

# Subspace: span of two vectors = entire R^2
ax = axes[2]
v1 = np.array([1, 0])
v2 = np.array([0.5, 1])
ax.arrow(0, 0, v1[0], v1[1], head_width=0.1, head_length=0.05, fc='blue', ec='blue', linewidth=2)
ax.arrow(0, 0, v2[0], v2[1], head_width=0.1, head_length=0.05, fc='red', ec='red', linewidth=2)
ax.annotate(r'$\mathbf{v}_1$', xy=v1, xytext=(1.1, -0.3), fontsize=12, color='blue')
ax.annotate(r'$\mathbf{v}_2$', xy=v2, xytext=(0.6, 1.1), fontsize=12, color='red')
# Show some linear combinations
alphas = np.linspace(-2, 2, 10)
for a in alphas:
    for bb in alphas:
        pt = a * v1 + bb * v2
        if np.abs(pt[0]) < 3 and np.abs(pt[1]) < 3:
            ax.plot(pt[0], pt[1], 'k.', markersize=2, alpha=0.4)
ax.axhline(0, color='k', linewidth=0.5)
ax.axvline(0, color='k', linewidth=0.5)
ax.set_title(r'span$[\mathbf{v}_1, \mathbf{v}_2] = \mathbb{R}^2$' + '\n(two non-parallel vectors span the plane)', fontsize=11)
ax.set_xlim(-3, 3); ax.set_ylim(-3, 3)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Column space (image) of a matrix = span of its columns
A_cs = np.array([[1, 2],
                 [2, 4],
                 [3, 6]])

print('Matrix A (3x2):')
print(A_cs)
print(f'\nColumn 2 is 2 * Column 1, so columns are linearly dependent.')
print(f'The column space is a 1-dimensional subspace of R^3 (a line through the origin).')
print(f'Rank of A: {np.linalg.matrix_rank(A_cs)}')

# Verify: any Ax is a scalar multiple of [1, 2, 3]
x_test_vec = np.array([3, -1])
result = A_cs @ x_test_vec
print(f'\nA @ [3, -1] = {result}')
print(f'This equals {result[0]} * [1, 2, 3] = {result[0] * np.array([1, 2, 3])}')

5. Linear Independence (MML 2.5)¶

Vectors $\mathbf{x}_1, \ldots, \mathbf{x}_k \in V$ are linearly independent if the only solution to $$\lambda_1 \mathbf{x}_1 + \lambda_2 \mathbf{x}_2 + \cdots + \lambda_k \mathbf{x}_k = \mathbf{0}$$ is $\lambda_1 = \lambda_2 = \cdots = \lambda_k = 0$.

If a non-trivial solution exists (some $\lambda_i \neq 0$), the vectors are linearly dependent.

Practical tests:

Write vectors as columns of a matrix $\mathbf{A}$
If $\det(\mathbf{A}) \neq 0$ (square case): linearly independent
If $\text{rank}(\mathbf{A}) = k$ (number of vectors): linearly independent
Gaussian elimination: independent iff every column is a pivot column

# Example 2.14 from MML: three linearly independent vectors in R^4
x1 = np.array([1, 2, -3, 4])
x2 = np.array([1, 1, 0, 2])
x3 = np.array([-1, -2, 1, 1])

A_indep = np.column_stack([x1, x2, x3])
print('Vectors as columns of A:')
print(A_indep)

rank = np.linalg.matrix_rank(A_indep)
print(f'\nRank of A: {rank}')
print(f'Number of vectors: 3')
print(f'Linearly independent: {rank == 3}')

print('\n--- Dependent example ---')
v1 = np.array([1, 2])
v2 = np.array([2, 4])  # v2 = 2 * v1
A_dep = np.column_stack([v1, v2])
print(f'v1 = {v1}, v2 = {v2}')
print(f'det(A) = {np.linalg.det(A_dep):.4f}')
print(f'Rank = {np.linalg.matrix_rank(A_dep)}')
print(f'Linearly independent: {np.linalg.matrix_rank(A_dep) == 2}')

# Visual: independent vs dependent vectors in R^2
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Independent
ax = axes[0]
v1_i = np.array([2, 1])
v2_i = np.array([1, 3])
ax.arrow(0, 0, v1_i[0], v1_i[1], head_width=0.1, head_length=0.08,
         fc='blue', ec='blue', linewidth=2)
ax.arrow(0, 0, v2_i[0], v2_i[1], head_width=0.1, head_length=0.08,
         fc='red', ec='red', linewidth=2)
ax.text(v1_i[0]+0.1, v1_i[1]-0.3, r'$\mathbf{v}_1$', fontsize=14, color='blue')
ax.text(v2_i[0]+0.1, v2_i[1], r'$\mathbf{v}_2$', fontsize=14, color='red')
det_val = np.linalg.det(np.column_stack([v1_i, v2_i]))
ax.set_title(f'Linearly Independent\ndet = {det_val:.1f} (non-zero)', fontsize=12)
ax.set_xlim(-0.5, 4); ax.set_ylim(-0.5, 4)
ax.grid(True, alpha=0.3)
ax.set_aspect('equal')

# Dependent
ax = axes[1]
v1_d = np.array([1, 2])
v2_d = np.array([2, 4])  # 2 * v1
ax.arrow(0, 0, v1_d[0], v1_d[1], head_width=0.1, head_length=0.08,
         fc='blue', ec='blue', linewidth=2)
ax.arrow(0, 0, v2_d[0], v2_d[1], head_width=0.1, head_length=0.08,
         fc='red', ec='red', linewidth=2)
ax.text(v1_d[0]+0.1, v1_d[1]-0.3, r'$\mathbf{v}_1$', fontsize=14, color='blue')
ax.text(v2_d[0]+0.1, v2_d[1], r'$\mathbf{v}_2 = 2\mathbf{v}_1$', fontsize=14, color='red')
det_val_d = np.linalg.det(np.column_stack([v1_d, v2_d]))
ax.set_title(f'Linearly Dependent\ndet = {det_val_d:.1f} (zero)', fontsize=12)
ax.set_xlim(-0.5, 4); ax.set_ylim(-0.5, 5)
ax.grid(True, alpha=0.3)
ax.set_aspect('equal')

plt.tight_layout()
plt.show()

6. Basis and Rank (MML 2.6)¶

A basis of a vector space $V$ is a minimal generating set – equivalently, a maximal set of linearly independent vectors that spans $V$.

Key facts:

Every basis of $V$ has the same number of elements, called the dimension $\dim(V)$
The canonical basis of $\mathbb{R}^n$ is $\{\mathbf{e}_1, \ldots, \mathbf{e}_n\}$

The rank of a matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ is the number of linearly independent columns (= number of linearly independent rows): $$\text{rk}(\mathbf{A}) = \dim(\text{Im}(\mathbf{A})) = \dim(\text{column space of } \mathbf{A})$$

The null space (kernel) of $\mathbf{A}$ is $\ker(\mathbf{A}) = \{\mathbf{x} \in \mathbb{R}^n : \mathbf{A}\mathbf{x} = \mathbf{0}\}$.

Rank-Nullity Theorem: $\dim(\ker(\mathbf{A})) + \text{rk}(\mathbf{A}) = n$ (number of columns)

# Finding basis and rank -- inspired by MML Example 2.17
# Vectors that span a subspace U of R^5
x1 = np.array([1, 2, -1, -1, -1])
x2 = np.array([2, -1, 1, 2, -2])
x3 = np.array([3, -4, 3, 5, -3])
x4 = np.array([-1, 8, -5, -6, 1])

A_basis = np.column_stack([x1, x2, x3, x4])
print('Matrix A (vectors as columns):')
print(A_basis)

rank_A = np.linalg.matrix_rank(A_basis)
print(f'\nRank: {rank_A}')
print(f'Number of vectors: 4')
print(f'=> {4 - rank_A} vector(s) are redundant (linearly dependent on others)')

# Use SVD to get column space basis
U, s, Vt = np.linalg.svd(A_basis, full_matrices=False)
col_space_basis = U[:, :rank_A]
print(f'\nOrthonormal basis for column space (via SVD):')
for i in range(rank_A):
    print(f'  basis vector {i+1}: {col_space_basis[:, i]}')

# Null space and Rank-Nullity Theorem
# Example from MML 2.7.3: Phi: R^4 -> R^2, A = [[1, 2, -1, 0], [1, 0, 0, 1]]
A_kern = np.array([[1, 2, -1, 0],
                   [1, 0,  0, 1]])

print('A (2x4):')
print(A_kern)

rank_kern = np.linalg.matrix_rank(A_kern)
n_cols = A_kern.shape[1]
nullity = n_cols - rank_kern

print(f'\nRank: {rank_kern}')
print(f'Nullity (dim of kernel): {nullity}')
print(f'Rank + Nullity = {rank_kern} + {nullity} = {rank_kern + nullity} = n (number of columns)')

# Compute null space using SVD
U_k, s_k, Vt_k = np.linalg.svd(A_kern)
null_space = Vt_k[rank_kern:].T
print(f'\nNull space basis ({nullity} vectors):')
for i in range(nullity):
    ns_vec = null_space[:, i]
    print(f'  n{i+1} = {ns_vec}')
    print(f'  Verification A @ n{i+1} = {A_kern @ ns_vec}  (should be ~0)')

# Visualize rank deficiency: a 3x3 matrix with rank 2
# Column space is a plane in R^3, null space is a line
A_rank2 = np.array([[1, 0, 1],
                    [0, 1, 1],
                    [0, 0, 0]])

print(f'A = \n{A_rank2}')
print(f'Rank: {np.linalg.matrix_rank(A_rank2)}')
print(f'Column 3 = Column 1 + Column 2 (linearly dependent)')

fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')

# Column space: span of [1,0,0] and [0,1,0] = the x-y plane
s_range = np.linspace(-2, 2, 10)
t_range = np.linspace(-2, 2, 10)
ss, tt = np.meshgrid(s_range, t_range)
# col1 = [1,0,0], col2 = [0,1,0]
xx_cs = ss * 1 + tt * 0
yy_cs = ss * 0 + tt * 1
zz_cs = ss * 0 + tt * 0
ax.plot_surface(xx_cs, yy_cs, zz_cs, alpha=0.3, color='blue')

# Null space: A @ [x1,x2,x3]^T = 0 => x1+x3=0, x2+x3=0 => x = t*[-1,-1,1]
t_null = np.linspace(-2, 2, 50)
ax.plot(-t_null, -t_null, t_null, 'r-', linewidth=3, label='Null space (line)')

# Plot column vectors
cols = A_rank2.T
colors = ['blue', 'green', 'orange']
for i, (c, clr) in enumerate(zip(cols, colors)):
    ax.quiver(0, 0, 0, c[0], c[1], c[2], color=clr, linewidth=2, arrow_length_ratio=0.15)

ax.set_xlabel('$x_1$'); ax.set_ylabel('$x_2$'); ax.set_zlabel('$x_3$')
ax.set_title('Rank-2 matrix: column space (plane) and null space (line)', fontsize=12)

blue_p = mpatches.Patch(color='blue', alpha=0.4, label='Column space (plane)')
red_line = plt.Line2D([0], [0], color='red', linewidth=2, label='Null space (line)')
ax.legend(handles=[blue_p, red_line], fontsize=10)
plt.tight_layout()
plt.show()

7. Linear Mappings (MML 2.7)¶

A linear mapping $\Phi: V \to W$ preserves the vector space structure: $$\Phi(\lambda\mathbf{x} + \psi\mathbf{y}) = \lambda\Phi(\mathbf{x}) + \psi\Phi(\mathbf{y})$$

Every linear mapping between finite-dimensional spaces can be represented by a transformation matrix $\mathbf{A}_\Phi$.

Common 2D transformations:

Rotation by angle $\theta$: $\mathbf{R}(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}$
Reflection across x-axis: $\begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix}$
Shearing: $\begin{bmatrix} 1 & s \\ 0 & 1 \end{bmatrix}$ (horizontal shear)
Scaling: $\begin{bmatrix} s_x & 0 \\ 0 & s_y \end{bmatrix}$

def make_unit_square():
    """Return vertices of a unit square and a grid of points inside it."""
    # Outline
    outline = np.array([[0, 0], [1, 0], [1, 1], [0, 1], [0, 0]]).T  # 2 x 5
    # Grid points
    t = np.linspace(0, 1, 15)
    xx, yy = np.meshgrid(t, t)
    grid = np.vstack([xx.ravel(), yy.ravel()])  # 2 x N
    return outline, grid


def rotation_matrix(theta):
    """2D rotation matrix for angle theta (radians)."""
    return np.array([[np.cos(theta), -np.sin(theta)],
                     [np.sin(theta),  np.cos(theta)]])


def reflection_matrix(axis='x'):
    """Reflection across x-axis or y-axis."""
    if axis == 'x':
        return np.array([[1, 0], [0, -1]])
    else:
        return np.array([[-1, 0], [0, 1]])


def shear_matrix(s, direction='horizontal'):
    """Shearing matrix."""
    if direction == 'horizontal':
        return np.array([[1, s], [0, 1]])
    else:
        return np.array([[1, 0], [s, 1]])


def scaling_matrix(sx, sy):
    """Scaling matrix."""
    return np.array([[sx, 0], [0, sy]])


# Define transformations (matches MML Example 2.22 spirit)
theta = np.pi / 4  # 45 degrees
transformations = [
    ('Original', np.eye(2)),
    (f'Rotation ({int(np.degrees(theta))} deg)', rotation_matrix(theta)),
    ('Stretch x by 2', scaling_matrix(2, 1)),
    ('Reflection (x-axis)', reflection_matrix('x')),
    ('Horizontal Shear (s=1)', shear_matrix(1)),
]

outline, grid = make_unit_square()

fig, axes = plt.subplots(1, 5, figsize=(20, 4))
for ax, (name, M) in zip(axes, transformations):
    t_outline = M @ outline
    t_grid = M @ grid
    ax.scatter(t_grid[0], t_grid[1], c='steelblue', s=5, alpha=0.6)
    ax.plot(t_outline[0], t_outline[1], 'k-', linewidth=2)
    ax.set_title(name, fontsize=11)
    ax.set_xlim(-2, 3); ax.set_ylim(-1.5, 2)
    ax.set_aspect('equal')
    ax.grid(True, alpha=0.3)
    ax.axhline(0, color='k', linewidth=0.3)
    ax.axvline(0, color='k', linewidth=0.3)

plt.suptitle('Linear Transformations Applied to a Unit Square', fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

# Composition of linear mappings and transformation on a more complex shape
# Create a house shape
house = np.array([
    [0, 0], [1, 0], [1, 0.8], [0.5, 1.2], [0, 0.8], [0, 0],  # outline
]).T  # 2 x N
# Center the house
house = house - np.array([[0.5], [0.5]])

# Compose: first rotate 30 deg, then scale x by 1.5
R30 = rotation_matrix(np.pi / 6)
S15 = scaling_matrix(1.5, 1)

# Composition: S15 @ R30 (apply R30 first, then S15)
composed = S15 @ R30

fig, axes = plt.subplots(1, 4, figsize=(16, 4))
shapes = [
    ('Original', np.eye(2)),
    ('Rotate 30 deg', R30),
    ('Scale x by 1.5', S15),
    ('Composed (Scale * Rotate)', composed),
]

for ax, (name, M) in zip(axes, shapes):
    t_house = M @ house
    ax.fill(t_house[0], t_house[1], alpha=0.3, color='steelblue')
    ax.plot(t_house[0], t_house[1], 'b-', linewidth=2)
    ax.set_title(name, fontsize=11)
    ax.set_xlim(-2, 2); ax.set_ylim(-1.5, 1.5)
    ax.set_aspect('equal')
    ax.grid(True, alpha=0.3)
    ax.axhline(0, color='k', linewidth=0.3)
    ax.axvline(0, color='k', linewidth=0.3)

plt.suptitle('Composition of Linear Transformations on a "House" Shape', fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

print(f'Rotation matrix R(30 deg):\n{R30}')
print(f'\nScaling matrix S(1.5, 1):\n{S15}')
print(f'\nComposed matrix S @ R:\n{composed}')
print(f'\nNote: A_composed = A_scale @ A_rotate (right-to-left application order)')

# Image and Kernel of a linear mapping
# Phi: R^3 -> R^2, A = [[1, 2, 0], [0, 1, 1]]
A_map = np.array([[1, 2, 0],
                  [0, 1, 1]])

print('Transformation matrix A (2x3):')
print(A_map)

# Image = column space
rank_map = np.linalg.matrix_rank(A_map)
print(f'\nRank (dim of image): {rank_map}')
print(f'The image is span of the columns = R^{rank_map}')

# Kernel (null space): Ax = 0
U_m, s_m, Vt_m = np.linalg.svd(A_map)
null_dim = A_map.shape[1] - rank_map
kernel_basis = Vt_m[rank_map:].T
print(f'\nNullity (dim of kernel): {null_dim}')
print(f'Kernel basis vector: {kernel_basis.ravel()}')
print(f'Verification A @ kernel = {A_map @ kernel_basis.ravel()} (should be ~0)')

# Rank-Nullity check
print(f'\nRank-Nullity: {rank_map} + {null_dim} = {rank_map + null_dim} = {A_map.shape[1]} (n)')

8. Affine Spaces (MML 2.8)¶

An affine subspace of a vector space $V$ is a translated subspace: $$L = \mathbf{x}_0 + U = \{\mathbf{x}_0 + \mathbf{u} : \mathbf{u} \in U\}$$ where $\mathbf{x}_0$ is a support point and $U$ is a vector subspace (the direction space).

Unlike vector subspaces, affine subspaces do not necessarily pass through the origin.

Examples: points, lines, planes, and hyperplanes that may be offset from the origin.

An affine mapping from $V$ to $W$ has the form: $$\phi(\mathbf{x}) = \mathbf{a} + \Phi(\mathbf{x})$$ where $\Phi: V \to W$ is a linear mapping and $\mathbf{a} \in W$ is the translation vector.

An affine combination of points $\mathbf{x}_1, \ldots, \mathbf{x}_k$ is: $$\mathbf{x} = \sum_{i=1}^{k} \lambda_i \mathbf{x}_i, \quad \text{where } \sum_{i=1}^{k} \lambda_i = 1$$

# Affine subspaces: lines and planes NOT through the origin
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# --- 2D: Affine line L = x0 + t*d ---
ax = axes[0]
x0 = np.array([1, 2])
d = np.array([2, 1])  # direction vector

t_vals = np.linspace(-2, 2, 100)
line_points = x0[:, None] + d[:, None] * t_vals[None, :]

# Subspace through origin (same direction)
subspace_points = d[:, None] * t_vals[None, :]

ax.plot(subspace_points[0], subspace_points[1], 'b--', linewidth=1.5,
        alpha=0.5, label='Subspace U = span(d)')
ax.plot(line_points[0], line_points[1], 'r-', linewidth=2.5,
        label=r'Affine subspace $L = x_0 + U$')
ax.plot(*x0, 'ko', markersize=8)
ax.annotate(r'$\mathbf{x}_0$ (support point)', xy=x0, xytext=(1.5, 3),
            arrowprops=dict(arrowstyle='->', color='black'), fontsize=11)
ax.arrow(x0[0], x0[1], d[0]*0.5, d[1]*0.5, head_width=0.15, head_length=0.1,
         fc='green', ec='green', linewidth=2)
ax.text(x0[0]+d[0]*0.5+0.2, x0[1]+d[1]*0.5, r'$\mathbf{d}$', fontsize=14, color='green')
ax.plot(0, 0, 'ks', markersize=8)
ax.text(0.1, -0.3, '0', fontsize=12)
ax.set_xlim(-3, 5); ax.set_ylim(-2, 5)
ax.set_xlabel(r'$x_1$'); ax.set_ylabel(r'$x_2$')
ax.set_title('Affine Line in 2D', fontsize=12)
ax.legend(fontsize=10)
ax.grid(True, alpha=0.3)
ax.set_aspect('equal')

# --- Affine combinations ---
ax = axes[1]
p1 = np.array([0, 0])
p2 = np.array([4, 0])
p3 = np.array([2, 3])

# Triangle outline
triangle = np.array([p1, p2, p3, p1]).T
ax.fill(triangle[0], triangle[1], alpha=0.15, color='blue')
ax.plot(triangle[0], triangle[1], 'b-', linewidth=2)

# Plot vertices
for p, name in zip([p1, p2, p3], [r'$\mathbf{p}_1$', r'$\mathbf{p}_2$', r'$\mathbf{p}_3$']):
    ax.plot(*p, 'ko', markersize=8)
    ax.annotate(name, xy=p, xytext=(p[0]+0.15, p[1]+0.15), fontsize=13)

# Affine combinations with lambda1 + lambda2 + lambda3 = 1
np.random.seed(7)
for _ in range(50):
    lam = np.random.dirichlet([1, 1, 1])  # sum to 1 and non-negative => inside triangle
    pt = lam[0] * p1 + lam[1] * p2 + lam[2] * p3
    ax.plot(*pt, 'r.', markersize=4)

# Centroid = (1/3, 1/3, 1/3) affine combination
centroid = (p1 + p2 + p3) / 3
ax.plot(*centroid, 'g*', markersize=15)
ax.annotate('Centroid\n' + r'$(\frac{1}{3}, \frac{1}{3}, \frac{1}{3})$',
            xy=centroid, xytext=(centroid[0]+0.8, centroid[1]+0.5),
            arrowprops=dict(arrowstyle='->', color='green'), fontsize=11)

ax.set_title('Affine Combinations (convex combinations\nwith ' + r'$\sum \lambda_i = 1, \lambda_i \geq 0$)',
             fontsize=12)
ax.set_xlabel(r'$x_1$'); ax.set_ylabel(r'$x_2$')
ax.set_xlim(-0.5, 5); ax.set_ylim(-0.5, 4)
ax.set_aspect('equal')
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Affine mappings: phi(x) = a + Phi(x) = a + A @ x
# Demonstrate how affine maps differ from linear maps

# Linear part: rotation by 30 degrees
A_lin = rotation_matrix(np.pi / 6)
# Translation vector
a_trans = np.array([2, 1])

# Apply to the unit square
outline, grid = make_unit_square()
# Center the square at origin for cleaner visualization
outline = outline - np.array([[0.5], [0.5]])
grid = grid - np.array([[0.5], [0.5]])

# Linear mapping: y = A @ x
linear_outline = A_lin @ outline
linear_grid = A_lin @ grid

# Affine mapping: y = a + A @ x
affine_outline = a_trans[:, None] + A_lin @ outline
affine_grid = a_trans[:, None] + A_lin @ grid

fig, axes = plt.subplots(1, 3, figsize=(15, 5))

for ax, (name, ol, gr) in zip(axes, [
    ('Original', outline, grid),
    ('Linear: y = Ax', linear_outline, linear_grid),
    ('Affine: y = a + Ax', affine_outline, affine_grid)
]):
    ax.scatter(gr[0], gr[1], c='steelblue', s=5, alpha=0.6)
    ax.plot(ol[0], ol[1], 'b-', linewidth=2)
    ax.plot(0, 0, 'k+', markersize=15, markeredgewidth=2)
    ax.set_title(name, fontsize=12)
    ax.set_xlim(-2, 4); ax.set_ylim(-2, 3)
    ax.set_aspect('equal')
    ax.grid(True, alpha=0.3)
    ax.axhline(0, color='k', linewidth=0.3)
    ax.axvline(0, color='k', linewidth=0.3)

# Mark the translation vector on the affine plot
axes[2].annotate('', xy=a_trans, xytext=(0, 0),
                 arrowprops=dict(arrowstyle='->', color='red', linewidth=2))
axes[2].text(1, 0.2, r'$\mathbf{a}$ = translation', fontsize=11, color='red')

plt.suptitle('Linear Mapping vs Affine Mapping', fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

print(f'Linear part A (rotation 30 deg):\n{A_lin}')
print(f'\nTranslation vector a: {a_trans}')
print(f'\nKey difference: linear maps fix the origin, affine maps do not.')
print(f'phi(0) = a + A @ 0 = a = {a_trans} (not zero!)')

# Solution of Ax = b as an affine subspace
# The general solution is: x = x_particular + x_homogeneous
# where x_homogeneous is from the null space (a vector subspace)

# System with infinitely many solutions: 2 equations, 3 unknowns
A_aff = np.array([[1, 2, 1],
                  [2, 4, 3]])
b_aff = np.array([4, 9])

print('System Ax = b:')
print(f'A = \n{A_aff}')
print(f'b = {b_aff}')
print(f'Rank of A: {np.linalg.matrix_rank(A_aff)}')
print(f'Rank of [A|b]: {np.linalg.matrix_rank(np.column_stack([A_aff, b_aff]))}')

# Find a particular solution using least-norm (pseudoinverse)
x_part = np.linalg.lstsq(A_aff, b_aff, rcond=None)[0]
print(f'\nParticular solution (least norm): {x_part}')
print(f'Verification: A @ x_part = {A_aff @ x_part}')

# Find the null space
U_a, s_a, Vt_a = np.linalg.svd(A_aff)
rk = np.linalg.matrix_rank(A_aff)
null_basis = Vt_a[rk:].T
print(f'\nNull space basis: {null_basis.ravel()}')
print(f'Verification: A @ null = {A_aff @ null_basis.ravel()}')

print(f'\nGeneral solution: x = x_part + lambda * null_vector')
print(f'  = {x_part} + lambda * {null_basis.ravel()}')
print(f'\nThis is an affine subspace: a line in R^3 offset from the origin.')

# Verify for a few values of lambda
for lam in [-1, 0, 1, 2]:
    x_gen = x_part + lam * null_basis.ravel()
    residual = A_aff @ x_gen - b_aff
    print(f'  lambda={lam:2d}: x = {x_gen}, A@x - b = {residual}')

Summary¶

This lab covered the foundational linear algebra concepts from MML Chapter 2:

Section	Topic	Key Concept
2.1	Systems of Linear Equations	$\mathbf{A}\mathbf{x} = \mathbf{b}$; geometric interpretation as intersecting lines/planes
2.2	Matrices	Addition, multiplication, transpose, inverse; non-commutativity of multiplication
2.3	Gaussian Elimination	Row echelon form, back substitution, augmented matrix $[\mathbf{A} \mid \mathbf{b}]$
2.4	Vector Spaces	Subspaces, closure properties, span of vectors
2.5	Linear Independence	Determinant test, rank test; pivot columns in row echelon form
2.6	Basis and Rank	Basis = minimal spanning set; rank-nullity theorem
2.7	Linear Mappings	Rotation, reflection, shear, scaling; image and kernel; composition
2.8	Affine Spaces	Affine subspaces ($L = \mathbf{x}_0 + U$); affine mappings ($\phi = \mathbf{a} + \Phi$); affine combinations

Key takeaways for ML:

Data is represented as vectors and matrices
Linear transformations are the building blocks of many ML models (e.g., neural network layers without activation functions)
Understanding rank, null space, and linear independence is essential for dimensionality reduction (PCA) and understanding model capacity
Affine mappings (linear + translation) appear throughout ML, from linear regression to neural network layers with bias terms

Section	Topic	Key Concept
2.1	Systems of Linear Equations	\(\mathbf{A}\mathbf{x} = \mathbf{b}\); geometric interpretation as intersecting lines/planes
2.2	Matrices	Addition, multiplication, transpose, inverse; non-commutativity of multiplication
2.3	Gaussian Elimination	Row echelon form, back substitution, augmented matrix \([\mathbf{A} \mid \mathbf{b}]\)
2.4	Vector Spaces	Subspaces, closure properties, span of vectors
2.5	Linear Independence	Determinant test, rank test; pivot columns in row echelon form
2.6	Basis and Rank	Basis = minimal spanning set; rank-nullity theorem
2.7	Linear Mappings	Rotation, reflection, shear, scaling; image and kernel; composition
2.8	Affine Spaces	Affine subspaces (\(L = \mathbf{x}_0 + U\)); affine mappings (\(\phi = \mathbf{a} + \Phi\)); affine combinations