Building Deep Neural Networks

In Week 2, we trained a logistic regression classifier for the cat vs. non-cat dataset. It reached about 70% test accuracy, but logistic regression is fundamentally limited: it can only draw one linear boundary.

This week, we built deep neural networks from scratch and apply them to the same dataset.

  • First: 2-layer network
  • Then: 4-layer network

Both outperformed logistic regression - and the deeper model gave the best results.


Neural Network Architecture

  • 2-Layer Model
    INPUT → LINEAR → RELU → LINEAR → SIGMOID → OUTPUT
  • L-Layer Model (generalized)
    [LINEAR → RELU] × (L-1) → LINEAR → SIGMOID

Forward Propagation

For each layer \( l \):

$$ Z^{[l]} = W^{[l]} A^{[l-1]} + b^{[l]} \newline A^{[l]} = g(Z^{[l]}) $$

where \( g \) is ReLU for hidden layers, and sigmoid for the final output.

Cost Function

We used cross-entropy loss:

$$ J = -\frac{1}{m} \sum_{i=1}^m \Big[ y^{(i)} \log \hat{y}^{(i)} + (1 - y^{(i)}) \log (1 - \hat{y}^{(i)}) \Big] $$

Backward Propagation

Gradients for layer \( l \):

$$ dZ^{[l]} = dA^{[l]} \odot g’(Z^{[l]}) $$ $$ dW^{[l]} = \frac{1}{m} dZ^{[l]} (A^{[l-1]})^T $$ $$ db^{[l]} = \frac{1}{m} \sum dZ^{[l]} $$

Update step:

$$ W^{[l]} = W^{[l]} - \alpha dW^{[l]} $$ $$ b^{[l]} = b^{[l]} - \alpha db^{[l]} $$


Dataset Setup

We used the cat vs non-cat dataset (209 training, 50 test). Each 64×64 RGB image is flattened into a vector of size 12,288.

train_x_orig, train_y, test_x_orig, test_y, classes = load_data()

# Flatten and normalize
train_x = train_x_orig.reshape(train_x_orig.shape[0], -1).T / 255.
test_x = test_x_orig.reshape(test_x_orig.shape[0], -1).T / 255.

print("train_x shape:", train_x.shape)
print("test_x shape:", test_x.shape)

Output:

train_x shape: (12288, 209)
test_x shape: (12288, 50)

Two-Layer Model

n_x = 12288 # input size
n_h = 7     # hidden layer size
n_y = 1     # output size
layers_dims = (n_x, n_h, n_y)

parameters, costs = two_layer_model(train_x, train_y, layers_dims, 
                                    num_iterations=2500, learning_rate=0.0075, 
                                    print_cost=True)
plot_costs(costs)

Training Curve:

Results:

  • Training accuracy: ~100%
  • Test accuracy: 72%

Four-Layer Model

layers_dims = [12288, 20, 7, 5, 1]

parameters, costs = L_layer_model(train_x, train_y, layers_dims, 
                                  num_iterations=2500, learning_rate=0.0075, 
                                  print_cost=True)
plot_costs(costs)

Training Curve:

Results:

  • Training accuracy: 98.5%
  • Test accuracy: 80%

Comparison

To see how depth helps, here’s both curves side by side:

  • Logistic Regression (Week 2): 70% test accuracy
  • 2-layer NN: 72% test accuracy
  • 4-layer NN: 80% test accuracy

Clearly, deeper networks generalize better.

Error Analysis

We printed mislabeled images:

pred_test = predict(test_x, test_y, parameters)  # parameters from 4-layer model
print_mislabeled_images(classes, test_x, test_y, pred_test)

Common errors:

  • Cats in unusual poses
  • Background colors similar to the cat
  • Low lighting or overexposure
  • Cats that are too zoomed in or very small
Mislabeled Images

Mislabeled Images

Key Takeaways

  • Logistic regression = single-layer neural network.
  • Deep networks stack linear + non-linear transformations to learn richer features.
  • Vectorized implementation makes training feasible.
  • Deeper models improve accuracy, but data and regularization matter.

This marks the end of Course 1 - next comes hyperparameter tuning, optimization algorithms, and regularization in Course 2.

Posts in this series