# Puzzle
## Practical Deep Learning
### January 29th, 2024

This notebook contains code for training & testing a neural network to classify images from the CIFAR-10 dataset. However, something's not gone right – if you look at the outputted graph, the network is achieving much lower loss on the training data than the testing data.

As usual, the code is running without error; it's a conceptual bug that's holding us back from correctly classifying CIFAR-10. See if you can fix the error(s) in the training set up and achieve as high of test accuracy as possible.

<src img="https://storage.googleapis.com/kaggle-competitions/kaggle/3649/media/cifar-10.png" width="400"/>

In [None]:
import numpy as np
import torch
import torchvision
import torchvision.transforms as transforms

transform = transforms.Compose(
    [
      transforms.ToTensor(),
      transforms.Normalize(
          (0.5, 0.5, 0.5),
          (0.5, 0.5, 0.5)
      )
    ]
)

batch_size = 32

train_data = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

test_data = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
test_data = torch.utils.data.Subset(
    test_data, np.random.choice(len(test_data), 1000, replace=False))
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100%|██████████| 170498071/170498071 [00:01<00:00, 92714546.55it/s] 


Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


In [None]:
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        # cifar images are 32x32x3 (hxwxc)
        self.fc1 = nn.Linear(in_features=(3 * 32 * 32), out_features=4096)
        self.fc2 = nn.Linear(in_features=4096, out_features=4096)
        self.fc3 = nn.Linear(in_features=4096, out_features=10)

    def forward(self, x):
        x = x.flatten(start_dim=1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

In [None]:
from typing import Tuple
import tqdm.notebook as tqdm

import torch.optim as optim

net = Net().cuda()
criterion = nn.CrossEntropyLoss()
optimizer = optim.AdamW(net.parameters(), lr=0.001)

def get_test_loss(model: nn.Module, test_loader: torch.utils.data.DataLoader) -> Tuple[float, float]:
  total_loss = 0
  n_batches = 0
  correct = 0
  total = 0
  # since we're not training, we don't need to calculate the gradients for our outputs
  with torch.no_grad():
      for data in tqdm.tqdm(test_loader, colour='green', desc='test', leave=False):
          images, labels = data
          images = images.cuda()
          labels = labels.cuda()
          outputs = model(images)
          total_loss += criterion(outputs, labels).item()
          #
          n_batches += 1
          total += len(labels)
          #
          pred_labels = outputs.argmax(dim=1)
          correct += (pred_labels == labels).sum().item()
          #
  return (total_loss / n_batches), (correct / total)


log_freq = 2000
num_epochs = 20

train_losses = []
test_losses = []
train_accs = []
test_accs = []
total_steps = 0
for epoch in tqdm.trange(num_epochs, desc='Epoch', colour='pink'):  # loop over the dataset multiple times
    running_loss = 0.0
    running_loss_steps = 0
    num_train_predictions_correct = 0
    num_train_predictions_total = 0
    for i, data in enumerate(tqdm.tqdm(train_loader, desc='batch', colour='blue', leave=False), 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data
        inputs = inputs.cuda()
        labels = labels.cuda()

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        # --> Uncomment next line to print the loss at every step:
        # print(f'{total_steps}  loss {loss.item():.2f}')
        optimizer.step()

        # compute train accuracy
        pred_labels = outputs.argmax(dim=1)
        num_train_predictions_correct += (pred_labels == labels).sum().item()
        num_train_predictions_total += len(pred_labels)

        # print statistics
        total_steps += 1
        running_loss += loss.item()
        running_loss_steps += 1
        if (total_steps + 1) % log_freq == 0:    # print every 2000 mini-batches
          test_loss, test_acc = get_test_loss(net, test_loader)
          avg_train_loss = running_loss / running_loss_steps
          avg_train_acc = (num_train_predictions_correct / num_train_predictions_total)
          print(f'[Step {total_steps}] train_loss: {avg_train_loss:.3f} || test_loss = {test_loss:.3f}')
          print(f'\t\t train_acc={avg_train_acc*100:.1f}% || test_acc={test_acc*100:.1f}%')
          #
          train_losses.append(avg_train_loss)
          test_losses.append(test_loss)
          train_accs.append(avg_train_acc)
          test_accs.append(test_acc)
          #
          num_train_predictions_correct = 0
          num_train_predictions_total = 0
          running_loss = 0.0
          running_loss_steps = 0

print('Finished Training')

Epoch:   0%|          | 0/20 [00:00<?, ?it/s]

batch:   0%|          | 0/1563 [00:00<?, ?it/s]

batch:   0%|          | 0/1563 [00:00<?, ?it/s]

test:   0%|          | 0/32 [00:00<?, ?it/s]

[Step 1999] train_loss: 1.578 || test_loss = 1.579
		 train_acc=44.7% || test_acc=45.2%


batch:   0%|          | 0/1563 [00:00<?, ?it/s]

test:   0%|          | 0/32 [00:00<?, ?it/s]

[Step 3999] train_loss: 1.484 || test_loss = 1.497
		 train_acc=47.8% || test_acc=48.0%


batch:   0%|          | 0/1563 [00:00<?, ?it/s]

test:   0%|          | 0/32 [00:00<?, ?it/s]

[Step 5999] train_loss: 1.405 || test_loss = 1.503
		 train_acc=51.0% || test_acc=47.2%


batch:   0%|          | 0/1563 [00:00<?, ?it/s]

batch:   0%|          | 0/1563 [00:00<?, ?it/s]

test:   0%|          | 0/32 [00:00<?, ?it/s]

[Step 7999] train_loss: 1.220 || test_loss = 1.529
		 train_acc=57.9% || test_acc=51.0%


batch:   0%|          | 0/1563 [00:00<?, ?it/s]

test:   0%|          | 0/32 [00:00<?, ?it/s]

[Step 9999] train_loss: 1.176 || test_loss = 1.584
		 train_acc=59.1% || test_acc=46.6%


batch:   0%|          | 0/1563 [00:00<?, ?it/s]

test:   0%|          | 0/32 [00:00<?, ?it/s]

[Step 11999] train_loss: 1.152 || test_loss = 1.570
		 train_acc=59.9% || test_acc=48.1%


batch:   0%|          | 0/1563 [00:00<?, ?it/s]

test:   0%|          | 0/32 [00:00<?, ?it/s]

[Step 13999] train_loss: 1.113 || test_loss = 1.508
		 train_acc=61.6% || test_acc=50.3%


batch:   0%|          | 0/1563 [00:00<?, ?it/s]

batch:   0%|          | 0/1563 [00:00<?, ?it/s]

test:   0%|          | 0/32 [00:00<?, ?it/s]

[Step 15999] train_loss: 0.964 || test_loss = 1.635
		 train_acc=66.9% || test_acc=52.0%


KeyboardInterrupt: 

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt


data = pd.DataFrame.from_dict({ 'train_loss': train_losses, 'test_loss': test_losses})
sns.lineplot(data=data)
plt.yscale('log')