PyTorch Tutorial


Install PyTorch

Documentation

conda install pytorch torchvision -c pytorch

Tensor

What is a Tensor?

  • Tensor: representated as N-dimensional array of data with certain transformation properties.
  • Tensor factorization: high-order generalization of matrix SVD or PCA
  • Matrix: a linear transformation

Tensor
Tensor


Methods of data reduction for a data tensor
Methods of data reduction for a data tensor


How to initialize a Tensor?

  • From a list
  • From a numpy array
  • From another tensor. The shape and datatype are reained, unless explicitly overridden.

What are the attributes of a tensor?

  • Shape: a tuple (row, col) to dtermine the dimensionality of a tensor
  • dtype
  • device a tensor is running on
    • cpu
    • gpu

Tensor Operation API

  • Over 100 Operations
  • In-place operations: operations that have a _ suffix are in place
    • tensor.add_(5)
    • x.copy_(y)
  • Transposing
  • Indexing
  • Sliding
  • Mathmatical
    • Multiply
      • use *
      • tensor.mul(tensor)
  • Linear Algebra
    • Matrix Multiplication
      • tensor @ tensor.T
      • tensor.matmul(tensor.T)
  • Random Sampling
  • Tensor and Numpy
    • Tensor to Numpy tensor.numpy()
    • Numpy to Tensor torch.from_numpy()

Autograd

Notebook Tutorial

  • An automatic differentiation engine that powers NN training
  • Training a NN happens in two steps:
    • Forward Propagation: In forward prop, the NN makes its best guess about the correct output. It runs the input data through each of its functions to make this guess.
    • Backward Propagation: In backprop, the NN adjusts its parameters proportionate to the error in its guess. It does this by traversing backwards from the output, collecting the derivatives of the error with respect to the parameters of the functions (gradients), and optimizing the parameters using gradient descent. For a more detailed walkthrough of backprop

Computational Graph

Autograd keeps a record ot data(tensors) and all executed operations (along with resulting new tensors) in a directed acyclic graph (DAG) consisting of Function objects.

  • Tensors
    • Leaves: input tensors
    • Roots: output tensors
    • From leaves to roots: run the requested operations
    • From roots to leaves: compute the gradients using chain rules
  • Operations
    • Forward pass
      • run the requested operation to compute a resulting tensor
      • maintain the operation’s gradient function in the DAG
    • Backward pass
      • computes the gradients from each .grad_fn
      • accumulates them in the respective tensor’s .grad attributes
      • use the chain rule, propagates all the way to leaf tensors

DAGs

  • What are DAGs?
    • DAGs are dynamic in PyTorch An important thing to note is that the graph is recreated from scratch; after each .backward() call, autograd starts populating a new graph.
    • This is exactly what allows you to use control flow statements in your model; you can change the shape, size and operations at every iteration if needed.

Inclusion and Exclusion from the DAG

  • torch.autograd tracks operations on all tensors which have their requires_grad flag set to True.
  • For tensors that don’t require gradients, setting this attribute to False excludes it from the gradient computation DAG
x = torch.rand(5, 5)
y = torch.rand(5, 5)
z = torch.rand((5, 5), requires_grad=True)

a = x + y
print(f"Does `a` require gradients? : {a.requires_grad}")
# False

b = x + z
print(f"Does `b` require gradients?: {b.requires_grad}")
# True

Why exclusion is needed?

  • Frozen Parameters
  • Parameters that don’t compute gradients
  • Useful to freeze part of your model to offer some performance benefits by reducing autograd computations
  • Finetune a pretrained network
  • In finetuning, we freeze most of the model
  • Modify the classifier layers to make predictions on new labels

How to use exlusionary functionality?

  • Use a context manager in torch.no_grad()
  • Set requires_grad=False in a tensor
from torch import nn, optim

model = torchvision.models.resnet18(pretrained=True)

# Freeze all the parameters in the network!!!
for param in model.parameters():
    param.requires_grad = False

#  Finetune the model on a new dataset with 10 labels. 
# In resnet, the classifier is the last linear layer model.fc. 
# We can simply replace it with a new linear layer (unfrozen by default) that acts as our classifier
model.fc = nn.Linear(512, 10)

# Now all parameters in the model, except the parameters of model.fc, are frozen. The only parameters that compute gradients are the weights and bias of model.fc
# Optimize only the classifier
optimizer = optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9)

Neural Networks

How to construct a NN?

  • Use torch.nn package
  • Define the network
  • You just have to define the forward function
  • The backward function (where gradients are computed) is automatically defined for you using autograd.
  • You can use any of the Tensor operations in the forward function.
  • The learnable parameters of a model are returned by net.parameters()

convnet
convnet

import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension 
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = Net()
print(net)

# params
params = list(net.parameters())
print(len(params))
print(params[0].size())

Reference