In this notebook, I will explain how to load an image with PIL and explain some operations with PyTorch and Numpy. These processes are part of Style Transfer in Convolutional Neural Networks, there are two images: content and style. The next post defines how to transfer the style of one image to the content of the other image.

This document is prepared by using Google Colab and the lecture notes from: https://classroom.udacity.com/courses/ud188

First of all; make sure that you are in the right content in drive.

from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive

Use GPU if available

# move the model to GPU, if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Import libraries

# import resources
%matplotlib inline

from PIL import Image
from io import BytesIO
import matplotlib.pyplot as plt
import numpy as np

import torch
import requests
from torchvision import transforms, models

Load Image

The function below is useful for loading images.

  • img_path should be the path the image in your drive.
  • Since the large images slow down the process, max_size is defined as 400.
  • transforms from torchvision i used to crop, resize, normalize the image and we can turn it to tensor.

Simply you can use following code:

image = Image.open(img_path).convert('RGB')
def load_image(img_path, max_size=400, shape=None):
    ''' Load in and transform an image, making sure the image
       is <= 400 pixels in the x-y dims.'''
    if "http" in img_path:
        response = requests.get(img_path)
        image = Image.open(BytesIO(response.content)).convert('RGB')
    else:
        image = Image.open(img_path).convert('RGB')
    
    # large images will slow down processing
    if max(image.size) > max_size:
        size = max_size
    else:
        size = max(image.size)
    
    if shape is not None:
        size = shape
        
    in_transform = transforms.Compose([
                        transforms.Resize(size),
                        transforms.ToTensor(),
                        transforms.Normalize((0.485, 0.456, 0.406), 
                                             (0.229, 0.224, 0.225))])

    # discard the transparent, alpha channel (that's the :3) and add the batch dimension
    image = in_transform(image)[:3,:,:].unsqueeze(0)
    
    return image

Load images by file name

content = load_image('/content/drive/My Drive/Style_Transfer/images/octopus.jpg').to(device)
style = load_image('/content/drive/My Drive/Style_Transfer/images/hockney.jpg').to(device)

print(content.shape)
print(style.shape)
torch.Size([1, 3, 400, 592])
torch.Size([1, 3, 360, 263])

What does it look like?

content[0][0]
tensor([[-1.8953, -1.9467, -1.9295,  ..., -0.1999, -0.1999, -0.3027],
        [-1.9980, -2.0152, -1.9467,  ..., -0.4568, -0.3369, -0.1657],
        [-1.9809, -1.9809, -1.9295,  ..., -0.8164, -0.5596, -0.3027],
        ...,
        [-1.9809, -1.9809, -1.9809,  ..., -1.0904, -1.1075, -1.1589],
        [-1.9809, -1.9809, -1.9809,  ..., -1.0733, -1.1418, -1.2103],
        [-1.9638, -1.9638, -1.9638,  ..., -1.0562, -1.1760, -1.2445]])

Change the shape

The shape of two images are different, we can force the style image to be the same size as the content image.

content.shape[-2:]
torch.Size([400, 592])
style = load_image('/content/drive/My Drive/Style_Transfer/images/hockney.jpg', 
                   shape=content.shape[-2:]).to(device)
print(style.shape)
torch.Size([1, 3, 400, 592])

Display the image

# helper function for un-normalizing an image 
# and converting it from a Tensor image to a NumPy image for display
def im_convert(tensor):
    """ Display a tensor as an image. """
    
    image = tensor.to("cpu").clone().detach()
    image = image.numpy().squeeze()
    image = image.transpose(1,2,0)
    image = image * np.array((0.229, 0.224, 0.225)) + np.array((0.485, 0.456, 0.406))
    image = image.clip(0, 1)

    return image
# display the images
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20, 10))
# content and style ims side-by-side
ax1.imshow(im_convert(content))
ax2.imshow(im_convert(style))
<matplotlib.image.AxesImage at 0x7ff3dcbcafd0>
Figure 1: Content image(left) - Style image(right)

See the explanation for each step of the function:

Copy content image & detach

Ref: https://stackoverflow.com/questions/63582590/why-do-we-call-detach-before-calling-numpy-on-a-pytorch-tensor

torch.tensors are designed to be used in the context of gradient descent optimization, and therefore they hold not only a tensor with numeric values, but (and more importantly) the computational graph leading to these values. This computational graph is then used (using the chain rule of derivatives) to compute the derivative of the loss function w.r.t each of the independent variables used to compute the loss.

As mentioned before, np.ndarray object does not have this extra “computational graph” layer and therefore, when converting a torch.tensor to np.ndarray you must explicitly remove the computational graph of the tensor using the detach() command.

image = content.to("cpu").clone().detach()
print(image.shape)
print(image.type)
torch.Size([1, 3, 400, 592])
<built-in method type of Tensor object at 0x7ff3dcba6e10>

Turn to numpy

image = image.numpy()
print(image.shape)
print(image.dtype)
(1, 3, 400, 592)
float32

Use squeeze

numpy.squeeze() : Remove single-dimensional entries from the shape of an array.

image = image.squeeze()
print(image.shape)
(3, 400, 592)

Transpose

Transpose the dimension such that: Height, Width, Channel

  • Original: 3, 400, 592
  • Transpose(1,2,0) –> 400, 592, 3
image = image.transpose(1,2,0)
image.shape
(400, 592, 3)

Normalize

Ref: https://pytorch.org/docs/stable/torchvision/models.html

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225].

You can use the following transform to normalize:

    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                             std=[0.229, 0.224, 0.225])   
image = image * np.array((0.229, 0.224, 0.225)) + np.array((0.485, 0.456, 0.406))
image.shape
(400, 592, 3)

Clip the values

Ref: https://numpy.org/doc/stable/reference/generated/numpy.clip.html

Given an interval, values outside the interval are clipped to the interval edges. For example, if an interval of [0, 1] is specified, values smaller than 0 become 0, and values larger than 1 become 1.

image = image.clip(0, 1)
image
array([[[0.05098039, 0.0509804 , 0.05882353],
        [0.03921567, 0.03921569, 0.04705882],
        [0.04313724, 0.04313725, 0.0509804 ],
        ...,
        [0.43921568, 0.43921569, 0.40000002],
        [0.43921568, 0.43921569, 0.40000002],
        [0.41568626, 0.41568628, 0.37647061]],

       [[0.02745098, 0.02745099, 0.03529412],
        [0.02352938, 0.02352943, 0.03137254],
        [0.03921567, 0.03921569, 0.04313725],
        ...,
        [0.38039215, 0.37254903, 0.33725492],
        [0.40784313, 0.40392158, 0.3647059 ],
        [0.44705881, 0.43921569, 0.40392159]],

       [[0.03137252, 0.03137255, 0.03529412],
        [0.03137252, 0.03137255, 0.03529412],
        [0.04313724, 0.04313725, 0.04313725],
        ...,
        [0.29803922, 0.28627453, 0.25098042],
        [0.35686274, 0.34509805, 0.30980394],
        [0.41568626, 0.40392158, 0.36862747]],

       ...,

       [[0.03137252, 0.02352943, 0.02745099],
        [0.03137252, 0.02352943, 0.02745099],
        [0.03137252, 0.02352943, 0.02745099],
        ...,
        [0.23529412, 0.21568628, 0.14117649],
        [0.23137252, 0.21176472, 0.13333336],
        [0.21960783, 0.20000002, 0.12156863]],

       [[0.03137252, 0.02352943, 0.02745099],
        [0.03137252, 0.02352943, 0.02745099],
        [0.03137252, 0.02352943, 0.02745099],
        ...,
        [0.23921566, 0.21960786, 0.14117649],
        [0.2235294 , 0.2039216 , 0.12549024],
        [0.20784311, 0.18823531, 0.10980393]],

       [[0.0352941 , 0.02745099, 0.03137254],
        [0.0352941 , 0.02745099, 0.03137254],
        [0.0352941 , 0.02745099, 0.03137254],
        ...,
        [0.24313724, 0.22352942, 0.14509807],
        [0.21568625, 0.19607846, 0.11764705],
        [0.19999996, 0.1803922 , 0.10196077]]])

Use matplotlib imshow to show the image

plt.imshow(image)
<matplotlib.image.AxesImage at 0x7ff3dae27cc0>
Figure 2: Content image

Summary:

  • Load the image by using PIL
  • Turn it to Tensor
  • Change the shape
  • Visualize the image
    • Why are we using constant terms when we normalize the image?
    • What is squeeze?
    • Transpose the 3 dimensional numpy array.
    • How can we use clip?