One place for hosting & domains

      Neural

      Introduction to PyTorch: Build a Neural Network to Recognize Handwritten Digits


      The author selected the Code 2040 to receive a donation as part of the Write for DOnations program.

      Introduction

      Machine learning is a field of computer science that finds patterns in data. As of 2021, machine learning practitioners use these patterns to detect lanes for self-driving cars; train a robot hand to solve a Rubik’s cube; or generate images of dubious artistic taste. As machine learning models grow more accurate and performant, we see increasing adoption in mainstream applications and products.

      Deep learning is a subset of machine learning that focuses on particularly complex models, termed neural networks. In later, advanced DigitalOcean articles (like this tutorial on building an Atari bot), we will formally define what “complex” means. Neural networks are the highly accurate and hype-inducing modern-day models your hear about, with applications across a wide range of tasks. In this tutorial, you will focus on one specific task called object recognition, or image classification. Given an image of a handwritten digit, your model will predict which digit is shown.

      You will build, train, and evaluate deep neural networks in PyTorch, a framework developed by Facebook AI Research for deep learning. When compared to other deep learning frameworks, like Tensorflow, PyTorch is a beginner-friendly framework with debugging features that aid in the building process. It’s also highly customizable for advanced users, with researchers and practitioners using it across companies like Facebook and Tesla. By the end of this tutorial, you will be able to:

      • Build, train, and evaluate a deep neural network in PyTorch
      • Understand the risks of applying deep learning

      While you won’t need prior experience in practical deep learning or PyTorch to follow along with this tutorial, we’ll assume some familiarity with machine learning terms and concepts such as training and testing, features and labels, optimization, and evaluation. You can learn more about these concepts in An Introduction to Machine Learning.

      Prerequisites

      To complete this tutorial, you will need a local development environment for Python 3 with at least 1GB of RAM. You can follow How to Install and Set Up a Local Programming Environment for Python 3 to configure everything you need.

      Step 1 — Creating Your Project and Installing Dependencies

      Let’s create a workspace for this project and install the dependencies you’ll need. You’ll call your workspace pytorch:

      Navigate to the pytorch directory:

      Then create a new virtual environment for the project:

      Activate your environment:

      • source pytorch/bin/activate

      Then install PyTorch. On macOS, install PyTorch with the following command:

      • python -m pip install torch==1.4.0 torchvision==0.5.0

      On Linux and Windows, use the following commands for a CPU-only build:

      • pip install torch==1.4.0+cpu torchvision==0.5.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
      • pip install torchvision

      With the dependencies installed, you will now build your first neural network.

      Step 2 — Building a “Hello World” Neural Network

      In this step, you will build your first neural network and train it. You will learn about two sub-libraries in Pytorch, torch.nn for neural network operations and torch.optim for neural network optimizers. To understand what an “optimizer” is, you will also learn about an algorithm called gradient descent. Throughout this tutorial, you will use the following five steps to build and train models:

      1. Build a computation graph
      2. Set up optimizers
      3. Set up criterion
      4. Set up data
      5. Train the model

      In this first section of the tutorial, you will build a small model with a manageable dataset. Start by creating a new file step_2_helloworld.py, using nano or your favorite text editor:

      • nano step_2_helloworld.py

      You will now write a short 18-line snippet that trains a small model. Start by importing several PyTorch utilities:

      step_2_helloworld.py

      import torch
      import torch.nn as nn
      import torch.optim as optim
      

      Here, you alias PyTorch libraries to several commonly used shortcuts:

      • torch contains all PyTorch utilities. However, routine PyTorch code includes a few extra imports. We follow the same convention here, so that you can understand PyTorch tutorials and random code snippets online.
      • torch.nn contains utilities for constructing neural networks. This is often denoted nn.
      • torch.optim contains training utilities. This is often denoted optim.

      Next, define the neural network, training utilities, and the dataset:

      step_2_helloworld.py

      . . .
      net = nn.Linear(1, 1)  # 1. Build a computation graph (a line!)
      optimizer = optim.SGD(net.parameters(), lr=0.1)  # 2. Setup optimizers
      criterion = nn.MSELoss()  # 3. Setup criterion
      x, target = torch.randn((1,)), torch.tensor([0.])  # 4. Setup data
      . . .
      

      Here, you define several necessary parts of any deep learning training script:

      • net = ... defines the “neural network”. In this case, the model is a line of the form y = m * x; the parameter nn.Linear(1, 1) is the slope of your line. This model parameter nn.Linear(1, 1) will be updated during training. Note that torch.nn (aliased with nn) includes many deep learning operations, like the fully connected layers used here (nn.Linear) and convolutional layers (nn.Conv2d).
      • optimizer = ... defines the optimizer. This optimizer determines how the neural network will learn. We will discuss optimizers in more detail after writing a few more lines of code. Note that torch.optim (aliased to optim) includes many such optimizers that you can use.
      • criterion = ... defines the loss. In short, the loss defines what your model is trying to minimize. For your basic model of a line, the goal is to minimize the difference between your line’s predicted y-values and the actual y-values in the training set. Note that torch.nn (aliased with nn) includes many other loss functions you can use.
      • x, target = ... defines your “dataset”. Right now, the dataset is just one coordinate—one x value and one y value. In this case, the torch package itself offers tensor, to create a new tensor, and randn to create a tensor with random values.

      Finally, train the model by iterating over the dataset ten times. Each time, you adjust the model’s parameter:

      step_2_helloworld.py

      . . .
      # 5. Train the model
      for i in range(10):
          output = net(x)
          loss = criterion(output, target)
          print(round(loss.item(), 2))
      
          net.zero_grad()
          loss.backward()
          optimizer.step()
      

      Your general goal is to minimize the loss, by adjusting the slope of the line. To effect this, this training code implements an algorithm called gradient descent. The intuition for gradient descent is as follows: Imagine you’re looking straight down at a bowl. The bowl has many points on it, and each point corresponds to a different parameter value. The bowl itself is the loss surface: the center of the bowl—the lowest point—indicates the best model with the lowest loss. This is the optimum. The fringes of the bowl—the highest points, and the parts of the bowl closest to you—hold the worst models with the highest loss.

      To find the best model with the lowest loss:

      1. With net = nn.Linear(1, 1) you initialize a random model. This is equivalent to picking a random point on the bowl.
      2. In the for i in range(10) loop, you begin training. This is equivalent to stepping closer to the center of the bowl.
      3. The direction of each step is given by the gradient. You will skip a formal proof here, but in summary, the negative gradient points to the lowest point in the bowl.
      4. With lr=0.1 in optimizer = ..., you specify the step size. This determines how large each step can be.

      In just ten steps, you reach the center of the bowl, the best possible model with the lowest possible loss. For a visualization of gradient descent, see Distill’s “Why Momentum Really Works,” first figure at the top of the page.

      The last three lines of this code are also important:

      • net.zero_grad clears all gradients that may have been leftover from the previous step iterate.
      • loss.backward computes new gradients.
      • optimizer.step uses those gradients to take steps. Notice that you didn’t compute gradients yourself. This is because PyTorch, and other deep learning libraries like it, automatically differentiate.

      This now concludes your “hello world” neural network. Save and close your file.

      Double-check that your script matches step_2_helloworld.py. Then, run the script:

      • python step_2_helloworld.py

      Your script will output the following:

      Output

      0.33 0.19 0.11 0.07 0.04 0.02 0.01 0.01 0.0 0.0

      Notice that your loss continually decreases, showing that your model is learning. There are two other implementation details to note, when using PyTorch:

      1. PyTorch uses torch.Tensor to hold all data and parameters. Here, torch.randn generates a tensor with random values, with the provided shape. For example, a torch.randn((1, 2)) creates a 1x2 tensor, or a 2-dimensional row vector.
      2. PyTorch supports a wide variety of optimizers. This features torch.optim.SGD, otherwise known as stochastic gradient descent (SGD). Roughly speaking, this is the algorithm described in this tutorial, where you took steps toward the optimum. There are more-involved optimizers that add extra features on top of SGD. There are also many losses, with torch.nn.MSELoss being just one of them.

      This concludes your very first model on a toy dataset. In the next step, you will replace this small model with a neural network and the toy dataset with a commonly used machine learning benchmark.

      Step 3 — Training Your Neural Network on Handwritten Digits

      In the previous section, you built a small PyTorch model. However, to better understand the benefits of PyTorch, you will now build a deep neural network using torch.nn.functional, which contains more neural network operations, and torchvision.datasets, which supports many datasets you can use, out of the box. In this section, you will build a relatively complex, custom model with a premade dataset.

      You’ll use convolutions, which are pattern-finders. For images, convolutions look for 2D patterns at various levels of “meaning”: Convolutions directly applied to the image are looking for “lower-level” features such as edges. However, convolutions applied to the outputs of many other operations may be looking for “higher-level” features, such as a door. For visualizations and a more thorough walkthrough of convolutions, see part of Stanford’s deep learning course.

      You will now expand on the first PyTorch model you built, by defining a slightly more complex model. Your neural network will now contain two convolutions and one fully connected layer, to handle image inputs.

      Start by creating a new file step_3_mnist.py, using your text editor:

      You will follow the same five-step algorithm as before:

      1. Build a computation graph
      2. Set up optimizers
      3. Set up criterion
      4. Set up data
      5. Train the model

      First, define your deep neural network. Note this is a pared down version of other neural networks you may find on MNIST—this is intentional, so that you can train your neural network on your laptop:

      step_3_mnist.py

      import torch
      import torch.nn as nn
      import torch.optim as optim
      import torch.nn.functional as F
      
      from torchvision import datasets, transforms
      from torch.optim.lr_scheduler import StepLR
      
      # 1. Build a computation graph
      class Net(nn.Module):
          def __init__(self):
              super(Net, self).__init__()
              self.conv1 = nn.Conv2d(1, 32, 3, 1)
              self.conv2 = nn.Conv2d(32, 64, 3, 1)
              self.fc = nn.Linear(1024, 10)
      
          def forward(self, x):
              x = F.relu(self.conv1(x))
              x = F.relu(self.conv2(x))
              x = F.max_pool2d(x, 1)
              x = torch.flatten(x, 1)
              x = self.fc(x)
              output = F.log_softmax(x, dim=1)
              return output
      net = Net()
      . . .
      

      Here, you define a neural network class, inheriting from nn.Module. All operations in the neural network (including the neural network itself) must inherit from nn.Module. The typical paradigm, for your neural network class, is as follows:

      1. In the constructor, define any operations needed for your network. In this case, you have two convolutions and a fully connected layer. (A tip to remember: The constructor always starts with super().__init__().) PyTorch expects the parent class to be initialized before assigning modules (for example, nn.Conv2d) to instance attributes (self.conv1).
      2. In the forward method, run the initialized operations. This method determines the neural network architecture, explicitly defining how the neural network will compute its predictions.

      This neural network uses a few different operations:

      • nn.Conv2d: A convolution. Convolutions look for patterns in the image. Earlier convolutions look for “low-level” patterns like edges. Later convolutions in the network look for “high-level” patterns like legs on a dog, or ears.
      • nn.Linear: A fully connected layer. Fully connected layers relate all input features to all output dimensions.
      • F.relu, F.max_pool2d: These are types of non-linearities. (A non-linearity is any function that is not linear.) relu is the function f(x) = max(x, 0). max_pool takes the maximum value in every patch of values. In this case, you take the maximum value across the entire image.
      • log_softmax: normalizes all of the values in a vector, so that the values sum to 1.

      Second, like before, define the optimizer. This time, you will use a different optimizer and a different hyper-parameter setting. Hyper-parameters configure training, whereas training adjusts model parameters. These hyper-parameter settings are taken from the PyTorch MNIST example:

      step_3_mnist.py

      . . .
      optimizer = optim.Adadelta(net.parameters(), lr=1.)  # 2. Setup optimizer
      . . .
      

      Third, unlike before, you will now use a different loss. This loss is used for classification problems, where the output of your model is a class index. In this particular example, the model will output the digit (possibly any number from 0 to 9) contained in the input image:

      step_3_mnist.py

      . . .
      criterion = nn.NLLLoss()  # 3. Setup criterion
      . . .
      

      Fourth, set up the data. In this case, you will set up a dataset called MNIST, which features handwritten digits. Deep Learning 101 tutorials often use this dataset. Each image is a small 28x28 px image containing a handwritten digit, and the goal is to classify each handwritten digit as 0, 1, 2, … or 9:

      step_3_mnist.py

      . . .
      # 4. Setup data
      transform = transforms.Compose([
          transforms.Resize((8, 8)),
          transforms.ToTensor(),
          transforms.Normalize((0.1307,), (0.3081,))
      ])
      train_dataset = datasets.MNIST(
          'data', train=True, download=True, transform=transform)
      train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=512)
      . . .
      

      Here, you preprocess the images in transform = ... by resizing the image, converting the image to a PyTorch tensor, and normalizing the tensor to have mean 0 and variance 1.

      In the next two lines, you set train=True, as this is the training dataset and download=True so that you download the dataset if it is not already.

      batch_size=512 determines how many images the network is trained on, at once. Barring ridiculously large batch sizes (for example, tens of thousands), larger batches are preferable for roughly faster training.

      Fifth, train the model. In the following code block, you make minimal modifications. Instead of running ten times on the same sample, you will now iterate over all samples in the provided dataset once. By passing over all samples once, the following is training for one epoch:

      step_3_mnist.py

      . . .
      # 5. Train the model
      for inputs, target in train_loader:
          output = net(inputs)
          loss = criterion(output, target)
          print(round(loss.item(), 2))
      
          net.zero_grad()
          loss.backward()
          optimizer.step()
      . . .
      

      Save and close your file.

      Double-check that your script matches step_3_mnist.py. Then, run the script.

      Your script will output the following:

      Output

      2.31 2.18 2.03 1.78 1.52 1.35 1.3 1.35 1.07 1.0 ... 0.21 0.2 0.23 0.12 0.12 0.12

      Notice that the final loss is less than 10% of the initial loss value. This means that your neural network is training correctly.

      That concludes training. However, the loss of 0.12 is difficult to reason about: we don’t know if 0.12 is “good” or “bad”. To assess how well your model is performing, you next compute an accuracy for this classification model.

      Step 4 — Evaluating Your Neural Network

      Earlier, you computed loss values on the train split of your dataset. However, it is good practice to keep a separate validation split of your dataset. You use this validation split to compute the accuracy of your model. However, you can’t use it for training. Following, you set up the validation dataset and evaluate your model on it. In this step, you will use the same PyTorch utilities from before, including torchvision.datasets for the MNIST dataset.

      Start by copying your step_3_mnist.py file into step_4_eval.py. Then, open the file:

      • cp step_3_mnist.py step_4_eval.py
      • nano step_4_eval.py

      First, set up the validation dataset:

      step_4_eval.py

      . . .
      train_loader = ...
      val_dataset = datasets.MNIST(
          'data', train=False, download=True, transform=transform)
      val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=512)
      . . .
      

      At the end of your file, after the training loop, add a validation loop:

      step_4_eval.py

          . . .
          optimizer.step()
      
      correct = 0.
      net.eval()
      for inputs, target in val_loader:
          output = net(inputs)
          _, pred = output.max(1)
          correct += (pred == target).sum()
      accuracy = correct / len(val_dataset) * 100.
      print(f'{accuracy:.2f}% correct')
      

      Here, the validation loop performs a few operations to compute accuracy:

      • Running net.eval() ensures that your neural network is in evaluation mode and ready for validation. Several operations are run differently in evaluation mode than when in training mode.
      • Iterating over all inputs and labels in val_loader.
      • Running the model net(inputs) to obtain probabilities for each class.
      • Finding the class with the highest probability output.max(1). output is a tensor with dimensions (n, k) for n samples and k classes. The 1 means you compute the max along the index 1 dimension.
      • Computing the number of images that were classified correctly: pred == target computes a boolean-valued vector. .sum() casts these booleans to integers and effectively computes the number of true values.
      • correct / len(val_dataset) finally computes the percent of images classified correctly.

      Save and close your file.

      Double-check that your script matches step_4_eval.py. Then, run the script:

      Your script will output the following. Note the specific loss values and final accuracy may vary:

      Output

      2.31 2.21 ... 0.14 0.2 89% correct

      You have now trained your very first deep neural network. You can make further modifications and improvements by tuning hyper-parameters for training: This includes different numbers of epochs, learning rates, and different optimizers. We include a sample script with tuned hyper-parameters; this script trains the same neural network but for 10 epochs, obtaining 97% accuracy.

      Risks of Deep Learning

      One gotcha is that deep learning does not always obtain state-of-the-art results. Deep learning works well in feature-rich, data-rich scenarios but conversely performs poorly in data-sparse, feature-sparse regimes. Whereas there is active research in deep learning’s weak areas, many other machine learning techniques are already well-suited for feature-sparse regimes, such as decision trees, linear regression, or support vector machines (SVM).

      Another gotcha is that deep learning is not well understood. There are no guarantees for accuracy, optimality, or even convergence. On the other hand, classic machine learning techniques are well-studied and are relatively interpretable. Again, there is active research to address this lack of interpretability in deep learning. You can read more in “What Explainable AI fails to explain (and how we fix that)”.

      Most importantly, lack of interpretability in deep learning leads to overlooked biases. For example, researchers from UC Berkeley were able to show a model’s gender bias in captioning (“Women also Snowboard”). Other research efforts focus on societal issues such as “Fairness” in machine learning. Given these issues are undergoing active research, it is difficult to recommend a prescribed diagnosis for biases in models. As a result, it is up to you, the practitioner, to apply deep learning responsibly.

      Conclusion

      PyTorch is deep learning framework for enthusiasts and researchers alike. To get acquainted with PyTorch, you have both trained a deep neural network and also learned several tips and tricks for customizing deep learning.

      You can also use a pre-built neural network architecture instead of building your own. Here is a link to an optional section: Use Existing Neural Network Architecture on Google Colab that you can try. For demonstration purposes, this optional step trains a much larger model with much larger images.

      Check out our other articles to dive deeper into machine learning and related fields:

      • Is your model complex enough? Too complex? Learn about the bias-variance tradeoff in Bias-Variance for Deep Reinforcement Learning: How To Build a Bot for Atari with OpenAI Gym to find out. In this article, we build AI bots for Atari Games and explore a field of research called Reinforcement Learning. Alternatively, find a visual explanation of the bias-variance trade-off in this Understanding the Bias-Variance Trade-off article.
      • How does a machine learning model process images? Learn more in Build an Emotion-Based Dog Filter. In this article, we discuss how models process and classify images, in more detail, exploring a field of research called Computer Vision.
      • Can a neural network be fooled? Learn how to in Tricking a Neural Network. In this article, we explore adversarial machine learning, a field of research that devises both attacks and defenses for neural networks for more robust real-world deep learning deployments.
      • How can we better understand how neural networks work? Read one class of approaches called “Explainable AI” in How To Visualize and Interpret Neural Networks. In this article, we explore explainable AI, and in particular visualize pixels that the neural networks believes are important for its predictions.



      Source link

      How To Visualize and Interpret Neural Networks in Python


      The author selected Open Sourcing Mental Illness to receive a donation as part of the Write for DOnations program.

      Introduction

      Neural networks achieve state-of-the-art accuracy in many fields such as computer vision, natural-language processing, and reinforcement learning. However, neural networks are complex, easily containing hundreds of thousands, or even, millions of operations (MFLOPs or GFLOPs). This complexity makes interpreting a neural network difficult. For example: How did the network arrive at the final prediction? Which parts of the input influenced the prediction? This lack of understanding is exacerbated for high-dimensional inputs like images: What does an explanation for an image classification even look like?

      Research in Explainable AI (XAI) works to answers these questions with a number of different explanations. In this tutorial, you’ll specifically explore two types of explanations: 1. Saliency maps, which highlight the most important parts of the input image; and 2. decision trees, which break down each prediction into a sequence of intermediate decisions. For both of these approaches, you’ll produce code that generates these explanations from a neural network.

      Along the way, you’ll also use deep-learning Python library PyTorch, computer-vision library OpenCV, and linear-algebra library numpy. By following this tutorial, you will gain an understanding of current XAI efforts to understand and visualize neural networks.

      Prerequisites

      To complete this tutorial, you will need the following:

      You can find all the code and assets from this tutorial in this repository.

      Step 1 — Creating Your Project and Installing Dependencies

      Let’s create a workspace for this project and install the dependencies you’ll need. You’ll call your workspace XAI, short for Explainable Artificial Intelligence:

      Navigate to the XAI directory:

      Make a directory to hold all your assets:

      Then create a new virtual environment for the project:

      Activate your environment:

      Then install PyTorch, a deep-learning framework for Python that you’ll use in this tutorial.

      On macOS, install PyTorch with the following command:

      • python -m pip install torch==1.4.0 torchvision==0.5.0

      On Linux and Windows, use the following commands for a CPU-only build:

      • pip install torch==1.4.0+cpu torchvision==0.5.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
      • pip install torchvision

      Now install prepackaged binaries for OpenCV, Pillow, and numpy, which are libraries for computer vision and linear algebra, respectively. OpenCV and Pillow offer utilities such as image rotations, and numpy offers linear algebra utilities, such as a matrix inversion:

      • python -m pip install opencv-python==3.4.3.18 pillow==7.1.0 numpy==1.14.5 matplotlib==3.3.2

      On Linux distributions, you will need to install libSM.so:

      • sudo apt-get install libsm6 libxext6 libxrender-dev

      Finally, install nbdt, a deep-learning library for neural-backed decision trees, which we will discuss in the last step of this tutorial:

      • python -m pip install nbdt==0.0.4

      With the dependencies installed, let’s run an image classifier that has already been trained.

      Step 2 — Running a Pretrained Classifier

      In this step, you will set up an image classifier that has already been trained.

      First, an image classifier accepts images as input, and outputs a predicted class (like Cat or Dog). Second, pretained means this model has already been trained and will be able to predict classes, accurately, straightaway. Your goal will be to visualize and interpret this image classifier: How does it make decisions? Which parts of the image did the model use for its prediction?

      First, download a JSON file to convert neural network output to a human-readable class name:

      • wget -O assets/imagenet_idx_to_label.json https://raw.githubusercontent.com/do-community/tricking-neural-networks/master/utils/imagenet_idx_to_label.json

      Download the following Python script, which will load an image, load a neural network with its weights, and classify the image using the neural network:

      • wget https://raw.githubusercontent.com/do-community/tricking-neural-networks/master/step_2_pretrained.py

      Note: For a more detailed walkthrough of this file step_2_pretrained.py, please see Step 2 — Running a Pretrained Animal Classifier in the How To Trick a Neural Network tutorial.

      Next you’ll download the following image of a cat and dog, as well, to run the image classifier on.

      Image of Cat and dog on sofa

      • wget -O assets/catdog.jpg https://assets.digitalocean.com/articles/visualize_neural_network/step2b.jpg

      Finally, run the pretrained image classifier on the newly downloaded image:

      • python step_2_pretrained.py assets/catdog.jpg

      This will produce the following output, showing your animal classifier works as expected:

      Output

      Prediction: Persian cat

      That concludes running inference with your pretrained model.

      Although this neural network produces predictions correctly, we don’t understand how the model arrived at its prediction. To better understand this, start by considering the cat and dog image that you provided to the image classifier.

      Image of Cat and dog on sofa

      The image classifier predicts Persian cat. One question you can ask is: Was the model looking at the cat on the left? Or the dog on the right? Which pixels did the model use to make that prediction? Fortunately, we have a visualization that answers this exact question. Following is a visualization that highlights pixels that the model used, to determine Persian Cat.

      A visualization that highlights pixels that the model used

      The model classifies the image as Persian cat by looking at the cat. For this tutorial, we will refer to visualizations like this example as saliency maps, which we define to be heatmaps that highlight pixels influencing the final prediction. There are two types of saliency maps:

      1. Model-agnostic Saliency Maps (often called “black-box” methods): These approaches do not need access to the model weights. In general, these methods change the image and observe the changed image’s impact on accuracy. For example, you might remove the center of the image (pictured following). The intuition is: If the image classifier now misclassifies the image, the image center must have been important. We can repeat this and randomly remove parts of the image each time. In this way, we can produce a heatmap like previously, by highlighting the patches that damaged accuracy the most.

      A heatmap highlighting the patches that damaged accuracy the most.

      1. Model-aware Saliency Maps (often called “white-box” methods): These approaches require access to the model’s weights. We will discuss one such method in more detail in the next section.

      This concludes our brief overview of saliency maps. In the next step, you will implement one model-aware technique called a Class Activation Map (CAM).

      Step 3 — Generating Class Activation Maps (CAM)

      Class Activation Maps (CAMs) are a type of model-aware saliency method. To understand how a CAM is computed, we first need to discuss what the last few layers in a classification network do. Following is an illustration of a typical image-classification neural network, for the method in this paper on Learning Deep Features for Discriminative Localization.

      Diagram of an existing image classification neural network.

      The figure describes the following process in a classification neural network. Note the image is represented as a stack of rectangles; for a refresher on how images are represented as a tensor, see How to Build an Emotion-Based Dog Filter in Python 3 (Step 4):

      1. Focus on the second-to-last layer’s outputs, labeled LAST CONV with blue, red, and green rectangles.
      2. This output undergoes a global average pool (denoted as GAP). GAP averages values in each channel (colored rectangle) to produce a single value (corresponding colored box, in LINEAR).
      3. Finally, those values are combined in a weighted sum (with weights denoted by w1, w2, w3) to produce a probability (dark gray box) of a class. In this case, these weights correspond to CAT. In essence, each wi answers: “How important is the ith channel to detecting a Cat?”
      4. Repeat for all classes (light gray circles) to obtain probabilities for all classes.

      We’ve omitted several details that are not necessary to explain CAM. Now, we can use this to compute CAM. Let us revisit an expanded version of this figure, still for the method in the same paper. Focus on the second row.

      Diagram of how class activation maps are computed from an image classification neural network.

      1. To compute a class activation map, take the second-to-last layer’s outputs. This is depicted in the second row, outlined by blue, red, and green rectangles corresponding to the same colored rectangles in the first row.
      2. Pick a class. In this case, we pick “Australian Terrier”. Find the weights w1, w2wn corresponding to that class.
      3. Each channel (colored rectangle) is then weighted by w1, w2wn. Note we do not perform a global average pool (step 2 from the previous figure). Compute the weighted sum, to obtain a class activation map (far right, second row in figure).

      This final weighted sum is the class activation map.

      Next, we will implement class activation maps. This section will be broken into the three steps that we’ve already discussed:

      1. Take the second-to-last layer’s outputs.
      2. Find weights w1, w2wn.
      3. Compute a weighted sum of outputs.

      Start by creating a new file step_3_cam.py:

      First, add the Python boilerplate; import the necessary packages and declare a main function:

      step_3_cam.py

      """Generate Class Activation Maps"""
      import numpy as np
      import sys
      import torch
      import torchvision.models as models
      import torchvision.transforms as transforms
      import matplotlib.cm as cm
      
      from PIL import Image
      from step_2_pretrained import load_image
      
      
      def main():
          pass
      
      
      if __name__ == '__main__':
          main()
      

      Create an image loader that will load, resize, and crop your image, but leave the color untouched. This ensures your image has the correct dimensions. Add this before your main function:

      step_3_cam.py

      . . .
      def load_raw_image():
          """Load raw 224x224 center crop of image"""
          image = Image.open(sys.argv[1])
          transform = transforms.Compose([
            transforms.Resize(224),  # resize smaller side of image to 224
            transforms.CenterCrop(224),  # take center 224x224 crop
          ])
          return transform(image)
      . . .
      

      In load_raw_image, you first access the one argument passed to the script sys.argv[1]. Then, open the image specified using Image.open. Next, you define a number of different transformations to apply to the images that are passed to your neural network:

      • transforms.Resize(224): Resizes the smaller side of the image to 224. For example, if your image is 448 x 672, this operation would downsample the image to 224 x 336.
      • transforms.CenterCrop(224): Takes a crop from the center of the image, of size 224 x 224.
      • transform(image): Applies the sequence of image transformations defined in the previous lines.

      This concludes image loading.

      Next, load the pretrained model. Add this function after your first load_raw_image function, but before the main function:

      step_3_cam.py

      . . .
      def get_model():
          """Get model, set forward hook to save second-to-last layer's output"""
          net = models.resnet18(pretrained=True).eval()
          layer = net.layer4[1].conv2
      
          def store_feature_map(self, _, output):
              self._parameters['out'] = output
          layer.register_forward_hook(store_feature_map)
      
          return net, layer
      . . .
      

      In the get_model function, you:

      1. Instantiate a pretrained model models.resnet18(pretrained=True).
      2. Change the model’s inference mode to eval by calling .eval().
      3. Define layer..., the second-to-last layer, which we will use later.
      4. Add a “forward hook” function. This function will save the layer’s output when the layer is executed. We do this in two steps, first defining a store_feature_map hook and then binding the hook with register_forward_hook.
      5. Return both the network and the second-to-last layer.

      This concludes model loading.

      Next, compute the class activation map itself. Add this function before your main function:

      step_3_cam.py

      . . .
      def compute_cam(net, layer, pred):
          """Compute class activation maps
      
          :param net: network that ran inference
          :param layer: layer to compute cam on
          :param int pred: prediction to compute cam for
          """
      
          # 1. get second-to-last-layer output
          features = layer._parameters['out'][0]
      
          # 2. get weights w_1, w_2, ... w_n
          weights = net.fc._parameters['weight'][pred]
      
          # 3. compute weighted sum of output
          cam = (features.T * weights).sum(2)
      
          # normalize cam
          cam -= cam.min()
          cam /= cam.max()
          cam = cam.detach().numpy()
          return cam
      . . .
      

      The compute_cam function mirrors the three steps outlined at the start of this section and in the section before.

      1. Take the second-to-last layer’s outputs, using the feature maps our forward hook saved in layer._parameters.
      2. Find weights w1, w2wn in the final linear layer net.fc_parameters['weight']. Access the predth row of weights, to obtain weights for our predicted class.
      3. Compute a weighted sum of outputs. (features.T * weights).sum(...). The argument 2 means we compute a sum along the index 2 dimension of the provided tensor.
      4. Normalize the class activation map, so that all values fall in between 0 and 1—cam -= cam.min(); cam /= cam.max().
      5. Detach the PyTorch tensor from the computation graph .detach(). Convert the CAM from a PyTorch tensor object into a numpy array. .numpy().

      This concludes computation for a class activation map.

      Our last helper function is a utility that saves the class activation map. Add this function before your main function:

      step_3_cam.py

      . . .
      def save_cam(cam):
          # save heatmap
          heatmap = (cm.jet_r(cam) * 255.0)[..., 2::-1].astype(np.uint8)
          heatmap = Image.fromarray(heatmap).resize((224, 224))
          heatmap.save('heatmap.jpg')
          print(' * Wrote heatmap to heatmap.jpg')
      
          # save heatmap on image
          image = load_raw_image()
          combined = (np.array(image) * 0.5 + np.array(heatmap) * 0.5).astype(np.uint8)
          Image.fromarray(combined).save('combined.jpg')
          print(' * Wrote heatmap on image to combined.jpg')
      . . .
      

      This utility save_cam performs the following:

      1. Colorize the heatmap cm.jet_r(cam). The output is in the range [0, 1] so multiply by 255.0. Furthermore, the output (1) contains a 4th alpha channel and (2) the color channels are ordered as BGR. We use indexing [..., 2::-1] to solve both problems, dropping the alpha channel and inverting the color channel order to be RGB. Finally, cast to unsigned integers.
      2. Convert the image Image.fromarray into a PIL image and use the image’s image-resize utility .resize(...), then the .save(...) utility.
      3. Load a raw image, using the utility load_raw_image we wrote earlier.
      4. Superimpose the heatmap on top of the image by adding 0.5 weight of each. Like before, cast the result to unsigned integers .astype(...).
      5. Finally, convert the image into PIL, and save.

      Next, populate the main function with some code to run the neural network on a provided image:

      step_3_cam.py

      . . .
      def main():
          """Generate CAM for network's predicted class"""
          x = load_image()
          net, layer = get_model()
      
          out = net(x)
          _, (pred,) = torch.max(out, 1)  # get class with highest probability
      
          cam = compute_cam(net, layer, pred)
          save_cam(cam)
      . . .
      

      In main, run the network to obtain a prediction.

      1. Load the image.
      2. Fetch the pretrained neural network.
      3. Run the neural network on the image.
      4. Find the highest probability with torch.max. pred is now a number with the index of the most likely class.
      5. Compute the CAM using compute_cam.
      6. Finally, save the CAM using save_cam.

      This now concludes our class activation script. Save and close your file. Check that your script matches the step_3_cam.py in this repository.

      Then, run the script:

      • python step_3_cam.py assets/catdog.jpg

      Your script will output the following:

      Output

      * Wrote heatmap to heatmap.jpg * Wrote heatmap on image to combined.jpg

      This will produce a heatmap.jpg and combined.jpg akin to the following images showing the heatmap and the heatmap combined with the cat/dog image.

      Heatmap highlighting
      Saliency map superimposed on top of the original image

      You have produced your first saliency map. We will end the article with more links and resources for generating other kinds of saliency maps. In the meantime, let us now explore a second approach to explainability—namely, making the model itself interpretable.

      Step 4 — Using Neural-Backed Decision Trees

      Decision Trees belong to a family of rule-based models. A decision tree is a data tree that displays possible decision pathways. Each prediction is the result of a series of predictions.

      Decision tree for hot dog, burger, super burger, waffle fries

      Instead of just outputting a prediction, each prediction also comes with justification. For example, to arrive at the conclusion of “Hotdog” for this figure the model must first ask: “Does it have a bun?”, then ask: “Does it have a sausage?” Each of these intermediate decisions can be verified or challenged separately. As a result, classic machine learning calls these rule-based systems “interpretable.”

      One question is: How are these rules created? Decision Trees warrant a far more detailed discussion of its own but in short, rules are created to “split classes as much as possible.” Formally, this is “maximizing information gain.” In the limit, maximizing this split makes sense: If the rules perfectly split classes, then our final predictions will always be correct.

      Now, we move on to using a neural network and decision tree hybrid. For more on decision trees, see Classification and Regression Trees (CART) overview.

      Now, we will run inference on a neural network and decision tree hybrid. As we will find, this gives us a different type of explainability: direct-model interpretability.

      Start by creating a new file called step_4_nbdt.py:

      First, add the Python boilerplate. Import the necessary packages and declare a main function. maybe_install_wordnet sets up a prerequisite that our program may need:

      step_4_nbdt.py

      """Run evaluation on a single image, using an NBDT"""
      
      from nbdt.model import SoftNBDT, HardNBDT
      from pytorchcv.models.wrn_cifar import wrn28_10_cifar10
      from torchvision import transforms
      from nbdt.utils import DATASET_TO_CLASSES, load_image_from_path, maybe_install_wordnet
      import sys
      
      maybe_install_wordnet()
      
      
      def main():
          pass
      
      
      if __name__ == '__main__':
          main()
      

      Start by loading the pretrained model, as before. Add the following before your main function:

      step_4_nbdt.py

      . . .
      def get_model():
          """Load pretrained NBDT"""
          model = wrn28_10_cifar10()
          model = HardNBDT(
            pretrained=True,
            dataset="CIFAR10",
            arch="wrn28_10_cifar10",
            model=model)
          return model
      . . .
      

      This function does the following:

      1. Creates a new model called WideResNet wrn28_10_cifar10().
      2. Next, it creates the neural-backed decision tree variant of that model, by wrapping it with HardNBDT(..., model=model).

      This concludes model loading.

      Next, load and preprocess the image for model inference. Add the following before your main function:

      step_4_nbdt.py

      . . .
      def load_image():
          """Load + transform image"""
          assert len(sys.argv) > 1, "Need to pass image URL or image path as argument"
          im = load_image_from_path(sys.argv[1])
          transform = transforms.Compose([
            transforms.Resize(32),
            transforms.CenterCrop(32),
            transforms.ToTensor(),
            transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
          ])
          x = transform(im)[None]
          return x
      . . .
      

      In load_image, you start by loading the image from the provided URL, using a custom utility method called load_image_from_path. Next, you define a number of different transformations to apply to the images that are passed to your neural network:

      • transforms.Resize(32): Resizes the smaller side of the image to 32. For example, if your image is 448 x 672, this operation would downsample the image to 32 x 48.
      • transforms.CenterCrop(224): Takes a crop from the center of the image, of size 32 x 32.
      • transforms.ToTensor(): Converts the image into a PyTorch tensor. All PyTorch models require PyTorch tensors as input.
      • transforms.Normalize(mean=..., std=...): Standardizes your input by subtracting the mean, then dividing by the standard deviation. This is described more precisely in the torchvision documentation.

      Finally, apply the image transformations to the image transform(im)[None].

      Next, define a utility function to log both the prediction and the intermediate decisions that led up to it. Place this before your main function:

      step_4_nbdt.py

      . . .
      def print_explanation(outputs, decisions):
          """Print the prediction and decisions"""
          _, predicted = outputs.max(1)
          cls = DATASET_TO_CLASSES['CIFAR10'][predicted[0]]
          print('Prediction:', cls, '// Decisions:', ', '.join([
              '{} ({:.2f}%)'.format(info['name'], info['prob'] * 100) for info in decisions[0]
          ][1:]))  # [1:] to skip the root
      . . .
      

      The print_explanations function computes and logs predictions and decisions:

      1. Starts by computing the index of the highest probability class outputs.max(1).
      2. Then, it converts that prediction into a human readable class name using the dictionary DATASET_TO_CLASSES['CIFAR10'][predicted[0]].
      3. Finally, it prints the prediction cls and the decisions info['name'], info['prob']....

      Conclude the script by populating the main with utilities we have written so far:

      step_4_nbdt.py

      . . .
      def main():
          model = get_model()
          x = load_image()
          outputs, decisions = model.forward_with_decisions(x)  # use `model(x)` to obtain just logits
          print_explanation(outputs, decisions)
      

      We perform model inference with explanations in several steps:

      1. Load the model get_model.
      2. Load the image load_image.
      3. Run model inference model.forward_with_decisions.
      4. Finally, print the prediction and explanations print_explanations.

      Close your file, and double-check your file contents matches step_4_nbdt.py. Then, run your script on the photo from earlier of two pets side-by-side.

      • python step_4_nbdt.py assets/catdog.jpg

      This will output the following, both the prediction and the corresponding justifications.

      Output

      Prediction: cat // Decisions: animal (99.34%), chordate (92.79%), carnivore (99.15%), cat (99.53%)

      This concludes the neural-backed decision tree section.

      Conclusion

      You have now run two types of Explainable AI approaches: a post-hoc explanation like saliency maps and a modified interpretable model using a rule-based system.

      There are many explainability techniques not covered in this tutorial. For further reading, please be sure to check out other ways to visualize and interpret neural networks; the utilities number many, from debugging to debiasing to avoiding catastrophic errors. There are many applications for Explainable AI (XAI), from sensitive applications like medicine to other mission-critical systems in self-driving cars.



      Source link

      Cómo engañar a una red neural en Phyton 3


      El autor seleccionó a Dev Color para recibir una donación como parte del programa Write for DOnations.

      ¿Sería posible engañar a una red neural para la clasificación de animales? Engañar a un clasificador de animales puede tener algunas consecuencias, ¿pero si pudiésemos engañar a nuestro autenticador facial? ¿O al software del prototipo de un vehículo autónomo? Afortunadamente, legiones de ingenieros e investigaciones están entre un modelo de visión computarizada de un prototipo y los modelos de calidad de producción, en nuestros dispositivos móviles o en nuestros vehículos. Aun así, estos riesgos tienen implicaciones significativas y es importante tenerlos en cuenta como profesional del aprendizaje automático.

      En este tutorial, intentará “engañar” a un clasificador de animales. A medida que avanza en este tutorial, usará OpenCV, una biblioteca de visión de computadora, y PyTorch, una biblioteca de aprendizaje profundo. Cubrirá los siguientes temas en el campo asociado de aprendizaje automático contradictorio:

      • Cree un ejemplo contradictorio objetivo. Seleccione una imagen, digamos un perro. Seleccione una clase objetivo, por ejemplo un gato. Su objetivo es engañar a la red neural para que crea que el perro de la imagen es un gato.
      • Cree una defensa contradictoria. En resumen, proteja su red neural contra estas imágenes engañosas, sin que sepa cuál es el truco.

      Al final de este tutorial, tendrá una herramienta para engañar a las redes neurales y comprenderá cómo defenderse contra los trucos.

      Requisitos previos

      Para completar este tutorial, necesitará lo siguiente:

      Paso 1: Clonar su proyecto e instalar las dependencias

      Vamos a crear un espacio de trabajo para este proyecto e instalaremos las dependencias que va a necesitar. Llamará a su espacio de trabajo AdversarialML:

      Diríjase al directorio AdversarialML:

      Cree un directorio para albergar sus activos:

      • mkdir ~/AdversarialML/assets

      Luego, cree un nuevo entorno virtual para el proyecto:

      • python3 -m venv adversarialml

      Active su entorno:

      • source adversarialml/bin/activate

      A continuación, instale PyTorch, un marco de trabajo de aprendizaje profundo para Python que utilizaremos en este tutorial.

      En macOS, instale Pytorch con el siguiente comando:

      • python -m pip install torch==1.2.0 torchvision==0.4.0

      En Linux y Windows, utilice los siguientes comandos para una compilación solo de CPU:

      • pip install torch==1.2.0+cpu torchvision==0.4.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
      • pip install torchvision

      Ahora instale los binarios empaquetados previamente para OpenCV y numpy, que son bibliotecas para la visión computarizada y el álgebra lineal, respectivamente. OpenCV ofrece utilidades como las rotaciones de imágenes y numpy ofrece utilidades de álgebra lineal, como la inversión de una matriz:

      • python -m pip install opencv-python==3.4.3.18 numpy==1.14.5

      En las distribuciones de Linux, deberá instalar libSM.so:

      • sudo apt-get install libsm6 libxext6 libxrender-dev

      Con las dependencias instaladas, vamos a ejecutar y clasificador de animales llamado ResNet18, que describiremos a continuación.

      Paso 2: Ejecutar un clasificador de animales preentrenado

      La biblioteca torchvision, la biblioteca de visión computarizada oficial para PyTorch, contiene versiones preentrenadas de redes neurales de visión computarizada usadas comúnmente. Estas redes neurales están entrenadas sobre ImageNet 2012, un conjunto de datos de 1,2 millones de imágenes de entrenamiento con 1000 clases. Estas clases incluyen vehículos, lugares y, sobre todo, animales. En este paso, ejecutará una de estas redes neurales preentrenadas, llamada ResNet18. Nos referiremos a ResNet18 entrenado en ImageNet como un “clasificador de animales”.

      ¿Qué es ResNet18? ResNet18 es la red neural más pequeña en una familia de redes neurales llamada redes neurales residuales, desarrollada por MSR (He et al.). En resumen, He descubrió que una red neural (denominada como una función f, con entrada x, y salida f(x) funcionaría mejor con una “conexión residual” x + f(x). Esta conexión residual se utiliza prolíficamente en redes neurales de última generación, incluso hoy en día. Por ejemplo, FBNetV2, FBNetV3.

      Descargue esta imagen de un perro con el siguiente comando:

      • wget -O assets/dog.jpg https://www.xpresservers.com/wp-content/uploads/2020/06/How-To-Trick-a-Neural-Network-in-Python-3.png

      Imagen de un corgi corriendo cerca de un estanque

      A continuación, descargue un archivo JSON para convertir el resultado de la red neural a un nombre de clase legible por el ser humano:

      • wget -O assets/imagenet_idx_to_label.json https://raw.githubusercontent.com/do-community/tricking-neural-networks/master/utils/imagenet_idx_to_label.json

      A continuación, cree una secuencia de comandos para ejecutar su modelo preentrenado sobre la imagen del perro. Cree un nuevo archivo llamado step_2_pretrained.py:

      • nano step_2_pretrained.py

      Primero, añada el texto estándar de Python importando los paquetes necesarios y declarando una función main:

      step_2_pretrained.py

      from PIL import Image
      import json
      import torchvision.models as models
      import torchvision.transforms as transforms
      import torch
      import sys
      
      def main():
          pass
      
      if __name__ == '__main__':
          main()
      

      A continuación, cargue la asignación desde el resultado de la red neural a nombres de clase legibles por el ser humano. Añada esto directamente tras sus declaraciones de importación y antes de su función main:

      step_2_pretrained.py

      . . .
      def get_idx_to_label():
          with open("assets/imagenet_idx_to_label.json") as f:
              return json.load(f)
      . . .
      

      Cree una función de transformación de imagen que garantizará que primero su imagen de entrada tenga las dimensiones correctas, y segundo que se haya normalizado correctamente. Añada la siguiente función directamente tras la última:

      step_2_pretrained.py

      . . .
      def get_image_transform():
          transform = transforms.Compose([
            transforms.Resize(224),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])
          ])
          return transform
      . . .
      

      En get_image_transform, define un número de diferentes transformaciones que aplicar a las imágenes que se pasan a su red neural.

      • transforms.Resize(224): cambia el tamaño del lado más pequeño de la imagen a 224. Por ejemplo, si su imagen es 448 x 672, esta operación reduciría la imagen a 224 x 336.
      • transforms.CenterCrop(224): hace un recorte desde el centro de la imagen, de un tamaño de 224 x 224.
      • transforms.ToTensor(): convierte la imagen a un tensor PyTorch. Todos los modelos requieren tensores PyTorch como entrada.
      • transforms.Normalize(mean=..., std=...): normaliza su entrada sustrayendo la media y, luego, dividiendo la desviación estándar. Esto se describe de forma más precisa en la documentación de torchvision.

      Añada una utilidad para predecir la clase animal, dada la imagen. Este método usa las utilidades anteriores para realizar la clasificación de animales:

      step_2_pretrained.py

      . . .
      def predict(image):
          model = models.resnet18(pretrained=True)
          model.eval()
      
          out = model(image)
      
          _, pred = torch.max(out, 1)  
          idx_to_label = get_idx_to_label()  
          cls = idx_to_label[str(int(pred))]  
          return cls
      . . .
      

      Aquí la función predict clasifica la imagen proporcionada usando una red neural preentrenada:

      • models.resnet18(pretrained=True): carga una red neural preentrenada llamada ResNet18.
      • model.eval(): modifica el modelo implementado para que se ejecute en modo “evaluación”. El único otro modo es el modo “entrenamiento”, pero el modo de entrenamiento no es necesario, ya que no está entrenando el modelo (es decir, actualizando los parámetros del modelo) en este tutorial.
      • out = model(image): ejecuta la red neural sobre la imagen transformada que se proporciona.
      • _, pred = torch.max(out, 1): la red neural da como resultado una probabilidad para cada clase posible. Este paso calcula el índice de la clase con la más alta probabilidad. Por ejemplo, si out = [0.4, 0.1, 0.2], entonces pred = 0.
      • idx_to_label = get_idx_to_label(): obtiene una asignación desde el índice de clase a nombres de clase legibles por el ser humano. Por ejemplo, la asignación podría ser {0: cat, 1: dog, 2: fish}.
      • cls = idx_to_label[str(int(pred))]: convierte el índice de clase predicho a un nombre de clase. Los ejemplos proporcionados en los últimos dos puntos arrojarían cls = idx_to_label[0] = 'cat.

      A continuación, tras la última función, añada una utilidad para cargar imágenes:

      step_2_pretrained.py

      . . .
      def load_image():
          assert len(sys.argv) > 1, 'Need to pass path to image'
          image = Image.open(sys.argv[1])
      
          transform = get_image_transform()
          image = transform(image)[None]
          return image
      . . .
      

      Esto cargará una imagen desde la ruta proporcionada en el primer argumento a la secuencia de comandos. transform(image)[None] aplica la secuencia de las transformaciones de la imagen definidas en las líneas anteriores.

      Finalmente, complete su función main con lo siguiente para cargar su imagen y clasificar el animal de la imagen:

      step_2_pretrained.py

      def main():
          x = load_image()
          print(f'Prediction: {predict(x)}')
      

      Compruebe que su archivo coincida con la secuencia de comandos final del paso 2 en step_2_pretrained.py en GitHub. Guarde y salga de su secuencia de comandos, y ejecute el clasificador de animales.

      • python step_2_pretrained.py assets/dog.jpg

      Esto producirá el siguiente resultado, lo que muestra que su clasificador de animales funciona como se espera:

      Output

      Prediction: Pembroke, Pembroke Welsh corgi

      Eso concluye ejecutar la interferencia con su modelo preentrenado. A continuación, verá un ejemplo contradictorio en acción engañando a una red neural con diferencias imperceptibles en la imagen.

      Paso 3: Probar un ejemplo contradictorio

      Ahora, sintetizará un ejemplo contradictorio, y probará la red neural en ese ejemplo. Para este tutorial, creará ejemplos contradictorios en formato x + r, donde x es la imagen original y r es cierta “perturbación”. Eventualmente creará la perturbación r usted mismo, pero, en este paso, descargará una que hemos creado de antemano. Comience descargando la perturbación r:

      • wget -O assets/adversarial_r.npy https://github.com/do-community/tricking-neural-networks/blob/master/outputs/adversarial_r.npy?raw=true

      Ahora componga la imagen con la perturbación. Cree un nuevo archivo llamado step_3_adversarial.py:

      • nano step_3_adversarial.py

      En este archivo, realizará el siguiente proceso de tres pasos para producir un ejemplo contradictorio:

      1. Transformar una imagen
      2. Aplicar la perturbación r
      3. Transformar a la inversa la imagen perturbada

      Al final del paso 3, tendrá una imagen contradictoria. Primero, importe los paquetes necesarios y declare una función main:

      step_3_adversarial.py

      from PIL import Image
      import torchvision.transforms as transforms
      import torch
      import numpy as np
      import os
      import sys
      
      from step_2_pretrained import get_idx_to_label, get_image_transform, predict, load_image
      
      
      def main():
          pass
      
      
      if __name__ == '__main__':
          main()
      

      A continuación, cree una “transformación de imagen” que invierta la transformación de la imagen anterior. Ponga esto tras sus importaciones, antes de la función main:

      step_3_adversarial.py

      . . .
      def get_inverse_transform():
          return transforms.Normalize(
              mean=[-0.485/0.229, -0.456/0.224, -0.406/0.255],  # INVERSE normalize images, according to https://pytorch.org/docs/stable/torchvision/models.html
              std=[1/0.229, 1/0.224, 1/0.255])
      . . .
      

      Como antes, la operación transforms.Normalize sustrae la media y divide por la desviación estándar (es decir, para la imagen original x, y = transforms.Normalize(mean=u, std=o) = (x - u) / o). Haga algo de álgebra y defina una nueva operación que invierta esta función normalizar (transforms.Normalize(mean=-u/o, std=1/o) = (y - -u/o) / 1/o = (y + u/o) o = yo + u = x).

      Como parte de la transformación inversa, añada un método que transforme un tensor PyTorch de vuelta a una imagen PIL. Añada esto tras la última función:

      step_3_adversarial.py

      . . .
      def tensor_to_image(tensor):
          x = tensor.data.numpy().transpose(1, 2, 0) * 255.  
          x = np.clip(x, 0, 255)
          return Image.fromarray(x.astype(np.uint8))
      . . .
      
      • tensor.data.numpy() convierte el tensor PyTorch en una matriz NumPy. .transpose(1, 2, 0) reordena (channels, width, height)en (height, width, channels). Esta matriz NumPy está aproximadamente en el intervalo (0, 1). Finalmente, multiplique por 255 para garantizar que la imagen está ahora en el intervalo (0, 255).
      • np.clip garantiza que todos los valores de la imagen están entre (0, 255).
      • x.asype(np.uint8) garantiza que todos los valores de la imagen sean enteros. Finalmente, Image.fromarray(...) crea un objeto de imagen PIL desde la matriz NumPy.

      A continuación, use estas utilidades para crear el ejemplo contradictorio con lo siguiente:

      step_3_adversarial.py

      . . .
      def get_adversarial_example(x, r):
          y = x + r
          y = get_inverse_transform()(y[0])
          image = tensor_to_image(y)
          return image
      . . .
      

      Esta función genera el ejemplo contradictorio descrito al inicio de la sección:

      1. y = x + r. Tome su perturbación r y añádala a la imagen original x.
      2. get_inverse_transform: obtenga y aplique la transformación de imagen inversa que definió hace varias líneas.
      3. tensor_to_image: por último, convierta el tensor PyTorch de vuelta a un objeto de imagen.

      Finalmente, modifique su función main para cargar la imagen, cargue la perturbación contradictoria r, aplique la perturbación, guarde el ejemplo contradictorio en el disco y ejecute la predicción sobre el ejemplo contradictorio:

      step_3_adversarial.py

      def main():
          x = load_image()
          r = torch.Tensor(np.load('assets/adversarial_r.npy'))
      
          # save perturbed image
          os.makedirs('outputs', exist_ok=True)
          adversarial = get_adversarial_example(x, r)
          adversarial.save('outputs/adversarial.png')
      
          # check prediction is new class
          print(f'Old prediction: {predict(x)}')
          print(f'New prediction: {predict(x + r)}')
      

      Su archivo completado debería coincidir con step_3_adversarial.py en GitHub. Guarde el archivo, salga del editor e inicie su secuencia de comandos con:

      • python step_3_adversarial.py assets/dog.jpg

      Verá este resultado:

      Output

      Old prediction: Pembroke, Pembroke Welsh corgi New prediction: goldfish, Carassius auratus

      Ahora ha creado un ejemplo contradictorio: engañar a la red neural para que crea que un corgi es un pez dorado. En el siguiente paso, creará la perturbación r que utilizó aquí.

      Paso 4: Comprender un ejemplo contradictorio

      Para obtener una preparación sobre la clasificación, consulte “Cómo crear un filtro de perro basado en emociones”.

      Dando un paso atrás, recuerde que su modelo de clasificación produce una probabilidad para cada clase. Durante la inferencia, el modelo predice la clase con la mayor probabilidad. Durante el entrenamiento, actualiza los parámetros del modelo t para maximizar la probabilidad de la clase correcta y, según sus datos x.

      argmax_y P(y|x,t)
      

      Sin embargo, para generar ejemplos contradictorios, ahora modifica su objetivo. En vez de encontrar una clase, su objetivo ahora es encontrar una nueva imagen, x. Tome cualquier clase distinta a la correcta. Vamos a llamar a esta nueva clase w. Su nuevo objetivo es maximizar la probabilidad de tener una clase equivocada.

      argmax_x P(w|x)
      

      Observe que las ponderaciones t de la red neural faltan de la expresión anterior. Esto es porque ahora asume la función de la contradicción: alguien más ha entrenado e implementado un modelo. Solo se le permite crear entradas contradictorias y no se le permite modificar el modelo implementado. Para generar el ejemplo contradictorio x, puede ejecutar “entrenamiento”, excepto que en vez de actualizar las ponderaciones de la red neural, actualiza la imagen de entrada con el nuevo objetivo.

      Como recordatorio, para este tutorial, asume que el ejemplo contradictorio es una transformación afín de x. En otras palabras, su ejemplo contradictorio toma la forma x + r para algunos r. En el siguiente paso, escribirá secuencia de comandos para generar este r.

      Paso 5: Crear un ejemplo contradictorio

      En este paso, aprenderá una perturbación r, de forma que su corgi esté mal clasificado como un pez dorado. Cree un nuevo archivo llamado step_5_adversarial.py:

      Importe los paquetes necesarios y declare una función main:

      step_5_perturb.py

      from torch.autograd import Variable
      import torchvision.models as models
      import torch.nn as nn
      import torch.optim as optim
      import numpy as np
      import torch
      import os
      
      from step_2_pretrained import get_idx_to_label, get_image_transform, predict, load_image
      from step_3_adversarial import get_adversarial_example
      
      
      def main():
          pass
      
      
      if __name__ == '__main__':
          main()
      

      Directamente tras sus importaciones y antes de la función main, defina dos constantes:

      step_5_perturb.py

      . . .
      TARGET_LABEL = 1
      EPSILON = 10 / 255.
      . . .
      

      La primera constante TARGET_LABEL es la clase para clasificar erróneamente al corgi. En este caso, el índice 1 corresponde a “pez dorado”. La segunda constante EPSILON es la cantidad máxima de perturbación permitida para cada valor de imagen. Este límite se introduce de manera que la imagen se altere de forma imperceptible.

      Tras sus dos constantes, añada una función helper para definir una red neural y el parámetro perturbación r:

      step_5_perturb.py

      . . .
      def get_model():
          net = models.resnet18(pretrained=True).eval()
          r = nn.Parameter(data=torch.zeros(1, 3, 224, 224), requires_grad=True)
          return net, r
      . . .
      
      • model.resnet18(pretrained=True) carga una red neural preentrenada, llamada ResNet18, como antes. También como antes, establece el modelo para el modo de evaluación usando .eval.
      • nn.Parameter(...) define una nueva perturbación r, el tamaño de la imagen de entrada. La imagen de entrada también es de tamaño (1, 3, 224, 224). El argumento de palabra clave requires_grad=True garantiza que puede actualizar esta perturbación en líneas posteriores, en este archivo.

      A continuación, comience a modificar su función main. Comience cargando la red del modelo, cargando las entradas x y definiendo la etiqueta label:

      step_5_perturb.py

      . . .
      def main():
          print(f'Target class: {get_idx_to_label()[str(TARGET_LABEL)]}')
          net, r = get_model()
          x = load_image()
          labels = Variable(torch.Tensor([TARGET_LABEL])).long()
        . . .
      

      A continuación, defina tanto el criterio como el optimizador de su función main. El primero le indica a PyTorch cuál es el objetivo: es decir, qué pérdida minimizar. Este último le indica a PyTorch cómo entrenar su parámetro r:

      step_5_perturb.py

      . . .
          criterion = nn.CrossEntropyLoss()
          optimizer = optim.SGD([r], lr=0.1, momentum=0.1)
      . . .
      

      Justo después, añada el bucle de entrenamiento principal para su parámetro r:

      step_5_perturb.py

      . . .
          for i in range(30):
              r.data.clamp_(-EPSILON, EPSILON)
              optimizer.zero_grad()
      
              outputs = net(x + r)
              loss = criterion(outputs, labels)
              loss.backward()
              optimizer.step()
      
              _, pred = torch.max(outputs, 1)
              if i % 5 == 0:
                  print(f'Loss: {loss.item():.2f} / Class: {get_idx_to_label()[str(int(pred))]}')
      . . .
      

      En cada iteración de este bucle de entrenamiento, usted:

      • r.data.clamp_(...): asegúrese de que el parámetro r es pequeño, dentro de EPSILON de 0.
      • optimizer.zero_grad(): borre cualquier gradiente que haya calculado en la iteración anterior.
      • model(x + r): ejecute la inferencia sobre la imagen modificada x + r.
      • Calcule la pérdida.
      • Calcule el gradiente loss.backward.
      • Tome un paso de descenso de gradiente optimizer.step.
      • Calcule la predicción pred.
      • Finalmente, informe de la pérdida y la clase predicha print(...).

      A continuación, guarde la perturbación final r:

      step_5_perturb.py

      def main():
          . . .
          for i in range(30):
              . . .
          . . .
          np.save('outputs/adversarial_r.npy', r.data.numpy())
      

      Justo después, aún en la función main, guarde la imagen perturbada:

      step_5_perturb.py

      . . .
          os.makedirs('outputs', exist_ok=True)
          adversarial = get_adversarial_example(x, r)
      

      Finalmente, ejecute la predicción tanto sobre la imagen original como sobre el ejemplo contradictorio:

      step_5_perturb.py

          print(f'Old prediction: {predict(x)}')
          print(f'New prediction: {predict(x + r)}')
      

      Compruebe que su secuencia de comandos coincide con step_5_perturb.py en GitHub. Guarde, salga y ejecute la secuencia de comandos.

      • python step_5_perturb.py assets/dog.jpg

      El resultado de su secuencia de comandos será la siguiente.

      Output

      Target class: goldfish, Carassius auratus Loss: 17.03 / Class: Pembroke, Pembroke Welsh corgi Loss: 8.19 / Class: Pembroke, Pembroke Welsh corgi Loss: 5.56 / Class: Pembroke, Pembroke Welsh corgi Loss: 3.53 / Class: Pembroke, Pembroke Welsh corgi Loss: 1.99 / Class: Pembroke, Pembroke Welsh corgi Loss: 1.00 / Class: goldfish, Carassius auratus Old prediction: Pembroke, Pembroke Welsh corgi New prediction: goldfish, Carassius auratus

      Las últimas dos líneas indican que ahora ha completado la construcción de un ejemplo contradictorio desde cero. Su red neural ahora clasifica una imagen de corgi perfectamente razonable como un pez dorado.

      Ahora ha demostrado que las redes neurales pueden ser engañadas fácilmente; además, la falta de robustez para los ejemplos contradictorios tiene consecuencias significativas. Una pregunta natural es esta: ¿cómo puede combatir los ejemplos contradictorios? Varias organizaciones han llevado a cabo extensas investigaciones, incluyendo OpenAI. En la siguiente sección, ejecutará una defensa para frustrar este ejemplo contradictorio.

      Paso 6: Defenderse contra ejemplos contradictorios

      En este paso, implementará una defensa contra ejemplos contradictorios. La idea es la siguiente: ahora es el propietario del clasificador de animales implementado a producción. No sabe qué ejemplos contradictorios pueden generarse, pero puede modificar la imagen o el modelo para protegerse contra ataques.

      Antes de defender, debería ver por sí que la manipulación de imágenes es imperceptible. Abra las dos imágenes siguientes:

      1. assets/dog.jpg
      2. outputs/adversarial.png

      Aquí, muestra ambas juntas. Su imagen original tendrá una relación de aspecto diferente. ¿Sabe cuál es el ejemplo contradictorio?

      (izquierda) Corgi como pez dorado, contradictoria; (derecha) Corgi como sí mismo, no contradictoria

      Observe que la nueva imagen parece idéntica a la original. En realidad, la imagen izquierda es su imagen contradictoria. Para estar seguro, descargue la imagen y ejecute su secuencia de comandos de evaluación:

      • wget -O assets/adversarial.png https://github.com/alvinwan/fooling-neural-network/blob/master/outputs/adversarial.png?raw=true
      • python step_2_pretrained.py assets/adversarial.png

      Esto dará como resultado la clase goldfish para demostrar su naturaleza contradictoria:

      Output

      Prediction: goldfish, Carassius auratus

      Ejecutará una defensa bastante ingenua, pero eficaz: comprima la imagen escribiendo a un formato JPEG. Abra la instrucción interactiva Python:

      A continuación, cargue la imagen contradictoria como PNG y guárdela como JPEG.

      • from PIL import Image
      • image = Image.open('assets/adversarial.png')
      • image.save('outputs/adversarial.jpg')

      Escriba CTRL + D para dejar la instrucción interactiva Python. A continuación, ejecute la inferencia con su modelo en el ejemplo contradictorio comprimido:

      • python step_2_pretrained.py outputs/adversarial.jpg

      Ahora dará como resultado la clase corgi, demostrando la eficacia de su defensa ingenua.

      Output

      Prediction: Pembroke, Pembroke Welsh corgi

      Ahora ha completado su primera defensa contradictoria. Observe que esta defensa no requiere saber cómo se generó el ejemplo contradictorio. Esto es lo que hace que una defensa sea efectiva. Existen también muchas otras formas de defensa, muchas de las cuales implican volver a entrenar la red neural. Sin embargo, estos procedimientos de entrenamiento son un tema en sí mismos y están más allá del ámbito de este tutorial. Con esto, concluye su guía sobre el aprendizaje automático contradictorio.

      Conclusión

      Para comprender las implicaciones de su trabajo en este tutorial, vuelva a ver las dos imágenes lado a lado: la original y el ejemplo contradictorio.

      (izquierda) Corgi como pez dorado, contradictoria, (derecha) Corgi como sí mismo, no contradictoria

      A pesar de que ambas imágenes parecen idénticas al ojo humano, la primera ha sido manipulada para engañar a su modelo. Ambas imágenes claramente muestran un corgi, y aun así el modelo tiene total confianza de que el segundo modelo contiene un pez dorado. Esto debería preocuparle y, a medida que finaliza este tutorial, tenga en cuenta la fragilidad de su modelo. Solo aplicando una transformación simple, puede engañarlo. Estos son peligros reales y plausibles que evaden incluso con investigación de vanguardia. La investigación más allá de la seguridad del aprendizaje automático es igual de susceptible a estos defectos, como profesional, depende de usted aplicar el aprendizaje automático de forma segura. Para obtener más información, eche un vistazo a los siguientes enlaces:

      Para obtener más contenido sobre el aprendizaje automático, puede visitar nuestra página Tema de aprendizaje automático.



      Source link