One place for hosting & domains

      Python

      How To Scrape Web Pages and Post Content to Twitter with Python 3


      The author selected The Computer History Museum to receive a donation as part of the Write for DOnations program.

      Introduction

      Twitter bots are a powerful way of managing your social media as well as extracting information from the microblogging network. By leveraging Twitter’s versatile APIs, a bot can do a lot of things: tweet, retweet, “favorite-tweet”, follow people with certain interests, reply automatically, and so on. Even though people can, and do, abuse their bot’s power, leading to a negative experience for other users, research shows that people view Twitter bots as a credible source of information. For example, a bot can keep your followers engaged with content even when you’re not online. Some bots even provide critical and helpful information, like @EarthquakesSF. The applications for bots are limitless. As of 2019, it is estimated that bots account for about 24% of all tweets on Twitter.

      In this tutorial, you’ll build a Twitter bot using this Twitter API library for Python. You’ll use API keys from your Twitter account to authorize your bot and build a to capable of scraping content from two websites. Furthermore, you’ll program your bot to alternately tweet content from these two websites and at set time intervals. Note that you’ll use Python 3 in this tutorial.

      Prerequisites

      You will need the following to complete this tutorial:

      Note: You’ll be setting up a developer account with Twitter, which involves an application review by Twitter before your can access the API keys you require for this bot. Step 1 walks through the specific details for completing the application.

      Step 1 — Setting Up Your Developer Account and Accessing Your Twitter API Keys

      Before you begin coding your bot, you’ll need the API keys for Twitter to recognize the requests of your bot. In this step, you’ll set up your Twitter Developer Account and access your API keys for your Twitter bot.

      To get your API keys, head over to developer.twitter.com and register your bot application with Twitter by clicking on Apply in the top right section of the page.

      Now click on Apply for a developer account.

      Next, click on Continue to associate your Twitter username with your bot application that you’ll be building in this tutorial.

      Twitter Username Association with Bot

      On the next page, for the purposes of this tutorial, you’ll choose the I am requesting access for my own personal use option since you’ll be building a bot for your own personal education use.

      Twitter API Personal Use

      After choosing your Account Name and Country, move on to the next section. For What use case(s) are you interested in?, pick the Publish and curate Tweets and Student project / Learning to code options. These categories are the best representation of why you’re completing this tutorial.

      Twitter Bot Purpose

      Then provide a description of the bot you’re trying to build. Twitter requires this to protect against bot abuse; in 2018 they introduced such vetting. For this tutorial, you’ll be scraping tech-focused content from The New Stack and The Coursera Blog.

      When deciding what to enter into the description box, model your answer on the following lines for the purposes of this tutorial:

      I’m following a tutorial to build a Twitter bot that will scrape content from websites like thenewstack.io (The New Stack) and blog.coursera.org (Coursera’s Blog) and tweet quotes from them. The scraped content will be aggregated and will be tweeted in a round-robin fashion via Python generator functions.

      Finally, choose no for Will your product, service, or analysis make Twitter content or derived information available to a government entity?

      Twitter Bot Intent

      Next, accept Twitter’s terms and conditions, click on Submit application, and then verify your email address. Twitter will send a verification email to you after your submission of this form.

      Once you verify your email, you’ll get an Application under review page with a feedback form for the application process.

      You will also receive another email from Twitter regarding the review:

      Application Review Email

      The timeline for Twitter’s application review process can vary significantly, but often Twitter will confirm this within a few minutes. However, should your application’s review take longer than this, it is not unusual, and you should receive it within a day or two. Once you receive confirmation, Twitter has authorized you to generate your keys. You can access these under the Keys and tokens tab after clicking the details button of your app on developer.twitter.com/apps.

      Finally go to the Permissions tab on your app’s page and set the Access Permission option to Read and Write since you want to write tweet content too. Usually, you would use the read-only mode for research purposes like analyzing trends, data-mining, and so on. The final option allows users to integrate chatbots into their existing apps, since chatbots require access to direct messages.

      Twitter App Permissions Page

      You have access to Twitter’s powerful API, which will be a crucial part of your bot application. Now you’ll set up your environment and begin building your bot.

      Step 2 — Building the Essentials

      In this step, you’ll write code to authenticate your bot with Twitter using the API keys, and make the first programmatic tweet via your Twitter handle. This will serve as a good milestone in your path towards the goal of building a Twitter bot that scrapes content from The New Stack and the Coursera Blog and tweets them periodically.

      First, you’ll set up a project folder and a specific programming environment for your project.

      Create your project folder:

      Move into your project folder:

      Then create a new Python virtual environment for your project:

      Then activate your environment using the following command:

      • source bird-env/bin/activate

      This will attach a (bird-env) prefix to the prompt in your terminal window.

      Now move to your text editor and create a file called credentials.py, which will store your Twitter API keys:

      Add the following content, replacing the highlighted code with your keys from Twitter:

      bird/credentials.py

      
      ACCESS_TOKEN='your-access-token'
      ACCESS_SECRET='your-access-secret'
      CONSUMER_KEY='your-consumer-key'
      CONSUMER_SECRET='your-consumer-secret'
      

      Now, you'll install the main API library for sending requests to Twitter. For this project, you'll require the following libraries: nltk, requests, twitter, lxml, random, and time. random and time are part of Python's standard library, so you don't need to separately install these libraries. To install the remaining libraries, you'll use pip, a package manager for Python.

      Open your terminal, ensure you're in the project folder, and run the following command:

      • pip3 install lxml nltk requests twitter
      • lxml and requests: You will use them for web scraping.
      • twitter: This is the library for making API calls to Twitter's servers.
      • nltk: (natural language toolkit) You will use to split paragraphs of blogs into sentences.
      • random: You will use this to randomly select parts of an entire scraped blog post.
      • time: You will use to make your bot sleep periodically after certain actions.

      Once you have installed the libraries, you're all set to begin programming. Now, you'll import your credentials into the main script that will run the bot. Alongside credentials.py, from your text editor create a file in the bird project directory, and name it bot.py:

      In practice, you would spread the functionality of your bot across multiple files as it grows more and more sophisticated. However, in this tutorial, you'll put all of your code in a single script, bot.py, for demonstration purposes.

      First you'll test your API keys by authorizing your bot. Begin by adding the following snippet to bot.py:

      bird/bot.py

      import random
      import time
      
      from lxml.html import fromstring
      import nltk
      nltk.download('punkt')
      import requests
      from twitter import OAuth, Twitter
      
      import credentials
      

      Here, you import the required libraries; and in a couple of instances you import the necessary functions from the libraries. You will use the fromstring function later in the code to convert the string source of a scraped webpage to a tree structure that makes it easier to extract relevant information from the page. OAuth will help you in constructing an authentication object from your keys, and Twitter will build the main API object for all further communication with Twitter's servers.

      Now extend bot.py with the following lines:

      bird/bot.py

      ...
      tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
      
      oauth = OAuth(
              credentials.ACCESS_TOKEN,
              credentials.ACCESS_SECRET,
              credentials.CONSUMER_KEY,
              credentials.CONSUMER_SECRET
          )
      t = Twitter(auth=oauth)
      

      nltk.download('punkt') downloads a dataset necessary for parsing paragraphs and tokenizing (splitting) them into smaller components. tokenizer is the object you'll use later in the code for splitting paragraphs written in English.

      oauth is the authentication object constructed by feeding the imported OAuth class with your API keys. You authenticate your bot via the line t = Twitter(auth=oauth). ACCESS_TOKEN and ACCESS_SECRET help in recognizing your application. Finally, CONSUMER_KEY and CONSUMER_SECRET help in recognizing the handle via which the application interacts with Twitter. You'll use this t object to communicate your requests to Twitter.

      Now save this file and run it in your terminal using the following command:

      Your output will look similar to the following, which means your authorization was successful:

      Output

      [nltk_data] Downloading package punkt to /Users/binaryboy/nltk_data... [nltk_data] Package punkt is already up-to-date!

      If you do receive an error, verify your saved API keys with those in your Twitter developer account and try again. Also ensure that the required libraries are installed correctly. If not, use pip3 again to install them.

      Now you can try tweeting something programmatically. Type the same command on the terminal with the -i flag to open the Python interpreter after the execution of your script:

      Next, type the following to send a tweet via your account:

      • t.statuses.update(status="Just setting up my Twttr bot")

      Now open your Twitter timeline in a browser, and you'll see a tweet at the top of your timeline containing the content you posted.

      First Programmatic Tweet

      Close the interpreter by typing quit() or CTRL + D.

      Your bot now has the fundamental capability to tweet. To develop your bot to tweet useful content, you'll incorporate web scraping in the next step.

      Step 3 — Scraping Websites for Your Tweet Content

      To introduce some more interesting content to your timeline, you'll scrape content from the New Stack and the Coursera Blog, and then post this content to Twitter in the form of tweets. Generally, to scrape the appropriate data from your target websites, you have to experiment with their HTML structure. Each tweet coming from the bot you'll build in this tutorial will have a link to a blog post from the chosen websites, along with a random quote from that blog. You'll implement this procedure within a function specific to scraping content from Coursera, so you'll name it scrape_coursera().

      First open bot.py:

      Add the scrape_coursera() function to the end of your file:

      bird/bot.py

      ...
      t = Twitter(auth=oauth)
      
      
      def scrape_coursera():
      

      To scrape information from the blog, you'll first request the relevant webpage from Coursera's servers. For that you will use the get() function from the requests library. get() takes in a URL and fetches the corresponding webpage. So, you'll pass blog.coursera.org as an argument to get(). But you also need to provide a header in your GET request, which will ensure Coursera's servers recognize you as a genuine client. Add the following highlighted lines to your scrape_coursera() function to provide a header:

      bird/bot.py

      def scrape_coursera():
          HEADERS = {
              'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5)'
                            ' AppleWebKit/537.36 (KHTML, like Gecko) Cafari/537.36'
              }
      

      This header will contain information pertaining to a defined web browser running on a specific operating system. As long as this information (usually referred to as User-Agent) corresponds to real web browsers and operating systems, it doesn't matter whether the header information aligns with the actual web browser and operating system on your computer. Therefore this header will work fine for all systems.

      Once you have defined the headers, add the following highlighted lines to make a GET request to Coursera by specifying the URL of the blog webpage:

      bird/bot.py

      ...
      def scrape_coursera():
          HEADERS = {
              'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5)'
                            ' AppleWebKit/537.36 (KHTML, like Gecko) Cafari/537.36'
              }
          r = requests.get('https://blog.coursera.org', headers=HEADERS)
          tree = fromstring(r.content)
      

      This will fetch the webpage to your machine and save the information from the entire webpage in the variable r. You can assess the HTML source code of the webpage using the content attribute of r. Therefore, the value of r.content is the same as what you see when you inspect the webpage in your browser by right clicking on the page and choosing the Inspect Element option.

      Here you've also added the fromstring function. You can pass the webpage's source code to the fromstring function imported from the lxml library to construct the tree structure of the webpage. This tree structure will allow you to conveniently access different parts of the webpage. HTML source code has a particular tree-like structure; every element is enclosed in the <html> tag and nested thereafter.

      Now, open https://blog.coursera.org in a browser and inspect its HTML source using the browser's developer tools. Right click on the page and choose the Inspect Element option. You'll see a window appear at the bottom of the browser, showing part of the page's HTML source code.

      browser-inspect

      Next, right click on the thumbnail of any visible blog post and then inspect it. The HTML source will highlight the relevant HTML lines where that blog thumbnail is defined. You'll notice that all blog posts on this page are defined within a <div> tag with a class of "recent":

      blog-div

      Thus, in your code, you'll use all such blog post div elements via their XPath, which is a convenient way of addressing elements of a web page.

      To do so, extend your function in bot.py as follows:

      bird/bot.py

      ...
      def scrape_coursera():
          HEADERS = {
              'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5)'
                            ' AppleWebKit/537.36 (KHTML, like Gecko) Cafari/537.36'
                          }
          r = requests.get('https://blog.coursera.org', headers=HEADERS)
          tree = fromstring(r.content)
          links = tree.xpath('//div[@class="recent"]//div[@class="title"]/a/@href')
          print(links)
      
      scrape_coursera()
      

      Here, the XPath (the string passed to tree.xpath()) communicates that you want div elements from the entire web page source, of class "recent". The // corresponds to searching the whole webpage, div tells the function to extract only the div elements, and [@class="recent"] asks it to only extract those div elements that have the values of their class attribute as "recent".

      However, you don't need these elements themselves, you only need the links they're pointing to, so that you can access the individual blog posts to scrape their content. Therefore, you extract all the links using the values of the href anchor tags that are within the previous div tags of the blog posts.

      To test your program so far, you call the scrape_coursera() function at the end of bot.py.

      Save and exit bot.py.

      Now run bot.py with the following command:

      In your output, you'll see a list of URLs like the following:

      Output

      ['https://blog.coursera.org/career-stories-from-inside-coursera/', 'https://blog.coursera.org/unlock-the-power-of-data-with-python-university-of-michigan-offers-new-programming-specializations-on-coursera/', ...]

      After you verify the output, you can remove the last two highlighted lines from bot.py script:

      bird/bot.py

      ...
      def scrape_coursera():
          ...
          tree = fromstring(r.content)
          links = tree.xpath('//div[@class="recent"]//div[@class="title"]/a/@href')
          ~~print(links)~~
      
      ~~scrape_coursera()~~
      

      Now extend the function in bot.py with the following highlighted line to extract the content from a blog post:

      bird/bot.py

      ...
      def scrape_coursera():
          ...
          links = tree.xpath('//div[@class="recent"]//div[@class="title"]/a/@href')
          for link in links:
              r = requests.get(link, headers=HEADERS)
              blog_tree = fromstring(r.content)
      

      You iterate over each link, fetch the corresponding blog post, extract a random sentence from the post, and then tweet this sentence as a quote, along with the corresponding URL. Extracting a random sentence involves three parts:

      1. Grabbing all the paragraphs in the blog post as a list.
      2. Selecting a paragraph at random from the list of paragraphs.
      3. Selecting a sentence at random from this paragraph.

      You'll execute these steps for each blog post. For fetching one, you make a GET request for its link.

      Now that you have access to the content of a blog, you will introduce the code that executes these three steps to extract the content you want from it. Add the following extension to your scraping function that executes the three steps:

      bird/bot.py

      ...
      def scrape_coursera():
          ...
          for link in links:
              r = requests.get(link, headers=HEADERS)
              blog_tree = fromstring(r.content)
              paras = blog_tree.xpath('//div[@class="entry-content"]/p')
              paras_text = [para.text_content() for para in paras if para.text_content()]
              para = random.choice(paras_text)
              para_tokenized = tokenizer.tokenize(para)
              for _ in range(10):
                  text = random.choice(para)
                  if text and 60 < len(text) < 210:
                      break
      

      If you inspect the blog post by opening the first link, you'll notice that all the paragraphs belong to the div tag having entry-content as its class. Therefore, you extract all paragraphs as a list with paras = blog_tree.xpath('//div[@class="entry-content"]/p').

      Div Enclosing Paragraphs

      The list elements aren't literal paragraphs; they are Element objects. To extract the text out of these objects, you use the text_content() method. This line follows Python's list comprehension design pattern, which defines a collection using a loop that is usually written out in a single line. In bot.py, you extract the text for each paragraph element object and store it in a list if the text is not empty. To randomly choose a paragraph from this list of paragraphs, you incorporate the random module.

      Finally, you have to select a sentence at random from this paragraph, which is stored in the variable para. For this task, you first break the paragraph into sentences. One approach to accomplish this is using the Python's split() method. However this can be difficult since a sentence can be split at multiple breakpoints. Therefore, to simplify your splitting tasks, you leverage natural language processing through the nltk library. The tokenizer object you defined earlier in the tutorial will be useful for this purpose.

      Now that you have a list of sentences, you call random.choice() to extract a random sentence. You want this sentence to be a quote for a tweet, so it can't exceed 280 characters. However, for aesthetic reasons, you'll select a sentence that is neither too big nor too small. You designate that your tweet sentence should have a length between 60 to 210 characters. The sentence random.choice() picks might not satisfy this criterion. To identify the right sentence, your script will make ten attempts, checking for the criterion each time. Once the randomly picked-up sentence satisfies your criterion, you can break out of the loop.

      Although the probability is quite low, it is possible that none of the sentences meet this size condition within ten attempts. In this case, you'll ignore the corresponding blog post and move on to the next one.

      Now that you have a sentence to quote, you can tweet it with the corresponding link. You can do this by yielding a string that contains the randomly picked-up sentence as well as the corresponding blog link. The code that calls this scrape_coursera() function will then post the yielded string to Twitter via Twitter's API.

      Extend your function as follows:

      bird/bot.py

      ...
      def scrape_coursera():
          ...
          for link in links:
              ...
              para_tokenized = tokenizer.tokenize(para)
              for _ in range(10):
                  text = random.choice(para)
                  if text and 60 < len(text) < 210:
                      break
              else:
                  yield None
              yield '"%s" %s' % (text, link)
      

      The script only executes the else statement when the preceding for loop doesn't break. Thus, it only happens when the loop is not able to find a sentence that fits your size condition. In that case, you simply yield None so that the code that calls this function is able to determine that there is nothing to tweet. It will then move on to call the function again and get the content for the next blog link. But if the loop does break it means the function has found an appropriate sentence; the script will not execute the else statement, and the function will yield a string composed of the sentence as well as the blog link, separated by a single whitespace.

      The implementation of the scrape_coursera() function is almost complete. If you want to make a similar function to scrape another website, you will have to repeat some of the code you've written for scraping Coursera's blog. To avoid rewriting and duplicating parts of the code and to ensure your bot's script follows the DRY principle (Don't Repeat Yourself), you'll identify and abstract out parts of the code that you will use again and again for any scraper function written later.

      Regardless of the website the function is scraping, you'll have to randomly pick up a paragraph and then choose a random sentence from this chosen paragraph — you can extract out these functionalities in separate functions. Then you can simply call these functions from your scraper functions and achieve the desired result. You can also define HEADERS outside the scrape_coursera() function so that all of the scraper functions can use it. Therefore, in the code that follows, the HEADERS definition should precede that of the scraper function, so that eventually you're able to use it for other scrapers:

      bird/bot.py

      ...
      HEADERS = {
          'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5)'
                        ' AppleWebKit/537.36 (KHTML, like Gecko) Cafari/537.36'
          }
      
      
      def scrape_coursera():
          r = requests.get('https://blog.coursera.org', headers=HEADERS)
          ...
      

      Now you can define the extract_paratext() function for extracting a random paragraph from a list of paragraph objects. The random paragraph will pass to the function as a paras argument, and return the chosen paragraph's tokenized form that you'll use later for sentence extraction:

      bird/bot.py

      ...
      HEADERS = {
              'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5)'
                            ' AppleWebKit/537.36 (KHTML, like Gecko) Cafari/537.36'
              }
      
      def extract_paratext(paras):
          """Extracts text from <p> elements and returns a clean, tokenized random
          paragraph."""
      
          paras = [para.text_content() for para in paras if para.text_content()]
          para = random.choice(paras)
          return tokenizer.tokenize(para)
      
      
      def scrape_coursera():
          r = requests.get('https://blog.coursera.org', headers=HEADERS)
          ...
      

      Next, you will define a function that will extract a random sentence of suitable length (between 60 and 210 characters) from the tokenized paragraph it gets as an argument, which you can name as para. If such a sentence is not discovered after ten attempts, the function returns None instead. Add the following highlighted code to define the extract_text() function:

      bird/bot.py

      ...
      
      def extract_paratext(paras):
          ...
          return tokenizer.tokenize(para)
      
      
      def extract_text(para):
          """Returns a sufficiently-large random text from a tokenized paragraph,
          if such text exists. Otherwise, returns None."""
      
          for _ in range(10):
              text = random.choice(para)
              if text and 60 < len(text) < 210:
                  return text
      
          return None
      
      
      def scrape_coursera():
          r = requests.get('https://blog.coursera.org', headers=HEADERS)
          ...
      

      Once you have defined these new helper functions, you can redefine the scrape_coursera() function to look as follows:

      bird/bot.py

      ...
      def extract_paratext():
          for _ in range(10):<^>
              text = random.choice(para)
          ...
      
      
      def scrape_coursera():
          """Scrapes content from the Coursera blog."""
      
          url = 'https://blog.coursera.org'
          r = requests.get(url, headers=HEADERS)
          tree = fromstring(r.content)
          links = tree.xpath('//div[@class="recent"]//div[@class="title"]/a/@href')
      
          for link in links:
              r = requests.get(link, headers=HEADERS)
              blog_tree = fromstring(r.content)
              paras = blog_tree.xpath('//div[@class="entry-content"]/p')
              para = extract_paratext(paras)
              text = extract_text(para)
              if not text:
                  continue
      
              yield '"%s" %s' % (text, link)
      

      Save and exit bot.py.

      Here you're using yield instead of return because, for iterating over the links, the scraper function will give you the tweet strings one-by-one in a sequential fashion. This means when you make a first call to the scraper sc defined as sc = scrape_coursera(), you will get the tweet string corresponding to the first link among the list of links that you computed within the scraper function. If you run the following code in the interpreter, you'll get string_1 and string_2 as displayed below, if the links variable within scrape_coursera() holds a list that looks like ["https://thenewstack.io/cloud-native-live-twistlocks-virtual-conference/", "https://blog.coursera.org/unlock-the-power-of-data-with-python-university-of-michigan-offers-new-programming-specializations-on-coursera/", ...].

      Instantiate the scraper and call it sc:

      >>> sc = scrape_coursera()
      

      It is now a generator; it generates or scrapes relevant content from Coursera, one at a time. You can access the scraped content one-by-one by calling next() over sc sequentially:

      >>> string_1 = next(sc)
      >>> string_2 = next(sc)
      

      Now you can print the strings you've defined to display the scraped content:

      >>> print(string_1)
      "Other speakers include Priyanka Sharma, director of cloud native alliances at GitLab and Dan Kohn, executive director of the Cloud Native Computing Foundation." https://thenewstack.io/cloud-native-live-twistlocks-virtual-conference/
      >>>
      >>> print(string_2)
      "You can learn how to use the power of Python for data analysis with a series of courses covering fundamental theory and project-based learning." https://blog.coursera.org/unlock-the-power-of-data-with-python-university-of-michigan-offers-new-programming-specializations-on-coursera/
      >>>
      

      If you use return instead, you will not be able to obtain the strings one-by-one and in a sequence. If you simply replace the yield with return in scrape_coursera(), you'll always get the string corresponding to the first blog post, instead of getting the first one in the first call, second one in the second call, and so on. You can modify the function to simply return a list of all the strings corresponding to all the links, but that is more memory intensive. Also, this kind of program could potentially make a lot of requests to Coursera's servers within a short span of time if you want the entire list quickly. This could result in your bot getting temporarily banned from accessing a website. Therefore, yield is the best fit for a wide variety of scraping jobs, where you only need information scraped one-at-a-time.

      Step 4 — Scraping Additional Content

      In this step, you'll build a scraper for thenewstack.io. The process is similar to what you've completed in the previous step, so this will be a quick overview.

      Open the website in your browser and inspect the page source. You'll find here that all blog sections are div elements of class normalstory-box.

      HTML Source Inspection of The New Stack website

      Now you'll make a new scraper function named scrape_thenewstack() and make a GET request to thenewstack.io from within it. Next, extract the links to the blogs from these elements and then iterate over each link. Add the following code to achieve this:

      bird/bot.py

      ...
      def scrape_coursera():
          ...
          yield '"%s" %s' % (text, link)
      
      
      def scrape_thenewstack():
          """Scrapes news from thenewstack.io"""
      
          r = requests.get('https://thenewstack.io', verify=False)
      
              tree = fromstring(r.content)
              links = tree.xpath('//div[@class="normalstory-box"]/header/h2/a/@href')
              for link in links:
      

      You use the verify=False flag because websites can sometimes have expired security certificates and it's OK to access them if no sensitive data is involved, as is the case here. The verify=False flag tells the requests.get method to not verify the certificates and continue fetching data as usual. Otherwise, the method throws an error about expired security certificates.

      You can now extract the paragraphs of the blog corresponding to each link, and use the extract_paratext() function you built in the previous step to pull out a random paragraph from the list of available paragraphs. Finally, extract a random sentence from this paragraph using the extract_text() function, and then yield it with the corresponding blog link. Add the following highlighted code to your file to accomplish these tasks:

      bird/bot.py

      ...
      def scrape_thenewstack():
          ...
          links = tree.xpath('//div[@class="normalstory-box"]/header/h2/a/@href')
      
          for link in links:
              r = requests.get(link, verify=False)
              tree = fromstring(r.content)
              paras = tree.xpath('//div[@class="post-content"]/p')
              para = extract_paratext(paras)
              text = extract_text(para)  
              if not text:
                  continue
      
              yield '"%s" %s' % (text, link)
      

      You now have an idea of what a scraping process generally encompasses. You can now build your own, custom scrapers that can, for example, scrape the images in blog posts instead of random quotes. For that, you can look for the relevant <img> tags. Once you have the right path for tags, which serve as their identifiers, you can access the information within tags using the names of corresponding attributes. For example, in the case of scraping images, you can access the links of images using their src attributes.

      At this point, you've built two scraper functions for scraping content from two different websites, and you've also built two helper functions to reuse functionalities that are common across the two scrapers. Now that your bot knows how to tweet and what to tweet, you'll write the code to tweet the scraped content.

      Step 5 — Tweeting the Scraped Content

      In this step, you'll extend the bot to scrape content from the two websites and tweet it via your Twitter account. More precisely, you want it to tweet content from the two websites alternately, and at regular intervals of ten minutes, for an indefinite period of time. Thus, you will use an infinite while loop to implement the desired functionality. You'll do this as part of a main() function, which will implement the core high-level process that you'll want your bot to follow:

      bird/bot.py

      ...
      def scrape_thenewstack():
          ...
          yield '"%s" %s' % (text, link)
      
      
      def main():
          """Encompasses the main loop of the bot."""
          print('---Bot started---n')
          news_funcs = ['scrape_coursera', 'scrape_thenewstack']
          news_iterators = []  
          for func in news_funcs:
              news_iterators.append(globals()[func]())
          while True:
              for i, iterator in enumerate(news_iterators):
                  try:
                      tweet = next(iterator)
                      t.statuses.update(status=tweet)
                      print(tweet, end='nn')
                      time.sleep(600)  
                  except StopIteration:
                      news_iterators[i] = globals()[newsfuncs[i]]()
      

      You first create a list of the names of the scraping functions you defined earlier, and call it as news_funcs. Then you create an empty list that will hold the actual scraper functions, and name that list as news_iterators. You then populate it by going through each name in the news_funcs list and appending the corresponding iterator in the news_iterators list. You're using Python's built-in globals() function. This returns a dictionary that maps variable names to actual variables within your script. An iterator is what you get when you call a scraper function: for example, if you write coursera_iterator = scrape_coursera(), then coursera_iterator will be an iterator on which you can invoke next() calls. Each next() call will return a string containing a quote and its corresponding link, exactly as defined in the scrape_coursera() function's yield statement. Each next() call goes through one iteration of the for loop in the scrape_coursera() function. Thus, you can only make as many next() calls as there are blog links in the scrape_coursera() function. Once that number exceeds, a StopIteration exception will be raised.

      Once both the iterators populate the news_iterators list, the main while loop starts. Within it, you have a for loop that goes through each iterator and tries to obtain the content to be tweeted. After obtaining the content, your bot tweets it and then sleeps for ten minutes. If the iterator has no more content to offer, a StopIteration exception is raised, upon which you refresh that iterator by re-instantiating it, to check for the availability of newer content on the source website. Then you move on to the next iterator, if available. Otherwise, if execution reaches the end of the iterators list, you restart from the beginning and tweet the next available content. This makes your bot tweet content alternately from the two scrapers for as long as you want.

      All that remains now is to make a call to the main() function. You do this when the script is called directly by the Python interpreter:

      bird/bot.py

      ...
      def main():
          print('---Bot started---n')<^>
          news_funcs = ['scrape_coursera', 'scrape_thenewstack']
          ...
      
      if __name__ == "__main__":  
          main()
      

      The following is a completed version of the bot.py script. You can also view the script on this GitHub repository.

      bird/bot.py

      
      """Main bot script - bot.py
      For the DigitalOcean Tutorial.
      """
      
      
      import random
      import time
      
      
      from lxml.html import fromstring
      import nltk  
      nltk.download('punkt')
      import requests  
      
      from twitter import OAuth, Twitter
      
      
      import credentials
      
      tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
      
      oauth = OAuth(
              credentials.ACCESS_TOKEN,
              credentials.ACCESS_SECRET,
              credentials.CONSUMER_KEY,
              credentials.CONSUMER_SECRET
          )
      t = Twitter(auth=oauth)
      
      HEADERS = {
              'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5)'
                            ' AppleWebKit/537.36 (KHTML, like Gecko) Cafari/537.36'
              }
      
      
      def extract_paratext(paras):
          """Extracts text from <p> elements and returns a clean, tokenized random
          paragraph."""
      
          paras = [para.text_content() for para in paras if para.text_content()]
          para = random.choice(paras)
          return tokenizer.tokenize(para)
      
      
      def extract_text(para):
          """Returns a sufficiently-large random text from a tokenized paragraph,
          if such text exists. Otherwise, returns None."""
      
          for _ in range(10):
              text = random.choice(para)
              if text and 60 < len(text) < 210:
                  return text
      
          return None
      
      
      def scrape_coursera():
          """Scrapes content from the Coursera blog."""
          url = 'https://blog.coursera.org'
          r = requests.get(url, headers=HEADERS)
          tree = fromstring(r.content)
          links = tree.xpath('//div[@class="recent"]//div[@class="title"]/a/@href')
      
          for link in links:
              r = requests.get(link, headers=HEADERS)
              blog_tree = fromstring(r.content)
              paras = blog_tree.xpath('//div[@class="entry-content"]/p')
              para = extract_paratext(paras)  
              text = extract_text(para)  
              if not text:
                  continue
      
              yield '"%s" %s' % (text, link)  
      
      
      def scrape_thenewstack():
          """Scrapes news from thenewstack.io"""
      
          r = requests.get('https://thenewstack.io', verify=False)
      
          tree = fromstring(r.content)
          links = tree.xpath('//div[@class="normalstory-box"]/header/h2/a/@href')
      
          for link in links:
              r = requests.get(link, verify=False)
              tree = fromstring(r.content)
              paras = tree.xpath('//div[@class="post-content"]/p')
              para = extract_paratext(paras)
              text = extract_text(para)  
              if not text:
                  continue
      
              yield '"%s" %s' % (text, link)
      
      
      def main():
          """Encompasses the main loop of the bot."""
          print('Bot started.')
          news_funcs = ['scrape_coursera', 'scrape_thenewstack']
          news_iterators = []  
          for func in news_funcs:
              news_iterators.append(globals()[func]())
          while True:
              for i, iterator in enumerate(news_iterators):
                  try:
                      tweet = next(iterator)
                      t.statuses.update(status=tweet)
                      print(tweet, end='n')
                      time.sleep(600)
                  except StopIteration:
                      news_iterators[i] = globals()[newsfuncs[i]]()
      
      
      if __name__ == "__main__":  
          main()
      
      

      Save and exit bot.py.

      The following is a sample execution of bot.py:

      You will receive output showing the content that your bot has scraped, in a similar format to the following:

      Output

      [nltk_data] Downloading package punkt to /Users/binaryboy/nltk_data... [nltk_data] Package punkt is already up-to-date! ---Bot started--- "Take the first step toward your career goals by building new skills." https://blog.coursera.org/career-stories-from-inside-coursera/ "Other speakers include Priyanka Sharma, director of cloud native alliances at GitLab and Dan Kohn, executive director of the Cloud Native Computing Foundation." https://thenewstack.io/cloud-native-live-twistlocks-virtual-conference/ "You can learn how to use the power of Python for data analysis with a series of courses covering fundamental theory and project-based learning." https://blog.coursera.org/unlock-the-power-of-data-with-python-university-of-michigan-offers-new-programming-specializations-on-coursera/ "“Real-user monitoring is really about trying to understand the underlying reasons, so you know, ‘who do I actually want to fly with?" https://thenewstack.io/how-raygun-co-founder-and-ceo-spun-gold-out-of-monitoring-agony/

      After a sample run of your bot, you'll see a full timeline of programmatic tweets posted by your bot on your Twitter page. It will look something like the following:

      Programmatic Tweets posted

      As you can see, the bot is tweeting the scraped blog links with random quotes from each blog as highlights. This feed is now an information feed with tweets alternating between blog quotes from Coursera and thenewstack.io. You've built a bot that aggregates content from the web and posts it on Twitter. You can now broaden the scope of this bot as per your wish by adding more scrapers for different websites, and the bot will tweet content coming from all the scrapers in a round-robin fashion, and in your desired time intervals.

      Conclusion

      In this tutorial you built a basic Twitter bot with Python and scraped some content from the web for your bot to tweet. There are many bot ideas to try; you could also implement your own ideas for a bot's utility. You can combine the versatile functionalities offered by Twitter's API and create something more complex. For a version of a more sophisticated Twitter bot, check out chirps, a Twitter bot framework that uses some advanced concepts like multithreading to make the bot do multiple things simultaneously. There are also some fun-idea bots, like misheardly. There are no limits on the creativity one can use while building Twitter bots. Finding the right API endpoints to hit for your bot's implementation is essential.

      Finally, bot etiquette or ("botiquette") is important to keep in mind when building your next bot. For example, if your bot incorporates retweeting, make all tweets' text pass through a filter to detect abusive language before retweeting them. You can implement such features using regular expressions and natural language processing. Also, while looking for sources to scrape, follow your judgment and avoid ones that spread misinformation. To read more about botiquette, you can visit this blog post by Joe Mayo on the topic.





      Source link

      How To Apply Computer Vision to Build an Emotion-Based Dog Filter in Python 3


      The author selected Girls Who Code to receive a donation as part of the Write for DOnations program.

      Introduction

      Computer vision is a subfield of computer science that aims to extract a higher-order understanding from images and videos. This field includes tasks such as object detection, image restoration (matrix completion), and optical flow. Computer vision powers technologies such as self-driving car prototypes, employee-less grocery stores, fun Snapchat filters, and your mobile device’s face authenticator.

      In this tutorial, you will explore computer vision as you use pre-trained models to build a Snapchat-esque dog filter. For those unfamiliar with Snapchat, this filter will detect your face and then superimpose a dog mask on it. You will then train a face-emotion classifier so that the filter can pick dog masks based on emotion, such as a corgi for happy or a pug for sad. Along the way, you will also explore related concepts in both ordinary least squares and computer vision, which will expose you to the fundamentals of machine learning.

      A working dog filter

      As you work through the tutorial, you’ll use OpenCV, a computer-vision library, numpy for linear algebra utilities, and matplotlib for plotting. You’ll also apply the following concepts as you build a computer-vision application:

      • Ordinary least squares as a regression and classification technique.
      • The basics of stochastic gradient neural networks.

      While not necessary to complete this tutorial, you’ll find it easier to understand some of the more detailed explanations if you’re familiar with these mathematical concepts:

      • Fundamental linear algebra concepts: scalars, vectors, and matrices.
      • Fundamental calculus: how to take a derivative.

      You can find the complete code for this tutorial at https://github.com/do-community/emotion-based-dog-filter.

      Let’s get started.

      Prerequisites

      To complete this tutorial, you will need the following:

      Step 1 — Creating The Project and Installing Dependencies

      Let’s create a workspace for this project and install the dependencies we’ll need. We’ll call our workspace DogFilter:

      Navigate to the DogFilter directory:

      Then create a new Python virtual environment for the project:

      • python3 -m venv dogfilter

      Activate your environment.

      • source dogfilter/bin/activate

      The prompt changes, indicating the environment is active. Now install PyTorch, a deep-learning framework for Python that we'll use in this tutorial. The installation process depends on which operating system you're using.

      On macOS, install Pytorch with the following command:

      • python -m pip install torch==0.4.1 torchvision==0.2.1

      On Linux, use the following commands:

      • pip install http://download.pytorch.org/whl/cpu/torch-0.4.1-cp35-cp35m-linux_x86_64.whl
      • pip install torchvision

      And for Windows, install Pytorch with these commands:

      • pip install http://download.pytorch.org/whl/cpu/torch-0.4.1-cp35-cp35m-win_amd64.whl
      • pip install torchvision

      Now install prepackaged binaries for OpenCV and numpy, which are computer vision and linear algebra libraries, respectively. The former offers utilities such as image rotations, and the latter offers linear algebra utilities such as a matrix inversion.

      • python -m pip install opencv-python==3.4.3.18 numpy==1.14.5

      Finally, create a directory for our assets, which will hold the images we'll use in this tutorial:

      With the dependencies installed, let's build the first version of our filter: a face detector.

      Step 2 — Building a Face Detector

      Our first objective is to detect all faces in an image. We'll create a script that accepts a single image and outputs an annotated image with the faces outlined with boxes.

      Fortunately, instead of writing our own face detection logic, we can use pre-trained models. We'll set up a model and then load pre-trained parameters. OpenCV makes this easy by providing both.

      OpenCV provides the model parameters in its source code. but we need the absolute path to our locally-installed OpenCV to use these parameters. Since that absolute path may vary, we'll download our own copy instead and place it in the assets folder:

      • wget -O assets/haarcascade_frontalface_default.xml https://github.com/opencv/opencv/raw/master/data/haarcascades/haarcascade_frontalface_default.xml

      The -O option specifies the destination as assets/haarcascade_frontalface_default.xml. The second argument is the source URL.

      We'll detect all faces in the following image from Pexels (CC0, link to original image).

      Picture of children

      First, download the image. The following command saves the downloaded image as children.png in the assets folder:

      • wget -O assets/children.png https://www.xpresservers.com/wp-content/uploads/2019/04/How-To-Apply-Computer-Vision-to-Build-an-Emotion-Based-Dog-Filter-in-Python-3.png

      To check that the detection algorithm works, we will run it on an individual image and save the resulting annotated image to disk. Create an outputs folder for these annotated results.

      Now create a Python script for the face detector. Create the file step_1_face_detect using nano or your favorite text editor:

      • nano step_2_face_detect.py

      Add the following code to the file. This code imports OpenCV, which contains the image utilities and face classifier. The rest of the code is typical Python program boilerplate.

      step_2_face_detect.py

      """Test for face detection"""
      
      import cv2
      
      
      def main():
          pass
      
      if __name__ == '__main__':
          main()
      

      Now replace pass in the main function with this code which initializes a face classifier using the OpenCV parameters you downloaded to your assets folder:

      step_2_face_detect.py

      def main():
          # initialize front face classifier
          cascade = cv2.CascadeClassifier("assets/haarcascade_frontalface_default.xml")
      

      Next, add this line to load the image children.png.

      step_2_face_detect.py

          frame = cv2.imread('assets/children.png')
      

      Then add this code to convert the image to black and white, as the classifier was trained on black-and-white images. To accomplish this, we convert to grayscale and then discretize the histogram:

      step_2_face_detect.py

          # Convert to black-and-white
          gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
          blackwhite = cv2.equalizeHist(gray)
      

      Then use OpenCV's detectMultiScale function to detect all faces in the image.

      step_2_face_detect.py

          rects = cascade.detectMultiScale(
              blackwhite, scaleFactor=1.3, minNeighbors=4, minSize=(30, 30),
              flags=cv2.CASCADE_SCALE_IMAGE)
      
      • scaleFactor specifies how much the image is reduced along each dimension.
      • minNeighbors denotes how many neighboring rectangles a candidate rectangle needs to be retained.
      • minSize is the minimum allowable detected object size. Objects smaller than this are discarded.

      The return type is a list of tuples, where each tuple has four numbers denoting the minimum x, minimum y, width, and height of the rectangle in that order.

      Iterate over all detected objects and draw them on the image in green using cv2.rectangle:

      step_2_face_detect.py

          for x, y, w, h in rects:
              cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
      
      • The second and third arguments are opposing corners of the rectangle.
      • The fourth argument is the color to use. (0, 255, 0) corresponds to green for our RGB color space.
      • The last argument denotes the width of our line.

      Finally, write the image with bounding boxes into a new file at outputs/children_detected.png:

      step_2_face_detect.py

          cv2.imwrite('outputs/children_detected.png', frame)
      

      Your completed script should look like this:

      step_2_face_detect.py

      """Tests face detection for a static image."""  
      
      import cv2  
      
      
      def main():  
      
          # initialize front face classifier  
          cascade = cv2.CascadeClassifier(  
              "assets/haarcascade_frontalface_default.xml")  
      
          frame = cv2.imread('assets/children.png')  
      
          # Convert to black-and-white  
          gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)  
          blackwhite = cv2.equalizeHist(gray)  
      
          rects = cascade.detectMultiScale(  
              blackwhite, scaleFactor=1.3, minNeighbors=4, minSize=(30, 30),  
          flags=cv2.CASCADE_SCALE_IMAGE)  
      
          for x, y, w, h in rects:  
              cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)  
      
          cv2.imwrite('outputs/children_detected.png', frame)  
      
      if __name__ == '__main__':  
          main()
      

      Save the file and exit your editor. Then run the script:

      • python step_2_face_detect.py

      Open outputs/children_detected.png. You'll see the following image that shows the faces outlined with boxes:

      Picture of children with bounding boxes

      At this point, you have a working face detector. It accepts an image as input and draws bounding boxes around all faces in the image, outputting the annotated image. Now let's apply this same detection to a live camera feed.

      Step 3 — Linking the Camera Feed

      The next objective is to link the computer's camera to the face detector. Instead of detecting faces in a static image, you'll detect all faces from your computer's camera. You will collect camera input, detect and annotate all faces, and then display the annotated image back to the user. You'll continue from the script in Step 2, so start by duplicating that script:

      • cp step_2_face_detect.py step_3_camera_face_detect.py

      Then open the new script in your editor:

      • nano step_3_camera_face_detect.py

      You will update the main function by using some elements from this test script from the official OpenCV documentation. Start by initializing a VideoCapture object that is set to capture live feed from your computer's camera. Place this at the start of the main function, before the other code in the function:

      step_3_camera_face_detect.py

      def main():
          cap = cv2.VideoCapture(0)
          ...
      

      Starting from the line defining frame, indent all of your existing code, placing all of the code in a while loop.

      step_3_camera_face_detect.py

          while True:
              frame = cv2.imread('assets/children.png')
              ...
              for x, y, w, h in rects:  
                  cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)  
      
              cv2.imwrite('outputs/children_detected.png', frame)
      

      Replace the line defining frame at the start of the while loop. Instead of reading from an image on disk, you're now reading from the camera:

      step_3_camera_face_detect.py

          while True:
              # frame = cv2.imread('assets/children.png') # DELETE ME
              # Capture frame-by-frame
              ret, frame = cap.read()
      

      Replace the line cv2.imwrite(...) at the end of the while loop. Instead of writing an image to disk, you'll display the annotated image back to the user's screen:

      step_3_camera_face_detect.py

            cv2.imwrite('outputs/children_detected.png', frame)  # DELETE ME
            # Display the resulting frame
            cv2.imshow('frame', frame)
      

      Also, add some code to watch for keyboard input so you can stop the program. Check if the user hits the q character and, if so, quit the application. Right after cv2.imshow(...) add the following:

      step_3_camera_face_detect.py

      ...
              cv2.imshow('frame', frame)
              if cv2.waitKey(1) & 0xFF == ord('q'):
                  break
      ...
      

      The line cv2.waitkey(1) halts the program for 1 millisecond so that the captured image can be displayed back to the user.

      Finally, release the capture and close all windows. Place this outside of the while loop to end the main function.

      step_3_camera_face_detect.py

      ...
      
          while True:
          ...
      
      
          cap.release()
          cv2.destroyAllWindows()
      

      Your script should look like the following:

      step_3_camera_face_detect.py

      """Test for face detection on video camera.
      
      Move your face around and a green box will identify your face.
      With the test frame in focus, hit `q` to exit.
      Note that typing `q` into your terminal will do nothing.
      """
      
      import cv2
      
      
      def main():
          cap = cv2.VideoCapture(0)
      
          # initialize front face classifier
          cascade = cv2.CascadeClassifier(
              "assets/haarcascade_frontalface_default.xml")
      
          while True:
              # Capture frame-by-frame
              ret, frame = cap.read()
      
              # Convert to black-and-white
              gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
              blackwhite = cv2.equalizeHist(gray)
      
              # Detect faces
              rects = cascade.detectMultiScale(
                  blackwhite, scaleFactor=1.3, minNeighbors=4, minSize=(30, 30),
                  flags=cv2.CASCADE_SCALE_IMAGE)
      
              # Add all bounding boxes to the image
              for x, y, w, h in rects:
                  cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
      
              # Display the resulting frame
              cv2.imshow('frame', frame)
              if cv2.waitKey(1) & 0xFF == ord('q'):
                  break
      
          # When everything done, release the capture
          cap.release()
          cv2.destroyAllWindows()
      
      
      if __name__ == '__main__':
          main()
      

      Save the file and exit your editor.

      Now run the test script.

      • python step_3_camera_face_detect.py

      This activates your camera and opens a window displaying your camera's feed. Your face will be boxed by a green square in real time:

      Working face detector

      Note: If you find that you have to hold very still for things to work, the lighting in the room may not be adequate. Try moving to a brightly lit room where you and your background have high constrast. Also, avoid bright lights near your head. For example, if you have your back to the sun, this process might not work very well.

      Our next objective is to take the detected faces and superimpose dog masks on each one.

      Step 4 — Building the Dog Filter

      Before we build the filter itself, let's explore how images are represented numerically. This will give you the background needed to modify images and ultimately apply a dog filter.

      Let's look at an example. We can construct a black-and-white image using numbers, where 0 corresponds to black and 1 corresponds to white.

      Focus on the dividing line between 1s and 0s. What shape do you see?

      0 0 0 0 0 0 0 0 0
      0 0 0 0 1 0 0 0 0
      0 0 0 1 1 1 0 0 0
      0 0 1 1 1 1 1 0 0
      0 0 0 1 1 1 0 0 0
      0 0 0 0 1 0 0 0 0
      0 0 0 0 0 0 0 0 0
      

      The image is a diamond. If save this matrix of values as an image. This gives us the following picture:

      Diamond as picture

      We can use any value between 0 and 1, such as 0.1, 0.26, or 0.74391. Numbers closer to 0 are darker and numbers closer to 1 are lighter. This allows us to represent white, black, and any shade of gray. This is great news for us because we can now construct any grayscale image using 0, 1, and any value in between. Consider the following, for example. Can you tell what it is? Again, each number corresponds to the color of a pixel.

      1  1  1  1  1  1  1  1  1  1  1  1
      1  1  1  1  0  0  0  0  1  1  1  1
      1  1  0  0 .4 .4 .4 .4  0  0  1  1
      1  0 .4 .4 .5 .4 .4 .4 .4 .4  0  1
      1  0 .4 .5 .5 .5 .4 .4 .4 .4  0  1
      0 .4 .4 .4 .5 .4 .4 .4 .4 .4 .4  0
      0 .4 .4 .4 .4  0  0 .4 .4 .4 .4  0
      0  0 .4 .4  0  1 .7  0 .4 .4  0  0
      0  1  0  0  0 .7 .7  0  0  0  1  0
      1  0  1  1  1  0  0 .7 .7 .4  0  1
      1  0 .7  1  1  1 .7 .7 .7 .7  0  1
      1  1  0  0 .7 .7 .7 .7  0  0  1  1
      1  1  1  1  0  0  0  0  1  1  1  1
      1  1  1  1  1  1  1  1  1  1  1  1
      

      Re-rendered as an image, you can now tell that this is, in fact, a Poké Ball:

      Pokeball as picture

      You've now seen how black-and-white and grayscale images are represented numerically. To introduce color, we need a way to encode more information. An image has its height and width expressed as h x w.

      Image

      In the current grayscale representation, each pixel is one value between 0 and 1. We can equivalently say our image has dimensions h x w x 1. In other words, every (x, y) position in our image has just one value.

      Grayscale image

      For a color representation, we represent the color of each pixel using three values between 0 and 1. One number corresponds to the "degree of red," one to the "degree of green," and the last to the "degree of blue." We call this the RGB color space. This means that for every (x, y) position in our image, we have three values (r, g, b). As a result, our image is now h x w x 3:

      Color image

      Here, each number ranges from 0 to 255 instead of 0 to 1, but the idea is the same. Different combinations of numbers correspond to different colors, such as dark purple (102, 0, 204) or bright orange (255, 153, 51). The takeaways are as follows:

      1. Each image will be represented as a box of numbers that has three dimensions: height, width, and color channels. Manipulating this box of numbers directly is equivalent to manipulating the image.
      2. We can also flatten this box to become just a list of numbers. In this way, our image becomes a vector. Later on, we will refer to images as vectors.

      Now that you understand how images are represented numerically, you are well-equipped to begin applying dog masks to faces. To apply a dog mask, you will replace values in the child image with non-white dog mask pixels. To start, you will work with a single image. Download this crop of a face from the image you used in Step 2.

      • wget -O assets/child.png https://www.xpresservers.com/wp-content/uploads/2019/04/1554419826_451_How-To-Apply-Computer-Vision-to-Build-an-Emotion-Based-Dog-Filter-in-Python-3.png

      Cropped face

      Additionally, download the following dog mask. The dog masks used in this tutorial are my own drawings, now released to the public domain under a CC0 License.

      Dog mask

      Download this with wget:

      • wget -O assets/dog.png https://www.xpresservers.com/wp-content/uploads/2019/04/1554419826_685_How-To-Apply-Computer-Vision-to-Build-an-Emotion-Based-Dog-Filter-in-Python-3.png

      Create a new file called step_4_dog_mask_simple.py which will hold the code for the script that applies the dog mask to faces:

      • nano step_4_dog_mask_simple.py

      Add the following boilerplate for the Python script and import the OpenCV and numpy libraries:

      step_4_dog_mask_simple.py

      """Test for adding dog mask"""
      
      import cv2
      import numpy as np
      
      
      def main():
          pass
      
      if __name__ == '__main__':
          main()
      

      Replace pass in the main function with these two lines which load the original image and the dog mask into memory.

      step_4_dog_mask_simple.py

      ...
      def main():
          face = cv2.imread('assets/child.png')
          mask = cv2.imread('assets/dog.png')
      

      Next, fit the dog mask to the child. The logic is more complicated than what we've done previously, so we will create a new function called apply_mask to modularize our code. Directly after the two lines that load the images, add this line which invokes the apply_mask function:

      step_4_dog_mask_simple.py

      ...
          face_with_mask = apply_mask(face, mask)
      

      Create a new function called apply_mask and place it above the main function:

      step_4_dog_mask_simple.py

      ...
      def apply_mask(face: np.array, mask: np.array) -> np.array:
          """Add the mask to the provided face, and return the face with mask."""
          pass
      
      def main():
      ...
      

      At this point, your file should look like this:

      step_4_dog_mask_simple.py

      """Test for adding dog mask"""
      
      import cv2
      import numpy as np
      
      
      def apply_mask(face: np.array, mask: np.array) -> np.array:
          """Add the mask to the provided face, and return the face with mask."""
          pass
      
      
      def main():
          face = cv2.imread('assets/child.png')
          mask = cv2.imread('assets/dog.png')
          face_with_mask = apply_mask(face, mask)
      
      if __name__ == '__main__':
          main()
      

      Let's build out the apply_mask function. Our goal is to apply the mask to the child's face. However, we need to maintain the aspect ratio for our dog mask. To do so, we need to explicitly compute our dog mask's final dimensions. Inside the apply_mask function, replace pass with these two lines which extract the height and width of both images:

      step_4_dog_mask_simple.py

      ...
          mask_h, mask_w, _ = mask.shape
          face_h, face_w, _ = face.shape
      

      Next, determine which dimension needs to be "shrunk more." To be precise, we need the tighter of the two constraints. Add this line to the apply_mask function:

      step_4_dog_mask_simple.py

      ...
      
          # Resize the mask to fit on face
          factor = min(face_h / mask_h, face_w / mask_w)
      

      Then compute the new shape by adding this code to the function:

      step_4_dog_mask_simple.py

      ...
          new_mask_w = int(factor * mask_w)
          new_mask_h = int(factor * mask_h)
          new_mask_shape = (new_mask_w, new_mask_h)
      

      Here we cast the numbers to integers, as the resize function needs integral dimensions.

      Now add this code to resize the dog mask to the new shape:

      step_4_dog_mask_simple.py

      ...
      
          # Add mask to face - ensure mask is centered
          resized_mask = cv2.resize(mask, new_mask_shape)
      

      Finally, write the image to disk so you can double-check that your resized dog mask is correct after you run the script:

      step_4_dog_mask_simple.py

          cv2.imwrite('outputs/resized_dog.png', resized_mask)
      

      The completed script should look like this:

      step_4_dog_mask_simple.py

      """Test for adding dog mask"""
      import cv2
      import numpy as np
      
      def apply_mask(face: np.array, mask: np.array) -> np.array:
          """Add the mask to the provided face, and return the face with mask."""
          mask_h, mask_w, _ = mask.shape
          face_h, face_w, _ = face.shape
      
          # Resize the mask to fit on face
          factor = min(face_h / mask_h, face_w / mask_w)
          new_mask_w = int(factor * mask_w)
          new_mask_h = int(factor * mask_h)
          new_mask_shape = (new_mask_w, new_mask_h)
      
          # Add mask to face - ensure mask is centered
          resized_mask = cv2.resize(mask, new_mask_shape)
          cv2.imwrite('outputs/resized_dog.png', resized_mask)
      
      
      def main():
          face = cv2.imread('assets/child.png')
          mask = cv2.imread('assets/dog.png')
          face_with_mask = apply_mask(face, mask)
      
      if __name__ == '__main__':
          main()
      
      

      Save the file and exit your editor. Run the new script:

      • python step_4_dog_mask_simple.py

      Open the image at outputs/resized_dog.png to double-check the mask was resized correctly. It will match the dog mask shown earlier in this section.

      Now add the dog mask to the child. Open the step_4_dog_mask_simple.py file again and return to the apply_mask function:

      • nano step_4_dog_mask_simple.py

      First, remove the line of code that writes the resized mask from the apply_mask function since you no longer need it:

          cv2.imwrite('outputs/resized_dog.png', resized_mask)  # delete this line
          ...
      

      In its place, apply your knowledge of image representation from the start of this section to modify the image. Start by making a copy of the child image. Add this line to the apply_mask function:

      step_4_dog_mask_simple.py

      ...
          face_with_mask = face.copy()
      

      Next, find all positions where the dog mask is not white or near white. To do this, check if the pixel value is less than 250 across all color channels, as we'd expect a near-white pixel to be near [255, 255, 255]. Add this code:

      step_4_dog_mask_simple.py

      ...
          non_white_pixels = (resized_mask < 250).all(axis=2)
      

      At this point, the dog image is, at most, as large as the child image. We want to center the dog image on the face, so compute the offset needed to center the dog image by adding this code to apply_mask:

      step_4_dog_mask_simple.py

      ...
          off_h = int((face_h - new_mask_h) / 2)  
          off_w = int((face_w - new_mask_w) / 2)
      

      Copy all non-white pixels from the dog image into the child image. Since the child image may be larger than the dog image, we need to take a subset of the child image:

      step_4_dog_mask_simple.py

          face_with_mask[off_h: off_h+new_mask_h, off_w: off_w+new_mask_w][non_white_pixels] = 
                  resized_mask[non_white_pixels]
      

      Then return the result:

      step_4_dog_mask_simple.py

          return face_with_mask
      

      In the main function, add this code to write the result of the apply_mask function to an output image so you can manually double-check the result:

      step_4_dog_mask_simple.py

      ...
          face_with_mask = apply_mask(face, mask)
          cv2.imwrite('outputs/child_with_dog_mask.png', face_with_mask)
      

      Your completed script will look like the following:

      step_4_dog_mask_simple.py

      """Test for adding dog mask"""
      
      import cv2
      import numpy as np
      
      
      def apply_mask(face: np.array, mask: np.array) -> np.array:
          """Add the mask to the provided face, and return the face with mask."""
          mask_h, mask_w, _ = mask.shape
          face_h, face_w, _ = face.shape
      
          # Resize the mask to fit on face
          factor = min(face_h / mask_h, face_w / mask_w)
          new_mask_w = int(factor * mask_w)
          new_mask_h = int(factor * mask_h)
          new_mask_shape = (new_mask_w, new_mask_h)
          resized_mask = cv2.resize(mask, new_mask_shape)
      
          # Add mask to face - ensure mask is centered
          face_with_mask = face.copy()
          non_white_pixels = (resized_mask < 250).all(axis=2)
          off_h = int((face_h - new_mask_h) / 2)  
          off_w = int((face_w - new_mask_w) / 2)
          face_with_mask[off_h: off_h+new_mask_h, off_w: off_w+new_mask_w][non_white_pixels] = 
               resized_mask[non_white_pixels]
      
          return face_with_mask
      
      def main():
          face = cv2.imread('assets/child.png')
          mask = cv2.imread('assets/dog.png')
          face_with_mask = apply_mask(face, mask)
          cv2.imwrite('outputs/child_with_dog_mask.png', face_with_mask)
      
      if __name__ == '__main__':
          main()
      

      Save the script and run it:

      • python step_4_dog_mask_simple.py

      You'll have the following picture of a child with a dog mask in outputs/child_with_dog_mask.png:

      Picture of child with dog mask on

      You now have a utility that applies dog masks to faces. Now let's use what you've built to add the dog mask in real time.

      We'll pick up from where we left off in Step 3. Copy step_3_camera_face_detect.py to step_4_dog_mask.py.

      • cp step_3_camera_face_detect.py step_4_dog_mask.py

      Open your new script.

      First, import the NumPy library at the top of the script:

      step_4_dog_mask.py

      import numpy as np
      ...
      

      Then add the apply_mask function from your previous work into this new file above the main function:

      step_4_dog_mask.py

      def apply_mask(face: np.array, mask: np.array) -> np.array:
          """Add the mask to the provided face, and return the face with mask."""
          mask_h, mask_w, _ = mask.shape
          face_h, face_w, _ = face.shape
      
          # Resize the mask to fit on face
          factor = min(face_h / mask_h, face_w / mask_w)
          new_mask_w = int(factor * mask_w)
          new_mask_h = int(factor * mask_h)
          new_mask_shape = (new_mask_w, new_mask_h)
          resized_mask = cv2.resize(mask, new_mask_shape)
      
          # Add mask to face - ensure mask is centered
          face_with_mask = face.copy()
          non_white_pixels = (resized_mask < 250).all(axis=2)
          off_h = int((face_h - new_mask_h) / 2)  
          off_w = int((face_w - new_mask_w) / 2)
          face_with_mask[off_h: off_h+new_mask_h, off_w: off_w+new_mask_w][non_white_pixels] = 
               resized_mask[non_white_pixels]
      
          return face_with_mask
      ...
      

      Second, locate this line in the main function:

      step_4_dog_mask.py

          cap = cv2.VideoCapture(0)
      

      Add this code after that line to load the dog mask:

      step_4_dog_mask.py

          cap = cv2.VideoCapture(0)
      
          # load mask
          mask = cv2.imread('assets/dog.png')
          ...
      

      Next, in the while loop, locate this line:

      step_4_dog_mask.py

              ret, frame = cap.read()
      

      Add this line after it to extract the image's height and width:

      step_4_dog_mask.py

              ret, frame = cap.read()
              frame_h, frame_w, _ = frame.shape
              ...
      

      Next, delete the line in main that draws bounding boxes. You'll find this line in the for loop that iterates over detected faces:

      step_4_dog_mask.py

              for x, y, w, h in rects:
              ...
                  cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2) # DELETE ME
              ...
      

      In its place, add this code which crops the frame. For aesthetic purposes, we crop an area slightly larger than the face.

      step_4_dog_mask.py

              for x, y, w, h in rects:
                  # crop a frame slightly larger than the face
                  y0, y1 = int(y - 0.25*h), int(y + 0.75*h)
                  x0, x1 = x, x + w
      

      Introduce a check in case the detected face is too close to the edge.

      step_4_dog_mask.py

                  # give up if the cropped frame would be out-of-bounds
                  if x0 < 0 or y0 < 0 or x1 > frame_w or y1 > frame_h:
                      continue
      

      Finally, insert the face with a mask into the image.

      step_4_dog_mask.py

                  # apply mask
                  frame[y0: y1, x0: x1] = apply_mask(frame[y0: y1, x0: x1], mask)
      

      Verify that your script looks like this:

      step_4_dog_mask.py

      """Real-time dog filter
      
      Move your face around and a dog filter will be applied to your face if it is not out-of-bounds. With the test frame in focus, hit `q` to exit. Note that typing `q` into your terminal will do nothing.
      """
      
      import numpy as np
      import cv2
      
      
      def apply_mask(face: np.array, mask: np.array) -> np.array:
          """Add the mask to the provided face, and return the face with mask."""
          mask_h, mask_w, _ = mask.shape
          face_h, face_w, _ = face.shape
      
          # Resize the mask to fit on face
          factor = min(face_h / mask_h, face_w / mask_w)
          new_mask_w = int(factor * mask_w)
          new_mask_h = int(factor * mask_h)
          new_mask_shape = (new_mask_w, new_mask_h)
          resized_mask = cv2.resize(mask, new_mask_shape)
      
          # Add mask to face - ensure mask is centered
          face_with_mask = face.copy()
          non_white_pixels = (resized_mask < 250).all(axis=2)
          off_h = int((face_h - new_mask_h) / 2)
          off_w = int((face_w - new_mask_w) / 2)
          face_with_mask[off_h: off_h+new_mask_h, off_w: off_w+new_mask_w][non_white_pixels] = 
               resized_mask[non_white_pixels]
      
          return face_with_mask
      
      def main():
          cap = cv2.VideoCapture(0)
      
          # load mask
          mask = cv2.imread('assets/dog.png')
      
          # initialize front face classifier
          cascade = cv2.CascadeClassifier("assets/haarcascade_frontalface_default.xml")
      
          while(True):
              # Capture frame-by-frame
              ret, frame = cap.read()
              frame_h, frame_w, _ = frame.shape
      
              # Convert to black-and-white
              gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
              blackwhite = cv2.equalizeHist(gray)
      
              # Detect faces
              rects = cascade.detectMultiScale(
                  blackwhite, scaleFactor=1.3, minNeighbors=4, minSize=(30, 30),
                  flags=cv2.CASCADE_SCALE_IMAGE)
      
              # Add mask to faces
              for x, y, w, h in rects:
                  # crop a frame slightly larger than the face
                  y0, y1 = int(y - 0.25*h), int(y + 0.75*h)
                  x0, x1 = x, x + w
      
                  # give up if the cropped frame would be out-of-bounds
                  if x0 < 0 or y0 < 0 or x1 > frame_w or y1 > frame_h:
                      continue
      
                  # apply mask
                  frame[y0: y1, x0: x1] = apply_mask(frame[y0: y1, x0: x1], mask)
      
              # Display the resulting frame
              cv2.imshow('frame', frame)
              if cv2.waitKey(1) & 0xFF == ord('q'):
                  break
      
          # When everything done, release the capture
          cap.release()
          cv2.destroyAllWindows()
      
      
      if __name__ == '__main__':
          main()
      

      Save the file and exit your editor. Then run the script.

      • python step_4_dog_mask.py

      You now have a real-time dog filter running. The script will also work with multiple faces in the picture, so you can get your friends together for some automatic dog-ification.

      GIF for working dog filter

      This concludes our first primary objective in this tutorial, which is to create a Snapchat-esque dog filter. Now let's use facial expression to determine the dog mask applied to a face.

      Step 5 — Build a Basic Face Emotion Classifier using Least Squares

      In this section you'll create an emotion classifier to apply different masks based on displayed emotions. If you smile, the filter will apply a corgi mask. If you frown, it will apply a pug mask. Along the way, you'll explore the least-squares framework, which is fundamental to understanding and discussing machine learning concepts.

      To understand how to process our data and produce predictions, we'll first briefly explore machine learning models.

      We need to ask two questions for each model that we consider. For now, these two questions will be sufficient to differentiate between models:

      1. Input: What information is the model given?
      2. Output: What is the model trying to predict?

      At a high-level, the goal is to develop a model for emotion classification. The model is:

      1. Input: given images of faces.
      2. Output: predicts the corresponding emotion.
      model: face -> emotion
      

      The approach we'll use is least squares; we take a set of points, and we find a line of best fit. The line of best fit, shown in the following image, is our model.

      Least Squares

      Consider the input and output for our line:

      1. Input: given x coordinates.
      2. Output: predicts the corresponding $y$ coordinate.
      least squares line: x -> y
      

      Our input x must represent faces and our output y must represent emotion, in order for us to use least squares for emotion classification:

      • x -> face: Instead of using one number for x, we will use a vector of values for x. Thus, x can represent images of faces. The article Ordinary Least Squares explains why you can use a vector of values for x.
      • y -> emotion: Each emotion will correspond to a number. For example, "angry" is 0, "sad" is 1, and "happy" is 2. In this way, y can represent emotions. However, our line is not constrained to output the y values 0, 1, and 2. It has an infinite number of possible y values–it could be 1.2, 3.5, or 10003.42. How do we translate those y values to integers corresponding to classes? See the article One-Hot Encoding for more detail and explanation.

      Armed with this background knowledge, you will build a simple least-squares classifier using vectorized images and one-hot encoded labels. You'll accomplish this in three steps:

      1. Preprocess the data: As explained at the start of this section, our samples are vectors where each vector encodes an image of a face. Our labels are integers corresponding to an emotion, and we'll apply one-hot encoding to these labels.
      2. Specify and train the model: Use the closed-form least squares solution, w^*.
      3. Run a prediction using the model: Take the argmax of Xw^* to obtain predicted emotions.

      Let's get started.

      First, set up a directory to contain the data:

      Then download the data, curated by Pierre-Luc Carrier and Aaron Courville, from a 2013 Face Emotion Classification competition on Kaggle.

      • wget -O data/fer2013.tar https://bitbucket.org/alvinwan/adversarial-examples-in-computer-vision-building-then-fooling/raw/babfe4651f89a398c4b3fdbdd6d7a697c5104cff/fer2013.tar

      Navigate to the data directory and unpack the data.

      • cd data
      • tar -xzf fer2013.tar

      Now we'll create a script to run the least-squares model. Navigate to the root of your project:

      Create a new file for the script:

      Add Python boilerplate and import the packages you will need:

      step_5_ls_simple.py

      """Train emotion classifier using least squares."""
      
      import numpy as np
      
      def main():
          pass
      
      if __name__ == '__main__':
          main()
      

      Next, load the data into memory. Replace pass in your main function with the following code:

      step_5_ls_simple.py

      
          # load data
          with np.load('data/fer2013_train.npz') as data:
              X_train, Y_train = data['X'], data['Y']
      
          with np.load('data/fer2013_test.npz') as data:
              X_test, Y_test = data['X'], data['Y']
      

      Now one-hot encode the labels. To do this, construct the identity matrix with numpy and then index into this matrix using our list of labels:

      step_5_ls_simple.py

          # one-hot labels
          I = np.eye(6)
          Y_oh_train, Y_oh_test = I[Y_train], I[Y_test]
      

      Here, we use the fact that the i-th row in the identity matrix is all zero, except for the i-th entry. Thus, the i-th row is the one-hot encoding for the label of class i. Additionally, we use numpy's advanced indexing, where [a, b, c, d][[1, 3]] = [b, d].

      Computing (X^TX)^{-1} would take too long on commodity hardware, as X^TX is a 2304x2304 matrix with over four million values, so we'll reduce this time by selecting only the first 100 features. Add this code:

      step_5_ls_simple.py

      ...
          # select first 100 dimensions
          A_train, A_test = X_train[:, :100], X_test[:, :100]
      

      Next, add this code to evaluate the closed-form least-squares solution:

      step_5_ls_simple.py

      ...
          # train model
          w = np.linalg.inv(A_train.T.dot(A_train)).dot(A_train.T.dot(Y_oh_train))
      

      Then define an evaluation function for training and validation sets. Place this before your main function:

      step_5_ls_simple.py

      def evaluate(A, Y, w):
          Yhat = np.argmax(A.dot(w), axis=1)
          return np.sum(Yhat == Y) / Y.shape[0]
      

      To estimate labels, we take the inner product with each sample and get the indices of the maximum values using np.argmax. Then we compute the average number of correct classifications. This final number is your accuracy.

      Finally, add this code to the end of the main function to compute the training and validation accuracy using the evaluate function you just wrote:

      step_5_ls_simple.py

          # evaluate model
          ols_train_accuracy = evaluate(A_train, Y_train, w)
          print('(ols) Train Accuracy:', ols_train_accuracy)
          ols_test_accuracy = evaluate(A_test, Y_test, w)
          print('(ols) Test Accuracy:', ols_test_accuracy)
      

      Double-check that your script matches the following:

      step_5_ls_simple.py

      """Train emotion classifier using least squares."""
      
      import numpy as np
      
      
      def evaluate(A, Y, w):
          Yhat = np.argmax(A.dot(w), axis=1)
          return np.sum(Yhat == Y) / Y.shape[0]
      
      def main():
      
          # load data
          with np.load('data/fer2013_train.npz') as data:
              X_train, Y_train = data['X'], data['Y']
      
          with np.load('data/fer2013_test.npz') as data:
              X_test, Y_test = data['X'], data['Y']
      
          # one-hot labels
          I = np.eye(6)
          Y_oh_train, Y_oh_test = I[Y_train], I[Y_test]
      
          # select first 100 dimensions
          A_train, A_test = X_train[:, :100], X_test[:, :100]
      
          # train model
          w = np.linalg.inv(A_train.T.dot(A_train)).dot(A_train.T.dot(Y_oh_train))
      
          # evaluate model
          ols_train_accuracy = evaluate(A_train, Y_train, w)
          print('(ols) Train Accuracy:', ols_train_accuracy)
          ols_test_accuracy = evaluate(A_test, Y_test, w)
          print('(ols) Test Accuracy:', ols_test_accuracy)
      
      
      if __name__ == '__main__':
          main()
      

      Save your file, exit your editor, and run the Python script.

      • python step_5_ls_simple.py

      You'll see the following output:

      Output

      (ols) Train Accuracy: 0.4748918316507146 (ols) Test Accuracy: 0.45280545359202934

      Our model gives 47.5% train accuracy. We repeat this on the validation set to obtain 45.3% accuracy. For a three-way classification problem, 45.3% is reasonably above guessing, which is 33%​. This is our starting classifier for emotion detection, and in the next step, you'll build off of this least-squares model to improve accuracy. The higher the accuracy, the more reliably your emotion-based dog filter can find the appropriate dog filter for each detected emotion.

      Step 6 — Improving Accuracy by Featurizing the Inputs

      We can use a more expressive model to boost accuracy. To accomplish this, we featurize our inputs.

      The original image tells us that position (0, 0) is red, (1, 0) is brown, and so on. A featurized image may tell us that there is a dog to the top-left of the image, a person in the middle, etc. Featurization is powerful, but its precise definition is beyond the scope of this tutorial.

      We'll use an approximation for the radial basis function (RBF) kernel, using a random Gaussian matrix. We won't go into detail in this tutorial. Instead, we'll treat this as a black box that computes higher-order features for us.

      We'll continue where we left off in the previous step. Copy the previous script so you have a good starting point:

      • cp step_5_ls_simple.py step_6_ls_simple.py

      Open the new file in your editor:

      We'll start by creating the featurizing random matrix. Again, we'll use only 100 features in our new feature space.

      Locate the following line, defining A_train and A_test:

      step_6_ls_simple.py

          # select first 100 dimensions
          A_train, A_test = X_train[:, :100], X_test[:, :100]
      

      Directly above this definition for A_train and A_test, add a random feature matrix:

      step_6_ls_simple.py

          d = 100
          W = np.random.normal(size=(X_train.shape[1], d))
          # select first 100 dimensions
          A_train, A_test = X_train[:, :100], X_test[:, :100]  ...
      

      Then replace the definitions for A_train and A_test. We redefine our matrices, called design matrices, using this random featurization.

      step_6_ls_simple.py

          A_train, A_test = X_train.dot(W), X_test.dot(W)
      

      Save your file and run the script.

      • python step_6_ls_simple.py

      You'll see the following output:

      Output

      (ols) Train Accuracy: 0.584174642717 (ols) Test Accuracy: 0.584425799685

      This featurization now offers 58.4% train accuracy and 58.4% validation accuracy, a 13.1% improvement in validation results. We trimmed the X matrix to be 100 x 100, but the choice of 100 was arbirtary. We could also trim the X matrix to be 1000 x 1000 or 50 x 50. Say the dimension of x is d x d. We can test more values of d by re-trimming X to be d x d and recomputing a new model.

      Trying more values of d, we find an additional 4.3% improvement in test accuracy to 61.7%. In the following figure, we consider the performance of our new classifier as we vary d. Intuitively, as d increases, the accuracy should also increase, as we use more and more of our original data. Rather than paint a rosy picture, however, the graph exhibits a negative trend:

      Performance of featurized ordinary least squares

      As we keep more of our data, the gap between the training and validation accuracies increases as well. This is clear evidence of overfitting, where our model is learning representations that are no longer generalizable to all data. To combat overfitting, we'll regularize our model by penalizing complex models.

      We amend our ordinary least-squares objective function with a regularization term, giving us a new objective. Our new objective function is called ridge regression and it looks like this:

      min_w |Aw- y|^2 + lambda |w|^2
      

      In this equation, lambda is a tunable hyperparameter. Plug lambda = 0 into the equation and ridge regression becomes least-squares. Plug lambda = infinity into the equation, and you'll find the best w must now be zero, as any non-zero w incurs infinite loss. As it turns out, this objective yields a closed-form solution as well:

      w^* = (A^TA + lambda I)^{-1}A^Ty
      

      Still using the featurized samples, retrain and reevaluate the model once more.

      Open step_6_ls_simple.py again in your editor:

      This time, increase the dimensionality of the new feature space to d=1000​. Change the value of d from 100 to 1000 as shown in the following code block:

      step_6_ls_simple.py

      ...
          d = 1000
          W = np.random.normal(size=(X_train.shape[1], d))
      ...
      

      Then apply ridge regression using a regularization of lambda = 10^{10}. Replace the line defining w with the following two lines:

      step_6_ls_simple.py

      ...
          # train model
          I = np.eye(A_train.shape[1])
          w = np.linalg.inv(A_train.T.dot(A_train) + 1e10 * I).dot(A_train.T.dot(Y_oh_train))
      

      Then locate this block:

      step_6_ls_simple.py

      ...
        ols_train_accuracy = evaluate(A_train, Y_train, w)
        print('(ols) Train Accuracy:', ols_train_accuracy)
        ols_test_accuracy = evaluate(A_test, Y_test, w)
        print('(ols) Test Accuracy:', ols_test_accuracy)
      

      Replace it with the following:

      step_6_ls_simple.py

      ...
      
        print('(ridge) Train Accuracy:', evaluate(A_train, Y_train, w))
        print('(ridge) Test Accuracy:', evaluate(A_test, Y_test, w))
      

      The completed script should look like this:

      step_6_ls_simple.py

      """Train emotion classifier using least squares."""
      
      import numpy as np
      
      def evaluate(A, Y, w):
          Yhat = np.argmax(A.dot(w), axis=1)
          return np.sum(Yhat == Y) / Y.shape[0]
      
      def main():
          # load data
          with np.load('data/fer2013_train.npz') as data:
              X_train, Y_train = data['X'], data['Y']
      
          with np.load('data/fer2013_test.npz') as data:
              X_test, Y_test = data['X'], data['Y']
      
          # one-hot labels
          I = np.eye(6)
          Y_oh_train, Y_oh_test = I[Y_train], I[Y_test]
          d = 1000
          W = np.random.normal(size=(X_train.shape[1], d))
          # select first 100 dimensions
          A_train, A_test = X_train.dot(W), X_test.dot(W)
      
          # train model
          I = np.eye(A_train.shape[1])
          w = np.linalg.inv(A_train.T.dot(A_train) + 1e10 * I).dot(A_train.T.dot(Y_oh_train))
      
          # evaluate model
          print('(ridge) Train Accuracy:', evaluate(A_train, Y_train, w))
          print('(ridge) Test Accuracy:', evaluate(A_test, Y_test, w))
      
      if __name__ == '__main__':
          main()
      

      Save the file, exit your editor, and run the script:

      • python step_6_ls_simple.py

      You'll see the following output:

      Output

      (ridge) Train Accuracy: 0.651173462698 (ridge) Test Accuracy: 0.622181436812

      There's an additional improvement of 0.4% in validation accuracy to 62.2%, as train accuracy drops to 65.1%. Once again reevaluating across a number of different d, we see a smaller gap between training and validation accuracies for ridge regression. In other words, ridge regression was subject to less overfitting.

      Performance of featurized ols and ridge regression

      Baseline performance for least squares, with these extra enhancements, performs reasonably well. The training and inference times, all together, take no more than 20 seconds for even the best results. In the next section, you'll explore even more complex models.

      Step 7 — Building the Face-Emotion Classifier Using a Convolutional Neural Network in PyTorch

      In this section, you'll build a second emotion classifier using neural networks instead of least squares. Again, our goal is to produce a model that accepts faces as input and outputs an emotion. Eventually, this classifier will then determine which dog mask to apply.

      For a brief neural network visualization and introduction, see the article Understanding Neural Networks. Here, we will use a deep-learning library called PyTorch. There are a number of deep-learning libraries in widespread use, and each has various pros and cons. PyTorch is a particularly good place to start. To impliment this neural network classifier, we again take three steps, as we did with the least-squares classifier:

      1. Preprocess the data: Apply one-hot encoding and then apply PyTorch abstractions.
      2. Specify and train the model: Set up a neural network using PyTorch layers. Define optimization hyperparameters and run stochastic gradient descent.
      3. Run a prediction using the model: Evaluate the neural network.

      Create a new file, named step_7_fer_simple.py

      • nano step_7_fer_simple.py

      Import the necessary utilities and create a Python class that will hold your data. For data processing here, you will create the train and test datasets. To do these, implement PyTorch's Dataset interface, which lets you load and use PyTorch's built-in data pipeline for the face-emotion recognition dataset:

      step_7_fer_simple.py

      from torch.utils.data import Dataset
      from torch.autograd import Variable
      import torch.nn as nn
      import torch.nn.functional as F
      import torch.optim as optim
      import numpy as np
      import torch
      import cv2
      import argparse
      
      
      class Fer2013Dataset(Dataset):
          """Face Emotion Recognition dataset.
      
          Utility for loading FER into PyTorch. Dataset curated by Pierre-Luc Carrier
          and Aaron Courville in 2013.
      
          Each sample is 1 x 1 x 48 x 48, and each label is a scalar.
          """
          pass
      

      Delete the pass placeholder in the Fer2013Dataset class. In its place, add a function that will initialize our data holder:

      step_7_fer_simple.py

          def __init__(self, path: str):
              """
              Args:
                  path: Path to `.np` file containing sample nxd and label nx1
              """
              with np.load(path) as data:
                  self._samples = data['X']
                  self._labels = data['Y']
              self._samples = self._samples.reshape((-1, 1, 48, 48))
      
              self.X = Variable(torch.from_numpy(self._samples)).float()
              self.Y = Variable(torch.from_numpy(self._labels)).float()
      ...
      

      This function starts by loading the samples and labels. Then it wraps the data in PyTorch data structures.

      Directly after the __init__ function, add a __len__ function, as this is needed to implement the Dataset interface PyTorch expects:

      step_7_fer_simple.py

      ...
          def __len__(self):
              return len(self._labels)
      

      Finally, add a __getitem__ method, which returns a dictionary containing the sample and the label:

      step_7_fer_simple.py

          def __getitem__(self, idx):
              return {'image': self._samples[idx], 'label': self._labels[idx]}
      

      Double-check that your file looks like the following:

      step_7_fer_simple.py

      from torch.utils.data import Dataset
      from torch.autograd import Variable
      import torch.nn as nn
      import torch.nn.functional as F
      import torch.optim as optim
      import numpy as np
      import torch
      import cv2
      import argparse
      
      
      class Fer2013Dataset(Dataset):
          """Face Emotion Recognition dataset.
          Utility for loading FER into PyTorch. Dataset curated by Pierre-Luc Carrier
          and Aaron Courville in 2013.
          Each sample is 1 x 1 x 48 x 48, and each label is a scalar.
          """
      
          def __init__(self, path: str):
              """
              Args:
                  path: Path to `.np` file containing sample nxd and label nx1
              """
              with np.load(path) as data:
                  self._samples = data['X']
                  self._labels = data['Y']
              self._samples = self._samples.reshape((-1, 1, 48, 48))
      
              self.X = Variable(torch.from_numpy(self._samples)).float()
              self.Y = Variable(torch.from_numpy(self._labels)).float()
      
          def __len__(self):
              return len(self._labels)
      
          def __getitem__(self, idx):
              return {'image': self._samples[idx], 'label': self._labels[idx]}
      

      Next, load the Fer2013Dataset dataset. Add the following code to the end of your file after the Fer2013Dataset class:

      step_7_fer_simple.py

      trainset = Fer2013Dataset('data/fer2013_train.npz')
      trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True)
      
      testset = Fer2013Dataset('data/fer2013_test.npz')
      testloader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False)
      

      This code initializes the dataset using the Fer2013Dataset class you created. Then for the train and validation sets, it wraps the dataset in a DataLoader. This translates the dataset into an iterable to use later.

      As a sanity check, verify that the dataset utilities are functioning. Create a sample dataset loader using DataLoader and print the first element of that loader. Add the following to the end of your file:

      step_7_fer_simple.py

      if __name__ == '__main__':
          loader = torch.utils.data.DataLoader(trainset, batch_size=2, shuffle=False)
          print(next(iter(loader)))
      

      Verify that your completed script looks like this:

      step_7_fer_simple.py

      from torch.utils.data import Dataset
      from torch.autograd import Variable
      import torch.nn as nn
      import torch.nn.functional as F
      import torch.optim as optim
      import numpy as np
      import torch
      import cv2
      import argparse
      
      
      class Fer2013Dataset(Dataset):
          """Face Emotion Recognition dataset.
          Utility for loading FER into PyTorch. Dataset curated by Pierre-Luc Carrier
          and Aaron Courville in 2013.
          Each sample is 1 x 1 x 48 x 48, and each label is a scalar.
          """
      
          def __init__(self, path: str):
              """
              Args:
                  path: Path to `.np` file containing sample nxd and label nx1
              """
              with np.load(path) as data:
                  self._samples = data['X']
                  self._labels = data['Y']
              self._samples = self._samples.reshape((-1, 1, 48, 48))
      
              self.X = Variable(torch.from_numpy(self._samples)).float()
              self.Y = Variable(torch.from_numpy(self._labels)).float()
      
          def __len__(self):
              return len(self._labels)
      
          def __getitem__(self, idx):
              return {'image': self._samples[idx], 'label': self._labels[idx]}
      
      trainset = Fer2013Dataset('data/fer2013_train.npz')
      trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True)
      
      testset = Fer2013Dataset('data/fer2013_test.npz')
      testloader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False)
      
      if __name__ == '__main__':
          loader = torch.utils.data.DataLoader(trainset, batch_size=2, shuffle=False)
          print(next(iter(loader)))
      

      Exit your editor and run the script.

      • python step_7_fer_simple.py

      This outputs the following pair of tensors. Our data pipeline outputs two samples and two labels. This indicates that our data pipeline is up and ready to go:

      Output

      {'image': (0 ,0 ,.,.) = 24 32 36 ... 173 172 173 25 34 29 ... 173 172 173 26 29 25 ... 172 172 174 ... ⋱ ... 159 185 157 ... 157 156 153 136 157 187 ... 152 152 150 145 130 161 ... 142 143 142 ⋮ (1 ,0 ,.,.) = 20 17 19 ... 187 176 162 22 17 17 ... 195 180 171 17 17 18 ... 203 193 175 ... ⋱ ... 1 1 1 ... 106 115 119 2 2 1 ... 103 111 119 2 2 2 ... 99 107 118 [torch.LongTensor of size 2x1x48x48] , 'label': 1 1 [torch.LongTensor of size 2] }

      Now that you've verified that the data pipeline works, return to step_7_fer_simple.py to add the neural network and optimizer. Open step_7_fer_simple.py.

      • nano step_7_fer_simple.py

      First, delete the last three lines you added in the previous iteration:

      step_7_fer_simple.py

      # Delete all three lines
      if __name__ == '__main__':
          loader = torch.utils.data.DataLoader(trainset, batch_size=2, shuffle=False)
          print(next(iter(loader)))
      

      In their place, define a PyTorch neural network that includes three convolutional layers, followed by three fully connected layers. Add this to the end of your existing script:

      step_7_fer_simple.py

      class Net(nn.Module):
          def __init__(self):
              super(Net, self).__init__()
              self.conv1 = nn.Conv2d(1, 6, 5)
              self.pool = nn.MaxPool2d(2, 2)
              self.conv2 = nn.Conv2d(6, 6, 3)
              self.conv3 = nn.Conv2d(6, 16, 3)
              self.fc1 = nn.Linear(16 * 4 * 4, 120)
              self.fc2 = nn.Linear(120, 48)
              self.fc3 = nn.Linear(48, 3)
      
          def forward(self, x):
              x = self.pool(F.relu(self.conv1(x)))
              x = self.pool(F.relu(self.conv2(x)))
              x = self.pool(F.relu(self.conv3(x)))
              x = x.view(-1, 16 * 4 * 4)
              x = F.relu(self.fc1(x))
              x = F.relu(self.fc2(x))
              x = self.fc3(x)
              return x
      

      Now initialize the neural network, define a loss function, and define optimization hyperparameters by adding the following code to the end of the script:

      step_7_fer_simple.py

      net = Net().float()
      criterion = nn.CrossEntropyLoss()
      optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
      

      We'll train for two epochs. For now, we define an epoch to be an iteration of training where every training sample has been used exactly once.

      First, extract image and label from the dataset loader and then wrap each in a PyTorch Variable. Second, run the forward pass and then backpropagate through the loss and neural network. Add the following code to the end of your script to do that:

      step_7_fer_simple.py

      for epoch in range(2):  # loop over the dataset multiple times
      
          running_loss = 0.0
          for i, data in enumerate(trainloader, 0):
              inputs = Variable(data['image'].float())
              labels = Variable(data['label'].long())
              optimizer.zero_grad()
      
              # forward + backward + optimize
              outputs = net(inputs)
              loss = criterion(outputs, labels)
              loss.backward()
              optimizer.step()
      
              # print statistics
              running_loss += loss.data[0]
              if i % 100 == 0:
                  print('[%d, %5d] loss: %.3f' % (epoch, i, running_loss / (i + 1)))
      

      Your script should now look like this:

      step_7_fer_simple.py

      from torch.utils.data import Dataset
      from torch.autograd import Variable
      import torch.nn as nn
      import torch.nn.functional as F
      import torch.optim as optim
      import numpy as np
      import torch
      import cv2
      import argparse
      
      
      class Fer2013Dataset(Dataset):
          """Face Emotion Recognition dataset.
      
          Utility for loading FER into PyTorch. Dataset curated by Pierre-Luc Carrier
          and Aaron Courville in 2013.
      
          Each sample is 1 x 1 x 48 x 48, and each label is a scalar.
          """
          def __init__(self, path: str):
              """
              Args:
                  path: Path to `.np` file containing sample nxd and label nx1
              """
              with np.load(path) as data:
                  self._samples = data['X']
                  self._labels = data['Y']
              self._samples = self._samples.reshape((-1, 1, 48, 48))
      
              self.X = Variable(torch.from_numpy(self._samples)).float()
              self.Y = Variable(torch.from_numpy(self._labels)).float()
      
          def __len__(self):
              return len(self._labels)
      
      
          def __getitem__(self, idx):
              return {'image': self._samples[idx], 'label': self._labels[idx]}
      
      
      trainset = Fer2013Dataset('data/fer2013_train.npz')
      trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True)
      
      testset = Fer2013Dataset('data/fer2013_test.npz')
      testloader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False)
      
      
      class Net(nn.Module):
          def __init__(self):
              super(Net, self).__init__()
              self.conv1 = nn.Conv2d(1, 6, 5)
              self.pool = nn.MaxPool2d(2, 2)
              self.conv2 = nn.Conv2d(6, 6, 3)
              self.conv3 = nn.Conv2d(6, 16, 3)
              self.fc1 = nn.Linear(16 * 4 * 4, 120)
              self.fc2 = nn.Linear(120, 48)
              self.fc3 = nn.Linear(48, 3)
      
          def forward(self, x):
              x = self.pool(F.relu(self.conv1(x)))
              x = self.pool(F.relu(self.conv2(x)))
              x = self.pool(F.relu(self.conv3(x)))
              x = x.view(-1, 16 * 4 * 4)
              x = F.relu(self.fc1(x))
              x = F.relu(self.fc2(x))
              x = self.fc3(x)
              return x
      
      net = Net().float()
      criterion = nn.CrossEntropyLoss()
      optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
      
      
      for epoch in range(2):  # loop over the dataset multiple times
      
          running_loss = 0.0
          for i, data in enumerate(trainloader, 0):
              inputs = Variable(data['image'].float())
              labels = Variable(data['label'].long())
              optimizer.zero_grad()
      
              # forward + backward + optimize
              outputs = net(inputs)
              loss = criterion(outputs, labels)
              loss.backward()
              optimizer.step()
      
              # print statistics
              running_loss += loss.data[0]
              if i % 100 == 0:
                  print('[%d, %5d] loss: %.3f' % (epoch, i, running_loss / (i + 1)))
      

      Save the file and exit the editor once you've verified your code. Then, launch this proof-of-concept training:

      • python step_7_fer_simple.py

      You'll see output similar to the following as the neural network trains:

      Output

      [0, 0] loss: 1.094 [0, 100] loss: 1.049 [0, 200] loss: 1.009 [0, 300] loss: 0.963 [0, 400] loss: 0.935 [1, 0] loss: 0.760 [1, 100] loss: 0.768 [1, 200] loss: 0.775 [1, 300] loss: 0.776 [1, 400] loss: 0.767

      You can then augment this script using a number of other PyTorch utilities to save and load models, output training and validation accuracies, fine-tune a learning-rate schedule, etc. After training for 20 epochs with a learning rate of 0.01 and momentum of 0.9, our neural network attains a 87.9% train accuracy and a 75.5% validation accuracy, a further 6.8% improvement over the most successful least-squares approach thus far at 66.6%. We'll include these additional bells and whistles in a new script.

      Create a new file to hold the final face emotion detector which your live camera feed will use. This script contains the code above along with a command-line interface and an easy-to-import version of our code that will be used later. Additionally, it contains the hyperparameters tuned in advance, for a model with higher accuracy.

      Start with the following imports. This matches our previous file but additionally includes OpenCV as import cv2.

      step_7_fer.py

      from torch.utils.data import Dataset
      from torch.autograd import Variable
      import torch.nn as nn
      import torch.nn.functional as F
      import torch.optim as optim
      import numpy as np
      import torch
      import cv2
      import argparse
      

      Directly beneath these imports, reuse your code from step_7_fer_simple.py to define the neural network:

      step_7_fer.py

      class Net(nn.Module):
          def __init__(self):
              super(Net, self).__init__()
              self.conv1 = nn.Conv2d(1, 6, 5)
              self.pool = nn.MaxPool2d(2, 2)
              self.conv2 = nn.Conv2d(6, 6, 3)
              self.conv3 = nn.Conv2d(6, 16, 3)
              self.fc1 = nn.Linear(16 * 4 * 4, 120)
              self.fc2 = nn.Linear(120, 48)
              self.fc3 = nn.Linear(48, 3)
      
          def forward(self, x):
              x = self.pool(F.relu(self.conv1(x)))
              x = self.pool(F.relu(self.conv2(x)))
              x = self.pool(F.relu(self.conv3(x)))
              x = x.view(-1, 16 * 4 * 4)
              x = F.relu(self.fc1(x))
              x = F.relu(self.fc2(x))
              x = self.fc3(x)
              return x
      

      Again, reuse the code for the Face Emotion Recognition dataset from step_7_fer_simple.py and add it to this file:

      step_7_fer.py

      class Fer2013Dataset(Dataset):
          """Face Emotion Recognition dataset.
          Utility for loading FER into PyTorch. Dataset curated by Pierre-Luc Carrier
          and Aaron Courville in 2013.
          Each sample is 1 x 1 x 48 x 48, and each label is a scalar.
          """
      
          def __init__(self, path: str):
              """
              Args:
                  path: Path to `.np` file containing sample nxd and label nx1
              """
              with np.load(path) as data:
                  self._samples = data['X']
                  self._labels = data['Y']
              self._samples = self._samples.reshape((-1, 1, 48, 48))
      
              self.X = Variable(torch.from_numpy(self._samples)).float()
              self.Y = Variable(torch.from_numpy(self._labels)).float()
      
          def __len__(self):
              return len(self._labels)
      
          def __getitem__(self, idx):
              return {'image': self._samples[idx], 'label': self._labels[idx]}
      

      Next, define a few utilities to evaluate the neural network's performance. First, add an evaluate function which compares the neural network's predicted emotion to the true emotion for a single image:

      step_7_fer.py

      def evaluate(outputs: Variable, labels: Variable, normalized: bool=True) -> float:
          """Evaluate neural network outputs against non-one-hotted labels."""
          Y = labels.data.numpy()
          Yhat = np.argmax(outputs.data.numpy(), axis=1)
          denom = Y.shape[0] if normalized else 1
          return float(np.sum(Yhat == Y) / denom)
      

      Then add a function called batch_evaluate which applies the first function to all images:

      step_7_fer.py

      def batch_evaluate(net: Net, dataset: Dataset, batch_size: int=500) -> float:
          """Evaluate neural network in batches, if dataset is too large."""
          score = 0.0
          n = dataset.X.shape[0]
          for i in range(0, n, batch_size):
              x = dataset.X[i: i + batch_size]
              y = dataset.Y[i: i + batch_size]
              score += evaluate(net(x), y, False)
          return score / n
      

      Now, define a function called get_image_to_emotion_predictor that takes in an image and outputs a predicted emotion, using a pretrained model:

      step_7_fer.py

      def get_image_to_emotion_predictor(model_path='assets/model_best.pth'):
          """Returns predictor, from image to emotion index."""
          net = Net().float()
          pretrained_model = torch.load(model_path)
          net.load_state_dict(pretrained_model['state_dict'])
      
          def predictor(image: np.array):
              """Translates images into emotion indices."""
              if image.shape[2] > 1:
                  image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
              frame = cv2.resize(image, (48, 48)).reshape((1, 1, 48, 48))
              X = Variable(torch.from_numpy(frame)).float()
              return np.argmax(net(X).data.numpy(), axis=1)[0]
          return predictor
      

      Finally, add the following code to define the main function to leverage the other utilities:

      step_7_fer.py

      def main():
          trainset = Fer2013Dataset('data/fer2013_train.npz')
          testset = Fer2013Dataset('data/fer2013_test.npz')
          net = Net().float()
      
          pretrained_model = torch.load("assets/model_best.pth")
          net.load_state_dict(pretrained_model['state_dict'])
      
          train_acc = batch_evaluate(net, trainset, batch_size=500)
          print('Training accuracy: %.3f' % train_acc)
          test_acc = batch_evaluate(net, testset, batch_size=500)
          print('Validation accuracy: %.3f' % test_acc)
      
      
      if __name__ == '__main__':
          main()
      

      This loads a pretrained neural network and evaluates its performance on the provided Face Emotion Recognition dataset. Specifically, the script outputs accuracy on the images we used for training, as well as a separate set of images we put aside for testing purposes.

      Double-check that your file matches the following:

      step_7_fer.py

      from torch.utils.data import Dataset
      from torch.autograd import Variable
      import torch.nn as nn
      import torch.nn.functional as F
      import torch.optim as optim
      import numpy as np
      import torch
      import cv2
      import argparse
      
      class Net(nn.Module):
          def __init__(self):
              super(Net, self).__init__()
              self.conv1 = nn.Conv2d(1, 6, 5)
              self.pool = nn.MaxPool2d(2, 2)
              self.conv2 = nn.Conv2d(6, 6, 3)
              self.conv3 = nn.Conv2d(6, 16, 3)
              self.fc1 = nn.Linear(16 * 4 * 4, 120)
              self.fc2 = nn.Linear(120, 48)
              self.fc3 = nn.Linear(48, 3)
      
          def forward(self, x):
              x = self.pool(F.relu(self.conv1(x)))
              x = self.pool(F.relu(self.conv2(x)))
              x = self.pool(F.relu(self.conv3(x)))
              x = x.view(-1, 16 * 4 * 4)
              x = F.relu(self.fc1(x))
              x = F.relu(self.fc2(x))
              x = self.fc3(x)
              return x
      
      
      class Fer2013Dataset(Dataset):
          """Face Emotion Recognition dataset.
          Utility for loading FER into PyTorch. Dataset curated by Pierre-Luc Carrier
          and Aaron Courville in 2013.
          Each sample is 1 x 1 x 48 x 48, and each label is a scalar.
          """
      
          def __init__(self, path: str):
              """
              Args:
                  path: Path to `.np` file containing sample nxd and label nx1
              """
              with np.load(path) as data:
                  self._samples = data['X']
                  self._labels = data['Y']
              self._samples = self._samples.reshape((-1, 1, 48, 48))
      
              self.X = Variable(torch.from_numpy(self._samples)).float()
              self.Y = Variable(torch.from_numpy(self._labels)).float()
      
          def __len__(self):
              return len(self._labels)
      
          def __getitem__(self, idx):
              return {'image': self._samples[idx], 'label': self._labels[idx]}
      
      
      def evaluate(outputs: Variable, labels: Variable, normalized: bool=True) -> float:
          """Evaluate neural network outputs against non-one-hotted labels."""
          Y = labels.data.numpy()
          Yhat = np.argmax(outputs.data.numpy(), axis=1)
          denom = Y.shape[0] if normalized else 1
          return float(np.sum(Yhat == Y) / denom)
      
      
      def batch_evaluate(net: Net, dataset: Dataset, batch_size: int=500) -> float:
          """Evaluate neural network in batches, if dataset is too large."""
          score = 0.0
          n = dataset.X.shape[0]
          for i in range(0, n, batch_size):
              x = dataset.X[i: i + batch_size]
              y = dataset.Y[i: i + batch_size]
              score += evaluate(net(x), y, False)
          return score / n
      
      
      def get_image_to_emotion_predictor(model_path='assets/model_best.pth'):
          """Returns predictor, from image to emotion index."""
          net = Net().float()
          pretrained_model = torch.load(model_path)
          net.load_state_dict(pretrained_model['state_dict'])
      
          def predictor(image: np.array):
              """Translates images into emotion indices."""
              if image.shape[2] > 1:
                  image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
              frame = cv2.resize(image, (48, 48)).reshape((1, 1, 48, 48))
              X = Variable(torch.from_numpy(frame)).float()
              return np.argmax(net(X).data.numpy(), axis=1)[0]
          return predictor
      
      
      def main():
          trainset = Fer2013Dataset('data/fer2013_train.npz')
          testset = Fer2013Dataset('data/fer2013_test.npz')
          net = Net().float()
      
          pretrained_model = torch.load("assets/model_best.pth")
          net.load_state_dict(pretrained_model['state_dict'])
      
          train_acc = batch_evaluate(net, trainset, batch_size=500)
          print('Training accuracy: %.3f' % train_acc)
          test_acc = batch_evaluate(net, testset, batch_size=500)
          print('Validation accuracy: %.3f' % test_acc)
      
      
      if __name__ == '__main__':
          main(
      

      Save the file and exit your editor.

      As before, with the face detector, download pre-trained model parameters and save them to your assets folder with the following command:

      • wget -O assets/model_best.pth https://github.com/alvinwan/emotion-based-dog-filter/raw/master/src/assets/model_best.pth

      Run the script to use and evaluate the pre-trained model:

      This will output the following:

      Output

      Training accuracy: 0.879 Validation accuracy: 0.755

      At this point, you've built a pretty accurate face-emotion classifier. In essence, our model can correctly disambiguate between faces that are happy, sad, and surprised eight out of ten times. This is a reasonably good model, so you can now move on to using this face-emotion classifier to determine which dog mask to apply to faces.

      Step 8 — Finishing the Emotion-Based Dog Filter

      Before integrating our brand-new face-emotion classifier, we will need animal masks to pick from. We'll use a Dalmation mask and a Sheepdog mask:

      Dalmation mask
      Sheepdog mask

      Execute these commands to download both masks to your assets folder:

      • wget -O assets/dalmation.png https://www.xpresservers.com/wp-content/uploads/2019/04/1554419827_591_How-To-Apply-Computer-Vision-to-Build-an-Emotion-Based-Dog-Filter-in-Python-3.png # dalmation
      • wget -O assets/sheepdog.png https://www.xpresservers.com/wp-content/uploads/2019/04/1554419827_102_How-To-Apply-Computer-Vision-to-Build-an-Emotion-Based-Dog-Filter-in-Python-3.png # sheepdog

      Now let's use the masks in our filter. Start by duplicating the step_4_dog_mask.py file:

      • cp step_4_dog_mask.py step_8_dog_emotion_mask.py

      Open the new Python script.

      • nano step_8_dog_emotion_mask.py

      Insert a new line at the top of the script to import the emotion predictor:

      step_8_dog_emotion_mask.py

      from step_7_fer import get_image_to_emotion_predictor
      ...
      

      Then, in the main() function, locate this line:

      step_8_dog_emotion_mask.py

          mask = cv2.imread('assets/dog.png')
      

      Replace it with the following to load the new masks and aggregate all masks into a tuple:

      step_8_dog_emotion_mask.py

          mask0 = cv2.imread('assets/dog.png')
          mask1 = cv2.imread('assets/dalmation.png')
          mask2 = cv2.imread('assets/sheepdog.png')
          masks = (mask0, mask1, mask2)
      

      Add a line break, and then add this code to create the emotion predictor.

      step_8_dog_emotion_mask.py

      
          # get emotion predictor
          predictor = get_image_to_emotion_predictor()
      

      Your main function should now match the following:

      step_8_dog_emotion_mask.py

      def main():
          cap = cv2.VideoCapture(0)
      
          # load mask
          mask0 = cv2.imread('assets/dog.png')
          mask1 = cv2.imread('assets/dalmation.png')
          mask2 = cv2.imread('assets/sheepdog.png')
          masks = (mask0, mask1, mask2)
      
          # get emotion predictor
          predictor = get_image_to_emotion_predictor()
      
          # initialize front face classifier
          ...
      

      Next, locate these lines:

      step_8_dog_emotion_mask.py

      
                  # apply mask
                  frame[y0: y1, x0: x1] = apply_mask(frame[y0: y1, x0: x1], mask)
      

      Insert the following line below the # apply mask line to select the appropriate mask by using the predictor:

      step_8_dog_emotion_mask.py

                  # apply mask
                  mask = masks[predictor(frame[y:y+h, x: x+w])]
                  frame[y0: y1, x0: x1] = apply_mask(frame[y0: y1, x0: x1], mask)
      
      

      The completed file should look like this:

      step_8_dog_emotion_mask.py

      """Test for face detection"""
      
      from step_7_fer import get_image_to_emotion_predictor
      import numpy as np
      import cv2
      
      def apply_mask(face: np.array, mask: np.array) -> np.array:
          """Add the mask to the provided face, and return the face with mask."""
          mask_h, mask_w, _ = mask.shape
          face_h, face_w, _ = face.shape
      
          # Resize the mask to fit on face
          factor = min(face_h / mask_h, face_w / mask_w)
          new_mask_w = int(factor * mask_w)
          new_mask_h = int(factor * mask_h)
          new_mask_shape = (new_mask_w, new_mask_h)
          resized_mask = cv2.resize(mask, new_mask_shape)
      
          # Add mask to face - ensure mask is centered
          face_with_mask = face.copy()
          non_white_pixels = (resized_mask < 250).all(axis=2)
          off_h = int((face_h - new_mask_h) / 2)
          off_w = int((face_w - new_mask_w) / 2)
          face_with_mask[off_h: off_h+new_mask_h, off_w: off_w+new_mask_w][non_white_pixels] = 
               resized_mask[non_white_pixels]
      
          return face_with_mask
      
      def main():
      
          cap = cv2.VideoCapture(0)
          # load mask
          mask0 = cv2.imread('assets/dog.png')
          mask1 = cv2.imread('assets/dalmation.png')
          mask2 = cv2.imread('assets/sheepdog.png')
          masks = (mask0, mask1, mask2)
      
          # get emotion predictor
          predictor = get_image_to_emotion_predictor()
      
          # initialize front face classifier
          cascade = cv2.CascadeClassifier("assets/haarcascade_frontalface_default.xml")
      
          while True:
              # Capture frame-by-frame
              ret, frame = cap.read()
              frame_h, frame_w, _ = frame.shape
      
              # Convert to black-and-white
              gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
              blackwhite = cv2.equalizeHist(gray)
      
              rects = cascade.detectMultiScale(
                  blackwhite, scaleFactor=1.3, minNeighbors=4, minSize=(30, 30),
                  flags=cv2.CASCADE_SCALE_IMAGE)
      
              for x, y, w, h in rects:
                  # crop a frame slightly larger than the face
                  y0, y1 = int(y - 0.25*h), int(y + 0.75*h)
                  x0, x1 = x, x + w
                  # give up if the cropped frame would be out-of-bounds
                  if x0 < 0 or y0 < 0 or x1 > frame_w or y1 > frame_h:
                      continue
                  # apply mask
                  mask = masks[predictor(frame[y:y+h, x: x+w])]
                  frame[y0: y1, x0: x1] = apply_mask(frame[y0: y1, x0: x1], mask)
      
              # Display the resulting frame
              cv2.imshow('frame', frame)
              if cv2.waitKey(1) & 0xFF == ord('q'):
                  break
      
          cap.release()
          cv2.destroyAllWindows()
      
      if __name__ == '__main__':
          main()
      

      Save and exit your editor. Now launch the script:

      • python step_8_dog_emotion_mask.py

      Now try it out! Smiling will register as "happy" and show the original dog. A neutral face or a frown will register as "sad" and yield the dalmation. A face of "surprise," with a nice big jaw drop, will yield the sheepdog.

      GIF for emotion-based dog filter

      This concludes our emotion-based dog filter and foray into computer vision.

      Conclusion

      In this tutorial, you built a face detector and dog filter using computer vision and employed machine learning models to apply masks based on detected emotions.

      Machine learning is widely applicable. However, it's up to the practitioner to consider the ethical implications of each application when applying machine learning. The application you built in this tutorial was a fun exercise, but remember that you relied on OpenCV and an existing dataset to identify faces, rather than supplying your own data to train the models. The data and models used have significant impacts on how a program works.

      For example, imagine a job search engine where the models were trained with data about candidates. such as race, gender, age, culture, first language, or other factors. And perhaps the developers trained a model that enforces sparsity, which ends up reducing the feature space to a subspace where gender explains most of the variance. As a result, the model influences candidate job searches and even company selection processes based primarily on gender. Now consider more complex situations where the model is less interpretable and you don't know what a particular feature corresponds to. You can learn more about this in Equality of Opportunity in Machine Learning by Professor Moritz Hardt at UC Berkeley.

      There can be an overwhelming magnitude of uncertainty in machine learning. To understand this randomness and complexity, you'll have to develop both mathematical intuitions and probabilistic thinking skills. As a practitioner, it is up to you to dig into the theoretical underpinnings of machine learning.



      Source link

      How To Detect and Extract Faces from an Image with OpenCV and Python


      The author selected the Open Internet/Free Speech Fund to receive a donation as part of the Write for DOnations program.

      Introduction

      Images make up a large amount of the data that gets generated each day, which makes the ability to process these images important. One method of processing images is via face detection. Face detection is a branch of image processing that uses machine learning to detect faces in images.

      A Haar Cascade is an object detection method used to locate an object of interest in images. The algorithm is trained on a large number of positive and negative samples, where positive samples are images that contain the object of interest. Negative samples are images that may contain anything but the desired object. Once trained, the classifier can then locate the object of interest in any new images.

      In this tutorial, you will use a pre-trained Haar Cascade model from OpenCV and Python to detect and extract faces from an image. OpenCV is an open-source programming library that is used to process images.

      Prerequisites

      Step 1 — Configuring the Local Environment

      Before you begin writing your code, you will first create a workspace to hold the code and install a few dependencies.

      Create a directory for the project with the mkdir command:

      Change into the newly created directory:

      Next, you will create a virtual environment for this project. Virtual environments isolate different projects so that differing dependencies won't cause any disruptions. Create a virtual environment named face_scrapper to use with this project:

      • python3 -m venv face_scrapper

      Activate the isolated environment:

      • source face_scrapper/bin/activate

      You will now see that your prompt is prefixed with the name of your virtual environment:

      Now that you've activated your virtual environment, you will use nano or your favorite text editor to create a requirements.txt file. This file indicates the necessary Python dependencies:

      Next, you need to install three dependencies to complete this tutorial:

      • numpy: numpy is a Python library that adds support for large, multi-dimensional arrays. It also includes a large collection of mathematical functions to operate on the arrays.
      • opencv-utils: This is the extended library for OpenCV that includes helper functions.
      • opencv-python: This is the core OpenCV module that Python uses.

      Add the following dependencies to the file:

      requirements.txt

      numpy 
      opencv-utils
      opencv-python
      

      Save and close the file.

      Install the dependencies by passing the requirements.txt file to the Python package manager, pip. The -r flag specifies the location of requirements.txt file.

      • pip install -r requirements.txt

      In this step, you set up a virtual environment for your project and installed the necessary dependencies. You're now ready to start writing the code to detect faces from an input image in next step.

      Step 2 — Writing and Running the Face Detector Script

      In this section, you will write code that will take an image as input and return two things:

      • The number of faces found in the input image.
      • A new image with a rectangular plot around each detected face.

      Start by creating a new file to hold your code:

      In this new file, start writing your code by first importing the necessary libraries. You will import two modules here: cv2 and sys. The cv2 module imports the OpenCV library into the program, and sys imports common Python functions, such as argv, that your code will use.

      app.py

      import cv2
      import sys
      

      Next, you will specify that the input image will be passed as an argument to the script at runtime. The Pythonic way of reading the first argument is to assign the value returned by sys.argv[1] function to an variable:

      app.py

      ...
      imagePath = sys.argv[1]
      

      A common practice in image processing is to first convert the input image to gray scale. This is because detecting luminance, as opposed to color, will generally yield better results in object detection. Add the following code to take an input image as an argument and convert it to grayscale:

      app.py

      ...
      image = cv2.imread(imagePath)
      gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
      

      The .imread() function takes the input image, which is passed as an argument to the script, and converts it to an OpenCV object. Next, OpenCV's .cvtColor() function converts the input image object to a grayscale object.

      Now that you've added the code to load an image, you will add the code that detects faces in the specified image:

      app.py

      ...
      faceCascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")
      faces = faceCascade.detectMultiScale(
              gray,
              scaleFactor=1.3,
              minNeighbors=3,
              minSize=(30, 30)
      ) 
      
      print("Found {0} Faces!".format(len(faces)))
      
      

      This code will create a faceCascade object that will load the Haar Cascade file with the cv2.CascadeClassifier method. This allows Python and your code to use the Haar Cascade.

      Next, the code applies OpenCV's .detectMultiScale() method on the faceCascade object. This generates a list of rectangles for all of the detected faces in the image. The list of rectangles is a collection of pixel locations from the image, in the form of Rect(x,y,w,h).

      Here is a summary of the other parameters your code uses:

      • gray: This specifies the use of the OpenCV grayscale image object that you loaded earlier.
      • scaleFactor: This parameter specifies the rate to reduce the image size at each image scale. Your model has a fixed scale during training, so input images can be scaled down for improved detection. This process stops after reaching a threshold limit, defined by maxSize and minSize.
      • minNeighbors: This parameter specifies how many neighbors, or detections, each candidate rectangle should have to retain it. A higher value may result in less false positives, but a value too high can eliminate true positives.
      • minSize: This allows you to define the minimum possible object size measured in pixels. Objects smaller than this parameter are ignored.

      After generating a list of rectangles, the faces are then counted with the len function. The number of detected faces are then returned as output after running the script.

      Next, you will use OpenCV's .rectangle() method to draw a rectangle around the detected faces:

      app.py

      ...
      for (x, y, w, h) in faces:
          cv2.rectangle(image, (x, y), (x+w, y+h), (0, 255, 0), 2)
      
      

      This code uses a for loop to iterate through the list of pixel locations returned from faceCascade.detectMultiScale method for each detected object. The rectangle method will take four arguments:

      • image tells the code to draw rectangles on the original input image.
      • (x,y), (x+w, y+h) are the four pixel locations for the detected object. rectangle will use these to locate and draw rectangles around the detected objects in the input image.
      • (0, 255, 0) is the color of the shape. This argument gets passed as a tuple for BGR. For example, you would use (255, 0, 0) for blue. We are using green in this case.
      • 2 is the thickness of the line measured in pixels.

      Now that you've added the code to draw the rectangles, use OpenCV's .imwrite() method to write the new image to your local filesystem as faces_detected.jpg. This method will return true if the write was successful and false if it wasn't able to write the new image.

      app.py

      ...
      status = cv2.imwrite('faces_detected.jpg', image)
      

      Finally, add this code to print the return the true or false status of the .imwrite() function to the console. This will let you know if the write was successful after running the script.

      app.py

      ...
      print ("Image faces_detected.jpg written to filesystem: ",status)
      

      The completed file will look like this:

      app.py

      import cv2
      import sys
      
      imagePath = sys.argv[1]
      
      image = cv2.imread(imagePath)
      gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
      
      faceCascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")
      faces = faceCascade.detectMultiScale(
          gray,
          scaleFactor=1.3,
          minNeighbors=3,
          minSize=(30, 30)
      )
      
      print("[INFO] Found {0} Faces!".format(len(faces)))
      
      for (x, y, w, h) in faces:
          cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
      
      status = cv2.imwrite('faces_detected.jpg', image)
      print("[INFO] Image faces_detected.jpg written to filesystem: ", status)
      

      Once you've verified that everything is entered correctly, save and close the file.

      Note: This code was sourced from the publicly available OpenCV documentation.

      Your code is complete and you are ready to run the script.

      Step 3 — Running the Script

      In this step, you will use an image to test your script. When you find an image you'd like to use to test, save it in the same directory as your app.py script. This tutorial will use the following image:

      Input Image of four people looking at phones

      If you would like to test with the same image, use the following command to download it:

      • curl -O https://www.xpresservers.com/wp-content/uploads/2019/03/How-To-Detect-and-Extract-Faces-from-an-Image-with-OpenCV-and-Python.png

      Once you have an image to test the script, run the script and provide the image path as an argument:

      • python app.py path/to/input_image

      Once the script finishes running, you will receive output like this:

      Output

      [INFO] Found 4 Faces! [INFO] Image faces_detected.jpg written to filesystem: True

      The true output tells you that the updated image was successfully written to the filesystem. Open the image on your local machine to see the changes on the new file:

      Output Image with detected faces

      You should see that your script detected four faces in the input image and drew rectangles to mark them. In the next step, you will use the pixel locations to extract faces from the image.

      Step 4 — Extracting Faces and Saving them Locally (Optional)

      In the previous step, you wrote code to use OpenCV and a Haar Cascade to detect and draw rectangles around faces in an image. In this section, you will modify your code to extract the detected faces from the image into their own files.

      Start by reopening the app.py file with your text editor:

      Next, add the highlighted lines under the cv2.rectangle line:

      app.py

      ...
      for (x, y, w, h) in faces:
          cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
          roi_color = image[y:y + h, x:x + w] 
          print("[INFO] Object found. Saving locally.") 
          cv2.imwrite(str(w) + str(h) + '_faces.jpg', roi_color) 
      ...
      

      The roi_color object plots the pixel locations from the faces list on the original input image. The x, y, h, and w variables are the pixel locations for each of the objects detected from faceCascade.detectMultiScale method. The code then prints output stating that an object was found and will be saved locally.

      Once that is done, the code saves the plot as a new image using the cv2.imwrite method. It appends the width and height of the plot to the name of the image being written to. This will keep the name unique in case there are multiple faces detected.

      The updated app.py script will look like this:

      app.py

      import cv2
      import sys
      
      imagePath = sys.argv[1]
      
      image = cv2.imread(imagePath)
      gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
      
      faceCascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")
      faces = faceCascade.detectMultiScale(
          gray,
          scaleFactor=1.3,
          minNeighbors=3,
          minSize=(30, 30)
      )
      
      print("[INFO] Found {0} Faces.".format(len(faces)))
      
      for (x, y, w, h) in faces:
          cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
          roi_color = image[y:y + h, x:x + w]
          print("[INFO] Object found. Saving locally.")
          cv2.imwrite(str(w) + str(h) + '_faces.jpg', roi_color)
      
      status = cv2.imwrite('faces_detected.jpg', image)
      print("[INFO] Image faces_detected.jpg written to filesystem: ", status)
      

      To summarize, the updated code uses the pixel locations to extract the faces from the image into a new file. Once you have finished updating the code, save and close the file.

      Now that you've updated the code, you are ready to run the script once more:

      • python app.py path/to/image

      You will see the similar output once your script is done processing the image:

      Output

      [INFO] Found 4 Faces. [INFO] Object found. Saving locally. [INFO] Object found. Saving locally. [INFO] Object found. Saving locally. [INFO] Object found. Saving locally. [INFO] Image faces_detected.jpg written to file-system: True

      Depending on how many faces are in your sample image, you may see more or less output.

      Looking at the contents of the working directory after the execution of the script, you'll see files for the head shots of all faces found in the input image.

      Directory Listing

      You will now see head shots extracted from the input image collected in the working directory:

      Extracted Faces

      In this step, you modified your script to extract the detected objects from the input image and save them locally.

      Conclusion

      In this tutorial, you wrote a script that uses OpenCV and Python to detect, count, and extract faces from an input image. You can update this script to detect different objects by using a different pre-trained Haar Cascade from the OpenCV library, or you can learn how to train your own Haar Cascade.



      Source link