One place for hosting & domains

      How To Apply CSS Styles to HTML with Cascade and Specificity


      Introduction

      Cascading Stylesheets, better known as CSS, is the language for visual style and design on the web. CSS has a long history on the web dating back to 1994 with the initial idea. In the time since, CSS has become a feature-rich language capable of laying out a webpage, creating complex animations, and much more.

      Since CSS is the web’s styling language, understanding how it works and how to use it is fundamental to web development. It is especially valuable to understand in order to work with Hypertext Markup Language (HTML) and JavaScript effectively. This tutorial will focus on applying CSS to HTML, cascade, and specificity, which are foundational aspects of CSS and will prepare you for using CSS effectively in your web projects.

      CSS is not a conventional programming language. While it does have some features found in other programming languages, such as variables and math, CSS is wholly dependent on HTML to work. CSS’s purpose is to provide visual modifications to HTML. The CSS language is more like a to-do list for the browser: You are saying to the browser, here is a list of things I want you to find. Once the browser finds those things, the CSS instructs the browser to go through the subset list and make changes to those things.

      The browser follows this list of instructions from top to bottom unquestionably, and CSS needs to be written with that in mind. The cascade part of Cascading Stylesheets speaks to how browsers read the list. Since the browser is impartial, it makes the style changes as it encounters them. If the CSS says to make some HTML elements red, then later down in the CSS it says to make those elements blue, the result is blue.

      Applying styles to an element gets a little complicated, as there are many ways to tell the browser to find an element in the HTML. Each element in HTML has a set of attributes which can be used to find a specific element. Because of the cascade where the browser reads the instructions from top to bottom with impartiality, the instructions provided must be specific. This is known as specificity, where the developer must write precise criteria for the browser to find the exact element they wish to apply the styles to.

      In this tutorial you will work through multiple hands-on examples to understand the different ways styles can be applied to HTML elements and how cascade and specificity affect how styles are written.

      Prerequisites

      Using the HTML Style Attribute

      In this first step, you will apply styles to an HTML element directly with the style attribute. This method, also known as inline styling, uses an HTML element attribute to accept a CSS property as a value, and then applies it directly to the element.

      To familiarize yourself with some concepts of CSS, start by opening the index.html file in your text editor. In that file, set up the base HTML structure of the <html>, <head>, and <body> tags. Inside the <body> tags add a <div> with a short sentence of text:

      index.html

      <!doctype html>
      <html>
          <head>
              <title>Sammy Shark</title>
          </head>
          <body>
              <div>Sammy is the greatest shark in the ocean.</div>
          </body>
      </html>
      

      Next, open index.html in your browser. You will see your text in the <div> in the top left portion of the browser window. Visually, the text should appear similar to the following image with black text on a white background using serif font, such as Times New Roman:

      Text rendered in black in a serif font–the browser default style.

      To begin styling, add an attribute with an empty value to the opening <div> tag:

      index.html

      ...
      <div style="">Sammy is the greatest shark in the ocean.</div>
      ...
      

      The style attribute is a special attribute for HTML that contains CSS properties and values. The browser will apply those styles to that element.

      In this case, change the color of your sentence to navy using the color property. The format for CSS property and values begins with the property name, followed by the colon symbol :, then the property value, and finally a semicolon symbol ; after the value to tell the browser that’s all for the value:

      index.html

      ...
      <div style="color: navy;">Sammy is the greatest shark in the ocean.</div>
      ...
      

      Save index.html, return to your browser, and refresh. The text has changed from the browser default color of black to navy, as seen in the following image:

      Text rendered in navy blue with the browser default serif font.

      There are many CSS properties you can try in the style attribute, such as background-color or font-family. Typically, a browser’s default font is a serif font, such as Times New Roman. To change the font to a sans serif font, such as Arial or Helvetica, add a space after the semicolon for the color property then type the font-family property, followed by a colon, with sans-serif as the value:

      index.html

      ...
      <div style="color: navy; font-family: sans-serif;">Sammy is the greatest shark in the ocean.</div>
      ...
      

      Save your file and refresh your browser to see how the font for your sentence has changed. The font will now be the browser’s sans-serif font, such as Helvetica or Arial, instead of the default font. The following image shows how the font-family property builds on the color change to navy.

      Text rendered in navy blue with a custom sans serif font.

      Now that you have written a couple of CSS properties, wrap a word in your sentence with the <strong> element and return to your browser:

      index.html

      ...
      <div style="color: navy; font-family: sans-serif;">Sammy is the <strong>greatest</strong> shark in the ocean.</div>
      ...
      

      In your browser, the word inside the <strong> tag will appear bolder than the other words in the sentence, as shown in the following image.

      Text rendered in navy blue with a normal font weight, except the word in the <strong> tag which is bold.

      The word with the <strong> element retains the color and font-family properties of the HTML element it is inside, also known as its parent. This is an example of inheritance, where a child element, an HTML element inside another element, inherits styles that are placed on the parent element. The <strong> element also adds a browser default style of font-weight: bold;, making the text bold. Additionally, the <strong> element can have a style attribute as well to give that element a custom look:

      index.html

      ...
      <div style="color: navy; font-family: sans-serif;">Sammy is the <strong style="color: blue;">greatest</strong> shark in the ocean.</div>
      ...
      

      Save the file and refresh you browser to see the difference, as the word in the <strong> element is now blue, in contrast to the navy of the rest of the sentence. This change is shown in the following image:

      Text rendered in navy blue with a normal font weight, except the word in the <strong> tag which is bold and a lighter blue.

      In this section you used HTML style attributes to apply styles to a <div> and a <strong> element. In the next section you’ll take the styles you wrote for those specific elements and apply them to all <div> and <strong> elements on the page.

      Using the <style> Tag to Write CSS

      Next you will take what was written in the previous section and apply the styles to all similar elements on the page. You will move from using the style attribute to using the <style> HTML element. <style> is a special element that allows you to write CSS within it and have those styles applied to the whole page.

      Using the style attribute on an HTML element can be very handy, but it is limited to only that element or its descendants. To see how this works add another <div> element with a new sentence:

      index.html

      ...
      <div style="color: navy; font-family: sans-serif;">Sammy is the <strong style="color: blue;">greatest</strong> shark in the ocean.</div>
      <div>They like to swim with their friends.</div>
      ...
      

      Go to your browser and reload the page. As you may notice in the browser or the following image, the first sentence gets all the styles you wrote earlier, but the new sentence uses the browser default styles instead:

      The first sentence text is rendered with custom styles while the second sentence is rendered with browser default styles. The first sentence is a navy blue sans serif font, except for the text in the <strong> tag, which is a lighter blue and bold. The second sentence is black and a serif font.

      You could apply the same style attribute on each individual element, but that becomes very cumbersome if you have many sentences that you want to look the same. What you need is a way to target many of the same kinds of elements simultaneously. This can be done with the HTML <style> element.

      The <style> element is most often placed in the <head> tag of an HTML document. This way the browser reads the styles before reading the HTML, causing the page to load already styled. The inverse can cause a flash as the browser loads the content with browser default styles and then loads the custom styles. However, keep in mind that the <style> tag is not limited to use in the <head> and can be placed anywhere within the <body>, which can be advantageous in some scenarios.

      Add <style> tags to the <head> of your index.html file:

      index.html

      <!doctype html>
      <html>
          <head>
              <title>Sammy Shark</title>
              <style>
              </style>
          </head>
          <body>
              <div style="color: navy; font-family: sans-serif;">Sammy is the <strong style="color: blue;">greatest</strong> shark in the ocean.</div>
              <div>They like to swim with their friends.</div>
          </body>
      </html>
      

      Inside the <style> element, you can define what kind of element you want to target with selectors, which identify which HTML elements to apply styles to. Once the selector is in place, you can then group the styles you wish to apply to that element in what is called a selector block.

      To begin setting that up, look at the example from earlier. Here there is a <div> with two properties, color and font-family.

      index.html

      ...
      <div style="color: navy; font-family: sans-serif;">...</div>
      ...
      

      To target all <div> elements on the index.html page, add what is called a type selector within the <style> attribute, followed by an opening and closing curly brace, which define the selector block. This tells the browser to find all the <div> elements on the page and apply the styles found within the selector block:

      index.html

      <!doctype html>
      <html>
          <head>
              <title>Sammy Shark</title>
              <style>
                  div {
                  }
              </style>
          </head>
          <body>
              <div style="color: navy; font-family: sans-serif;">Sammy is the <strong style="color: blue;">greatest</strong> shark in the ocean.</div>
              <div>They like to swim with their friends.</div>
          </body>
      </html>
      

      Next, take the properties from the style attribute and put them inside the curly braces of the selector block. To make it easier to read, it helps to put each property on an individual line:

      index.html

      <!doctype html>
      <html>
          <head>
              <title>Sammy Shark</title>
              <style>
                  div {
                      color: navy;
                      font-family: sans-serif;
                  }
              </style>
          </head>
          <body>
              <div>Sammy is the <strong style="color: blue;">greatest</strong> shark in the ocean.</div>
              <div>They like to swim with their friends.</div>
          </body>
      </html>
      

      Once you have saved the file, return to the browser and refresh. Now both sentences have the same styles applied, all from a single selector in the <style> element:

      The text in both sentences are rendered navy blue and in a sans serif font, except for the text in the one <strong> tag, which is a lighter blue and bold.

      Add a new selector after your div selector block to apply the styles for the <strong> element in the same manner. Also, add a <strong> element around a word in your second sentence to see your new CSS on multiple elements:

      index.html

      <!doctype html>
      <html>
          <head>
              <title>Sammy Shark</title>
              <style>
                  div {
                      color: navy;
                      font-family: sans-serif;
                  }
                  strong {
                      color: blue;
                  }
              </style>
          </head>
          <body>
              <div>Sammy is the <strong>greatest</strong> shark in the ocean.</div>
              <div>They like to swim with their <strong>friends</strong>.</div>
          </body>
      </html>
      

      Save the file and refresh your browser, or look at the following image, to find that now both words using the <strong> element are the color blue:

      The text in both sentences is rendered navy blue and in a sans serif font, except for the text in both <strong> tags, which are a lighter blue and bold.

      In this section, you wrote CSS selectors within a <style> element, which applied the styles to all matching elements on the page. In the next section you will move these styles so that they can be applied on many pages of a website.

      Loading an External CSS Document into HTML

      In this section you will start working with a CSS file that is loaded on multiple HTML pages. You will move the styles from the previous section to the CSS file and create a new HTML page to see how one CSS file can style multiple pages.

      Just as the style attribute is limited to styling the single element, the CSS found in a <style> element are limited to styling a single page. Websites are most often a collection of many web pages that share the same styles. If you had multiple pages that all needed to look the same and you used the <style> element to hold your CSS, making changes to the styles would require a lot of repeat work on each page. This is where the CSS file comes in.

      Create a new file in your editor called styles.css. In that file, copy the contents of the <style> element from index.html and add them to your styles.css file. Be sure to exclude the <style> tags.

      styles.css

      div {
          color: navy;
          font-family: sans-serif;
      }
      strong {
          color: blue;
      }
      

      Now that you have an independent CSS file, it’s time to load that file on to the page so the browser can apply the styles. Start by removing the <style> tags from the <head>. Then inside the <head> tag, write a self-closing <link /> element with two attributes, href and rel. The href value contains the path to the style.css file so the browser can reference the CSS. The rel attribute should have a value of stylesheet as it defines the type of relationship between the page and the document being referenced:

      index.html

      <!doctype html>
      <html>
          <head>
              <title>Sammy Shark</title>
              <link rel="stylesheet" href="https://www.digitalocean.com/community/tutorials/styles.css" />
          </head>
          <body>
              <div>Sammy is the <strong>greatest</strong> shark in the ocean.</div>
              <div>They like to swim with their <strong>friends</strong>.</div>
          </body>
      </html>
      

      Now go to your browser and refresh index.html. In this case, you will not find anything changed since all you have done is change where the styles live.

      The text in both sentences remains rendered navy blue and in a sans serif font, except for the text in both <strong> tags, which are a lighter blue and bold.

      To demonstrate how useful a CSS file is, create a new HTML file called about.html. Copy and paste the HTML from index.html and then make changes to the sentences, or create new sentences:

      about.html

      <!doctype html>
      <html>
          <head>
              <title>About Sharks</title>
              <link rel="stylesheet" href="https://www.digitalocean.com/community/tutorials/styles.css" />
          </head>
          <body>
              <div>There are over <strong>500</strong> species of sharks.</div>
              <div>The great white shark can grow up to <strong>6.1 meters</strong> long.</div>
          </body>
      </html>
      

      Next, open about.html in a new browser window so you can view and compare both HTML files simultaneously. This results in about.html having the same styles for both div and strong elements, as shown in the following image.

      The text in both sentences on the new page are rendered navy blue and in a sans serif font, except for the text in both <strong> tags, which are a lighter blue and bold.

      Return to your text editor and open styles.css and change the div selector’s color property value to green:

      styles.css

      div {
          color: green;
          font-family: sans-serif;
      }
      strong {
          color: blue;
      }
      

      In your browser, refresh both index.html and about.html to see how changing the styles in the CSS file affects both HTML files. As the following image shows, the text changed from navy to green in both index.html and about.html:

      The text in both sentences on the about page are now rendered green and in a sans serif font, except for the text in both <strong> tags, which remains blue and bold.

      Each page has the same styles applied with the green text and blue <strong> elements, all from one central CSS file.

      In this section you created a CSS file and loaded that CSS file on multiple HTML pages. You moved your CSS from the <style> element into the CSS file, which applied the same styles to index.html and the new about.html page. Next you will start working with CSS cascade and specificity.

      Working With the Cascade and Specificity

      This section will get into the depths of the CSS features of cascade and specificity mentioned in the introduction. You will write CSS that exemplifies these concepts, starting with cascade and then specificity. Understanding cascade and specificity can help troubleshoot problems you may find in your code.

      With what you have accomplished so far, the cascade is short. As your CSS file grows in size, it is more and more necessary to be aware of the order of your CSS selectors and properties. One way to think about the cascade is to think of a water cascade and traversing rapids. It’s advisable to go with the current, as trying to go upstream will require extensive effort to make little progress. The same is true with CSS: if your code is not working as expected, it may be going against the flow of the cascade.

      To see this in practice, open up the files from earlier. Open styles.css in your text editor and index.html in your browser. The <div> elements in the browser will currently be green, with the bold text in blue. After the font-family property in the div selector, add another color property with a value of orange:

      styles.css

      div {
          color: green;
          font-family: sans-serif;
          color: orange;
      }
      strong {
          color: blue;
      }
      

      The browser traverses the cascade and hits the green style, turning the div green. Then the browser hits the orange style, and changes the color from green to orange. Refresh index.html in your browser to see the green text is now orange, as shown in the following image:

      The text in both sentences on the about page are now rendered orange and in a sans serif font, except for the text in both <strong> tags, which remains blue and bold.

      In this scenario the browser has been given two color properties, and due to the nature of the cascade, the browser applies the last color property to the element. When a property further down the cascade negates a previous property, this results in a situation called an override. As a CSS file grows in size and scope, overrides can be the source of bugs as well as the solution to problems.

      While the cascade deals with how the browser reads and applies styles to elements, specificity deals with what elements are found and styled.

      Open about.html in your browser. Right now both sentences have the same style. Next, you will change the color of the <strong> element in the second sentence to red, but keep the first <strong> color set to blue. To accomplish this change requires a higher specificity selector. Right now the selectors are what is called low specificity as they are targeting all <strong> elements on the page, regardless of their parent.

      Higher specificity can be achieved several different ways, but the most common and effective way is a class selector. On the second <strong> element, add a new attribute called class and give that attribute a property value of highlight:

      about.html

      <!doctype html>
      <html>
          <head>
              <title>About Sharks</title>
              <link rel="stylesheet" href="https://www.digitalocean.com/community/tutorials/styles.css" />
          </head>
          <body>
              <div>There are over <strong>500</strong> species of sharks.</div>
              <div>The great white shark can grow up to <strong class="highlight">6.1 meters</strong> long.</div>
          </body>
      </html>
      

      Next, open styles.css in you text editor to create a class selector. First, remove the color: orange; from the div you added earlier.

      In CSS, element selectors are written out without an identifier, but with class selectors a period (.) precedes the value found in the attribute. In this case, use the selector .hightlight to apply a color property with a value of red:

      styles.css

      div {
          color: green;
          font-family: sans-serif;
      }
      strong {
          color: blue;
      }
      .highlight {
       color: red;
      }
      

      Save the changes to both styles.css and about.html and refresh about.html in your browser. You will find that the second <strong> element is now red, as seen in the following image:

      The text in both sentences on the about page are now rendered green and in a sans serif font, except for the text in both <strong> tags, which is bold with the first <strong> text being blue and the second being red.

      To understand the robustness of specificity in regards to the cascade, swap the strong and .highlight selector blocks. Different kinds of selectors have higher specificity. In this case the class selector has a higher specificity than the element selector:

      styles.css

      div {
          color: green;
          font-family: sans-serif;
      }
      .highlight {
       color: red;
      }
      strong {
       color: blue;
      }
      

      Save and refresh about.html in your browser and you’ll notice no change. The following image shows that there is no visual change despite the reordering of the CSS.

      The text in both sentences on the about page remain rendered green and in a sans serif font, except for the text in both <strong> tags, which is bold with the first <strong> text being blue and the second being red.

      This is due to the low specificity of element selectors and the high specificity of a class selector. While the browser is reading the list from top to bottom without regard, you can tell the browser to pay more attention when applying a style by using higher specificity selectors.

      In this section you worked with the CSS features of cascade and specificity. You applied the same property twice to an element, which showed how the cascade works by using the last property in the list. You also created styles using a higher specificity selector called a class selector. Next you’ll learn about a special CSS rule that overrides both cascade and specificity.

      Using the !important Rule

      In this last section you will learn about the CSS !important rule and write an example of how to use it. This example uses a fictional scenario where you would not have control over the HTML and therefore must fix the problem only using CSS.

      Although CSS will often work with the cascade and have good specificity, there are times when a style needs to be forced. This is done by adding an !important flag at the end of a property value, before the semicolon. This is not a rule to be used lightly, and when used it is a good practice to include a code comment explaining the reason for using !important.

      Warning: Due to how the !important rule works, it should be used only if all other methods fail. Using the rule overrides the value on all matching elements, which makes it difficult or impossible to override further. This will make your code less legible to other developers.

      To see how this works, open up index.html in your editor and add a style attribute with a color set to red:

      index.html

      <!doctype html>
      <html>
          <head>
              <title>Sammy Shark</title>
              <link href="https://www.digitalocean.com/community/tutorials/styles.css" rel="stylesheet" />
          </head>
          <body>
              <div>Sammy is the <strong style="color: red;">greatest</strong> shark in the ocean.</div>
              <div>They like to swim with their friends.</div>
          </body>
      </html>
      

      Load index.html in your browser and you will find that the style attribute overrides the blue color with red, since a style attribute has higher specificity than the CSS selector. What is in the browser will look similar to the following image:

      The text in the sentence is green with sans serif, except for the text in the <strong> tag, which is red and bold.

      When working with websites it is common to have Javascript loaded that may apply inline styles like this. Elements with style attributes are at the bottom of the cascade, meaning that even with styles that turn all strong tags blue, this one will be red. In a scenario where Javascript creates the style attribute, it cannot be removed from the HTML.

      To force a style override, open up styles.css in your editor and after the blue property value in your strong selector, add !important:

      styles.css

      ...
      strong {
          color: blue !important;
      }
      

      Now return to your browser and refresh index.html. You will see the blue color again, as in the following image:

      The text in the sentence is green with sans serif font, except for the text in the <strong> tag, which is blue and bold.

      Despite the style attribute defining the color as red it is now blue, thanks to the !important rule telling the browser that this is the more important style to use. It is helpful to add a CSS code comment explaining the reason for the !important so future developers or future you understand why you are using it.

      styles.css

      ...
      strong {
          /* !imporant used here because of JavaScript applying a style attribute on the selector */
          color: blue !important;
      }
      

      In this section you learned about the !important rule and used it in a real-world scenario. You also learned that that the !important rule is a dangerous tool that should be used intentionally because of how drastic it overrides cascade and specificity. Additionally, you wrote a CSS comment, which is used to inform future developers looking at your code as well as a reminder to you when you return to your code later.

      Conclusion

      CSS is a versatile language made for manipulating and styling HTML. In this tutorial you styled HTML elements through various methods of applying styles. You now have the foundation to begin writing your own styles. If you want to dive further into understanding CSS and how it works, the World Wide Web Consortium (W3C), the governing body for CSS, provides all kinds of information about the language.

      If you would like to see more tutorials on CSS, check out our CSS topic page.



      Source link

      Understanding This, Bind, Call, and Apply in JavaScript


      The author selected the Open Internet/Free Speech Fund to receive a donation as part of the Write for DOnations program.

      The this keyword is a very important concept in JavaScript, and also a particularly confusing one to both new developers and those who have experience in other programming languages. In JavaScript, this is a reference to an object. The object that this refers to can vary, implicitly based on whether it is global, on an object, or in a constructor, and can also vary explicitly based on usage of the Function prototype methods bind, call, and apply.

      Although this is a bit of a complex topic, it is also one that appears as soon as you begin writing your first JavaScript programs. Whether you’re trying to access an element or event in the Document Object Model (DOM), building classes for writing in the object-oriented programming style, or using the properties and methods of regular objects, you will encounter this.

      In this article, you’ll learn what this refers to implicitly based on context, and you’ll learn how to use the bind, call, and apply methods to explicitly determine the value of this.

      Implicit Context

      There are four main contexts in which the value of this can be implicitly inferred:

      • the global context
      • as a method within an object
      • as a constructor on a function or class
      • as a DOM event handler

      Global

      In the global context, this refers to the global object. When you’re working in a browser, the global context is would be window. When you’re working in Node.js, the global context is global.

      Note: If you are not yet familiar with the concept of scope in JavaScript, please review Understanding Variables, Scope, and Hoisting in JavaScript.

      For the examples, you will practice the code in the browser’s Developer Tools console. Read How to Use the JavaScript Developer Console if you are not familiar with running JavaScript code in the browser.

      If you log the value of this without any other code, you will see what object this refers to.

      console.log(this)
      

      Output

      Window {postMessage: ƒ, blur: ƒ, focus: ƒ, close: ƒ, parent: Window, …}

      You can see that this is window, which is the global object of a browser.

      In Understanding Variables, Scope, and Hoisting in JavaScript, you learned that functions have their own context for variables. You might be tempted to think that this would follow the same rules inside a function, but it does not. A top-level function will still retain the this reference of the global object.

      You write a top-level function, or a function that is not associated with any object, like this:

      function printThis() {
        console.log(this)
      }
      
      printThis()
      

      Output

      Window {postMessage: ƒ, blur: ƒ, focus: ƒ, close: ƒ, parent: Window, …}

      Even within a function, this still refers to the window, or global object.

      However, when using strict mode, the context of this within a function on the global context will be undefined.

      'use strict'
      
      function printThis() {
        console.log(this)
      }
      
      printThis()
      

      Output

      undefined

      Generally, it is safer to use strict mode to reduce the probability of this having an unexpected scope. Rarely will someone want to refer to the window object using this.

      For more information about strict mode and what changes it makes regarding mistakes and security, read the Strict mode documentation on MDN.

      An Object Method

      A method is a function on an object, or a task that an object can perform. A method uses this to refer to the properties of the object.

      const america = {
        name: 'The United States of America',
        yearFounded: 1776,
      
        describe() {
          console.log(`${this.name} was founded in ${this.yearFounded}.`)
        },
      }
      
      america.describe()
      

      Output

      "The United States of America was founded in 1776."

      In this example, this is the same as america.

      In a nested object, this refers to the current object scope of the method. In the following example, this.symbol within the details object refers to details.symbol.

      const america = {
        name: 'The United States of America',
        yearFounded: 1776,
        details: {
          symbol: 'eagle',
          currency: 'USD',
          printDetails() {
            console.log(`The symbol is the ${this.symbol} and the currency is ${this.currency}.`)
          },
        },
      }
      
      america.details.printDetails()
      

      Output

      "The symbol is the eagle and the currency is USD."

      Another way of thinking about it is that this refers to the object on the left side of the dot when calling a method.

      A Function Constructor

      When you use the new keyword, it creates an instance of a constructor function or class. Function constructors were the standard way to initialize a user-defined object before the class syntax was introduced in the ECMAScript 2015 update to JavaScript. In Understanding Classes in JavaScript, you will learn how to create a function constructor and an equivalent class constructor.

      function Country(name, yearFounded) {
        this.name = name
        this.yearFounded = yearFounded
      
        this.describe = function() {
          console.log(`${this.name} was founded in ${this.yearFounded}.`)
        }
      }
      
      const america = new Country('The United States of America', 1776)
      
      america.describe()
      

      Output

      "The United States of America was founded in 1776."

      In this context, this is now bound to the instance of Country, which is contained in the america constant.

      A Class Constructor

      A constructor on a class acts the same as a constructor on a function. Read more about the similarities and differences between function constructors and ES6 classes in Understanding Classes in JavaScript.

      class Country {
        constructor(name, yearFounded) {
          this.name = name
          this.yearFounded = yearFounded
        }
      
        describe() {
          console.log(`${this.name} was founded in ${this.yearFounded}.`)
        }
      }
      
      const america = new Country('The United States of America', 1776)
      
      america.describe()
      

      this in the describe method refers to the instance of Country, which is america.

      Output

      "The United States of America was founded in 1776."

      A DOM Event Handler

      In the browser, there is a special this context for event handlers. In an event handler called by addEventListener, this will refer to event.currentTarget. More often than not, developers will simply use event.target or event.currentTarget as needed to access elements in the DOM, but since the this reference changes in this context, it is important to know.

      In the following example, we’ll create a button, add text to it, and append it to the DOM. When we log the value of this within the event handler, it will print the target.

      const button = document.createElement('button')
      button.textContent = 'Click me'
      document.body.append(button)
      
      button.addEventListener('click', function(event) {
        console.log(this)
      })
      

      Output

      <button>Click me</button>

      Once you paste this into your browser, you will see a button appended to the page that says “Click me”. If you click the button, you will see <button>Click me</button> appear in your console, as clicking the button logs the element, which is the button itself. Therefore, as you can see, this refers to the targeted element, which is the element we added an event listener to.

      Explicit Context

      In all of the previous examples, the value of this was determined by its context—whether it is global, in an object, in a constructed function or class, or on a DOM event handler. However, using call, apply, or bind, you can explicitly determine what this should refer to.

      It is difficult to define exactly when to use call, apply, or bind, as it will depend on the context of your program. bind can be particularly helpful when you want to use events to access properties of one class within another class. For example, if you were to write a simple game, you might separate the user interface and I/O into one class, and the game logic and state into another. Since the game logic would need to access input, such as key press and click, you would want to bind the events to access the this value of the game logic class.

      The important part is to know how to determine what object this refers to, which you can do implicitly with what you learned in the previous sections, or explicitly with the three methods you will learn next.

      Call and Apply

      call and apply are very similar—they invoke a function with a specified this context, and optional arguments. The only difference between call and apply is that call requires the arguments to be passed in one-by-one, and apply takes the arguments as an array.

      In this example, we’ll create an object, and create a function that references this but has no this context.

      const book = {
        title: 'Brave New World',
        author: 'Aldous Huxley',
      }
      
      function summary() {
        console.log(`${this.title} was written by ${this.author}.`)
      }
      
      summary()
      

      Output

      "undefined was written by undefined"

      Since summary and book have no connection, invoking summary by itself will only print undefined, as it’s looking for those properties on the global object.

      Note: Attempting this in strict mode would result in Uncaught TypeError: Cannot read property 'title' of undefined, as this itself would be undefined.

      However, you can use call and apply to invoke the this context of book on the function.

      summary.call(book)
      // or:
      summary.apply(book)
      

      Output

      "Brave New World was written by Aldous Huxley."

      There is now a connection between book and summary when these methods are applied. Let’s confirm exactly what this is.

      function printThis() {
        console.log(this)
      }
      
      printThis.call(book)
      // or:
      whatIsThis.apply(book)
      

      Output

      {title: "Brave New World", author: "Aldous Huxley"}

      In this case, this actually becomes the object passed as an argument.

      This is how call and apply are the same, but there is one small difference. In addition to being able to pass the this context as the first argument, you can also pass additional arguments through.

      function longerSummary(genre, year) {
        console.log(
          `${this.title} was written by ${this.author}. It is a ${genre} novel written in ${year}.`
        )
      }
      

      With call each additional value you want to pass is sent as an additional argument.

      longerSummary.call(book, 'dystopian', 1932)
      

      Output

      "Brave New World was written by Aldous Huxley. It is a dystopian novel written in 1932."

      If you try to send the exact same arguments with apply, this is what happens:

      longerSummary.apply(book, 'dystopian', 1932)
      

      Output

      Uncaught TypeError: CreateListFromArrayLike called on non-object at <anonymous>:1:15

      Instead, for apply, you have to pass all the arguments in an array.

      longerSummary.apply(book, ['dystopian', 1932])
      

      Output

      "Brave New World was written by Aldous Huxley. It is a dystopian novel written in 1932."

      The difference between passing the arguments individually or in an array is subtle, but it’s important to be aware of. It might be simpler and more convenient to use apply, as it would not require changing the function call if some parameter details changed.

      Bind

      Both call and apply are one-time use methods—if you call the method with the this context it will have it, but the original function will remain unchanged.

      Sometimes, you might need to use a method over and over with the this context of another object, and in that case you could use the bind method to create a brand new function with an explicitly bound this.

      const braveNewWorldSummary = summary.bind(book)
      
      braveNewWorldSummary()
      

      Output

      "Brave New World was written by Aldous Huxley"

      In this example, every time you call braveNewWorldSummary, it will always return the original this value bound to it. Attempting to bind a new this context to it will fail, so you can always trust a bound function to return the this value you expect.

      const braveNewWorldSummary = summary.bind(book)
      
      braveNewWorldSummary() // Brave New World was written by Aldous Huxley.
      
      const book2 = {
        title: '1984',
        author: 'George Orwell',
      }
      
      braveNewWorldSummary.bind(book2)
      
      braveNewWorldSummary() // Brave New World was written by Aldous Huxley.
      

      Although this example tries to bind braveNewWorldSummary once again, it retains the original this context from the first time it was bound.

      Arrow Functions

      Arrow functions do not have their own this binding. Instead, they go up to the next level of execution.

      const whoAmI = {
        name: 'Leslie Knope',
        regularFunction: function() {
          console.log(this.name)
        },
        arrowFunction: () => {
          console.log(this.name)
        },
      }
      
      whoAmI.regularFunction() // "Leslie Knope"
      whoAmI.arrowFunction() // undefined
      

      It can be useful to use the arrow function in cases where you really want this to refer to the outer context. For example, if you had an event listener inside of a class, you would probably want this to refer to some value in the class.

      In this example, you’ll create and append button to the DOM like before, but the class will have an event listener that will change the text value of the button when clicked.

      const button = document.createElement('button')
      button.textContent = 'Click me'
      document.body.append(button)
      
      class Display {
        constructor() {
          this.buttonText = 'New text'
      
          button.addEventListener('click', event => {
            event.target.textContent = this.buttonText
          })
        }
      }
      
      new Display()
      

      If you click the button, the text content will change to the value of buttonText. If you hadn’t used an arrow function here, this would be equal to event.currentTarget, and you wouldn’t be able to use it to access a value within the class without explicitly binding it. This tactic is often used on class methods in frameworks like React.

      Conclusion

      In this article, you learned about this in JavaScript, and the many different values it might have based on implicit runtime binding, and explicit binding through bind, call, and apply. You also learned about how the lack of this binding in arrow functions can be used to refer to a different context. With this knowledge, you should be able to determine the value of this in your programs.



      Source link

      How To Apply Computer Vision to Build an Emotion-Based Dog Filter in Python 3


      The author selected Girls Who Code to receive a donation as part of the Write for DOnations program.

      Introduction

      Computer vision is a subfield of computer science that aims to extract a higher-order understanding from images and videos. This field includes tasks such as object detection, image restoration (matrix completion), and optical flow. Computer vision powers technologies such as self-driving car prototypes, employee-less grocery stores, fun Snapchat filters, and your mobile device’s face authenticator.

      In this tutorial, you will explore computer vision as you use pre-trained models to build a Snapchat-esque dog filter. For those unfamiliar with Snapchat, this filter will detect your face and then superimpose a dog mask on it. You will then train a face-emotion classifier so that the filter can pick dog masks based on emotion, such as a corgi for happy or a pug for sad. Along the way, you will also explore related concepts in both ordinary least squares and computer vision, which will expose you to the fundamentals of machine learning.

      A working dog filter

      As you work through the tutorial, you’ll use OpenCV, a computer-vision library, numpy for linear algebra utilities, and matplotlib for plotting. You’ll also apply the following concepts as you build a computer-vision application:

      • Ordinary least squares as a regression and classification technique.
      • The basics of stochastic gradient neural networks.

      While not necessary to complete this tutorial, you’ll find it easier to understand some of the more detailed explanations if you’re familiar with these mathematical concepts:

      • Fundamental linear algebra concepts: scalars, vectors, and matrices.
      • Fundamental calculus: how to take a derivative.

      You can find the complete code for this tutorial at https://github.com/do-community/emotion-based-dog-filter.

      Let’s get started.

      Prerequisites

      To complete this tutorial, you will need the following:

      Step 1 — Creating The Project and Installing Dependencies

      Let’s create a workspace for this project and install the dependencies we’ll need. We’ll call our workspace DogFilter:

      Navigate to the DogFilter directory:

      Then create a new Python virtual environment for the project:

      • python3 -m venv dogfilter

      Activate your environment.

      • source dogfilter/bin/activate

      The prompt changes, indicating the environment is active. Now install PyTorch, a deep-learning framework for Python that we'll use in this tutorial. The installation process depends on which operating system you're using.

      On macOS, install Pytorch with the following command:

      • python -m pip install torch==0.4.1 torchvision==0.2.1

      On Linux, use the following commands:

      • pip install http://download.pytorch.org/whl/cpu/torch-0.4.1-cp35-cp35m-linux_x86_64.whl
      • pip install torchvision

      And for Windows, install Pytorch with these commands:

      • pip install http://download.pytorch.org/whl/cpu/torch-0.4.1-cp35-cp35m-win_amd64.whl
      • pip install torchvision

      Now install prepackaged binaries for OpenCV and numpy, which are computer vision and linear algebra libraries, respectively. The former offers utilities such as image rotations, and the latter offers linear algebra utilities such as a matrix inversion.

      • python -m pip install opencv-python==3.4.3.18 numpy==1.14.5

      Finally, create a directory for our assets, which will hold the images we'll use in this tutorial:

      With the dependencies installed, let's build the first version of our filter: a face detector.

      Step 2 — Building a Face Detector

      Our first objective is to detect all faces in an image. We'll create a script that accepts a single image and outputs an annotated image with the faces outlined with boxes.

      Fortunately, instead of writing our own face detection logic, we can use pre-trained models. We'll set up a model and then load pre-trained parameters. OpenCV makes this easy by providing both.

      OpenCV provides the model parameters in its source code. but we need the absolute path to our locally-installed OpenCV to use these parameters. Since that absolute path may vary, we'll download our own copy instead and place it in the assets folder:

      • wget -O assets/haarcascade_frontalface_default.xml https://github.com/opencv/opencv/raw/master/data/haarcascades/haarcascade_frontalface_default.xml

      The -O option specifies the destination as assets/haarcascade_frontalface_default.xml. The second argument is the source URL.

      We'll detect all faces in the following image from Pexels (CC0, link to original image).

      Picture of children

      First, download the image. The following command saves the downloaded image as children.png in the assets folder:

      • wget -O assets/children.png https://www.xpresservers.com/wp-content/uploads/2019/04/How-To-Apply-Computer-Vision-to-Build-an-Emotion-Based-Dog-Filter-in-Python-3.png

      To check that the detection algorithm works, we will run it on an individual image and save the resulting annotated image to disk. Create an outputs folder for these annotated results.

      Now create a Python script for the face detector. Create the file step_1_face_detect using nano or your favorite text editor:

      • nano step_2_face_detect.py

      Add the following code to the file. This code imports OpenCV, which contains the image utilities and face classifier. The rest of the code is typical Python program boilerplate.

      step_2_face_detect.py

      """Test for face detection"""
      
      import cv2
      
      
      def main():
          pass
      
      if __name__ == '__main__':
          main()
      

      Now replace pass in the main function with this code which initializes a face classifier using the OpenCV parameters you downloaded to your assets folder:

      step_2_face_detect.py

      def main():
          # initialize front face classifier
          cascade = cv2.CascadeClassifier("assets/haarcascade_frontalface_default.xml")
      

      Next, add this line to load the image children.png.

      step_2_face_detect.py

          frame = cv2.imread('assets/children.png')
      

      Then add this code to convert the image to black and white, as the classifier was trained on black-and-white images. To accomplish this, we convert to grayscale and then discretize the histogram:

      step_2_face_detect.py

          # Convert to black-and-white
          gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
          blackwhite = cv2.equalizeHist(gray)
      

      Then use OpenCV's detectMultiScale function to detect all faces in the image.

      step_2_face_detect.py

          rects = cascade.detectMultiScale(
              blackwhite, scaleFactor=1.3, minNeighbors=4, minSize=(30, 30),
              flags=cv2.CASCADE_SCALE_IMAGE)
      
      • scaleFactor specifies how much the image is reduced along each dimension.
      • minNeighbors denotes how many neighboring rectangles a candidate rectangle needs to be retained.
      • minSize is the minimum allowable detected object size. Objects smaller than this are discarded.

      The return type is a list of tuples, where each tuple has four numbers denoting the minimum x, minimum y, width, and height of the rectangle in that order.

      Iterate over all detected objects and draw them on the image in green using cv2.rectangle:

      step_2_face_detect.py

          for x, y, w, h in rects:
              cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
      
      • The second and third arguments are opposing corners of the rectangle.
      • The fourth argument is the color to use. (0, 255, 0) corresponds to green for our RGB color space.
      • The last argument denotes the width of our line.

      Finally, write the image with bounding boxes into a new file at outputs/children_detected.png:

      step_2_face_detect.py

          cv2.imwrite('outputs/children_detected.png', frame)
      

      Your completed script should look like this:

      step_2_face_detect.py

      """Tests face detection for a static image."""  
      
      import cv2  
      
      
      def main():  
      
          # initialize front face classifier  
          cascade = cv2.CascadeClassifier(  
              "assets/haarcascade_frontalface_default.xml")  
      
          frame = cv2.imread('assets/children.png')  
      
          # Convert to black-and-white  
          gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)  
          blackwhite = cv2.equalizeHist(gray)  
      
          rects = cascade.detectMultiScale(  
              blackwhite, scaleFactor=1.3, minNeighbors=4, minSize=(30, 30),  
          flags=cv2.CASCADE_SCALE_IMAGE)  
      
          for x, y, w, h in rects:  
              cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)  
      
          cv2.imwrite('outputs/children_detected.png', frame)  
      
      if __name__ == '__main__':  
          main()
      

      Save the file and exit your editor. Then run the script:

      • python step_2_face_detect.py

      Open outputs/children_detected.png. You'll see the following image that shows the faces outlined with boxes:

      Picture of children with bounding boxes

      At this point, you have a working face detector. It accepts an image as input and draws bounding boxes around all faces in the image, outputting the annotated image. Now let's apply this same detection to a live camera feed.

      Step 3 — Linking the Camera Feed

      The next objective is to link the computer's camera to the face detector. Instead of detecting faces in a static image, you'll detect all faces from your computer's camera. You will collect camera input, detect and annotate all faces, and then display the annotated image back to the user. You'll continue from the script in Step 2, so start by duplicating that script:

      • cp step_2_face_detect.py step_3_camera_face_detect.py

      Then open the new script in your editor:

      • nano step_3_camera_face_detect.py

      You will update the main function by using some elements from this test script from the official OpenCV documentation. Start by initializing a VideoCapture object that is set to capture live feed from your computer's camera. Place this at the start of the main function, before the other code in the function:

      step_3_camera_face_detect.py

      def main():
          cap = cv2.VideoCapture(0)
          ...
      

      Starting from the line defining frame, indent all of your existing code, placing all of the code in a while loop.

      step_3_camera_face_detect.py

          while True:
              frame = cv2.imread('assets/children.png')
              ...
              for x, y, w, h in rects:  
                  cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)  
      
              cv2.imwrite('outputs/children_detected.png', frame)
      

      Replace the line defining frame at the start of the while loop. Instead of reading from an image on disk, you're now reading from the camera:

      step_3_camera_face_detect.py

          while True:
              # frame = cv2.imread('assets/children.png') # DELETE ME
              # Capture frame-by-frame
              ret, frame = cap.read()
      

      Replace the line cv2.imwrite(...) at the end of the while loop. Instead of writing an image to disk, you'll display the annotated image back to the user's screen:

      step_3_camera_face_detect.py

            cv2.imwrite('outputs/children_detected.png', frame)  # DELETE ME
            # Display the resulting frame
            cv2.imshow('frame', frame)
      

      Also, add some code to watch for keyboard input so you can stop the program. Check if the user hits the q character and, if so, quit the application. Right after cv2.imshow(...) add the following:

      step_3_camera_face_detect.py

      ...
              cv2.imshow('frame', frame)
              if cv2.waitKey(1) & 0xFF == ord('q'):
                  break
      ...
      

      The line cv2.waitkey(1) halts the program for 1 millisecond so that the captured image can be displayed back to the user.

      Finally, release the capture and close all windows. Place this outside of the while loop to end the main function.

      step_3_camera_face_detect.py

      ...
      
          while True:
          ...
      
      
          cap.release()
          cv2.destroyAllWindows()
      

      Your script should look like the following:

      step_3_camera_face_detect.py

      """Test for face detection on video camera.
      
      Move your face around and a green box will identify your face.
      With the test frame in focus, hit `q` to exit.
      Note that typing `q` into your terminal will do nothing.
      """
      
      import cv2
      
      
      def main():
          cap = cv2.VideoCapture(0)
      
          # initialize front face classifier
          cascade = cv2.CascadeClassifier(
              "assets/haarcascade_frontalface_default.xml")
      
          while True:
              # Capture frame-by-frame
              ret, frame = cap.read()
      
              # Convert to black-and-white
              gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
              blackwhite = cv2.equalizeHist(gray)
      
              # Detect faces
              rects = cascade.detectMultiScale(
                  blackwhite, scaleFactor=1.3, minNeighbors=4, minSize=(30, 30),
                  flags=cv2.CASCADE_SCALE_IMAGE)
      
              # Add all bounding boxes to the image
              for x, y, w, h in rects:
                  cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
      
              # Display the resulting frame
              cv2.imshow('frame', frame)
              if cv2.waitKey(1) & 0xFF == ord('q'):
                  break
      
          # When everything done, release the capture
          cap.release()
          cv2.destroyAllWindows()
      
      
      if __name__ == '__main__':
          main()
      

      Save the file and exit your editor.

      Now run the test script.

      • python step_3_camera_face_detect.py

      This activates your camera and opens a window displaying your camera's feed. Your face will be boxed by a green square in real time:

      Working face detector

      Note: If you find that you have to hold very still for things to work, the lighting in the room may not be adequate. Try moving to a brightly lit room where you and your background have high constrast. Also, avoid bright lights near your head. For example, if you have your back to the sun, this process might not work very well.

      Our next objective is to take the detected faces and superimpose dog masks on each one.

      Step 4 — Building the Dog Filter

      Before we build the filter itself, let's explore how images are represented numerically. This will give you the background needed to modify images and ultimately apply a dog filter.

      Let's look at an example. We can construct a black-and-white image using numbers, where 0 corresponds to black and 1 corresponds to white.

      Focus on the dividing line between 1s and 0s. What shape do you see?

      0 0 0 0 0 0 0 0 0
      0 0 0 0 1 0 0 0 0
      0 0 0 1 1 1 0 0 0
      0 0 1 1 1 1 1 0 0
      0 0 0 1 1 1 0 0 0
      0 0 0 0 1 0 0 0 0
      0 0 0 0 0 0 0 0 0
      

      The image is a diamond. If save this matrix of values as an image. This gives us the following picture:

      Diamond as picture

      We can use any value between 0 and 1, such as 0.1, 0.26, or 0.74391. Numbers closer to 0 are darker and numbers closer to 1 are lighter. This allows us to represent white, black, and any shade of gray. This is great news for us because we can now construct any grayscale image using 0, 1, and any value in between. Consider the following, for example. Can you tell what it is? Again, each number corresponds to the color of a pixel.

      1  1  1  1  1  1  1  1  1  1  1  1
      1  1  1  1  0  0  0  0  1  1  1  1
      1  1  0  0 .4 .4 .4 .4  0  0  1  1
      1  0 .4 .4 .5 .4 .4 .4 .4 .4  0  1
      1  0 .4 .5 .5 .5 .4 .4 .4 .4  0  1
      0 .4 .4 .4 .5 .4 .4 .4 .4 .4 .4  0
      0 .4 .4 .4 .4  0  0 .4 .4 .4 .4  0
      0  0 .4 .4  0  1 .7  0 .4 .4  0  0
      0  1  0  0  0 .7 .7  0  0  0  1  0
      1  0  1  1  1  0  0 .7 .7 .4  0  1
      1  0 .7  1  1  1 .7 .7 .7 .7  0  1
      1  1  0  0 .7 .7 .7 .7  0  0  1  1
      1  1  1  1  0  0  0  0  1  1  1  1
      1  1  1  1  1  1  1  1  1  1  1  1
      

      Re-rendered as an image, you can now tell that this is, in fact, a Poké Ball:

      Pokeball as picture

      You've now seen how black-and-white and grayscale images are represented numerically. To introduce color, we need a way to encode more information. An image has its height and width expressed as h x w.

      Image

      In the current grayscale representation, each pixel is one value between 0 and 1. We can equivalently say our image has dimensions h x w x 1. In other words, every (x, y) position in our image has just one value.

      Grayscale image

      For a color representation, we represent the color of each pixel using three values between 0 and 1. One number corresponds to the "degree of red," one to the "degree of green," and the last to the "degree of blue." We call this the RGB color space. This means that for every (x, y) position in our image, we have three values (r, g, b). As a result, our image is now h x w x 3:

      Color image

      Here, each number ranges from 0 to 255 instead of 0 to 1, but the idea is the same. Different combinations of numbers correspond to different colors, such as dark purple (102, 0, 204) or bright orange (255, 153, 51). The takeaways are as follows:

      1. Each image will be represented as a box of numbers that has three dimensions: height, width, and color channels. Manipulating this box of numbers directly is equivalent to manipulating the image.
      2. We can also flatten this box to become just a list of numbers. In this way, our image becomes a vector. Later on, we will refer to images as vectors.

      Now that you understand how images are represented numerically, you are well-equipped to begin applying dog masks to faces. To apply a dog mask, you will replace values in the child image with non-white dog mask pixels. To start, you will work with a single image. Download this crop of a face from the image you used in Step 2.

      • wget -O assets/child.png https://www.xpresservers.com/wp-content/uploads/2019/04/1554419826_451_How-To-Apply-Computer-Vision-to-Build-an-Emotion-Based-Dog-Filter-in-Python-3.png

      Cropped face

      Additionally, download the following dog mask. The dog masks used in this tutorial are my own drawings, now released to the public domain under a CC0 License.

      Dog mask

      Download this with wget:

      • wget -O assets/dog.png https://www.xpresservers.com/wp-content/uploads/2019/04/1554419826_685_How-To-Apply-Computer-Vision-to-Build-an-Emotion-Based-Dog-Filter-in-Python-3.png

      Create a new file called step_4_dog_mask_simple.py which will hold the code for the script that applies the dog mask to faces:

      • nano step_4_dog_mask_simple.py

      Add the following boilerplate for the Python script and import the OpenCV and numpy libraries:

      step_4_dog_mask_simple.py

      """Test for adding dog mask"""
      
      import cv2
      import numpy as np
      
      
      def main():
          pass
      
      if __name__ == '__main__':
          main()
      

      Replace pass in the main function with these two lines which load the original image and the dog mask into memory.

      step_4_dog_mask_simple.py

      ...
      def main():
          face = cv2.imread('assets/child.png')
          mask = cv2.imread('assets/dog.png')
      

      Next, fit the dog mask to the child. The logic is more complicated than what we've done previously, so we will create a new function called apply_mask to modularize our code. Directly after the two lines that load the images, add this line which invokes the apply_mask function:

      step_4_dog_mask_simple.py

      ...
          face_with_mask = apply_mask(face, mask)
      

      Create a new function called apply_mask and place it above the main function:

      step_4_dog_mask_simple.py

      ...
      def apply_mask(face: np.array, mask: np.array) -> np.array:
          """Add the mask to the provided face, and return the face with mask."""
          pass
      
      def main():
      ...
      

      At this point, your file should look like this:

      step_4_dog_mask_simple.py

      """Test for adding dog mask"""
      
      import cv2
      import numpy as np
      
      
      def apply_mask(face: np.array, mask: np.array) -> np.array:
          """Add the mask to the provided face, and return the face with mask."""
          pass
      
      
      def main():
          face = cv2.imread('assets/child.png')
          mask = cv2.imread('assets/dog.png')
          face_with_mask = apply_mask(face, mask)
      
      if __name__ == '__main__':
          main()
      

      Let's build out the apply_mask function. Our goal is to apply the mask to the child's face. However, we need to maintain the aspect ratio for our dog mask. To do so, we need to explicitly compute our dog mask's final dimensions. Inside the apply_mask function, replace pass with these two lines which extract the height and width of both images:

      step_4_dog_mask_simple.py

      ...
          mask_h, mask_w, _ = mask.shape
          face_h, face_w, _ = face.shape
      

      Next, determine which dimension needs to be "shrunk more." To be precise, we need the tighter of the two constraints. Add this line to the apply_mask function:

      step_4_dog_mask_simple.py

      ...
      
          # Resize the mask to fit on face
          factor = min(face_h / mask_h, face_w / mask_w)
      

      Then compute the new shape by adding this code to the function:

      step_4_dog_mask_simple.py

      ...
          new_mask_w = int(factor * mask_w)
          new_mask_h = int(factor * mask_h)
          new_mask_shape = (new_mask_w, new_mask_h)
      

      Here we cast the numbers to integers, as the resize function needs integral dimensions.

      Now add this code to resize the dog mask to the new shape:

      step_4_dog_mask_simple.py

      ...
      
          # Add mask to face - ensure mask is centered
          resized_mask = cv2.resize(mask, new_mask_shape)
      

      Finally, write the image to disk so you can double-check that your resized dog mask is correct after you run the script:

      step_4_dog_mask_simple.py

          cv2.imwrite('outputs/resized_dog.png', resized_mask)
      

      The completed script should look like this:

      step_4_dog_mask_simple.py

      """Test for adding dog mask"""
      import cv2
      import numpy as np
      
      def apply_mask(face: np.array, mask: np.array) -> np.array:
          """Add the mask to the provided face, and return the face with mask."""
          mask_h, mask_w, _ = mask.shape
          face_h, face_w, _ = face.shape
      
          # Resize the mask to fit on face
          factor = min(face_h / mask_h, face_w / mask_w)
          new_mask_w = int(factor * mask_w)
          new_mask_h = int(factor * mask_h)
          new_mask_shape = (new_mask_w, new_mask_h)
      
          # Add mask to face - ensure mask is centered
          resized_mask = cv2.resize(mask, new_mask_shape)
          cv2.imwrite('outputs/resized_dog.png', resized_mask)
      
      
      def main():
          face = cv2.imread('assets/child.png')
          mask = cv2.imread('assets/dog.png')
          face_with_mask = apply_mask(face, mask)
      
      if __name__ == '__main__':
          main()
      
      

      Save the file and exit your editor. Run the new script:

      • python step_4_dog_mask_simple.py

      Open the image at outputs/resized_dog.png to double-check the mask was resized correctly. It will match the dog mask shown earlier in this section.

      Now add the dog mask to the child. Open the step_4_dog_mask_simple.py file again and return to the apply_mask function:

      • nano step_4_dog_mask_simple.py

      First, remove the line of code that writes the resized mask from the apply_mask function since you no longer need it:

          cv2.imwrite('outputs/resized_dog.png', resized_mask)  # delete this line
          ...
      

      In its place, apply your knowledge of image representation from the start of this section to modify the image. Start by making a copy of the child image. Add this line to the apply_mask function:

      step_4_dog_mask_simple.py

      ...
          face_with_mask = face.copy()
      

      Next, find all positions where the dog mask is not white or near white. To do this, check if the pixel value is less than 250 across all color channels, as we'd expect a near-white pixel to be near [255, 255, 255]. Add this code:

      step_4_dog_mask_simple.py

      ...
          non_white_pixels = (resized_mask < 250).all(axis=2)
      

      At this point, the dog image is, at most, as large as the child image. We want to center the dog image on the face, so compute the offset needed to center the dog image by adding this code to apply_mask:

      step_4_dog_mask_simple.py

      ...
          off_h = int((face_h - new_mask_h) / 2)  
          off_w = int((face_w - new_mask_w) / 2)
      

      Copy all non-white pixels from the dog image into the child image. Since the child image may be larger than the dog image, we need to take a subset of the child image:

      step_4_dog_mask_simple.py

          face_with_mask[off_h: off_h+new_mask_h, off_w: off_w+new_mask_w][non_white_pixels] = 
                  resized_mask[non_white_pixels]
      

      Then return the result:

      step_4_dog_mask_simple.py

          return face_with_mask
      

      In the main function, add this code to write the result of the apply_mask function to an output image so you can manually double-check the result:

      step_4_dog_mask_simple.py

      ...
          face_with_mask = apply_mask(face, mask)
          cv2.imwrite('outputs/child_with_dog_mask.png', face_with_mask)
      

      Your completed script will look like the following:

      step_4_dog_mask_simple.py

      """Test for adding dog mask"""
      
      import cv2
      import numpy as np
      
      
      def apply_mask(face: np.array, mask: np.array) -> np.array:
          """Add the mask to the provided face, and return the face with mask."""
          mask_h, mask_w, _ = mask.shape
          face_h, face_w, _ = face.shape
      
          # Resize the mask to fit on face
          factor = min(face_h / mask_h, face_w / mask_w)
          new_mask_w = int(factor * mask_w)
          new_mask_h = int(factor * mask_h)
          new_mask_shape = (new_mask_w, new_mask_h)
          resized_mask = cv2.resize(mask, new_mask_shape)
      
          # Add mask to face - ensure mask is centered
          face_with_mask = face.copy()
          non_white_pixels = (resized_mask < 250).all(axis=2)
          off_h = int((face_h - new_mask_h) / 2)  
          off_w = int((face_w - new_mask_w) / 2)
          face_with_mask[off_h: off_h+new_mask_h, off_w: off_w+new_mask_w][non_white_pixels] = 
               resized_mask[non_white_pixels]
      
          return face_with_mask
      
      def main():
          face = cv2.imread('assets/child.png')
          mask = cv2.imread('assets/dog.png')
          face_with_mask = apply_mask(face, mask)
          cv2.imwrite('outputs/child_with_dog_mask.png', face_with_mask)
      
      if __name__ == '__main__':
          main()
      

      Save the script and run it:

      • python step_4_dog_mask_simple.py

      You'll have the following picture of a child with a dog mask in outputs/child_with_dog_mask.png:

      Picture of child with dog mask on

      You now have a utility that applies dog masks to faces. Now let's use what you've built to add the dog mask in real time.

      We'll pick up from where we left off in Step 3. Copy step_3_camera_face_detect.py to step_4_dog_mask.py.

      • cp step_3_camera_face_detect.py step_4_dog_mask.py

      Open your new script.

      First, import the NumPy library at the top of the script:

      step_4_dog_mask.py

      import numpy as np
      ...
      

      Then add the apply_mask function from your previous work into this new file above the main function:

      step_4_dog_mask.py

      def apply_mask(face: np.array, mask: np.array) -> np.array:
          """Add the mask to the provided face, and return the face with mask."""
          mask_h, mask_w, _ = mask.shape
          face_h, face_w, _ = face.shape
      
          # Resize the mask to fit on face
          factor = min(face_h / mask_h, face_w / mask_w)
          new_mask_w = int(factor * mask_w)
          new_mask_h = int(factor * mask_h)
          new_mask_shape = (new_mask_w, new_mask_h)
          resized_mask = cv2.resize(mask, new_mask_shape)
      
          # Add mask to face - ensure mask is centered
          face_with_mask = face.copy()
          non_white_pixels = (resized_mask < 250).all(axis=2)
          off_h = int((face_h - new_mask_h) / 2)  
          off_w = int((face_w - new_mask_w) / 2)
          face_with_mask[off_h: off_h+new_mask_h, off_w: off_w+new_mask_w][non_white_pixels] = 
               resized_mask[non_white_pixels]
      
          return face_with_mask
      ...
      

      Second, locate this line in the main function:

      step_4_dog_mask.py

          cap = cv2.VideoCapture(0)
      

      Add this code after that line to load the dog mask:

      step_4_dog_mask.py

          cap = cv2.VideoCapture(0)
      
          # load mask
          mask = cv2.imread('assets/dog.png')
          ...
      

      Next, in the while loop, locate this line:

      step_4_dog_mask.py

              ret, frame = cap.read()
      

      Add this line after it to extract the image's height and width:

      step_4_dog_mask.py

              ret, frame = cap.read()
              frame_h, frame_w, _ = frame.shape
              ...
      

      Next, delete the line in main that draws bounding boxes. You'll find this line in the for loop that iterates over detected faces:

      step_4_dog_mask.py

              for x, y, w, h in rects:
              ...
                  cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2) # DELETE ME
              ...
      

      In its place, add this code which crops the frame. For aesthetic purposes, we crop an area slightly larger than the face.

      step_4_dog_mask.py

              for x, y, w, h in rects:
                  # crop a frame slightly larger than the face
                  y0, y1 = int(y - 0.25*h), int(y + 0.75*h)
                  x0, x1 = x, x + w
      

      Introduce a check in case the detected face is too close to the edge.

      step_4_dog_mask.py

                  # give up if the cropped frame would be out-of-bounds
                  if x0 < 0 or y0 < 0 or x1 > frame_w or y1 > frame_h:
                      continue
      

      Finally, insert the face with a mask into the image.

      step_4_dog_mask.py

                  # apply mask
                  frame[y0: y1, x0: x1] = apply_mask(frame[y0: y1, x0: x1], mask)
      

      Verify that your script looks like this:

      step_4_dog_mask.py

      """Real-time dog filter
      
      Move your face around and a dog filter will be applied to your face if it is not out-of-bounds. With the test frame in focus, hit `q` to exit. Note that typing `q` into your terminal will do nothing.
      """
      
      import numpy as np
      import cv2
      
      
      def apply_mask(face: np.array, mask: np.array) -> np.array:
          """Add the mask to the provided face, and return the face with mask."""
          mask_h, mask_w, _ = mask.shape
          face_h, face_w, _ = face.shape
      
          # Resize the mask to fit on face
          factor = min(face_h / mask_h, face_w / mask_w)
          new_mask_w = int(factor * mask_w)
          new_mask_h = int(factor * mask_h)
          new_mask_shape = (new_mask_w, new_mask_h)
          resized_mask = cv2.resize(mask, new_mask_shape)
      
          # Add mask to face - ensure mask is centered
          face_with_mask = face.copy()
          non_white_pixels = (resized_mask < 250).all(axis=2)
          off_h = int((face_h - new_mask_h) / 2)
          off_w = int((face_w - new_mask_w) / 2)
          face_with_mask[off_h: off_h+new_mask_h, off_w: off_w+new_mask_w][non_white_pixels] = 
               resized_mask[non_white_pixels]
      
          return face_with_mask
      
      def main():
          cap = cv2.VideoCapture(0)
      
          # load mask
          mask = cv2.imread('assets/dog.png')
      
          # initialize front face classifier
          cascade = cv2.CascadeClassifier("assets/haarcascade_frontalface_default.xml")
      
          while(True):
              # Capture frame-by-frame
              ret, frame = cap.read()
              frame_h, frame_w, _ = frame.shape
      
              # Convert to black-and-white
              gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
              blackwhite = cv2.equalizeHist(gray)
      
              # Detect faces
              rects = cascade.detectMultiScale(
                  blackwhite, scaleFactor=1.3, minNeighbors=4, minSize=(30, 30),
                  flags=cv2.CASCADE_SCALE_IMAGE)
      
              # Add mask to faces
              for x, y, w, h in rects:
                  # crop a frame slightly larger than the face
                  y0, y1 = int(y - 0.25*h), int(y + 0.75*h)
                  x0, x1 = x, x + w
      
                  # give up if the cropped frame would be out-of-bounds
                  if x0 < 0 or y0 < 0 or x1 > frame_w or y1 > frame_h:
                      continue
      
                  # apply mask
                  frame[y0: y1, x0: x1] = apply_mask(frame[y0: y1, x0: x1], mask)
      
              # Display the resulting frame
              cv2.imshow('frame', frame)
              if cv2.waitKey(1) & 0xFF == ord('q'):
                  break
      
          # When everything done, release the capture
          cap.release()
          cv2.destroyAllWindows()
      
      
      if __name__ == '__main__':
          main()
      

      Save the file and exit your editor. Then run the script.

      • python step_4_dog_mask.py

      You now have a real-time dog filter running. The script will also work with multiple faces in the picture, so you can get your friends together for some automatic dog-ification.

      GIF for working dog filter

      This concludes our first primary objective in this tutorial, which is to create a Snapchat-esque dog filter. Now let's use facial expression to determine the dog mask applied to a face.

      Step 5 — Build a Basic Face Emotion Classifier using Least Squares

      In this section you'll create an emotion classifier to apply different masks based on displayed emotions. If you smile, the filter will apply a corgi mask. If you frown, it will apply a pug mask. Along the way, you'll explore the least-squares framework, which is fundamental to understanding and discussing machine learning concepts.

      To understand how to process our data and produce predictions, we'll first briefly explore machine learning models.

      We need to ask two questions for each model that we consider. For now, these two questions will be sufficient to differentiate between models:

      1. Input: What information is the model given?
      2. Output: What is the model trying to predict?

      At a high-level, the goal is to develop a model for emotion classification. The model is:

      1. Input: given images of faces.
      2. Output: predicts the corresponding emotion.
      model: face -> emotion
      

      The approach we'll use is least squares; we take a set of points, and we find a line of best fit. The line of best fit, shown in the following image, is our model.

      Least Squares

      Consider the input and output for our line:

      1. Input: given x coordinates.
      2. Output: predicts the corresponding $y$ coordinate.
      least squares line: x -> y
      

      Our input x must represent faces and our output y must represent emotion, in order for us to use least squares for emotion classification:

      • x -> face: Instead of using one number for x, we will use a vector of values for x. Thus, x can represent images of faces. The article Ordinary Least Squares explains why you can use a vector of values for x.
      • y -> emotion: Each emotion will correspond to a number. For example, "angry" is 0, "sad" is 1, and "happy" is 2. In this way, y can represent emotions. However, our line is not constrained to output the y values 0, 1, and 2. It has an infinite number of possible y values–it could be 1.2, 3.5, or 10003.42. How do we translate those y values to integers corresponding to classes? See the article One-Hot Encoding for more detail and explanation.

      Armed with this background knowledge, you will build a simple least-squares classifier using vectorized images and one-hot encoded labels. You'll accomplish this in three steps:

      1. Preprocess the data: As explained at the start of this section, our samples are vectors where each vector encodes an image of a face. Our labels are integers corresponding to an emotion, and we'll apply one-hot encoding to these labels.
      2. Specify and train the model: Use the closed-form least squares solution, w^*.
      3. Run a prediction using the model: Take the argmax of Xw^* to obtain predicted emotions.

      Let's get started.

      First, set up a directory to contain the data:

      Then download the data, curated by Pierre-Luc Carrier and Aaron Courville, from a 2013 Face Emotion Classification competition on Kaggle.

      • wget -O data/fer2013.tar https://bitbucket.org/alvinwan/adversarial-examples-in-computer-vision-building-then-fooling/raw/babfe4651f89a398c4b3fdbdd6d7a697c5104cff/fer2013.tar

      Navigate to the data directory and unpack the data.

      • cd data
      • tar -xzf fer2013.tar

      Now we'll create a script to run the least-squares model. Navigate to the root of your project:

      Create a new file for the script:

      Add Python boilerplate and import the packages you will need:

      step_5_ls_simple.py

      """Train emotion classifier using least squares."""
      
      import numpy as np
      
      def main():
          pass
      
      if __name__ == '__main__':
          main()
      

      Next, load the data into memory. Replace pass in your main function with the following code:

      step_5_ls_simple.py

      
          # load data
          with np.load('data/fer2013_train.npz') as data:
              X_train, Y_train = data['X'], data['Y']
      
          with np.load('data/fer2013_test.npz') as data:
              X_test, Y_test = data['X'], data['Y']
      

      Now one-hot encode the labels. To do this, construct the identity matrix with numpy and then index into this matrix using our list of labels:

      step_5_ls_simple.py

          # one-hot labels
          I = np.eye(6)
          Y_oh_train, Y_oh_test = I[Y_train], I[Y_test]
      

      Here, we use the fact that the i-th row in the identity matrix is all zero, except for the i-th entry. Thus, the i-th row is the one-hot encoding for the label of class i. Additionally, we use numpy's advanced indexing, where [a, b, c, d][[1, 3]] = [b, d].

      Computing (X^TX)^{-1} would take too long on commodity hardware, as X^TX is a 2304x2304 matrix with over four million values, so we'll reduce this time by selecting only the first 100 features. Add this code:

      step_5_ls_simple.py

      ...
          # select first 100 dimensions
          A_train, A_test = X_train[:, :100], X_test[:, :100]
      

      Next, add this code to evaluate the closed-form least-squares solution:

      step_5_ls_simple.py

      ...
          # train model
          w = np.linalg.inv(A_train.T.dot(A_train)).dot(A_train.T.dot(Y_oh_train))
      

      Then define an evaluation function for training and validation sets. Place this before your main function:

      step_5_ls_simple.py

      def evaluate(A, Y, w):
          Yhat = np.argmax(A.dot(w), axis=1)
          return np.sum(Yhat == Y) / Y.shape[0]
      

      To estimate labels, we take the inner product with each sample and get the indices of the maximum values using np.argmax. Then we compute the average number of correct classifications. This final number is your accuracy.

      Finally, add this code to the end of the main function to compute the training and validation accuracy using the evaluate function you just wrote:

      step_5_ls_simple.py

          # evaluate model
          ols_train_accuracy = evaluate(A_train, Y_train, w)
          print('(ols) Train Accuracy:', ols_train_accuracy)
          ols_test_accuracy = evaluate(A_test, Y_test, w)
          print('(ols) Test Accuracy:', ols_test_accuracy)
      

      Double-check that your script matches the following:

      step_5_ls_simple.py

      """Train emotion classifier using least squares."""
      
      import numpy as np
      
      
      def evaluate(A, Y, w):
          Yhat = np.argmax(A.dot(w), axis=1)
          return np.sum(Yhat == Y) / Y.shape[0]
      
      def main():
      
          # load data
          with np.load('data/fer2013_train.npz') as data:
              X_train, Y_train = data['X'], data['Y']
      
          with np.load('data/fer2013_test.npz') as data:
              X_test, Y_test = data['X'], data['Y']
      
          # one-hot labels
          I = np.eye(6)
          Y_oh_train, Y_oh_test = I[Y_train], I[Y_test]
      
          # select first 100 dimensions
          A_train, A_test = X_train[:, :100], X_test[:, :100]
      
          # train model
          w = np.linalg.inv(A_train.T.dot(A_train)).dot(A_train.T.dot(Y_oh_train))
      
          # evaluate model
          ols_train_accuracy = evaluate(A_train, Y_train, w)
          print('(ols) Train Accuracy:', ols_train_accuracy)
          ols_test_accuracy = evaluate(A_test, Y_test, w)
          print('(ols) Test Accuracy:', ols_test_accuracy)
      
      
      if __name__ == '__main__':
          main()
      

      Save your file, exit your editor, and run the Python script.

      • python step_5_ls_simple.py

      You'll see the following output:

      Output

      (ols) Train Accuracy: 0.4748918316507146 (ols) Test Accuracy: 0.45280545359202934

      Our model gives 47.5% train accuracy. We repeat this on the validation set to obtain 45.3% accuracy. For a three-way classification problem, 45.3% is reasonably above guessing, which is 33%​. This is our starting classifier for emotion detection, and in the next step, you'll build off of this least-squares model to improve accuracy. The higher the accuracy, the more reliably your emotion-based dog filter can find the appropriate dog filter for each detected emotion.

      Step 6 — Improving Accuracy by Featurizing the Inputs

      We can use a more expressive model to boost accuracy. To accomplish this, we featurize our inputs.

      The original image tells us that position (0, 0) is red, (1, 0) is brown, and so on. A featurized image may tell us that there is a dog to the top-left of the image, a person in the middle, etc. Featurization is powerful, but its precise definition is beyond the scope of this tutorial.

      We'll use an approximation for the radial basis function (RBF) kernel, using a random Gaussian matrix. We won't go into detail in this tutorial. Instead, we'll treat this as a black box that computes higher-order features for us.

      We'll continue where we left off in the previous step. Copy the previous script so you have a good starting point:

      • cp step_5_ls_simple.py step_6_ls_simple.py

      Open the new file in your editor:

      We'll start by creating the featurizing random matrix. Again, we'll use only 100 features in our new feature space.

      Locate the following line, defining A_train and A_test:

      step_6_ls_simple.py

          # select first 100 dimensions
          A_train, A_test = X_train[:, :100], X_test[:, :100]
      

      Directly above this definition for A_train and A_test, add a random feature matrix:

      step_6_ls_simple.py

          d = 100
          W = np.random.normal(size=(X_train.shape[1], d))
          # select first 100 dimensions
          A_train, A_test = X_train[:, :100], X_test[:, :100]  ...
      

      Then replace the definitions for A_train and A_test. We redefine our matrices, called design matrices, using this random featurization.

      step_6_ls_simple.py

          A_train, A_test = X_train.dot(W), X_test.dot(W)
      

      Save your file and run the script.

      • python step_6_ls_simple.py

      You'll see the following output:

      Output

      (ols) Train Accuracy: 0.584174642717 (ols) Test Accuracy: 0.584425799685

      This featurization now offers 58.4% train accuracy and 58.4% validation accuracy, a 13.1% improvement in validation results. We trimmed the X matrix to be 100 x 100, but the choice of 100 was arbirtary. We could also trim the X matrix to be 1000 x 1000 or 50 x 50. Say the dimension of x is d x d. We can test more values of d by re-trimming X to be d x d and recomputing a new model.

      Trying more values of d, we find an additional 4.3% improvement in test accuracy to 61.7%. In the following figure, we consider the performance of our new classifier as we vary d. Intuitively, as d increases, the accuracy should also increase, as we use more and more of our original data. Rather than paint a rosy picture, however, the graph exhibits a negative trend:

      Performance of featurized ordinary least squares

      As we keep more of our data, the gap between the training and validation accuracies increases as well. This is clear evidence of overfitting, where our model is learning representations that are no longer generalizable to all data. To combat overfitting, we'll regularize our model by penalizing complex models.

      We amend our ordinary least-squares objective function with a regularization term, giving us a new objective. Our new objective function is called ridge regression and it looks like this:

      min_w |Aw- y|^2 + lambda |w|^2
      

      In this equation, lambda is a tunable hyperparameter. Plug lambda = 0 into the equation and ridge regression becomes least-squares. Plug lambda = infinity into the equation, and you'll find the best w must now be zero, as any non-zero w incurs infinite loss. As it turns out, this objective yields a closed-form solution as well:

      w^* = (A^TA + lambda I)^{-1}A^Ty
      

      Still using the featurized samples, retrain and reevaluate the model once more.

      Open step_6_ls_simple.py again in your editor:

      This time, increase the dimensionality of the new feature space to d=1000​. Change the value of d from 100 to 1000 as shown in the following code block:

      step_6_ls_simple.py

      ...
          d = 1000
          W = np.random.normal(size=(X_train.shape[1], d))
      ...
      

      Then apply ridge regression using a regularization of lambda = 10^{10}. Replace the line defining w with the following two lines:

      step_6_ls_simple.py

      ...
          # train model
          I = np.eye(A_train.shape[1])
          w = np.linalg.inv(A_train.T.dot(A_train) + 1e10 * I).dot(A_train.T.dot(Y_oh_train))
      

      Then locate this block:

      step_6_ls_simple.py

      ...
        ols_train_accuracy = evaluate(A_train, Y_train, w)
        print('(ols) Train Accuracy:', ols_train_accuracy)
        ols_test_accuracy = evaluate(A_test, Y_test, w)
        print('(ols) Test Accuracy:', ols_test_accuracy)
      

      Replace it with the following:

      step_6_ls_simple.py

      ...
      
        print('(ridge) Train Accuracy:', evaluate(A_train, Y_train, w))
        print('(ridge) Test Accuracy:', evaluate(A_test, Y_test, w))
      

      The completed script should look like this:

      step_6_ls_simple.py

      """Train emotion classifier using least squares."""
      
      import numpy as np
      
      def evaluate(A, Y, w):
          Yhat = np.argmax(A.dot(w), axis=1)
          return np.sum(Yhat == Y) / Y.shape[0]
      
      def main():
          # load data
          with np.load('data/fer2013_train.npz') as data:
              X_train, Y_train = data['X'], data['Y']
      
          with np.load('data/fer2013_test.npz') as data:
              X_test, Y_test = data['X'], data['Y']
      
          # one-hot labels
          I = np.eye(6)
          Y_oh_train, Y_oh_test = I[Y_train], I[Y_test]
          d = 1000
          W = np.random.normal(size=(X_train.shape[1], d))
          # select first 100 dimensions
          A_train, A_test = X_train.dot(W), X_test.dot(W)
      
          # train model
          I = np.eye(A_train.shape[1])
          w = np.linalg.inv(A_train.T.dot(A_train) + 1e10 * I).dot(A_train.T.dot(Y_oh_train))
      
          # evaluate model
          print('(ridge) Train Accuracy:', evaluate(A_train, Y_train, w))
          print('(ridge) Test Accuracy:', evaluate(A_test, Y_test, w))
      
      if __name__ == '__main__':
          main()
      

      Save the file, exit your editor, and run the script:

      • python step_6_ls_simple.py

      You'll see the following output:

      Output

      (ridge) Train Accuracy: 0.651173462698 (ridge) Test Accuracy: 0.622181436812

      There's an additional improvement of 0.4% in validation accuracy to 62.2%, as train accuracy drops to 65.1%. Once again reevaluating across a number of different d, we see a smaller gap between training and validation accuracies for ridge regression. In other words, ridge regression was subject to less overfitting.

      Performance of featurized ols and ridge regression

      Baseline performance for least squares, with these extra enhancements, performs reasonably well. The training and inference times, all together, take no more than 20 seconds for even the best results. In the next section, you'll explore even more complex models.

      Step 7 — Building the Face-Emotion Classifier Using a Convolutional Neural Network in PyTorch

      In this section, you'll build a second emotion classifier using neural networks instead of least squares. Again, our goal is to produce a model that accepts faces as input and outputs an emotion. Eventually, this classifier will then determine which dog mask to apply.

      For a brief neural network visualization and introduction, see the article Understanding Neural Networks. Here, we will use a deep-learning library called PyTorch. There are a number of deep-learning libraries in widespread use, and each has various pros and cons. PyTorch is a particularly good place to start. To impliment this neural network classifier, we again take three steps, as we did with the least-squares classifier:

      1. Preprocess the data: Apply one-hot encoding and then apply PyTorch abstractions.
      2. Specify and train the model: Set up a neural network using PyTorch layers. Define optimization hyperparameters and run stochastic gradient descent.
      3. Run a prediction using the model: Evaluate the neural network.

      Create a new file, named step_7_fer_simple.py

      • nano step_7_fer_simple.py

      Import the necessary utilities and create a Python class that will hold your data. For data processing here, you will create the train and test datasets. To do these, implement PyTorch's Dataset interface, which lets you load and use PyTorch's built-in data pipeline for the face-emotion recognition dataset:

      step_7_fer_simple.py

      from torch.utils.data import Dataset
      from torch.autograd import Variable
      import torch.nn as nn
      import torch.nn.functional as F
      import torch.optim as optim
      import numpy as np
      import torch
      import cv2
      import argparse
      
      
      class Fer2013Dataset(Dataset):
          """Face Emotion Recognition dataset.
      
          Utility for loading FER into PyTorch. Dataset curated by Pierre-Luc Carrier
          and Aaron Courville in 2013.
      
          Each sample is 1 x 1 x 48 x 48, and each label is a scalar.
          """
          pass
      

      Delete the pass placeholder in the Fer2013Dataset class. In its place, add a function that will initialize our data holder:

      step_7_fer_simple.py

          def __init__(self, path: str):
              """
              Args:
                  path: Path to `.np` file containing sample nxd and label nx1
              """
              with np.load(path) as data:
                  self._samples = data['X']
                  self._labels = data['Y']
              self._samples = self._samples.reshape((-1, 1, 48, 48))
      
              self.X = Variable(torch.from_numpy(self._samples)).float()
              self.Y = Variable(torch.from_numpy(self._labels)).float()
      ...
      

      This function starts by loading the samples and labels. Then it wraps the data in PyTorch data structures.

      Directly after the __init__ function, add a __len__ function, as this is needed to implement the Dataset interface PyTorch expects:

      step_7_fer_simple.py

      ...
          def __len__(self):
              return len(self._labels)
      

      Finally, add a __getitem__ method, which returns a dictionary containing the sample and the label:

      step_7_fer_simple.py

          def __getitem__(self, idx):
              return {'image': self._samples[idx], 'label': self._labels[idx]}
      

      Double-check that your file looks like the following:

      step_7_fer_simple.py

      from torch.utils.data import Dataset
      from torch.autograd import Variable
      import torch.nn as nn
      import torch.nn.functional as F
      import torch.optim as optim
      import numpy as np
      import torch
      import cv2
      import argparse
      
      
      class Fer2013Dataset(Dataset):
          """Face Emotion Recognition dataset.
          Utility for loading FER into PyTorch. Dataset curated by Pierre-Luc Carrier
          and Aaron Courville in 2013.
          Each sample is 1 x 1 x 48 x 48, and each label is a scalar.
          """
      
          def __init__(self, path: str):
              """
              Args:
                  path: Path to `.np` file containing sample nxd and label nx1
              """
              with np.load(path) as data:
                  self._samples = data['X']
                  self._labels = data['Y']
              self._samples = self._samples.reshape((-1, 1, 48, 48))
      
              self.X = Variable(torch.from_numpy(self._samples)).float()
              self.Y = Variable(torch.from_numpy(self._labels)).float()
      
          def __len__(self):
              return len(self._labels)
      
          def __getitem__(self, idx):
              return {'image': self._samples[idx], 'label': self._labels[idx]}
      

      Next, load the Fer2013Dataset dataset. Add the following code to the end of your file after the Fer2013Dataset class:

      step_7_fer_simple.py

      trainset = Fer2013Dataset('data/fer2013_train.npz')
      trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True)
      
      testset = Fer2013Dataset('data/fer2013_test.npz')
      testloader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False)
      

      This code initializes the dataset using the Fer2013Dataset class you created. Then for the train and validation sets, it wraps the dataset in a DataLoader. This translates the dataset into an iterable to use later.

      As a sanity check, verify that the dataset utilities are functioning. Create a sample dataset loader using DataLoader and print the first element of that loader. Add the following to the end of your file:

      step_7_fer_simple.py

      if __name__ == '__main__':
          loader = torch.utils.data.DataLoader(trainset, batch_size=2, shuffle=False)
          print(next(iter(loader)))
      

      Verify that your completed script looks like this:

      step_7_fer_simple.py

      from torch.utils.data import Dataset
      from torch.autograd import Variable
      import torch.nn as nn
      import torch.nn.functional as F
      import torch.optim as optim
      import numpy as np
      import torch
      import cv2
      import argparse
      
      
      class Fer2013Dataset(Dataset):
          """Face Emotion Recognition dataset.
          Utility for loading FER into PyTorch. Dataset curated by Pierre-Luc Carrier
          and Aaron Courville in 2013.
          Each sample is 1 x 1 x 48 x 48, and each label is a scalar.
          """
      
          def __init__(self, path: str):
              """
              Args:
                  path: Path to `.np` file containing sample nxd and label nx1
              """
              with np.load(path) as data:
                  self._samples = data['X']
                  self._labels = data['Y']
              self._samples = self._samples.reshape((-1, 1, 48, 48))
      
              self.X = Variable(torch.from_numpy(self._samples)).float()
              self.Y = Variable(torch.from_numpy(self._labels)).float()
      
          def __len__(self):
              return len(self._labels)
      
          def __getitem__(self, idx):
              return {'image': self._samples[idx], 'label': self._labels[idx]}
      
      trainset = Fer2013Dataset('data/fer2013_train.npz')
      trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True)
      
      testset = Fer2013Dataset('data/fer2013_test.npz')
      testloader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False)
      
      if __name__ == '__main__':
          loader = torch.utils.data.DataLoader(trainset, batch_size=2, shuffle=False)
          print(next(iter(loader)))
      

      Exit your editor and run the script.

      • python step_7_fer_simple.py

      This outputs the following pair of tensors. Our data pipeline outputs two samples and two labels. This indicates that our data pipeline is up and ready to go:

      Output

      {'image': (0 ,0 ,.,.) = 24 32 36 ... 173 172 173 25 34 29 ... 173 172 173 26 29 25 ... 172 172 174 ... ⋱ ... 159 185 157 ... 157 156 153 136 157 187 ... 152 152 150 145 130 161 ... 142 143 142 ⋮ (1 ,0 ,.,.) = 20 17 19 ... 187 176 162 22 17 17 ... 195 180 171 17 17 18 ... 203 193 175 ... ⋱ ... 1 1 1 ... 106 115 119 2 2 1 ... 103 111 119 2 2 2 ... 99 107 118 [torch.LongTensor of size 2x1x48x48] , 'label': 1 1 [torch.LongTensor of size 2] }

      Now that you've verified that the data pipeline works, return to step_7_fer_simple.py to add the neural network and optimizer. Open step_7_fer_simple.py.

      • nano step_7_fer_simple.py

      First, delete the last three lines you added in the previous iteration:

      step_7_fer_simple.py

      # Delete all three lines
      if __name__ == '__main__':
          loader = torch.utils.data.DataLoader(trainset, batch_size=2, shuffle=False)
          print(next(iter(loader)))
      

      In their place, define a PyTorch neural network that includes three convolutional layers, followed by three fully connected layers. Add this to the end of your existing script:

      step_7_fer_simple.py

      class Net(nn.Module):
          def __init__(self):
              super(Net, self).__init__()
              self.conv1 = nn.Conv2d(1, 6, 5)
              self.pool = nn.MaxPool2d(2, 2)
              self.conv2 = nn.Conv2d(6, 6, 3)
              self.conv3 = nn.Conv2d(6, 16, 3)
              self.fc1 = nn.Linear(16 * 4 * 4, 120)
              self.fc2 = nn.Linear(120, 48)
              self.fc3 = nn.Linear(48, 3)
      
          def forward(self, x):
              x = self.pool(F.relu(self.conv1(x)))
              x = self.pool(F.relu(self.conv2(x)))
              x = self.pool(F.relu(self.conv3(x)))
              x = x.view(-1, 16 * 4 * 4)
              x = F.relu(self.fc1(x))
              x = F.relu(self.fc2(x))
              x = self.fc3(x)
              return x
      

      Now initialize the neural network, define a loss function, and define optimization hyperparameters by adding the following code to the end of the script:

      step_7_fer_simple.py

      net = Net().float()
      criterion = nn.CrossEntropyLoss()
      optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
      

      We'll train for two epochs. For now, we define an epoch to be an iteration of training where every training sample has been used exactly once.

      First, extract image and label from the dataset loader and then wrap each in a PyTorch Variable. Second, run the forward pass and then backpropagate through the loss and neural network. Add the following code to the end of your script to do that:

      step_7_fer_simple.py

      for epoch in range(2):  # loop over the dataset multiple times
      
          running_loss = 0.0
          for i, data in enumerate(trainloader, 0):
              inputs = Variable(data['image'].float())
              labels = Variable(data['label'].long())
              optimizer.zero_grad()
      
              # forward + backward + optimize
              outputs = net(inputs)
              loss = criterion(outputs, labels)
              loss.backward()
              optimizer.step()
      
              # print statistics
              running_loss += loss.data[0]
              if i % 100 == 0:
                  print('[%d, %5d] loss: %.3f' % (epoch, i, running_loss / (i + 1)))
      

      Your script should now look like this:

      step_7_fer_simple.py

      from torch.utils.data import Dataset
      from torch.autograd import Variable
      import torch.nn as nn
      import torch.nn.functional as F
      import torch.optim as optim
      import numpy as np
      import torch
      import cv2
      import argparse
      
      
      class Fer2013Dataset(Dataset):
          """Face Emotion Recognition dataset.
      
          Utility for loading FER into PyTorch. Dataset curated by Pierre-Luc Carrier
          and Aaron Courville in 2013.
      
          Each sample is 1 x 1 x 48 x 48, and each label is a scalar.
          """
          def __init__(self, path: str):
              """
              Args:
                  path: Path to `.np` file containing sample nxd and label nx1
              """
              with np.load(path) as data:
                  self._samples = data['X']
                  self._labels = data['Y']
              self._samples = self._samples.reshape((-1, 1, 48, 48))
      
              self.X = Variable(torch.from_numpy(self._samples)).float()
              self.Y = Variable(torch.from_numpy(self._labels)).float()
      
          def __len__(self):
              return len(self._labels)
      
      
          def __getitem__(self, idx):
              return {'image': self._samples[idx], 'label': self._labels[idx]}
      
      
      trainset = Fer2013Dataset('data/fer2013_train.npz')
      trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True)
      
      testset = Fer2013Dataset('data/fer2013_test.npz')
      testloader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False)
      
      
      class Net(nn.Module):
          def __init__(self):
              super(Net, self).__init__()
              self.conv1 = nn.Conv2d(1, 6, 5)
              self.pool = nn.MaxPool2d(2, 2)
              self.conv2 = nn.Conv2d(6, 6, 3)
              self.conv3 = nn.Conv2d(6, 16, 3)
              self.fc1 = nn.Linear(16 * 4 * 4, 120)
              self.fc2 = nn.Linear(120, 48)
              self.fc3 = nn.Linear(48, 3)
      
          def forward(self, x):
              x = self.pool(F.relu(self.conv1(x)))
              x = self.pool(F.relu(self.conv2(x)))
              x = self.pool(F.relu(self.conv3(x)))
              x = x.view(-1, 16 * 4 * 4)
              x = F.relu(self.fc1(x))
              x = F.relu(self.fc2(x))
              x = self.fc3(x)
              return x
      
      net = Net().float()
      criterion = nn.CrossEntropyLoss()
      optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
      
      
      for epoch in range(2):  # loop over the dataset multiple times
      
          running_loss = 0.0
          for i, data in enumerate(trainloader, 0):
              inputs = Variable(data['image'].float())
              labels = Variable(data['label'].long())
              optimizer.zero_grad()
      
              # forward + backward + optimize
              outputs = net(inputs)
              loss = criterion(outputs, labels)
              loss.backward()
              optimizer.step()
      
              # print statistics
              running_loss += loss.data[0]
              if i % 100 == 0:
                  print('[%d, %5d] loss: %.3f' % (epoch, i, running_loss / (i + 1)))
      

      Save the file and exit the editor once you've verified your code. Then, launch this proof-of-concept training:

      • python step_7_fer_simple.py

      You'll see output similar to the following as the neural network trains:

      Output

      [0, 0] loss: 1.094 [0, 100] loss: 1.049 [0, 200] loss: 1.009 [0, 300] loss: 0.963 [0, 400] loss: 0.935 [1, 0] loss: 0.760 [1, 100] loss: 0.768 [1, 200] loss: 0.775 [1, 300] loss: 0.776 [1, 400] loss: 0.767

      You can then augment this script using a number of other PyTorch utilities to save and load models, output training and validation accuracies, fine-tune a learning-rate schedule, etc. After training for 20 epochs with a learning rate of 0.01 and momentum of 0.9, our neural network attains a 87.9% train accuracy and a 75.5% validation accuracy, a further 6.8% improvement over the most successful least-squares approach thus far at 66.6%. We'll include these additional bells and whistles in a new script.

      Create a new file to hold the final face emotion detector which your live camera feed will use. This script contains the code above along with a command-line interface and an easy-to-import version of our code that will be used later. Additionally, it contains the hyperparameters tuned in advance, for a model with higher accuracy.

      Start with the following imports. This matches our previous file but additionally includes OpenCV as import cv2.

      step_7_fer.py

      from torch.utils.data import Dataset
      from torch.autograd import Variable
      import torch.nn as nn
      import torch.nn.functional as F
      import torch.optim as optim
      import numpy as np
      import torch
      import cv2
      import argparse
      

      Directly beneath these imports, reuse your code from step_7_fer_simple.py to define the neural network:

      step_7_fer.py

      class Net(nn.Module):
          def __init__(self):
              super(Net, self).__init__()
              self.conv1 = nn.Conv2d(1, 6, 5)
              self.pool = nn.MaxPool2d(2, 2)
              self.conv2 = nn.Conv2d(6, 6, 3)
              self.conv3 = nn.Conv2d(6, 16, 3)
              self.fc1 = nn.Linear(16 * 4 * 4, 120)
              self.fc2 = nn.Linear(120, 48)
              self.fc3 = nn.Linear(48, 3)
      
          def forward(self, x):
              x = self.pool(F.relu(self.conv1(x)))
              x = self.pool(F.relu(self.conv2(x)))
              x = self.pool(F.relu(self.conv3(x)))
              x = x.view(-1, 16 * 4 * 4)
              x = F.relu(self.fc1(x))
              x = F.relu(self.fc2(x))
              x = self.fc3(x)
              return x
      

      Again, reuse the code for the Face Emotion Recognition dataset from step_7_fer_simple.py and add it to this file:

      step_7_fer.py

      class Fer2013Dataset(Dataset):
          """Face Emotion Recognition dataset.
          Utility for loading FER into PyTorch. Dataset curated by Pierre-Luc Carrier
          and Aaron Courville in 2013.
          Each sample is 1 x 1 x 48 x 48, and each label is a scalar.
          """
      
          def __init__(self, path: str):
              """
              Args:
                  path: Path to `.np` file containing sample nxd and label nx1
              """
              with np.load(path) as data:
                  self._samples = data['X']
                  self._labels = data['Y']
              self._samples = self._samples.reshape((-1, 1, 48, 48))
      
              self.X = Variable(torch.from_numpy(self._samples)).float()
              self.Y = Variable(torch.from_numpy(self._labels)).float()
      
          def __len__(self):
              return len(self._labels)
      
          def __getitem__(self, idx):
              return {'image': self._samples[idx], 'label': self._labels[idx]}
      

      Next, define a few utilities to evaluate the neural network's performance. First, add an evaluate function which compares the neural network's predicted emotion to the true emotion for a single image:

      step_7_fer.py

      def evaluate(outputs: Variable, labels: Variable, normalized: bool=True) -> float:
          """Evaluate neural network outputs against non-one-hotted labels."""
          Y = labels.data.numpy()
          Yhat = np.argmax(outputs.data.numpy(), axis=1)
          denom = Y.shape[0] if normalized else 1
          return float(np.sum(Yhat == Y) / denom)
      

      Then add a function called batch_evaluate which applies the first function to all images:

      step_7_fer.py

      def batch_evaluate(net: Net, dataset: Dataset, batch_size: int=500) -> float:
          """Evaluate neural network in batches, if dataset is too large."""
          score = 0.0
          n = dataset.X.shape[0]
          for i in range(0, n, batch_size):
              x = dataset.X[i: i + batch_size]
              y = dataset.Y[i: i + batch_size]
              score += evaluate(net(x), y, False)
          return score / n
      

      Now, define a function called get_image_to_emotion_predictor that takes in an image and outputs a predicted emotion, using a pretrained model:

      step_7_fer.py

      def get_image_to_emotion_predictor(model_path='assets/model_best.pth'):
          """Returns predictor, from image to emotion index."""
          net = Net().float()
          pretrained_model = torch.load(model_path)
          net.load_state_dict(pretrained_model['state_dict'])
      
          def predictor(image: np.array):
              """Translates images into emotion indices."""
              if image.shape[2] > 1:
                  image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
              frame = cv2.resize(image, (48, 48)).reshape((1, 1, 48, 48))
              X = Variable(torch.from_numpy(frame)).float()
              return np.argmax(net(X).data.numpy(), axis=1)[0]
          return predictor
      

      Finally, add the following code to define the main function to leverage the other utilities:

      step_7_fer.py

      def main():
          trainset = Fer2013Dataset('data/fer2013_train.npz')
          testset = Fer2013Dataset('data/fer2013_test.npz')
          net = Net().float()
      
          pretrained_model = torch.load("assets/model_best.pth")
          net.load_state_dict(pretrained_model['state_dict'])
      
          train_acc = batch_evaluate(net, trainset, batch_size=500)
          print('Training accuracy: %.3f' % train_acc)
          test_acc = batch_evaluate(net, testset, batch_size=500)
          print('Validation accuracy: %.3f' % test_acc)
      
      
      if __name__ == '__main__':
          main()
      

      This loads a pretrained neural network and evaluates its performance on the provided Face Emotion Recognition dataset. Specifically, the script outputs accuracy on the images we used for training, as well as a separate set of images we put aside for testing purposes.

      Double-check that your file matches the following:

      step_7_fer.py

      from torch.utils.data import Dataset
      from torch.autograd import Variable
      import torch.nn as nn
      import torch.nn.functional as F
      import torch.optim as optim
      import numpy as np
      import torch
      import cv2
      import argparse
      
      class Net(nn.Module):
          def __init__(self):
              super(Net, self).__init__()
              self.conv1 = nn.Conv2d(1, 6, 5)
              self.pool = nn.MaxPool2d(2, 2)
              self.conv2 = nn.Conv2d(6, 6, 3)
              self.conv3 = nn.Conv2d(6, 16, 3)
              self.fc1 = nn.Linear(16 * 4 * 4, 120)
              self.fc2 = nn.Linear(120, 48)
              self.fc3 = nn.Linear(48, 3)
      
          def forward(self, x):
              x = self.pool(F.relu(self.conv1(x)))
              x = self.pool(F.relu(self.conv2(x)))
              x = self.pool(F.relu(self.conv3(x)))
              x = x.view(-1, 16 * 4 * 4)
              x = F.relu(self.fc1(x))
              x = F.relu(self.fc2(x))
              x = self.fc3(x)
              return x
      
      
      class Fer2013Dataset(Dataset):
          """Face Emotion Recognition dataset.
          Utility for loading FER into PyTorch. Dataset curated by Pierre-Luc Carrier
          and Aaron Courville in 2013.
          Each sample is 1 x 1 x 48 x 48, and each label is a scalar.
          """
      
          def __init__(self, path: str):
              """
              Args:
                  path: Path to `.np` file containing sample nxd and label nx1
              """
              with np.load(path) as data:
                  self._samples = data['X']
                  self._labels = data['Y']
              self._samples = self._samples.reshape((-1, 1, 48, 48))
      
              self.X = Variable(torch.from_numpy(self._samples)).float()
              self.Y = Variable(torch.from_numpy(self._labels)).float()
      
          def __len__(self):
              return len(self._labels)
      
          def __getitem__(self, idx):
              return {'image': self._samples[idx], 'label': self._labels[idx]}
      
      
      def evaluate(outputs: Variable, labels: Variable, normalized: bool=True) -> float:
          """Evaluate neural network outputs against non-one-hotted labels."""
          Y = labels.data.numpy()
          Yhat = np.argmax(outputs.data.numpy(), axis=1)
          denom = Y.shape[0] if normalized else 1
          return float(np.sum(Yhat == Y) / denom)
      
      
      def batch_evaluate(net: Net, dataset: Dataset, batch_size: int=500) -> float:
          """Evaluate neural network in batches, if dataset is too large."""
          score = 0.0
          n = dataset.X.shape[0]
          for i in range(0, n, batch_size):
              x = dataset.X[i: i + batch_size]
              y = dataset.Y[i: i + batch_size]
              score += evaluate(net(x), y, False)
          return score / n
      
      
      def get_image_to_emotion_predictor(model_path='assets/model_best.pth'):
          """Returns predictor, from image to emotion index."""
          net = Net().float()
          pretrained_model = torch.load(model_path)
          net.load_state_dict(pretrained_model['state_dict'])
      
          def predictor(image: np.array):
              """Translates images into emotion indices."""
              if image.shape[2] > 1:
                  image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
              frame = cv2.resize(image, (48, 48)).reshape((1, 1, 48, 48))
              X = Variable(torch.from_numpy(frame)).float()
              return np.argmax(net(X).data.numpy(), axis=1)[0]
          return predictor
      
      
      def main():
          trainset = Fer2013Dataset('data/fer2013_train.npz')
          testset = Fer2013Dataset('data/fer2013_test.npz')
          net = Net().float()
      
          pretrained_model = torch.load("assets/model_best.pth")
          net.load_state_dict(pretrained_model['state_dict'])
      
          train_acc = batch_evaluate(net, trainset, batch_size=500)
          print('Training accuracy: %.3f' % train_acc)
          test_acc = batch_evaluate(net, testset, batch_size=500)
          print('Validation accuracy: %.3f' % test_acc)
      
      
      if __name__ == '__main__':
          main(
      

      Save the file and exit your editor.

      As before, with the face detector, download pre-trained model parameters and save them to your assets folder with the following command:

      • wget -O assets/model_best.pth https://github.com/alvinwan/emotion-based-dog-filter/raw/master/src/assets/model_best.pth

      Run the script to use and evaluate the pre-trained model:

      This will output the following:

      Output

      Training accuracy: 0.879 Validation accuracy: 0.755

      At this point, you've built a pretty accurate face-emotion classifier. In essence, our model can correctly disambiguate between faces that are happy, sad, and surprised eight out of ten times. This is a reasonably good model, so you can now move on to using this face-emotion classifier to determine which dog mask to apply to faces.

      Step 8 — Finishing the Emotion-Based Dog Filter

      Before integrating our brand-new face-emotion classifier, we will need animal masks to pick from. We'll use a Dalmation mask and a Sheepdog mask:

      Dalmation mask
      Sheepdog mask

      Execute these commands to download both masks to your assets folder:

      • wget -O assets/dalmation.png https://www.xpresservers.com/wp-content/uploads/2019/04/1554419827_591_How-To-Apply-Computer-Vision-to-Build-an-Emotion-Based-Dog-Filter-in-Python-3.png # dalmation
      • wget -O assets/sheepdog.png https://www.xpresservers.com/wp-content/uploads/2019/04/1554419827_102_How-To-Apply-Computer-Vision-to-Build-an-Emotion-Based-Dog-Filter-in-Python-3.png # sheepdog

      Now let's use the masks in our filter. Start by duplicating the step_4_dog_mask.py file:

      • cp step_4_dog_mask.py step_8_dog_emotion_mask.py

      Open the new Python script.

      • nano step_8_dog_emotion_mask.py

      Insert a new line at the top of the script to import the emotion predictor:

      step_8_dog_emotion_mask.py

      from step_7_fer import get_image_to_emotion_predictor
      ...
      

      Then, in the main() function, locate this line:

      step_8_dog_emotion_mask.py

          mask = cv2.imread('assets/dog.png')
      

      Replace it with the following to load the new masks and aggregate all masks into a tuple:

      step_8_dog_emotion_mask.py

          mask0 = cv2.imread('assets/dog.png')
          mask1 = cv2.imread('assets/dalmation.png')
          mask2 = cv2.imread('assets/sheepdog.png')
          masks = (mask0, mask1, mask2)
      

      Add a line break, and then add this code to create the emotion predictor.

      step_8_dog_emotion_mask.py

      
          # get emotion predictor
          predictor = get_image_to_emotion_predictor()
      

      Your main function should now match the following:

      step_8_dog_emotion_mask.py

      def main():
          cap = cv2.VideoCapture(0)
      
          # load mask
          mask0 = cv2.imread('assets/dog.png')
          mask1 = cv2.imread('assets/dalmation.png')
          mask2 = cv2.imread('assets/sheepdog.png')
          masks = (mask0, mask1, mask2)
      
          # get emotion predictor
          predictor = get_image_to_emotion_predictor()
      
          # initialize front face classifier
          ...
      

      Next, locate these lines:

      step_8_dog_emotion_mask.py

      
                  # apply mask
                  frame[y0: y1, x0: x1] = apply_mask(frame[y0: y1, x0: x1], mask)
      

      Insert the following line below the # apply mask line to select the appropriate mask by using the predictor:

      step_8_dog_emotion_mask.py

                  # apply mask
                  mask = masks[predictor(frame[y:y+h, x: x+w])]
                  frame[y0: y1, x0: x1] = apply_mask(frame[y0: y1, x0: x1], mask)
      
      

      The completed file should look like this:

      step_8_dog_emotion_mask.py

      """Test for face detection"""
      
      from step_7_fer import get_image_to_emotion_predictor
      import numpy as np
      import cv2
      
      def apply_mask(face: np.array, mask: np.array) -> np.array:
          """Add the mask to the provided face, and return the face with mask."""
          mask_h, mask_w, _ = mask.shape
          face_h, face_w, _ = face.shape
      
          # Resize the mask to fit on face
          factor = min(face_h / mask_h, face_w / mask_w)
          new_mask_w = int(factor * mask_w)
          new_mask_h = int(factor * mask_h)
          new_mask_shape = (new_mask_w, new_mask_h)
          resized_mask = cv2.resize(mask, new_mask_shape)
      
          # Add mask to face - ensure mask is centered
          face_with_mask = face.copy()
          non_white_pixels = (resized_mask < 250).all(axis=2)
          off_h = int((face_h - new_mask_h) / 2)
          off_w = int((face_w - new_mask_w) / 2)
          face_with_mask[off_h: off_h+new_mask_h, off_w: off_w+new_mask_w][non_white_pixels] = 
               resized_mask[non_white_pixels]
      
          return face_with_mask
      
      def main():
      
          cap = cv2.VideoCapture(0)
          # load mask
          mask0 = cv2.imread('assets/dog.png')
          mask1 = cv2.imread('assets/dalmation.png')
          mask2 = cv2.imread('assets/sheepdog.png')
          masks = (mask0, mask1, mask2)
      
          # get emotion predictor
          predictor = get_image_to_emotion_predictor()
      
          # initialize front face classifier
          cascade = cv2.CascadeClassifier("assets/haarcascade_frontalface_default.xml")
      
          while True:
              # Capture frame-by-frame
              ret, frame = cap.read()
              frame_h, frame_w, _ = frame.shape
      
              # Convert to black-and-white
              gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
              blackwhite = cv2.equalizeHist(gray)
      
              rects = cascade.detectMultiScale(
                  blackwhite, scaleFactor=1.3, minNeighbors=4, minSize=(30, 30),
                  flags=cv2.CASCADE_SCALE_IMAGE)
      
              for x, y, w, h in rects:
                  # crop a frame slightly larger than the face
                  y0, y1 = int(y - 0.25*h), int(y + 0.75*h)
                  x0, x1 = x, x + w
                  # give up if the cropped frame would be out-of-bounds
                  if x0 < 0 or y0 < 0 or x1 > frame_w or y1 > frame_h:
                      continue
                  # apply mask
                  mask = masks[predictor(frame[y:y+h, x: x+w])]
                  frame[y0: y1, x0: x1] = apply_mask(frame[y0: y1, x0: x1], mask)
      
              # Display the resulting frame
              cv2.imshow('frame', frame)
              if cv2.waitKey(1) & 0xFF == ord('q'):
                  break
      
          cap.release()
          cv2.destroyAllWindows()
      
      if __name__ == '__main__':
          main()
      

      Save and exit your editor. Now launch the script:

      • python step_8_dog_emotion_mask.py

      Now try it out! Smiling will register as "happy" and show the original dog. A neutral face or a frown will register as "sad" and yield the dalmation. A face of "surprise," with a nice big jaw drop, will yield the sheepdog.

      GIF for emotion-based dog filter

      This concludes our emotion-based dog filter and foray into computer vision.

      Conclusion

      In this tutorial, you built a face detector and dog filter using computer vision and employed machine learning models to apply masks based on detected emotions.

      Machine learning is widely applicable. However, it's up to the practitioner to consider the ethical implications of each application when applying machine learning. The application you built in this tutorial was a fun exercise, but remember that you relied on OpenCV and an existing dataset to identify faces, rather than supplying your own data to train the models. The data and models used have significant impacts on how a program works.

      For example, imagine a job search engine where the models were trained with data about candidates. such as race, gender, age, culture, first language, or other factors. And perhaps the developers trained a model that enforces sparsity, which ends up reducing the feature space to a subspace where gender explains most of the variance. As a result, the model influences candidate job searches and even company selection processes based primarily on gender. Now consider more complex situations where the model is less interpretable and you don't know what a particular feature corresponds to. You can learn more about this in Equality of Opportunity in Machine Learning by Professor Moritz Hardt at UC Berkeley.

      There can be an overwhelming magnitude of uncertainty in machine learning. To understand this randomness and complexity, you'll have to develop both mathematical intuitions and probabilistic thinking skills. As a practitioner, it is up to you to dig into the theoretical underpinnings of machine learning.



      Source link