One place for hosting & domains

      Processing

      How To Build a Data Processing Pipeline Using Luigi in Python on Ubuntu 20.04


      The author selected the Free and Open Source Fund to receive a donation as part of the Write for DOnations program.

      Introduction

      Luigi is a Python package that manages long-running batch processing, which is the automated running of data processing jobs on batches of items. Luigi allows you to define a data processing job as a set of dependent tasks. For example, task B depends on the output of task A. And task D depends on the output of task B and task C. Luigi automatically works out what tasks it needs to run to complete a requested job.

      Overall Luigi provides a framework to develop and manage data processing pipelines. It was originally developed by Spotify, who use it to manage plumbing together collections of tasks that need to fetch and process data from a variety of sources. Within Luigi, developers at Spotify built functionality to help with their batch processing needs including handling of failures, the ability to automatically resolve dependencies between tasks, and visualization of task processing. Spotify uses Luigi to support batch processing jobs, including providing music recommendations to users, populating internal dashboards, and calculating lists of top songs.

      In this tutorial, you will build a data processing pipeline to analyze the most common words from the most popular books on Project Gutenburg. To do this, you will build a pipeline using the Luigi package. You will use Luigi tasks, targets, dependencies, and parameters to build your pipeline.

      Prerequisites

      To complete this tutorial, you will need the following:

      Step 1 — Installing Luigi

      In this step, you will create a clean sandbox environment for your Luigi installation.

      First, create a project directory. For this tutorial luigi-demo:

      Navigate into the newly created luigi-demo directory:

      Create a new virtual environment luigi-venv:

      • python3 -m venv luigi-venv

      And activate the newly created virtual environment:

      • . luigi-venv/bin/activate

      You will find (luigi-venv) appended to the front of your terminal prompt to indicate which virtual environment is active:

      Output

      (luigi-venv) username@hostname:~/luigi-demo$

      For this tutorial, you will need three libraries: luigi, beautifulsoup4, and requests. The requests library streamlines making HTTP requests; you will use it to download the Project Gutenberg book lists and the books to analyze. The beautifulsoup4 library provides functions to parse data from web pages; you will use it to parse out a list of the most popular books on the Project Gutenberg site.

      Run the following command to install these libraries using pip:

      • pip install wheel luigi beautifulsoup4 requests

      You will get a response confirming the installation of the latest versions of the libraries and all of their dependencies:

      Output

      Successfully installed beautifulsoup4-4.9.1 certifi-2020.6.20 chardet-3.0.4 docutils-0.16 idna-2.10 lockfile-0.12.2 luigi-3.0.1 python-daemon-2.2.4 python-dateutil-2.8.1 requests-2.24.0 six-1.15.0 soupsieve-2.0.1 tornado-5.1.1 urllib3-1.25.10

      You’ve installed the dependencies for your project. Now, you’ll move on to building your first Luigi task.

      Step 2 — Creating a Luigi Task

      In this step, you will create a “Hello World” Luigi task to demonstrate how they work.

      A Luigi task is where the execution of your pipeline and the definition of each task’s input and output dependencies take place. Tasks are the building blocks that you will create your pipeline from. You define them in a class, which contains:

      • A run() method that holds the logic for executing the task.
      • An output() method that returns the artifacts generated by the task. The run() method populates these artifacts.
      • An optional input() method that returns any additional tasks in your pipeline that are required to execute the current task. The run() method uses these to carry out the task.

      Create a new file hello-world.py:

      Now add the following code to your file:

      hello-world.py

      import luigi
      
      class HelloLuigi(luigi.Task):
      
          def output(self):
              return luigi.LocalTarget('hello-luigi.txt')
      
          def run(self):
              with self.output().open("w") as outfile:
                  outfile.write("Hello Luigi!")
      
      

      You define that HelloLuigi() is a Luigi task by adding the luigi.Task mixin to it.

      The output() method defines one or more Target outputs that your task produces. In the case of this example, you define a luigi.LocalTarget, which is a local file.

      Note: Luigi allows you to connect to a variety of common data sources including AWS S3 buckets, MongoDB databases, and SQL databases. You can find a complete list of supported data sources in the Luigi docs.

      The run() method contains the code you want to execute for your pipeline stage. For this example you are opening the output() target file in write mode, self.output().open("w") as outfile: and writing "Hello Luigi!" to it with outfile.write("Hello Luigi!").

      To execute the task you created, run the following command:

      • python -m luigi --module hello-world HelloLuigi --local-scheduler

      Here, you run the task using python -m instead of executing the luigi command directly; this is because Luigi can only execute code that is within the current PYTHONPATH. You can alternatively add PYTHONPATH='.' to the front of your Luigi command, like so:

      • PYTHONPATH='.' luigi --module hello-world HelloLuigi --local-scheduler

      With the --module hello-world HelloLuigi flag, you tell Luigi which Python module and Luigi task to execute.

      The --local-scheduler flag tells Luigi to not connect to a Luigi scheduler and, instead, execute this task locally. (We explain the Luigi scheduler in Step 4.) Running tasks using the local-scheduler flag is only recommended for development work.

      Luigi will output a summary of the executed tasks:

      Output

      ===== Luigi Execution Summary ===== Scheduled 1 tasks of which: * 1 ran successfully: - 1 HelloLuigi() This progress looks :) because there were no failed tasks or missing dependencies ===== Luigi Execution Summary =====

      And it will create a new file hello-luigi.txt with content:

      hello-luigi.txt

      Hello Luigi!
      

      You have created a Luigi task that generates a file and then executed it using the Luigi local-scheduler. Now, you’ll create a task that can extract a list of books from a web page.

      In this step, you will create a Luigi task and define a run() method for the task to download a list of the most popular books on Project Gutenberg. You’ll define an output() method to store links to these books in a file. You will run these using the Luigi local scheduler.

      Create a new directory data inside of your luigi-demo directory. This will be where you will store the files defined in the output() methods of your tasks. You need to create the directories before running your tasks—Python throws exceptions when you try to write a file to a directory that does not exist yet:

      • mkdir data
      • mkdir data/counts
      • mkdir data/downloads

      Create a new file word-frequency.py:

      Insert the following code, which is a Luigi task to extract a list of links to the top most-read books on Project Gutenberg:

      word-frequency.py

      import requests
      import luigi
      from bs4 import BeautifulSoup
      
      
      class GetTopBooks(luigi.Task):
          """
          Get list of the most popular books from Project Gutenberg
          """
      
          def output(self):
              return luigi.LocalTarget("data/books_list.txt")
      
          def run(self):
              resp = requests.get("http://www.gutenberg.org/browse/scores/top")
      
              soup = BeautifulSoup(resp.content, "html.parser")
      
              pageHeader = soup.find_all("h2", string="Top 100 EBooks yesterday")[0]
              listTop = pageHeader.find_next_sibling("ol")
      
              with self.output().open("w") as f:
                  for result in listTop.select("li>a"):
                      if "/ebooks/" in result["href"]:
                          f.write("http://www.gutenberg.org{link}.txt.utf-8n"
                              .format(
                                  link=result["href"]
                              )
                          )
      

      You define an output() target of file "data/books_list.txt" to store the list of books.

      In the run() method, you:

      • use the requests library to download the HTML contents of the Project Gutenberg top books page.
      • use the BeautifulSoup library to parse the contents of the page. The BeautifulSoup library allows us to scrape information out of web pages. To find out more about using the BeautifulSoup library, read the How To Scrape Web Pages with Beautiful Soup and Python 3 tutorial.
      • open the output file defined in the output() method.
      • iterate over the HTML structure to get all of the links in the Top 100 EBooks yesterday list. For this page, this is locating all links <a> that are within a list item <li>. For each of those links, if they link to a page that points at a link containing /ebooks/, you can assume it is a book and write that link to your output() file.

      Screenshot of the Project Gutenberg top books web page with the top ebooks links highlighted

      Save and exit the file once you’re done.

      Execute this new task using the following command:

      • python -m luigi --module word-frequency GetTopBooks --local-scheduler

      Luigi will output a summary of the executed tasks:

      Output

      ===== Luigi Execution Summary ===== Scheduled 1 tasks of which: * 1 ran successfully: - 1 GetTopBooks() This progress looks :) because there were no failed tasks or missing dependencies ===== Luigi Execution Summary =====

      In the data directory, Luigi will create a new file (data/books_list.txt). Run the following command to output the contents of the file:

      This file contains a list of URLs extracted from the Project Gutenberg top projects list:

      Output

      http://www.gutenberg.org/ebooks/1342.txt.utf-8 http://www.gutenberg.org/ebooks/11.txt.utf-8 http://www.gutenberg.org/ebooks/2701.txt.utf-8 http://www.gutenberg.org/ebooks/1661.txt.utf-8 http://www.gutenberg.org/ebooks/16328.txt.utf-8 http://www.gutenberg.org/ebooks/45858.txt.utf-8 http://www.gutenberg.org/ebooks/98.txt.utf-8 http://www.gutenberg.org/ebooks/84.txt.utf-8 http://www.gutenberg.org/ebooks/5200.txt.utf-8 http://www.gutenberg.org/ebooks/51461.txt.utf-8 ...

      You’ve created a task that can extract a list of books from a web page. In the next step, you’ll set up a central Luigi scheduler.

      Step 4 — Running the Luigi Scheduler

      Now, you’ll launch the Luigi scheduler to execute and visualize your tasks. You will take the task developed in Step 3 and run it using the Luigi scheduler.

      So far, you have been running Luigi using the --local-scheduler tag to run your jobs locally without allocating work to a central scheduler. This is useful for development, but for production usage it is recommended to use the Luigi scheduler. The Luigi scheduler provides:

      • A central point to execute your tasks.
      • Visualization of the execution of your tasks.

      To access the Luigi scheduler interface, you need to enable access to port 8082. To do this, run the following command:

      To run the scheduler execute the following command:

      • sudo sh -c ". luigi-venv/bin/activate ;luigid --background --port 8082"

      Note: We have re-run the virtualenv activate script as root, before launching the Luigi scheduler as a background task. This is because when running sudo the virtualenv environment variables and aliases are not carried over.

      If you do not want to run as root, you can run the Luigi scheduler as a background process for the current user. This command runs the Luigi scheduler in the background and hides messages from the scheduler background task. You can find out more about managing background processes in the terminal at How To Use Bash’s Job Control to Manage Foreground and Background Processes:

      • luigid --port 8082 > /dev/null 2> /dev/null &

      Open a browser to access the Luigi interface. This will either be at http://your_server_ip:8082, or if you have set up a domain for your server http://your_domain:8082. This will open the Luigi user interface.

      Luigi default user interface

      By default, Luigi tasks run using the Luigi scheduler. To run one of your previous tasks using the Luigi scheduler omit the --local-scheduler argument from the command. Re-run the task from Step 3 using the following command:

      • python -m luigi --module word-frequency GetTopBooks

      Refresh the Luigi scheduler user interface. You will find the GetTopBooks task added to the run list and its execution status.

      Luigi User Interface after running the GetTopBooks Task

      You will continue to refer back to this user interface to monitor the progress of your pipeline.

      Note: If you’d like to secure your Luigi scheduler through HTTPS, you can serve it through Nginx. To set up an Nginx server using HTTPS follow: How To Secure Nginx with Let’s Encrypt on Ubuntu 20.04. See Github - Luigi - Pull Request 2785 for suggestions on a suitable Nginx configuration to connect the Luigi server to Nginx.

      You’ve launched the Luigi Scheduler and used it to visualize your executed tasks. Next, you will create a task to download the list of books that the GetTopBooks() task outputs.

      Step 5 — Downloading the Books

      In this step you will create a Luigi task to download a specified book. You will define a dependency between this newly created task and the task created in Step 3.

      First open your file:

      Add an additional class following your GetTopBooks() task to the word-frequency.py file with the following code:

      word-frequency.py

      . . .
      class DownloadBooks(luigi.Task):
          """
          Download a specified list of books
          """
          FileID = luigi.IntParameter()
      
          REPLACE_LIST = """.,"';_[]:*-"""
      
          def requires(self):
              return GetTopBooks()
      
          def output(self):
              return luigi.LocalTarget("data/downloads/{}.txt".format(self.FileID))
      
          def run(self):
              with self.input().open("r") as i:
                  URL = i.read().splitlines()[self.FileID]
      
                  with self.output().open("w") as outfile:
                      book_downloads = requests.get(URL)
                      book_text = book_downloads.text
      
                      for char in self.REPLACE_LIST:
                          book_text = book_text.replace(char, " ")
      
                      book_text = book_text.lower()
                      outfile.write(book_text)
      

      In this task you introduce a Parameter; in this case, an integer parameter. Luigi parameters are inputs to your tasks that affect the execution of the pipeline. Here you introduce a parameter FileID to specify a line in your list of URLs to fetch.

      You have added an additional method to your Luigi task, def requires(); in this method you define the Luigi task that you need the output of before you can execute this task. You require the output of the GetTopBooks() task you defined in Step 3.

      In the output() method, you define your target. You use the FileID parameter to create a name for the file created by this step. In this case, you format data/downloads/{FileID}.txt.

      In the run() method, you:

      • open the list of books generated in the GetTopBooks() task.
      • get the URL from the line specified by parameter FileID.
      • use the requests library to download the contents of the book from the URL.
      • filter out any special characters inside the book like :,.?, so they don’t get included in your word analysis.
      • convert the text to lowercase so you can compare words with different cases.
      • write the filtered output to the file specified in the output() method.

      Save and exit your file.

      Run the new DownloadBooks() task using this command:

      • python -m luigi --module word-frequency DownloadBooks --FileID 2

      In this command, you set the FileID parameter using the --FileID argument.

      Note: Be careful when defining a parameter with an _ in the name. To reference them in Luigi you need to substitute the _ for a -. For example, a File_ID parameter would be referenced as --File-ID when calling a task from the terminal.

      You will receive the following output:

      Output

      ===== Luigi Execution Summary ===== Scheduled 2 tasks of which: * 1 complete ones were encountered: - 1 GetTopBooks() * 1 ran successfully: - 1 DownloadBooks(FileID=2) This progress looks :) because there were no failed tasks or missing dependencies ===== Luigi Execution Summary =====

      Note from the output that Luigi has detected that you have already generated the output of GetTopBooks() and skipped running that task. This functionality allows you to minimize the number of tasks you have to execute as you can re-use successful output from previous runs.

      You have created a task that uses the output of another task and downloads a set of books to analyze. In the next step, you will create a task to count the most common words in a downloaded book.

      Step 6 — Counting Words and Summarizing Results

      In this step, you will create a Luigi task to count the frequency of words in each of the books downloaded in Step 5. This will be your first task that executes in parallel.

      First open your file again:

      Add the following imports to the top of word-frequency.py:

      word-frequency.py

      from collections import Counter
      import pickle
      

      Add the following task to word-frequency.py, after your DownloadBooks() task. This task takes the output of the previous DownloadBooks() task for a specified book, and returns the most common words in that book:

      word-frequency.py

      class CountWords(luigi.Task):
          """
          Count the frequency of the most common words from a file
          """
      
          FileID = luigi.IntParameter()
      
          def requires(self):
              return DownloadBooks(FileID=self.FileID)
      
          def output(self):
              return luigi.LocalTarget(
                  "data/counts/count_{}.pickle".format(self.FileID),
                  format=luigi.format.Nop
              )
      
          def run(self):
              with self.input().open("r") as i:
                  word_count = Counter(i.read().split())
      
                  with self.output().open("w") as outfile:
                      pickle.dump(word_count, outfile)
      

      When you define requires() you pass the FileID parameter to the next task. When you specify that a task depends on another task, you specify the parameters you need the dependent task to be executed with.

      In the run() method you:

      • open the file generated by the DownloadBooks() task.
      • use the built-in Counter object in the collections library. This provides an easy way to analyze the most common words in a book.
      • use the pickle library to store the output of the Python Counter object, so you can re-use that object in a later task. pickle is a library that you use to convert Python objects into a byte stream, which you can store and restore into a later Python session. You have to set the format property of the luigi.LocalTarget to allow it to write the binary output the pickle library generates.

      Save and exit your file.

      Run the new CountWords() task using this command:

      • python -m luigi --module word-frequency CountWords --FileID 2

      Open the CountWords task graph view in the Luigi scheduler user interface.

      Showing how to view a graph from the Luigi user interface

      Deselect the Hide Done option, and deselect Upstream Dependencies. You will find the flow of execution from the tasks you have created.

      Visualizing the execution of the CountWords task

      You have created a task to count the most common words in a downloaded book and visualized the dependencies between those tasks. Next, you will define parameters that you can use to customize the execution of your tasks.

      Step 7 — Defining Configuration Parameters

      In this step, you will add configuration parameters to the pipeline. These will allow you to customize how many books to analyze and the number of words to include in the results.

      When you want to set parameters that are shared among tasks, you can create a Config() class. Other pipeline stages can reference the parameters defined in the Config() class; these are set by the pipeline when executing a job.

      Add the following Config() class to the end of word-frequency.py. This will define two new parameters in your pipeline for the number of books to analyze and the number of most frequent words to include in the summary:

      word-frequency.py

      class GlobalParams(luigi.Config):
          NumberBooks = luigi.IntParameter(default=10)
          NumberTopWords = luigi.IntParameter(default=500)
      

      Add the following class to word-frequency.py. This class aggregates the results from all of the CountWords() task to create a summary of the most frequent words:

      word-frequency.py

      class TopWords(luigi.Task):
          """
          Aggregate the count results from the different files
          """
      
          def requires(self):
              requiredInputs = []
              for i in range(GlobalParams().NumberBooks):
                  requiredInputs.append(CountWords(FileID=i))
              return requiredInputs
      
          def output(self):
              return luigi.LocalTarget("data/summary.txt")
      
          def run(self):
              total_count = Counter()
              for input in self.input():
                  with input.open("rb") as infile:
                      nextCounter = pickle.load(infile)
                      total_count += nextCounter
      
              with self.output().open("w") as f:
                  for item in total_count.most_common(GlobalParams().NumberTopWords):
                      f.write("{0: <15}{1}n".format(*item))
      
      

      In the requires() method, you can provide a list where you want a task to use the output of multiple dependent tasks. You use the GlobalParams().NumberBooks parameter to set the number of books you need word counts from.

      In the output() method, you define a data/summary.txt output file that will be the final output of your pipeline.

      In the run() method you:

      • create a Counter() object to store the total count.
      • open the file and “unpickle” it (convert it from a file back to a Python object), for each count carried out in the CountWords() method
      • append the loaded count and add it to the total count.
      • write the most common words to target output file.

      Run the pipeline with the following command:

      • python -m luigi --module word-frequency TopWords --GlobalParams-NumberBooks 15 --GlobalParams-NumberTopWords 750

      Luigi will execute the remaining tasks needed to generate the summary of the top words:

      Output

      ===== Luigi Execution Summary ===== Scheduled 31 tasks of which: * 2 complete ones were encountered: - 1 CountWords(FileID=2) - 1 GetTopBooks() * 29 ran successfully: - 14 CountWords(FileID=0,1,10,11,12,13,14,3,4,5,6,7,8,9) - 14 DownloadBooks(FileID=0,1,10,11,12,13,14,3,4,5,6,7,8,9) - 1 TopWords() This progress looks :) because there were no failed tasks or missing dependencies ===== Luigi Execution Summary =====

      You can visualize the execution of the pipeline from the Luigi scheduler. Select the GetTopBooks task in the task list and press the View Graph button.

      Showing how to view a graph from the Luigi user interface

      Deselect the Hide Done and Upstream Dependencies options.

      Visualizing the execution of the TopWords Task

      It will show the flow of processing that is happening in Luigi.

      Open the data/summary.txt file:

      You will find the calculated most common words:

      Output

      the 64593 and 41650 of 31896 to 31368 a 25265 i 23449 in 19496 it 16282 that 15907 he 14974 ...

      In this step, you have defined and used parameters to customize the execution of your tasks. You have generated a summary of the most common words for a set of books.

      Find all the code for this tutorial in this repository.

      Conclusion

      This tutorial has introduced you to using the Luigi data processing pipeline and its major features including tasks, parameters, configuration parameters, and the Luigi scheduler.

      Luigi supports connecting to a large number of common data sources out the box. You can also scale it to run large, complex data pipelines. This provides a powerful framework to start solving your data processing challenges.

      For more tutorials, check out our Data Analysis topic page and Python topic page.



      Source link

      Processing Incoming Request Data in Flask


      In any web app, you’ll have to process incoming request data from users. Flask, like any other web framework, allows you to access the request data easily.

      In this tutorial, we’ll go through how to process incoming data for the most common use cases. The forms of incoming data we’ll cover are: query strings, form data, and JSON objects. To demonstrate these cases, we’ll build a simple example app with three routes that accept either query string data, form data, or JSON data.

      The Request Object

      To access the incoming data in Flask, you have to use the request object. The request object holds all incoming data from the request, which includes the mimetype, referrer, IP address, raw data, HTTP method, and headers, among other things. Although all the information the request object holds can be useful, in this article, we’ll only focus on the data that is normally directly supplied by the caller of our endpoint.

      To gain access to the request object in Flask, you simply import it from the Flask library.

      from flask import request
      

      You then have the ability to use it in any of your view functions.

      Once we get to the section on query strings, you’ll get to see the request object in action.

      The Example App

      To demonstrate the different ways of using request, we’ll start with a simple Flask app. Even though the example app uses a simple organization layout for the view functions and routes, what you learn in this tutorial can be applied to any method of organizing your views like class-based views, blueprints, or an extension like Flask-Via.

      To get started, first we need to install Flask.

      pip install Flask
      

      Then, we can start with the following code.

      #app.py
      
      from flask import Flask, request #import main Flask class and request object
      
      app = Flask(__name__) #create the Flask app
      
      @app.route('/query-example')
      def query_example():
          return 'Todo...'
      
      @app.route('/form-example')
      def formexample():
          return 'Todo...'
      
      @app.route('/json-example')
      def jsonexample():
          return 'Todo...'
      
      if _name == '__main_':
          app.run(debug=True, port=5000) #run app in debug mode on port 5000
      

      Start the app with:

      python app.py
      

      The code supplied sets up three routes with a message telling us ‘Todo…’. The app will start on port 5000, so you can view each route in your browser with the following links:

      http://127.0.0.1:5000/query-example (or localhost:5000/query-example)
      http://127.0.0.1:5000/form-example (or localhost:5000/form-example)
      http://127.0.0.1:5000/json-example (or localhost:5000/json-example)

      For each of the three routes, you’ll see the same thing:

      Query Arguments

      URL arguments that you add to a query string are the simpliest way to pass data to a web app, so let’s start with them.

      A query string looks like the following:

      example.com?arg1=value1&arg2=value2
      

      The query string begins after the question mark (?) and has two key-value pairs separated by an ampersand (&). For each pair, the key is followed by an equals sign (=) and then the value. Even if you’ve never heard of a query string until now, you have definitely seen them all over the web.

      So in that example, the app receives:

      arg1 : value1
      arg2 : value2
      

      Query strings are useful for passing data that doesn’t require the user to take action. You could generate a query string somewhere in your app and append it a URL so when a user makes a request, the data is automatically passed for them. A query string can also be generated by forms that have GET as the method.

      To create our own query string on the query-example route, we’ll start with this simple one:

      http://127.0.0.1:5000/query-example?language=Python

      If you run the app and navigate to that URL, you’ll see that nothing has changed. That’s only because we haven’t handled the query arguments yet.

      To do so, we’ll need to read in the language key by using either request.args.get('language') or request.args['language'].

      By calling request.args.get('language'), our application will continue to run if the language key doesn’t exist in the URL. In that case, the result of the method will be None. If we use request.args['language'], the app will return a 400 error if language key doesn’t exist in the URL. For query strings, I recommend using request.args.get() because of how easy it is for the user to modify the URL. If they remove one of the keys, request.args.get() will prevent the app from failing.

      Let’s read the language key and display it as output. Modify with query-example route the following code.

      @app.route('/query-example')
      def query_example():
          language = request.args.get('language') #if key doesn't exist, returns None
      
          return '''<h1>The language value is: {}</h1>'''.format(language)
      

      Then run the app and navigate to the URL.

      As you can see, the argument from the URL gets assigned to the language variable and then gets returned to the browser. In a real app, you’d probably want to do something with the data other than simply return it.

      To add more query string parameters, we just append ampersands and the new key-value pairs to the end of the URL. So an additional pair would look like this:

      http://127.0.0.1:5000/query-example?language=Python&framework=Flask

      With the new pair being:

      framework : Flask
      

      And if you want more, continue adding ampersands and key-value pairs. Here’s an example:

      http://127.0.0.1:5000/query-example?language=Python&framework=Flask&website=Scotch

      To gain access to those values, we still use either request.args.get() or request.args[]. Let’s use both to demonstrate what happens when there’s a missing key. We’ll assign the value of the results to variables and then display them.

      @app.route('/query-example')
      def query_example():
          language = request.args.get('language') #if key doesn't exist, returns None
          framework = request.args['framework'] #if key doesn't exist, returns a 400, bad request error
          website = request.args.get('website')
      
          return '''<h1>The language value is: {}</h1>
                    <h1>The framework value is: {}</h1>
                    <h1>The website value is: {}'''.format(language, framework, website)
      

      When you run that, you should see:

      If you remove the language from the URL, then you’ll see that the value ends up as None.

      If you remove the framework key, you get an error.

      Now that you undertand query strings, let’s move on to the next type of incoming data.

      Form Data

      Next we have form data. Form data comes from a form that has been sent as a POST request to a route. So instead of seeing the data in the URL (except for cases when the form is submitted with a GET request), the form data will be passed to the app behind the scenes. Even though you can’t easily see the form data that gets passed, your app can still read it.

      To demonstrate this, modify the form-example route to accept both GET and POST requests and to return a simple form.

      @app.route('/form-example', methods=['GET', 'POST']) #allow both GET and POST requests
      def form_example():
          return '''<form method="POST">
                        Language: <input type="text" name="language"><br>
                        Framework: <input type="text" name="framework"><br>
                        <input type="submit" value="Submit"><br>
                    </form>'''
      

      Running the app results in this:

      The most important thing to know about this form is that it performs a POST request to the same route that generated the form. The keys that we’ll read in our app all come from the “name” attributes on our form inputs. In our case, language and framework are the names of the inputs, so we’ll have access to those in our app.

      Inside the view function, we need to check if the request method is GET or POST. If it’s GET, we simply display the form we have. If it’s POST, then we want to process the incoming data.

      To do that, let’s add a simple if statement that checks for a POST request. If the request method isn’t POST, then we know it’s GET, because our route only allows those two types of requests. For GET requests, the form will be generated.

      @app.route('/form-example', methods=['GET', 'POST']) #allow both GET and POST requests
      def form_example():
          if request.method == 'POST': #this block is only entered when the form is submitted
              return 'Submitted form.'
      
          return '''<form method="POST">
                        Language: <input type="text" name="language"><br>
                        Framework: <input type="text" name="framework"><br>
                        <input type="submit" value="Submit"><br>
                    </form>'''
      

      Inside of the block, we’ll read the incoming values with request.args.get('language') and request.form['framework']. Remember that if we have more inputs, then we can read additional data.

      Add the additional code to the form-example route.

      @app.route('/form-example', methods=['GET', 'POST']) #allow both GET and POST requests
      def form_example():
          if request.method == 'POST':  #this block is only entered when the form is submitted
              language = request.form.get('language')
              framework = request.form['framework']
      
              return '''<h1>The language value is: {}</h1>
                        <h1>The framework value is: {}</h1>'''.format(language, framework)
      
          return '''<form method="POST">
                        Language: <input type="text" name="language"><br>
                        Framework: <input type="text" name="framework"><br>
                        <input type="submit" value="Submit"><br>
                    </form>'''
      

      Try running the app and submitting the form.

      Similiar to the query string example before, we can use request.form.get() instead of referencing the key directly with request.form[]. request.form.get() returns None instead of causing a 400 error when the key isn’t found.

      As you can see, handling submitted form data is just as easy as handling query string arguments.

      JSON Data

      Finally, we have JSON data. Like form data, it’s not so easy to see. JSON data is normally constructed by a process that calls our route. An example JSON object looks like this:

      {
          "language" : "Python",
          "framework" : "Flask",
          "website" : "Scotch",
          "version_info" : {
              "python" : 3.4,
              "flask" : 0.12
          },
          "examples" : ["query", "form", "json"],
          "boolean_test" : true
      }
      

      As you can see with JSON, you can pass much more complicated data that you could with query strings or form data. In the example, you see nested JSON objects and an array of items. With Flask, reading all of these values is straightforward.

      First, to send a JSON object, we’ll need a program capable of sending custom requests to URLs. For this, we can use an app called Postman.

      Before we use Postman though, change the method on the route to accept only POST requests.

      @app.route('/json-example', methods=['POST']) #GET requests will be blocked
      def json_example():
          return 'Todo...'
      

      Then in Postman, let’s do a little set up to enable sending POST requests.

      In Postman, add the URL and change the type to POST. On the body tab, change to raw and select JSON (application/json) from the drop down. All this is done so Postman can send JSON data properly and so your Flask app will understand that it’s receiving JSON.

      From there, you can copy the example into the text input. It should look like this:

      To test, just send the request and you should get ‘Todo…’ as the response because we haven’t modified our function yet.

      To read the data, first you must understand how Flask translates JSON data into Python data structures.

      Anything that is an object gets converted to a Python dict. {"key" : "value"} in JSON corresponds to somedict['key'], which returns a value in Python.

      An array in JSON gets converted to a list in Python. Since the syntax is the same, here’s an example list: [1,2,3,4,5]

      Then the values inside of quotes in the JSON object become strings in Python. true and false become True and False in Python. Finally, numbers without quotes around them become numbers in Python.

      Now let’s get on to reading the incoming JSON data.

      First, let’s assign everything from the JSON object into a variable using request.get_json().

      req_data = request.get_json()
      

      request.get_json() converts the JSON object into Python data for us. Let’s assign the incoming request data to variables and return them by making the following changes to our json-example route.

      @app.route('/json-example', methods=['POST']) #GET requests will be blocked
      def json_example():
          req_data = request.get_json()
      
          language = req_data['language']
          framework = req_data['framework']
          python_version = req_data['version_info']['python'] #two keys are needed because of the nested object
          example = req_data['examples'][0] #an index is needed because of the array
          boolean_test = req_data['boolean_test']
      
          return '''
                 The language value is: {}
                 The framework value is: {}
                 The Python version is: {}
                 The item at index 0 in the example list is: {}
                 The boolean value is: {}'''.format(language, framework, python_version, example, boolean_test)
      

      If we run our app and submit the same request using Postman, we will get this:

      Note how you access elements that aren’t at the top level. ['version']['python'] is used because you are entering a nested object. And ['examples'][0] is used to access the 0th index in the examples array.

      If the JSON object sent with the request doesn’t have a key that is accessed in your view function, then the request will fail. If you don’t want it to fail when a key doesn’t exist, you’ll have to check if the key exists before trying to access it. Here’s an example:

      language = None
      if 'language' in req_data:
          language = req_data['language']
      

      Conclusion

      You should now understand how to use the request object in Flask to get the most common forms of input data. To recap, we covered:

      • Query Strings
      • Form Data
      • JSON Data

      With this knowledge, you’ll have no problem with any data that users throw at your app!

      Learn More

      If you like this tutorial and want to learn more about Flask from me, check out my free Intro to Flask video course at my site Pretty Printed, which takes you from knowing nothing about Flask to building a guestbook app. If that course isn’t at your level, then I have other courses on my site that may interest you as well.



      Source link