One place for hosting & domains

      Processing

      How To Build a Media Processing API in Node.js With Express and FFmpeg.wasm


      The author selected the Electronic Frontier Foundation to receive a donation as part of the Write for DOnations program.

      Introduction

      Handling media assets is becoming a common requirement of modern back-end services. Using dedicated, cloud-based solutions may help when you’re dealing with massive scale or performing expensive operations, such as video transcoding. However, the extra cost and added complexity may be hard to justify when all you need is to extract a thumbnail from a video or check that user-generated content is in the correct format. Particularly at a smaller scale, it makes sense to add media processing capability directly to your Node.js API.

      In this guide, you will build a media API in Node.js with Express and ffmpeg.wasm — a WebAssembly port of the popular media processing tool. You’ll build an endpoint that extracts a thumbnail from a video as an example. You can use the same techniques to add other features supported by FFmpeg to your API.

      When you’re finished, you will have a good grasp on handling binary data in Express and processing them with ffmpeg.wasm. You’ll also handle requests made to your API that cannot be processed in parallel.

      Prerequisites

      To complete this tutorial, you will need:

      This tutorial was verified with Node v16.11.0, npm v7.15.1, express v4.17.1, and ffmpeg.wasm v0.10.1.

      Step 1 — Setting Up the Project and Creating a Basic Express Server

      In this step, you will create a project directory, initialize Node.js and install ffmpeg, and set up a basic Express server.

      Start by opening the terminal and creating a new directory for the project:

      Navigate to the new directory:

      Use npm init to create a new package.json file. The -y parameter indicates that you’re happy with the default settings for the project.

      Finally, use npm install to install the packages required to build the API. The --save flag indicates that you wish to save those as dependencies in the package.json file.

      • npm install --save @ffmpeg/ffmpeg @ffmpeg/core express cors multer p-queue

      Now that you have installed ffmpeg, you’ll set up a web server that responds to requests using Express.

      First, open a new file called server.mjs with nano or your editor of choice:

      The code in this file will register the cors middleware which will permit requests made from websites with a different origin. At the top of the file, import the express and cors dependencies:

      server.mjs

      import express from 'express';
      import cors from 'cors';
      

      Then, create an Express app and start the server on the port :3000 by adding the following code below the import statements:

      server.mjs

      ...
      const app = express();
      const port = 3000;
      
      app.use(cors());
      
      app.listen(port, () => {
          console.log(`[info] ffmpeg-api listening at http://localhost:${port}`)
      });
      

      You can start the server by running the following command:

      You’ll see the following output:

      Output

      [info] ffmpeg-api listening at http://localhost:3000

      When you try loading http://localhost:3000 in your browser, you’ll see Cannot GET /. This is Express telling you it is listening for requests.

      With your Express server now set up, you’ll create a client to upload the video and make requests to your Express server.

       Step 2 — Creating a Client and Testing the Server

      In this section, you’ll create a web page that will let you select a file and upload it to the API for processing.

      Start by opening a new file called client.html:

      In your client.html file, create a file input and a Create Thumbnail button. Below, add an empty <div> element to display errors and an image that will show the thumbnail that the API sends back. At the very end of the <body> tag, load a script called client.js. Your final HTML template should look as follows:

      client.html

      <!DOCTYPE html>
      <html lang="en">
      <head>
          <meta charset="UTF-8">
          <title>Create a Thumbnail from a Video</title>
          <style>
              #thumbnail {
                  max-width: 100%;
              }
          </style>
      </head>
      <body>
          <div>
              <input id="file-input" type="file" />
              <button id="submit">Create Thumbnail</button>
              <div id="error"></div>
              <img id="thumbnail" />
          </div>
          <script src="https://www.digitalocean.com/community/tutorials/client.js"></script>
      </body>
      </html>
      

      Note that each element has a unique id. You’ll need them when referring to the elements from the client.js script. The styling on the #thumbnail element is there to ensure that the image fits on the screen when it loads.

      Save the client.html file and open client.js:

      In your client.js file, start by defining variables that store references to your HTML elements you created:

      client.js

      const fileInput = document.querySelector('#file-input');
      const submitButton = document.querySelector('#submit');
      const thumbnailPreview = document.querySelector('#thumbnail');
      const errorDiv = document.querySelector('#error');
      

      Then, attach a click event listener to the submitButton variable to check whether you’ve selected a file:

      client.js

      ...
      submitButton.addEventListener('click', async () => {
          const { files } = fileInput;
      }
      

      Next, create a function showError() that will output an error message when a file is not selected. Add the showError() function above your event listener:

      client.js

      const fileInput = document.querySelector('#file-input');
      const submitButton = document.querySelector('#submit');
      const thumbnailPreview = document.querySelector('#thumbnail');
      const errorDiv = document.querySelector('#error');
      
      function showError(msg) {
          errorDiv.innerText = `ERROR: ${msg}`;
      }
      
      submitButton.addEventListener('click', async () => {
      ...
      

      Now, you will build a function createThumbnail() that will make a request to the API, send the video, and receive a thumbnail in response. At the top of your client.js file, define a new constant with the URL to a /thumbnail endpoint:

      const API_ENDPOINT = 'http://localhost:3000/thumbnail';
      
      const fileInput = document.querySelector('#file-input');
      const submitButton = document.querySelector('#submit');
      const thumbnailPreview = document.querySelector('#thumbnail');
      const errorDiv = document.querySelector('#error');
      ...
      

      You will define and use the /thumbnail endpoint in your Express server.

      Next, add the createThumbnail() function below your showError() function:

      client.js

      ...
      function showError(msg) {
          errorDiv.innerText = `ERROR: ${msg}`;
      }
      
      async function createThumbnail(video) {
      
      }
      ...
      

      Web APIs frequently use JSON to transfer structured data from and to the client. To include a video in a JSON, you would have to encode it in base64, which would increase its size by about 30%. You can avoid this by using multipart requests instead. Multipart requests allow you to transfer structured data including binary files over http without the unnecessary overhead. You can do this using the FormData() constructor function.

      Inside the createThumbnail() function, create an instance of FormData and append the video file to the object. Then make a POST request to the API endpoint using the Fetch API with the FormData() instance as the body. Interpret the response as a binary file (or blob) and convert it to a data URL so that you can assign it to the <img> tag you created earlier.

      Here’s the full implementation of createThumbnail():

      client.js

      ...
      async function createThumbnail(video) {
          const payload = new FormData();
          payload.append('video', video);
      
          const res = await fetch(API_ENDPOINT, {
              method: 'POST',
              body: payload
          });
      
          if (!res.ok) {
              throw new Error('Creating thumbnail failed');
          }
      
          const thumbnailBlob = await res.blob();
          const thumbnail = await blobToDataURL(thumbnailBlob);
      
          return thumbnail;
      }
      ...
      

      You’ll notice createThumbnail() has the function blobToDataURL() in its body. This is a helper function that will convert a blob to a data URL.

      Above your createThumbnail() function, create the function blobDataToURL() that returns a promise:

      client.js

      ...
      async function blobToDataURL(blob) {
          return new Promise((resolve, reject) => {
              const reader = new FileReader();
              reader.onload = () => resolve(reader.result);
              reader.onerror = () => reject(reader.error);
              reader.onabort = () => reject(new Error("Read aborted"));
              reader.readAsDataURL(blob);
          });
      }
      ...
      

      blobToDataURL() uses FileReader to read the contents of the binary file and format it as a data URL.

      With the createThumbnail() and showError() functions now defined, you can use them to finish implementing the event listener:

      client.js

      ...
      submitButton.addEventListener('click', async () => {
          const { files } = fileInput;
      
          if (files.length > 0) {
              const file = files[0];
              try {
                  const thumbnail = await createThumbnail(file);
                  thumbnailPreview.src = thumbnail;
              } catch(error) {
                  showError(error);
              }
          } else {
              showError('Please select a file');
          }
      });
      

      When a user clicks on the button, the event listener will pass the file to the createThumbnail() function. If successful, it will assign the thumbnail to the <img> element you created earlier. In case the user doesn’t select a file or the request fails, it will call the showError() function to display an error.

      At this point, your client.js file will look like the following:

      client.js

      const API_ENDPOINT = 'http://localhost:3000/thumbnail';
      
      const fileInput = document.querySelector('#file-input');
      const submitButton = document.querySelector('#submit');
      const thumbnailPreview = document.querySelector('#thumbnail');
      const errorDiv = document.querySelector('#error');
      
      function showError(msg) {
          errorDiv.innerText = `ERROR: ${msg}`;
      }
      
      async function blobToDataURL(blob) {
          return new Promise((resolve, reject) => {
              const reader = new FileReader();
              reader.onload = () => resolve(reader.result);
              reader.onerror = () => reject(reader.error);
              reader.onabort = () => reject(new Error("Read aborted"));
              reader.readAsDataURL(blob);
          });
      }
      
      async function createThumbnail(video) {
          const payload = new FormData();
          payload.append('video', video);
      
          const res = await fetch(API_ENDPOINT, {
              method: 'POST',
              body: payload
          });
      
          if (!res.ok) {
              throw new Error('Creating thumbnail failed');
          }
      
          const thumbnailBlob = await res.blob();
          const thumbnail = await blobToDataURL(thumbnailBlob);
      
          return thumbnail;
      }
      
      submitButton.addEventListener('click', async () => {
          const { files } = fileInput;
      
          if (files.length > 0) {
              const file = files[0];
      
              try {
                  const thumbnail = await createThumbnail(file);
                  thumbnailPreview.src = thumbnail;
              } catch(error) {
                  showError(error);
              }
          } else {
              showError('Please select a file');
          }
      });
      

      Start the server again by running:

      With your client now set up, uploading the video file here will result in receiving an error message. This is because the /thumbnail endpoint is not built yet. In the next step, you’ll create the /thumbnail endpoint in Express to accept the video file and create the thumbnail.

       Step 3 — Setting Up an Endpoint to Accept Binary Data

      In this step, you will set up a POST request for the /thumbnail endpoint and use middleware to accept multipart requests.

      Open server.mjs in an editor:

      Then, import multer at the top of the file:

      server.mjs

      import express from 'express';
      import cors from 'cors';
      import multer from 'multer';
      ...
      

      Multer is a middleware that processes incoming multipart/form-data requests before passing them to your endpoint handler. It extracts fields and files from the body and makes them available as an array on the request object in Express. You can configure where to store the uploaded files and set limits on file size and format.

      After importing it, initialize the multer middleware with the following options:

      server.mjs

      ...
      const app = express();
      const port = 3000;
      
      const upload = multer({
          storage: multer.memoryStorage(),
          limits: { fileSize: 100 * 1024 * 1024 }
      });
      
      app.use(cors());
      ...
      

      The storage option lets you choose where to store the incoming files. Calling multer.memoryStorage() will initialize a storage engine that keeps files in Buffer objects in memory as opposed to writing them to disk. The limits option lets you define various limits on what files will be accepted. Set the fileSize limit to 100MB or a different number that matches your needs and the amount of memory available on your server. This will prevent your API from crashing when the input file is too big.

      Note: Due to the limitations of WebAssembly, ffmpeg.wasm cannot handle input files over 2GB in size.

      Next, set up the POST /thumbnail endpoint itself:

      server.mjs

      ...
      app.use(cors());
      
      app.post('/thumbnail', upload.single('video'), async (req, res) => {
          const videoData = req.file.buffer;
      
          res.sendStatus(200);
      });
      
      app.listen(port, () => {
          console.log(`[info] ffmpeg-api listening at http://localhost:${port}`)
      });
      

      The upload.single('video') call will set up a middleware for that endpoint only that will parse the body of a multipart request that includes a single file. The first parameter is the field name. It must match the one you gave to FormData when creating the request in client.js. In this case, it’s video. multer will then attach the parsed file to the req parameter. The content of the file will be under req.file.buffer.

      At this point, the endpoint doesn’t do anything with the data it receives. It acknowledges the request by sending an empty 200 response. In the next step, you’ll replace that with the code that extracts a thumbnail from the video data received.

      In this step, you’ll use ffmpeg.wasm to extract a thumbnail from the video file received by the POST /thumbnail endpoint.

      ffmpeg.wasm is a pure WebAssembly and JavaScript port of FFmpeg. Its main goal is to allow running FFmpeg directly in the browser. However, because Node.js is built on top of V8 — Chrome’s JavaScript engine — you can use the library on the server too.

      The benefit of using a native port of FFmpeg over a wrapper built on top of the ffmpeg command is that if you’re planning to deploy your app with Docker, you don’t have to build a custom image that includes both FFmpeg and Node.js. This will save you time and reduce the maintenance burden of your service.

      Add the following import to the top of server.mjs:

      server.mjs

      import express from 'express';
      import cors from 'cors';
      import multer from 'multer';
      import { createFFmpeg } from '@ffmpeg/ffmpeg';
      ...
      

      Then, create an instance of ffmpeg.wasm and start loading the core:

      server.mjs

      ...
      import { createFFmpeg } from '@ffmpeg/ffmpeg';
      
      const ffmpegInstance = createFFmpeg({ log: true });
      let ffmpegLoadingPromise = ffmpegInstance.load();
      
      const app = express();
      ...
      

      The ffmpegInstance variable holds a reference to the library. Calling ffmpegInstance.load() starts loading the core into memory asynchronously and returns a promise. Store the promise in the ffmpegLoadingPromise variable so that you can check whether the core has loaded.

      Next, define the following helper function that will use fmpegLoadingPromise to wait for the core to load in case the first request arrives before it’s ready:

      server.mjs

      ...
      let ffmpegLoadingPromise = ffmpegInstance.load();
      
      async function getFFmpeg() {
          if (ffmpegLoadingPromise) {
              await ffmpegLoadingPromise;
              ffmpegLoadingPromise = undefined;
          }
      
          return ffmpegInstance;
      }
      
      const app = express();
      ...
      

      The getFFmpeg() function returns a reference to the library stored in the ffmpegInstance variable. Before returning it, it checks whether the library has finished loading. If not, it will wait until ffmpegLoadingPromise resolves. In case the first request to your POST /thumbnail endpoint arrives before ffmpegInstance is ready to use, your API will wait and resolve it when it can rather than rejecting it.

      Now, implement the POST /thumbnail endpoint handler. Replace res.sendStatus(200); at the end of the end of the function with a call to getFFmpeg to get a reference to ffmpeg.wasm when it’s ready:

      server.mjs

      ...
      app.post('/thumbnail', upload.single('video'), async (req, res) => {
          const videoData = req.file.buffer;
      
          const ffmpeg = await getFFmpeg();
      });
      ...
      

      ffmpeg.wasm works on top of an in-memory file system. You can read and write to it using ffmpeg.FS. When running FFmpeg operations, you will pass virtual file names to the ffmpeg.run function as an argument the same way as you would when working with the CLI tool. Any output files created by FFmpeg will be written to the file system for you to retrieve.

      In this case, the input file is a video. The output file will be a single PNG image. Define the following variables:

      server.mjs

      ...
          const ffmpeg = await getFFmpeg();
      
          const inputFileName = `input-video`;
          const outputFileName = `output-image.png`;
          let outputData = null;
      });
      ...
      

      The file names will be used on the virtual file system. outputData is where you’ll store the thumbnail when it’s ready.

      Call ffmpeg.FS() to write the video data to the in-memory file system:

      server.mjs

      ...
          let outputData = null;
      
          ffmpeg.FS('writeFile', inputFileName, videoData);
      });
      ...
      

      Then, run the FFmpeg operation:

      server.mjs

      ...
          ffmpeg.FS('writeFile', inputFileName, videoData);
      
          await ffmpeg.run(
              '-ss', '00:00:01.000',
              '-i', inputFileName,
              '-frames:v', '1',
              outputFileName
          );
      });
      ...
      

      The -i parameter specifies the input file. -ss seeks to the specified time (in this case, 1 second from the beginning of the video). -frames:v limits the number of frames that will be written to the output (a single frame in this scenario). outputFileName at the end indicates where will FFmpeg write the output.

      After FFmpeg exits, use ffmpeg.FS() to read the data from the file system and delete both the input and output files to free up memory:

      server.mjs

      ...
          await ffmpeg.run(
              '-ss', '00:00:01.000',
              '-i', inputFileName,
              '-frames:v', '1',
              outputFileName
          );
      
          outputData = ffmpeg.FS('readFile', outputFileName);
          ffmpeg.FS('unlink', inputFileName);
          ffmpeg.FS('unlink', outputFileName);
      });
      ...
      

      Finally, dispatch the output data in the body of the response:

      server.mjs

      ...
          ffmpeg.FS('unlink', outputFileName);
      
          res.writeHead(200, {
              'Content-Type': 'image/png',
              'Content-Disposition': `attachment;filename=${outputFileName}`,
              'Content-Length': outputData.length
          });
          res.end(Buffer.from(outputData, 'binary'));
      });
      ...
      

      Calling res.writeHead() dispatches the response head. The second parameter includes custom http headers) with information about the data in the body of the request that will follow. The res.end() function sends the data from its first argument as the body of the request and finalizes the request. The outputData variable is a raw array of bytes as returned by ffmpeg.FS(). Passing it to Buffer.from() initializes a Buffer to ensure the binary data will be handled correctly by res.end().

      At this point, your POST /thumbnail endpoint implementation should look like this:

      server.mjs

      ...
      app.post('/thumbnail', upload.single('video'), async (req, res) => {
          const videoData = req.file.buffer;
      
          const ffmpeg = await getFFmpeg();
      
          const inputFileName = `input-video`;
          const outputFileName = `output-image.png`;
          let outputData = null;
      
          ffmpeg.FS('writeFile', inputFileName, videoData);
      
          await ffmpeg.run(
              '-ss', '00:00:01.000',
              '-i', inputFileName,
              '-frames:v', '1',
              outputFileName
          );
      
          outputData = ffmpeg.FS('readFile', outputFileName);
          ffmpeg.FS('unlink', inputFileName);
          ffmpeg.FS('unlink', outputFileName);
      
          res.writeHead(200, {
              'Content-Type': 'image/png',
              'Content-Disposition': `attachment;filename=${outputFileName}`,
              'Content-Length': outputData.length
          });
          res.end(Buffer.from(outputData, 'binary'));
      });
      ...
      

      Aside from the 100MB file limit for uploads, there’s no input validation or error handling. When ffmpeg.wasm fails to process a file, reading the output from the virtual file system will fail and prevent the response from being sent. For the purposes of this tutorial, wrap the implementation of the endpoint in a try-catch block to handle that scenario:

      server.mjs

      ...
      app.post('/thumbnail', upload.single('video'), async (req, res) => {
          try {
              const videoData = req.file.buffer;
      
              const ffmpeg = await getFFmpeg();
      
              const inputFileName = `input-video`;
              const outputFileName = `output-image.png`;
              let outputData = null;
      
              ffmpeg.FS('writeFile', inputFileName, videoData);
      
              await ffmpeg.run(
                  '-ss', '00:00:01.000',
                  '-i', inputFileName,
                  '-frames:v', '1',
                  outputFileName
              );
      
              outputData = ffmpeg.FS('readFile', outputFileName);
              ffmpeg.FS('unlink', inputFileName);
              ffmpeg.FS('unlink', outputFileName);
      
              res.writeHead(200, {
                  'Content-Type': 'image/png',
                  'Content-Disposition': `attachment;filename=${outputFileName}`,
                  'Content-Length': outputData.length
              });
              res.end(Buffer.from(outputData, 'binary'));
          } catch(error) {
              console.error(error);
              res.sendStatus(500);
          }
      ...
      });
      

      Secondly, ffmpeg.wasm cannot handle two requests in parallel. You can try this yourself by launching the server:

      • node --experimental-wasm-threads server.mjs

      Note the flag required for ffmpeg.wasm to work. The library depends on WebAssembly threads and bulk memory operations. These have been in V8/Chrome since 2019. However, as of Node.js v16.11.0, WebAssembly threads remain behind a flag in case there might be changes before the proposal is finalised. Bulk memory operations also require a flag in older versions of Node. If you’re running Node.js 15 or lower, add --experimental-wasm-bulk-memory as well.

      The output of the command will look like this:

      Output

      [info] use ffmpeg.wasm v0.10.1 [info] load ffmpeg-core [info] loading ffmpeg-core [info] fetch ffmpeg.wasm-core script from @ffmpeg/core [info] ffmpeg-api listening at http://localhost:3000 [info] ffmpeg-core loaded

      Open client.html in a web browser and select a video file. When you click the Create Thumbnail button, you should see the thumbnail appear on the page. Behind the scenes, the site uploads the video to the API, which processes it and responds with the image. However, when you click the button repeatedly in quick succession, the API will handle the first request. The subsequent requests will fail:

      Output

      Error: ffmpeg.wasm can only run one command at a time at Object.run (.../ffmpeg-api/node_modules/@ffmpeg/ffmpeg/src/createFFmpeg.js:126:13) at file://.../ffmpeg-api/server.mjs:54:26 at runMicrotasks (<anonymous>) at processTicksAndRejections (internal/process/task_queues.js:95:5)

      In the next section, you’ll learn how to deal with concurrent requests.

      Step 5 — Handling Concurrent Requests

      Since ffmpeg.wasm can only execute a single operation at a time, you’ll need a way of serializing requests that come in and processing them one at a time. In this scenario, a promise queue is a perfect solution. Instead of starting to process each request right away, it will be queued up and processed when all the requests that arrived before it have been handled.

      Open server.mjs in your preferred editor:

      Import p-queue at the top of server.mjs:

      server.mjs

      import express from 'express';
      import cors from 'cors';
      import { createFFmpeg } from '@ffmpeg/ffmpeg';
      import PQueue from 'p-queue';
      ...
      

      Then, create a new queue at the top of server.mjs file under the variable ffmpegLoadingPromise:

      server.mjs

      ...
      const ffmpegInstance = createFFmpeg({ log: true });
      let ffmpegLoadingPromise = ffmpegInstance.load();
      
      const requestQueue = new PQueue({ concurrency: 1 });
      ...
      

      In the POST /thumbnail endpoint handler, wrap the calls to ffmpeg in a function that will be queued up:

      server.mjs

      ...
      app.post('/thumbnail', upload.single('video'), async (req, res) => {
          try {
              const videoData = req.file.buffer;
      
              const ffmpeg = await getFFmpeg();
      
              const inputFileName = `input-video`;
              const outputFileName = `thumbnail.png`;
              let outputData = null;
      
              await requestQueue.add(async () => {
                  ffmpeg.FS('writeFile', inputFileName, videoData);
      
                  await ffmpeg.run(
                      '-ss', '00:00:01.000',
                      '-i', inputFileName,
                      '-frames:v', '1',
                      outputFileName
                  );
      
                  outputData = ffmpeg.FS('readFile', outputFileName);
                  ffmpeg.FS('unlink', inputFileName);
                  ffmpeg.FS('unlink', outputFileName);
              });
      
              res.writeHead(200, {
                  'Content-Type': 'image/png',
                  'Content-Disposition': `attachment;filename=${outputFileName}`,
                  'Content-Length': outputData.length
              });
              res.end(Buffer.from(outputData, 'binary'));
          } catch(error) {
              console.error(error);
              res.sendStatus(500);
          }
      });
      ...
      

      Every time a new request comes in, it will only start processing when there’s nothing else queued up in front of it. Note that the final sending of the response can happen asynchronously. Once the ffmpeg.wasm operation finishes running, another request can start processing while the response goes out.

      To test that everything works as expected, start up the server again:

      • node --experimental-wasm-threads server.mjs

      Open the client.html file in your browser and try uploading a file.

      A screenshot of client.html with a thumbnail loaded

      With the queue in place, the API will now respond every time. The requests will be handled sequentially in the order in which they arrive.

      Conclusion

      In this article, you built a Node.js service that extracts a thumbnail from a video using ffmpeg.wasm. You learned how to upload binary data from the browser to your Express API using multipart requests and how to process media with FFmpeg in Node.js without relying on external tools or having to write data to disk.

      FFmpeg is an incredibly versatile tool. You can use the knowledge from this tutorial to take advantage of any features that FFmpeg supports and use them in your project. For example, to generate a three-second GIF, change the ffmpeg.run call to this on the POST /thumbnail endpoint:

      server.mjs

      ...
      await ffmpeg.run(
          '-y',
          '-t', '3',
          '-i', inputFileName,
          '-filter_complex', 'fps=5,scale=720:-1:flags=lanczos[x];[x]split[x1][x2];[x1]palettegen[p];[x2][p]paletteuse',
          '-f', 'gif',
          outputFileName
      );
      ...
      

      The library accepts the same parameters as the original ffmpeg CLI tool. You can use the official documentation to find a solution for your use case and test it quickly in the terminal.

      Thanks to ffmpeg.wasm being self-contained, you can dockerize this service using the stock Node.js base images and scale your service up by keeping multiple nodes behind a load balancer. Follow the tutorial How To Build a Node.js Application with Docker to learn more.

      If your use case requires performing more expensive operations, such as transcoding large videos, make sure that you run your service on machines with enough memory to store them. Due to current limitations in WebAssembly, the maximum input file size cannot exceed 2GB, although this might change in the future.

      Additionally, ffmpeg.wasm cannot take advantage of some x86 assembly optimizations from the original FFmpeg codebase. That means some operations can take a long time to finish. If that’s the case, consider whether this is the right solution for your use case. Alternatively, make requests to your API asynchronous. Instead of waiting for the operation to finish, queue it up and respond with a unique ID. Create another endpoint that the clients can query to find out whether the processing ended and the output file is ready. Learn more about the asynchronous request-reply pattern for REST APIs and how to implement it.



      Source link

      How To Build a Data Processing Pipeline Using Luigi in Python on Ubuntu 20.04


      The author selected the Free and Open Source Fund to receive a donation as part of the Write for DOnations program.

      Introduction

      Luigi is a Python package that manages long-running batch processing, which is the automated running of data processing jobs on batches of items. Luigi allows you to define a data processing job as a set of dependent tasks. For example, task B depends on the output of task A. And task D depends on the output of task B and task C. Luigi automatically works out what tasks it needs to run to complete a requested job.

      Overall Luigi provides a framework to develop and manage data processing pipelines. It was originally developed by Spotify, who use it to manage plumbing together collections of tasks that need to fetch and process data from a variety of sources. Within Luigi, developers at Spotify built functionality to help with their batch processing needs including handling of failures, the ability to automatically resolve dependencies between tasks, and visualization of task processing. Spotify uses Luigi to support batch processing jobs, including providing music recommendations to users, populating internal dashboards, and calculating lists of top songs.

      In this tutorial, you will build a data processing pipeline to analyze the most common words from the most popular books on Project Gutenburg. To do this, you will build a pipeline using the Luigi package. You will use Luigi tasks, targets, dependencies, and parameters to build your pipeline.

      Prerequisites

      To complete this tutorial, you will need the following:

      Step 1 — Installing Luigi

      In this step, you will create a clean sandbox environment for your Luigi installation.

      First, create a project directory. For this tutorial luigi-demo:

      Navigate into the newly created luigi-demo directory:

      Create a new virtual environment luigi-venv:

      • python3 -m venv luigi-venv

      And activate the newly created virtual environment:

      • . luigi-venv/bin/activate

      You will find (luigi-venv) appended to the front of your terminal prompt to indicate which virtual environment is active:

      Output

      (luigi-venv) username@hostname:~/luigi-demo$

      For this tutorial, you will need three libraries: luigi, beautifulsoup4, and requests. The requests library streamlines making HTTP requests; you will use it to download the Project Gutenberg book lists and the books to analyze. The beautifulsoup4 library provides functions to parse data from web pages; you will use it to parse out a list of the most popular books on the Project Gutenberg site.

      Run the following command to install these libraries using pip:

      • pip install wheel luigi beautifulsoup4 requests

      You will get a response confirming the installation of the latest versions of the libraries and all of their dependencies:

      Output

      Successfully installed beautifulsoup4-4.9.1 certifi-2020.6.20 chardet-3.0.4 docutils-0.16 idna-2.10 lockfile-0.12.2 luigi-3.0.1 python-daemon-2.2.4 python-dateutil-2.8.1 requests-2.24.0 six-1.15.0 soupsieve-2.0.1 tornado-5.1.1 urllib3-1.25.10

      You’ve installed the dependencies for your project. Now, you’ll move on to building your first Luigi task.

      Step 2 — Creating a Luigi Task

      In this step, you will create a “Hello World” Luigi task to demonstrate how they work.

      A Luigi task is where the execution of your pipeline and the definition of each task’s input and output dependencies take place. Tasks are the building blocks that you will create your pipeline from. You define them in a class, which contains:

      • A run() method that holds the logic for executing the task.
      • An output() method that returns the artifacts generated by the task. The run() method populates these artifacts.
      • An optional input() method that returns any additional tasks in your pipeline that are required to execute the current task. The run() method uses these to carry out the task.

      Create a new file hello-world.py:

      Now add the following code to your file:

      hello-world.py

      import luigi
      
      class HelloLuigi(luigi.Task):
      
          def output(self):
              return luigi.LocalTarget('hello-luigi.txt')
      
          def run(self):
              with self.output().open("w") as outfile:
                  outfile.write("Hello Luigi!")
      
      

      You define that HelloLuigi() is a Luigi task by adding the luigi.Task mixin to it.

      The output() method defines one or more Target outputs that your task produces. In the case of this example, you define a luigi.LocalTarget, which is a local file.

      Note: Luigi allows you to connect to a variety of common data sources including AWS S3 buckets, MongoDB databases, and SQL databases. You can find a complete list of supported data sources in the Luigi docs.

      The run() method contains the code you want to execute for your pipeline stage. For this example you are opening the output() target file in write mode, self.output().open("w") as outfile: and writing "Hello Luigi!" to it with outfile.write("Hello Luigi!").

      To execute the task you created, run the following command:

      • python -m luigi --module hello-world HelloLuigi --local-scheduler

      Here, you run the task using python -m instead of executing the luigi command directly; this is because Luigi can only execute code that is within the current PYTHONPATH. You can alternatively add PYTHONPATH='.' to the front of your Luigi command, like so:

      • PYTHONPATH='.' luigi --module hello-world HelloLuigi --local-scheduler

      With the --module hello-world HelloLuigi flag, you tell Luigi which Python module and Luigi task to execute.

      The --local-scheduler flag tells Luigi to not connect to a Luigi scheduler and, instead, execute this task locally. (We explain the Luigi scheduler in Step 4.) Running tasks using the local-scheduler flag is only recommended for development work.

      Luigi will output a summary of the executed tasks:

      Output

      ===== Luigi Execution Summary ===== Scheduled 1 tasks of which: * 1 ran successfully: - 1 HelloLuigi() This progress looks :) because there were no failed tasks or missing dependencies ===== Luigi Execution Summary =====

      And it will create a new file hello-luigi.txt with content:

      hello-luigi.txt

      Hello Luigi!
      

      You have created a Luigi task that generates a file and then executed it using the Luigi local-scheduler. Now, you’ll create a task that can extract a list of books from a web page.

      In this step, you will create a Luigi task and define a run() method for the task to download a list of the most popular books on Project Gutenberg. You’ll define an output() method to store links to these books in a file. You will run these using the Luigi local scheduler.

      Create a new directory data inside of your luigi-demo directory. This will be where you will store the files defined in the output() methods of your tasks. You need to create the directories before running your tasks—Python throws exceptions when you try to write a file to a directory that does not exist yet:

      • mkdir data
      • mkdir data/counts
      • mkdir data/downloads

      Create a new file word-frequency.py:

      Insert the following code, which is a Luigi task to extract a list of links to the top most-read books on Project Gutenberg:

      word-frequency.py

      import requests
      import luigi
      from bs4 import BeautifulSoup
      
      
      class GetTopBooks(luigi.Task):
          """
          Get list of the most popular books from Project Gutenberg
          """
      
          def output(self):
              return luigi.LocalTarget("data/books_list.txt")
      
          def run(self):
              resp = requests.get("http://www.gutenberg.org/browse/scores/top")
      
              soup = BeautifulSoup(resp.content, "html.parser")
      
              pageHeader = soup.find_all("h2", string="Top 100 EBooks yesterday")[0]
              listTop = pageHeader.find_next_sibling("ol")
      
              with self.output().open("w") as f:
                  for result in listTop.select("li>a"):
                      if "/ebooks/" in result["href"]:
                          f.write("http://www.gutenberg.org{link}.txt.utf-8n"
                              .format(
                                  link=result["href"]
                              )
                          )
      

      You define an output() target of file "data/books_list.txt" to store the list of books.

      In the run() method, you:

      • use the requests library to download the HTML contents of the Project Gutenberg top books page.
      • use the BeautifulSoup library to parse the contents of the page. The BeautifulSoup library allows us to scrape information out of web pages. To find out more about using the BeautifulSoup library, read the How To Scrape Web Pages with Beautiful Soup and Python 3 tutorial.
      • open the output file defined in the output() method.
      • iterate over the HTML structure to get all of the links in the Top 100 EBooks yesterday list. For this page, this is locating all links <a> that are within a list item <li>. For each of those links, if they link to a page that points at a link containing /ebooks/, you can assume it is a book and write that link to your output() file.

      Screenshot of the Project Gutenberg top books web page with the top ebooks links highlighted

      Save and exit the file once you’re done.

      Execute this new task using the following command:

      • python -m luigi --module word-frequency GetTopBooks --local-scheduler

      Luigi will output a summary of the executed tasks:

      Output

      ===== Luigi Execution Summary ===== Scheduled 1 tasks of which: * 1 ran successfully: - 1 GetTopBooks() This progress looks :) because there were no failed tasks or missing dependencies ===== Luigi Execution Summary =====

      In the data directory, Luigi will create a new file (data/books_list.txt). Run the following command to output the contents of the file:

      This file contains a list of URLs extracted from the Project Gutenberg top projects list:

      Output

      http://www.gutenberg.org/ebooks/1342.txt.utf-8 http://www.gutenberg.org/ebooks/11.txt.utf-8 http://www.gutenberg.org/ebooks/2701.txt.utf-8 http://www.gutenberg.org/ebooks/1661.txt.utf-8 http://www.gutenberg.org/ebooks/16328.txt.utf-8 http://www.gutenberg.org/ebooks/45858.txt.utf-8 http://www.gutenberg.org/ebooks/98.txt.utf-8 http://www.gutenberg.org/ebooks/84.txt.utf-8 http://www.gutenberg.org/ebooks/5200.txt.utf-8 http://www.gutenberg.org/ebooks/51461.txt.utf-8 ...

      You’ve created a task that can extract a list of books from a web page. In the next step, you’ll set up a central Luigi scheduler.

      Step 4 — Running the Luigi Scheduler

      Now, you’ll launch the Luigi scheduler to execute and visualize your tasks. You will take the task developed in Step 3 and run it using the Luigi scheduler.

      So far, you have been running Luigi using the --local-scheduler tag to run your jobs locally without allocating work to a central scheduler. This is useful for development, but for production usage it is recommended to use the Luigi scheduler. The Luigi scheduler provides:

      • A central point to execute your tasks.
      • Visualization of the execution of your tasks.

      To access the Luigi scheduler interface, you need to enable access to port 8082. To do this, run the following command:

      To run the scheduler execute the following command:

      • sudo sh -c ". luigi-venv/bin/activate ;luigid --background --port 8082"

      Note: We have re-run the virtualenv activate script as root, before launching the Luigi scheduler as a background task. This is because when running sudo the virtualenv environment variables and aliases are not carried over.

      If you do not want to run as root, you can run the Luigi scheduler as a background process for the current user. This command runs the Luigi scheduler in the background and hides messages from the scheduler background task. You can find out more about managing background processes in the terminal at How To Use Bash’s Job Control to Manage Foreground and Background Processes:

      • luigid --port 8082 > /dev/null 2> /dev/null &

      Open a browser to access the Luigi interface. This will either be at http://your_server_ip:8082, or if you have set up a domain for your server http://your_domain:8082. This will open the Luigi user interface.

      Luigi default user interface

      By default, Luigi tasks run using the Luigi scheduler. To run one of your previous tasks using the Luigi scheduler omit the --local-scheduler argument from the command. Re-run the task from Step 3 using the following command:

      • python -m luigi --module word-frequency GetTopBooks

      Refresh the Luigi scheduler user interface. You will find the GetTopBooks task added to the run list and its execution status.

      Luigi User Interface after running the GetTopBooks Task

      You will continue to refer back to this user interface to monitor the progress of your pipeline.

      Note: If you’d like to secure your Luigi scheduler through HTTPS, you can serve it through Nginx. To set up an Nginx server using HTTPS follow: How To Secure Nginx with Let’s Encrypt on Ubuntu 20.04. See Github - Luigi - Pull Request 2785 for suggestions on a suitable Nginx configuration to connect the Luigi server to Nginx.

      You’ve launched the Luigi Scheduler and used it to visualize your executed tasks. Next, you will create a task to download the list of books that the GetTopBooks() task outputs.

      Step 5 — Downloading the Books

      In this step you will create a Luigi task to download a specified book. You will define a dependency between this newly created task and the task created in Step 3.

      First open your file:

      Add an additional class following your GetTopBooks() task to the word-frequency.py file with the following code:

      word-frequency.py

      . . .
      class DownloadBooks(luigi.Task):
          """
          Download a specified list of books
          """
          FileID = luigi.IntParameter()
      
          REPLACE_LIST = """.,"';_[]:*-"""
      
          def requires(self):
              return GetTopBooks()
      
          def output(self):
              return luigi.LocalTarget("data/downloads/{}.txt".format(self.FileID))
      
          def run(self):
              with self.input().open("r") as i:
                  URL = i.read().splitlines()[self.FileID]
      
                  with self.output().open("w") as outfile:
                      book_downloads = requests.get(URL)
                      book_text = book_downloads.text
      
                      for char in self.REPLACE_LIST:
                          book_text = book_text.replace(char, " ")
      
                      book_text = book_text.lower()
                      outfile.write(book_text)
      

      In this task you introduce a Parameter; in this case, an integer parameter. Luigi parameters are inputs to your tasks that affect the execution of the pipeline. Here you introduce a parameter FileID to specify a line in your list of URLs to fetch.

      You have added an additional method to your Luigi task, def requires(); in this method you define the Luigi task that you need the output of before you can execute this task. You require the output of the GetTopBooks() task you defined in Step 3.

      In the output() method, you define your target. You use the FileID parameter to create a name for the file created by this step. In this case, you format data/downloads/{FileID}.txt.

      In the run() method, you:

      • open the list of books generated in the GetTopBooks() task.
      • get the URL from the line specified by parameter FileID.
      • use the requests library to download the contents of the book from the URL.
      • filter out any special characters inside the book like :,.?, so they don’t get included in your word analysis.
      • convert the text to lowercase so you can compare words with different cases.
      • write the filtered output to the file specified in the output() method.

      Save and exit your file.

      Run the new DownloadBooks() task using this command:

      • python -m luigi --module word-frequency DownloadBooks --FileID 2

      In this command, you set the FileID parameter using the --FileID argument.

      Note: Be careful when defining a parameter with an _ in the name. To reference them in Luigi you need to substitute the _ for a -. For example, a File_ID parameter would be referenced as --File-ID when calling a task from the terminal.

      You will receive the following output:

      Output

      ===== Luigi Execution Summary ===== Scheduled 2 tasks of which: * 1 complete ones were encountered: - 1 GetTopBooks() * 1 ran successfully: - 1 DownloadBooks(FileID=2) This progress looks :) because there were no failed tasks or missing dependencies ===== Luigi Execution Summary =====

      Note from the output that Luigi has detected that you have already generated the output of GetTopBooks() and skipped running that task. This functionality allows you to minimize the number of tasks you have to execute as you can re-use successful output from previous runs.

      You have created a task that uses the output of another task and downloads a set of books to analyze. In the next step, you will create a task to count the most common words in a downloaded book.

      Step 6 — Counting Words and Summarizing Results

      In this step, you will create a Luigi task to count the frequency of words in each of the books downloaded in Step 5. This will be your first task that executes in parallel.

      First open your file again:

      Add the following imports to the top of word-frequency.py:

      word-frequency.py

      from collections import Counter
      import pickle
      

      Add the following task to word-frequency.py, after your DownloadBooks() task. This task takes the output of the previous DownloadBooks() task for a specified book, and returns the most common words in that book:

      word-frequency.py

      class CountWords(luigi.Task):
          """
          Count the frequency of the most common words from a file
          """
      
          FileID = luigi.IntParameter()
      
          def requires(self):
              return DownloadBooks(FileID=self.FileID)
      
          def output(self):
              return luigi.LocalTarget(
                  "data/counts/count_{}.pickle".format(self.FileID),
                  format=luigi.format.Nop
              )
      
          def run(self):
              with self.input().open("r") as i:
                  word_count = Counter(i.read().split())
      
                  with self.output().open("w") as outfile:
                      pickle.dump(word_count, outfile)
      

      When you define requires() you pass the FileID parameter to the next task. When you specify that a task depends on another task, you specify the parameters you need the dependent task to be executed with.

      In the run() method you:

      • open the file generated by the DownloadBooks() task.
      • use the built-in Counter object in the collections library. This provides an easy way to analyze the most common words in a book.
      • use the pickle library to store the output of the Python Counter object, so you can re-use that object in a later task. pickle is a library that you use to convert Python objects into a byte stream, which you can store and restore into a later Python session. You have to set the format property of the luigi.LocalTarget to allow it to write the binary output the pickle library generates.

      Save and exit your file.

      Run the new CountWords() task using this command:

      • python -m luigi --module word-frequency CountWords --FileID 2

      Open the CountWords task graph view in the Luigi scheduler user interface.

      Showing how to view a graph from the Luigi user interface

      Deselect the Hide Done option, and deselect Upstream Dependencies. You will find the flow of execution from the tasks you have created.

      Visualizing the execution of the CountWords task

      You have created a task to count the most common words in a downloaded book and visualized the dependencies between those tasks. Next, you will define parameters that you can use to customize the execution of your tasks.

      Step 7 — Defining Configuration Parameters

      In this step, you will add configuration parameters to the pipeline. These will allow you to customize how many books to analyze and the number of words to include in the results.

      When you want to set parameters that are shared among tasks, you can create a Config() class. Other pipeline stages can reference the parameters defined in the Config() class; these are set by the pipeline when executing a job.

      Add the following Config() class to the end of word-frequency.py. This will define two new parameters in your pipeline for the number of books to analyze and the number of most frequent words to include in the summary:

      word-frequency.py

      class GlobalParams(luigi.Config):
          NumberBooks = luigi.IntParameter(default=10)
          NumberTopWords = luigi.IntParameter(default=500)
      

      Add the following class to word-frequency.py. This class aggregates the results from all of the CountWords() task to create a summary of the most frequent words:

      word-frequency.py

      class TopWords(luigi.Task):
          """
          Aggregate the count results from the different files
          """
      
          def requires(self):
              requiredInputs = []
              for i in range(GlobalParams().NumberBooks):
                  requiredInputs.append(CountWords(FileID=i))
              return requiredInputs
      
          def output(self):
              return luigi.LocalTarget("data/summary.txt")
      
          def run(self):
              total_count = Counter()
              for input in self.input():
                  with input.open("rb") as infile:
                      nextCounter = pickle.load(infile)
                      total_count += nextCounter
      
              with self.output().open("w") as f:
                  for item in total_count.most_common(GlobalParams().NumberTopWords):
                      f.write("{0: <15}{1}n".format(*item))
      
      

      In the requires() method, you can provide a list where you want a task to use the output of multiple dependent tasks. You use the GlobalParams().NumberBooks parameter to set the number of books you need word counts from.

      In the output() method, you define a data/summary.txt output file that will be the final output of your pipeline.

      In the run() method you:

      • create a Counter() object to store the total count.
      • open the file and “unpickle” it (convert it from a file back to a Python object), for each count carried out in the CountWords() method
      • append the loaded count and add it to the total count.
      • write the most common words to target output file.

      Run the pipeline with the following command:

      • python -m luigi --module word-frequency TopWords --GlobalParams-NumberBooks 15 --GlobalParams-NumberTopWords 750

      Luigi will execute the remaining tasks needed to generate the summary of the top words:

      Output

      ===== Luigi Execution Summary ===== Scheduled 31 tasks of which: * 2 complete ones were encountered: - 1 CountWords(FileID=2) - 1 GetTopBooks() * 29 ran successfully: - 14 CountWords(FileID=0,1,10,11,12,13,14,3,4,5,6,7,8,9) - 14 DownloadBooks(FileID=0,1,10,11,12,13,14,3,4,5,6,7,8,9) - 1 TopWords() This progress looks :) because there were no failed tasks or missing dependencies ===== Luigi Execution Summary =====

      You can visualize the execution of the pipeline from the Luigi scheduler. Select the GetTopBooks task in the task list and press the View Graph button.

      Showing how to view a graph from the Luigi user interface

      Deselect the Hide Done and Upstream Dependencies options.

      Visualizing the execution of the TopWords Task

      It will show the flow of processing that is happening in Luigi.

      Open the data/summary.txt file:

      You will find the calculated most common words:

      Output

      the 64593 and 41650 of 31896 to 31368 a 25265 i 23449 in 19496 it 16282 that 15907 he 14974 ...

      In this step, you have defined and used parameters to customize the execution of your tasks. You have generated a summary of the most common words for a set of books.

      Find all the code for this tutorial in this repository.

      Conclusion

      This tutorial has introduced you to using the Luigi data processing pipeline and its major features including tasks, parameters, configuration parameters, and the Luigi scheduler.

      Luigi supports connecting to a large number of common data sources out the box. You can also scale it to run large, complex data pipelines. This provides a powerful framework to start solving your data processing challenges.

      For more tutorials, check out our Data Analysis topic page and Python topic page.



      Source link

      Processing Incoming Request Data in Flask


      In any web app, you’ll have to process incoming request data from users. Flask, like any other web framework, allows you to access the request data easily.

      In this tutorial, we’ll go through how to process incoming data for the most common use cases. The forms of incoming data we’ll cover are: query strings, form data, and JSON objects. To demonstrate these cases, we’ll build a simple example app with three routes that accept either query string data, form data, or JSON data.

      The Request Object

      To access the incoming data in Flask, you have to use the request object. The request object holds all incoming data from the request, which includes the mimetype, referrer, IP address, raw data, HTTP method, and headers, among other things. Although all the information the request object holds can be useful, in this article, we’ll only focus on the data that is normally directly supplied by the caller of our endpoint.

      To gain access to the request object in Flask, you simply import it from the Flask library.

      from flask import request
      

      You then have the ability to use it in any of your view functions.

      Once we get to the section on query strings, you’ll get to see the request object in action.

      The Example App

      To demonstrate the different ways of using request, we’ll start with a simple Flask app. Even though the example app uses a simple organization layout for the view functions and routes, what you learn in this tutorial can be applied to any method of organizing your views like class-based views, blueprints, or an extension like Flask-Via.

      To get started, first we need to install Flask.

      pip install Flask
      

      Then, we can start with the following code.

      #app.py
      
      from flask import Flask, request #import main Flask class and request object
      
      app = Flask(__name__) #create the Flask app
      
      @app.route('/query-example')
      def query_example():
          return 'Todo...'
      
      @app.route('/form-example')
      def formexample():
          return 'Todo...'
      
      @app.route('/json-example')
      def jsonexample():
          return 'Todo...'
      
      if _name == '__main_':
          app.run(debug=True, port=5000) #run app in debug mode on port 5000
      

      Start the app with:

      python app.py
      

      The code supplied sets up three routes with a message telling us ‘Todo…’. The app will start on port 5000, so you can view each route in your browser with the following links:

      http://127.0.0.1:5000/query-example (or localhost:5000/query-example)
      http://127.0.0.1:5000/form-example (or localhost:5000/form-example)
      http://127.0.0.1:5000/json-example (or localhost:5000/json-example)

      For each of the three routes, you’ll see the same thing:

      Query Arguments

      URL arguments that you add to a query string are the simpliest way to pass data to a web app, so let’s start with them.

      A query string looks like the following:

      example.com?arg1=value1&arg2=value2
      

      The query string begins after the question mark (?) and has two key-value pairs separated by an ampersand (&). For each pair, the key is followed by an equals sign (=) and then the value. Even if you’ve never heard of a query string until now, you have definitely seen them all over the web.

      So in that example, the app receives:

      arg1 : value1
      arg2 : value2
      

      Query strings are useful for passing data that doesn’t require the user to take action. You could generate a query string somewhere in your app and append it a URL so when a user makes a request, the data is automatically passed for them. A query string can also be generated by forms that have GET as the method.

      To create our own query string on the query-example route, we’ll start with this simple one:

      http://127.0.0.1:5000/query-example?language=Python

      If you run the app and navigate to that URL, you’ll see that nothing has changed. That’s only because we haven’t handled the query arguments yet.

      To do so, we’ll need to read in the language key by using either request.args.get('language') or request.args['language'].

      By calling request.args.get('language'), our application will continue to run if the language key doesn’t exist in the URL. In that case, the result of the method will be None. If we use request.args['language'], the app will return a 400 error if language key doesn’t exist in the URL. For query strings, I recommend using request.args.get() because of how easy it is for the user to modify the URL. If they remove one of the keys, request.args.get() will prevent the app from failing.

      Let’s read the language key and display it as output. Modify with query-example route the following code.

      @app.route('/query-example')
      def query_example():
          language = request.args.get('language') #if key doesn't exist, returns None
      
          return '''<h1>The language value is: {}</h1>'''.format(language)
      

      Then run the app and navigate to the URL.

      As you can see, the argument from the URL gets assigned to the language variable and then gets returned to the browser. In a real app, you’d probably want to do something with the data other than simply return it.

      To add more query string parameters, we just append ampersands and the new key-value pairs to the end of the URL. So an additional pair would look like this:

      http://127.0.0.1:5000/query-example?language=Python&framework=Flask

      With the new pair being:

      framework : Flask
      

      And if you want more, continue adding ampersands and key-value pairs. Here’s an example:

      http://127.0.0.1:5000/query-example?language=Python&framework=Flask&website=Scotch

      To gain access to those values, we still use either request.args.get() or request.args[]. Let’s use both to demonstrate what happens when there’s a missing key. We’ll assign the value of the results to variables and then display them.

      @app.route('/query-example')
      def query_example():
          language = request.args.get('language') #if key doesn't exist, returns None
          framework = request.args['framework'] #if key doesn't exist, returns a 400, bad request error
          website = request.args.get('website')
      
          return '''<h1>The language value is: {}</h1>
                    <h1>The framework value is: {}</h1>
                    <h1>The website value is: {}'''.format(language, framework, website)
      

      When you run that, you should see:

      If you remove the language from the URL, then you’ll see that the value ends up as None.

      If you remove the framework key, you get an error.

      Now that you undertand query strings, let’s move on to the next type of incoming data.

      Form Data

      Next we have form data. Form data comes from a form that has been sent as a POST request to a route. So instead of seeing the data in the URL (except for cases when the form is submitted with a GET request), the form data will be passed to the app behind the scenes. Even though you can’t easily see the form data that gets passed, your app can still read it.

      To demonstrate this, modify the form-example route to accept both GET and POST requests and to return a simple form.

      @app.route('/form-example', methods=['GET', 'POST']) #allow both GET and POST requests
      def form_example():
          return '''<form method="POST">
                        Language: <input type="text" name="language"><br>
                        Framework: <input type="text" name="framework"><br>
                        <input type="submit" value="Submit"><br>
                    </form>'''
      

      Running the app results in this:

      The most important thing to know about this form is that it performs a POST request to the same route that generated the form. The keys that we’ll read in our app all come from the “name” attributes on our form inputs. In our case, language and framework are the names of the inputs, so we’ll have access to those in our app.

      Inside the view function, we need to check if the request method is GET or POST. If it’s GET, we simply display the form we have. If it’s POST, then we want to process the incoming data.

      To do that, let’s add a simple if statement that checks for a POST request. If the request method isn’t POST, then we know it’s GET, because our route only allows those two types of requests. For GET requests, the form will be generated.

      @app.route('/form-example', methods=['GET', 'POST']) #allow both GET and POST requests
      def form_example():
          if request.method == 'POST': #this block is only entered when the form is submitted
              return 'Submitted form.'
      
          return '''<form method="POST">
                        Language: <input type="text" name="language"><br>
                        Framework: <input type="text" name="framework"><br>
                        <input type="submit" value="Submit"><br>
                    </form>'''
      

      Inside of the block, we’ll read the incoming values with request.args.get('language') and request.form['framework']. Remember that if we have more inputs, then we can read additional data.

      Add the additional code to the form-example route.

      @app.route('/form-example', methods=['GET', 'POST']) #allow both GET and POST requests
      def form_example():
          if request.method == 'POST':  #this block is only entered when the form is submitted
              language = request.form.get('language')
              framework = request.form['framework']
      
              return '''<h1>The language value is: {}</h1>
                        <h1>The framework value is: {}</h1>'''.format(language, framework)
      
          return '''<form method="POST">
                        Language: <input type="text" name="language"><br>
                        Framework: <input type="text" name="framework"><br>
                        <input type="submit" value="Submit"><br>
                    </form>'''
      

      Try running the app and submitting the form.

      Similiar to the query string example before, we can use request.form.get() instead of referencing the key directly with request.form[]. request.form.get() returns None instead of causing a 400 error when the key isn’t found.

      As you can see, handling submitted form data is just as easy as handling query string arguments.

      JSON Data

      Finally, we have JSON data. Like form data, it’s not so easy to see. JSON data is normally constructed by a process that calls our route. An example JSON object looks like this:

      {
          "language" : "Python",
          "framework" : "Flask",
          "website" : "Scotch",
          "version_info" : {
              "python" : 3.4,
              "flask" : 0.12
          },
          "examples" : ["query", "form", "json"],
          "boolean_test" : true
      }
      

      As you can see with JSON, you can pass much more complicated data that you could with query strings or form data. In the example, you see nested JSON objects and an array of items. With Flask, reading all of these values is straightforward.

      First, to send a JSON object, we’ll need a program capable of sending custom requests to URLs. For this, we can use an app called Postman.

      Before we use Postman though, change the method on the route to accept only POST requests.

      @app.route('/json-example', methods=['POST']) #GET requests will be blocked
      def json_example():
          return 'Todo...'
      

      Then in Postman, let’s do a little set up to enable sending POST requests.

      In Postman, add the URL and change the type to POST. On the body tab, change to raw and select JSON (application/json) from the drop down. All this is done so Postman can send JSON data properly and so your Flask app will understand that it’s receiving JSON.

      From there, you can copy the example into the text input. It should look like this:

      To test, just send the request and you should get ‘Todo…’ as the response because we haven’t modified our function yet.

      To read the data, first you must understand how Flask translates JSON data into Python data structures.

      Anything that is an object gets converted to a Python dict. {"key" : "value"} in JSON corresponds to somedict['key'], which returns a value in Python.

      An array in JSON gets converted to a list in Python. Since the syntax is the same, here’s an example list: [1,2,3,4,5]

      Then the values inside of quotes in the JSON object become strings in Python. true and false become True and False in Python. Finally, numbers without quotes around them become numbers in Python.

      Now let’s get on to reading the incoming JSON data.

      First, let’s assign everything from the JSON object into a variable using request.get_json().

      req_data = request.get_json()
      

      request.get_json() converts the JSON object into Python data for us. Let’s assign the incoming request data to variables and return them by making the following changes to our json-example route.

      @app.route('/json-example', methods=['POST']) #GET requests will be blocked
      def json_example():
          req_data = request.get_json()
      
          language = req_data['language']
          framework = req_data['framework']
          python_version = req_data['version_info']['python'] #two keys are needed because of the nested object
          example = req_data['examples'][0] #an index is needed because of the array
          boolean_test = req_data['boolean_test']
      
          return '''
                 The language value is: {}
                 The framework value is: {}
                 The Python version is: {}
                 The item at index 0 in the example list is: {}
                 The boolean value is: {}'''.format(language, framework, python_version, example, boolean_test)
      

      If we run our app and submit the same request using Postman, we will get this:

      Note how you access elements that aren’t at the top level. ['version']['python'] is used because you are entering a nested object. And ['examples'][0] is used to access the 0th index in the examples array.

      If the JSON object sent with the request doesn’t have a key that is accessed in your view function, then the request will fail. If you don’t want it to fail when a key doesn’t exist, you’ll have to check if the key exists before trying to access it. Here’s an example:

      language = None
      if 'language' in req_data:
          language = req_data['language']
      

      Conclusion

      You should now understand how to use the request object in Flask to get the most common forms of input data. To recap, we covered:

      • Query Strings
      • Form Data
      • JSON Data

      With this knowledge, you’ll have no problem with any data that users throw at your app!

      Learn More

      If you like this tutorial and want to learn more about Flask from me, check out my free Intro to Flask video course at my site Pretty Printed, which takes you from knowing nothing about Flask to building a guestbook app. If that course isn’t at your level, then I have other courses on my site that may interest you as well.



      Source link