One place for hosting & domains

      Data

      Kubernetes Data Protection and Recovery Made Easy


      How to Join

      This Tech Talk is free and open to everyone. Register below to get a link to join the live stream or receive the video recording after it airs.

      Date Time RSVP
      December 7, 2021 11:00 a.m.–12:00 p.m. ET / 4:00–5:00 p.m. GMT

      About the Talk

      Protect or migrate your Kubernetes container environments across any public/hybrid cloud with a 1-Click install that lets you backup and restore your entire app.

      Discover why cloud-native data protection is important for your app, and how you can get enterprise-grade backup and recovery for free, with TrilioVault for Kubernetes.

      What You’ll Learn

      • Differences between an enterprise-grade Kubernetes data protection platform and Velero
      • How to backup your entire application, including data, metadata, and any other Kubernetes objects associated with your application
      • How to restore data from any point in time

      This Talk Is Designed For

      • Anyone that wants to protect or migrate your Kubernetes container environments across a public or hybrid cloud
      • DigitalOcean Kubernetes users looking to satisfy data protection requirements

      Resources

      1-Click install: TrilioVault (TVK) for Kubernetes
      TVK is a cloud-native, application-centric data protection platform that offers backup and recovery of your entire application, including data, metadata, and any Kubernetes objects associated with the application – all of which can be restored with a single click.

      Technical documentation for TrilioVault

      Video: Installing TrilioVault on DigitalOcean



      Source link

      Understanding Data Types in PHP


      The author selected Open Sourcing Mental Illness Ltd to receive a donation as part of the Write for DOnations program.

      Introduction

      In PHP, as in all programming languages, data types are used to classify one particular type of data. This is important because the specific data type you use will determine what values you can assign to it and what you can do to it (including what operations you can perform on it).

      In this tutorial, we will go over the important data types native to PHP. This is not an exhaustive investigation of data types, but will help you become familiar with what options you have available to you in PHP.

      One way to think about data types is to consider the different types of data that we use in the real world. Two different types are numbers and words. These two data types work in different ways. We would add 3 + 4 to get 7, while we would combine the words star and fish to get starfish.

      If we start evaluating different data types with one another, such as numbers and words, things start to make less sense. The following equation, for example, has no obvious answer:

      'sky' + 8

      For computers, each data type can be thought of as being quite different, like words and numbers, so we have to be careful about how we use them to assign values and how we manipulate them through operations.

      Working with Data Types

      PHP is a loosely typed language. This means, by default, if a value doesn’t match the expected data type, PHP will attempt the change the value of the wrong data type to match the expected type when possible. This is called type juggling. For example, a function that expects a string but instead receives an integer with a value of 2 will change the incoming value into the expected string type with a value of "2".

      It is possible, and encouraged, to enable strict mode on a per-file basis. This provides enforcement of data types in the code you control, while allowing the use of additional code packages that may not adhere to strict data types. Strict type is declared at the top of a file:

      <?php
      declare(strict_types=1);
      ...
      

      In strict mode, only a value corresponding exactly to the type declaration will be accepted; otherwise a TypeError will be thrown. The only exception to this rule is that an int value will pass a float type declaration.

      Numbers

      Any number you enter in PHP will be interpreted as a number. You are not required to declare what kind of data type you are entering. PHP will consider any number written without decimals as an integer (such as 138) and any number written with decimals as a float (such as 138.0).

      Integers

      Like in math, integers in computer programming are whole numbers that can be positive, negative, or 0 (…, -1, 0, 1, …). An integer can also be known as an int. As with other programming languages, you should not use commas in numbers of four digits or more, so to represent the number 1,000 in your program, write it as 1000.

      We can print out an integer in a like this:

      echo -25;
      

      Which would output:

      Output

      -25

      We can also declare a variable, which in this case is a symbol of the number we are using or manipulating, like so:

      $my_int = -25;
      echo $my_int;
      

      Which would output:

      Output

      -25

      We can do math with integers in PHP, too:

      $int_ans = 116 - 68;
      echo $int_ans;
      

      Which would output:

      Output

      48

      Integers can be used in many ways within PHP programs, and as you continue to learn more about the language you will have a lot of opportunities to work with integers and understand more about this data type.

      Floating-Point Numbers

      A floating-point number or float is a real number, meaning that it can be either a rational or an irrational number. Because of this, floating-point numbers can be numbers that can contain a fractional part, such as 9.0 or -116.42. For the purposes of thinking of a float in a PHP program, it is a number that contains a decimal point.

      Like we did with the integer, we can print out a floating-point number like this:

      echo 17.3;
      

      Which would output:

      Output

      17.3

      We can also declare a variable that stands in for a float, like so:

      $my_flt = 17.3;
      echo $my_flt;
      

      Which would output:

      Output

      17.3

      And, just like with integers, we can do math with floats in PHP, too:

      $flt_ans = 564.0 + 365.24;
      echo $flt_ans;
      

      Which would output:

      Output

      929.24

      With integers and floating-point numbers, it is important to keep in mind that 3 does not equal 3.0, because 3 refers to an integer while 3.0 refers to a float. This may or may not change the way your program functions.

      Numbers are useful when working with calculations, counting items or money, and the passage of time.

      Strings

      A string is a sequence of one or more characters that may consist of letters, numbers, or symbols. This sequence is enclosed within either single quotes '' or double quotes "":

      echo 'This is a 47 character string in single quotes.'
      echo "This is a 47 character string in double quotes."
      

      Both lines output the their value the same way:

      Output

      This is a 47 character string in single quotes. This is a 47 character string in double quotes.

      You can choose to use either single quotes or double quotes, but whichever you decide on you should be consistent within a program.

      The program “Hello, World!” demonstrates how a string can be used in computer programming, as the characters that make up the phrase Hello, World! are a string:

      echo "Hello, World!";
      

      As with other data types, we can store strings in variables and output the results:

      $hw = "Hello, World!"
      echo $hw;
      

      Either way, the output is the same:

      Output

      Hello, World!

      Like numbers, there are many operations that we can perform on strings within our programs in order to manipulate them to achieve the results we are seeking. Strings are important for communicating information to the user, and for the user to communicate information back to the program.

      Booleans

      The Boolean, or bool, data type can be one of two values, either true or false. Booleans are used to represent the truth values that are associated with the logic branch of mathematics.

      You do not use quotes when declaring a Boolean value; anything in quotes is assumed to be a string. PHP doesn’t care about case when declaring a Boolean; True, TRUE, true, and tRuE all evaluate the same. If you follow the style guide put out by the PHP-FIG, the values should be all lowercase true or false.

      Many operations in math give us answers that evaluate to either True or False:

      • greater than
        • 500 > 100 True
        • 1 > 5 False
      • less than
        • 200 < 400 True
        • 4 < 2 False
      • equal
        • 5 = 5 True
        • 500 = 400 False

      Like with any other data type, we can store a Boolean value in a variable. Unlike numbers or strings, echo cannot be used to output the value because a Boolean true value is converted to the string "1", while a Boolean false is converted to "" (an empty string). This allows “type juggling” to convert a variable back and forth between Boolean and string values. To output the value of a Boolean we have several options. To output the type along with the value of a variable, we use var_dump. To output the string representation of a variable’s value, we use var_export:

      $my_bool = 4 > 3;
      echo $my_bool;
      var_dump($my_bool);
      var_export($my_bool);
      

      Since 4 is greater than 3, we will receive the following output:

      Output

      1 bool(true) true

      The echo line converts the true Boolean to the string of 1. The var_dump outputs the variable type of bool along with the value of true. The var_export outputs the string representation of the value which is true.

      As you write more programs in PHP, you will become more familiar with how Booleans work and how different functions and operations evaluating to either True or False can change the course of the program.

      Arrays

      An array in PHP is actually an ordered map. A map is a data type that associates or “maps” values to keys. This data type has many different uses; it can be treated as an array, list, hash table, dictionary, collection, and more. Additionally, because array values in PHP can also be other arrays, multidimensional arrays are possible.

      Indexed Arrays

      In its simplest form, an array will have a numeric index or key. If you do not specify a key, PHP will automatically generate the next numeric key for you. By default, array keys are 0-indexed, which means that the first key is 0, not 1. Each element, or value, that is inside of an array can also be referred to as an item.

      An array can be defined in one of two ways. The first is using the array() language construct, which uses a comma-separated list of items. An array of integers would be defined like this:

      array(-3, -2, -1, 0, 1, 2, 3)
      

      The second and more common way to define an array is through the short array syntax using square brackets []. An array of floats would be defined like this:

      [3.14, 9.23, 111.11, 312.12, 1.05]
      

      We can also define an array of strings, and assign an array to a variable, like so:

      $sea_creatures = ['shark', 'cuttlefish', 'squid', 'mantis shrimp'];
      

      Once again, we cannot use echo to output an entire array, but we can use var_export or var_dump:

      var_export($sea_creatures);
      var_dump($sea_creatures);
      

      The output shows that the array uses numeric keys:

      Output

      array ( 0 => 'shark', 1 => 'cuttlefish', 2 => 'squid', 3 => 'mantis shrimp', ) array(4) { [0]=> string(5) "shark" [1]=> string(10) "cuttlefish" [2]=> string(5) "squid" [3]=> string(13) "mantis shrimp" }

      Because the array is 0-indexed, the var_dump shows an indexed array with numeric keys between 0 and 3. Each numeric key corresponds with a string value. The first element has a key of 0 and a value of shark. The var_dump function gives us more details about an array: there are 4 items in the array, and the value of the first item is a string with a length of 5.

      The numeric key of an indexed array may be specified when setting the value. However, the key is more commonly specified when using a named key.

      Associative Arrays

      Associative arrays are arrays with named keys. They are typically used to hold data that are related, such as the information contained in an ID. An associative array looks like this:

      ['name' => 'Sammy', 'animal' => 'shark', 'color' => 'blue', 'location' => 'ocean']
      

      Notice the double arrow operator => used to separate the strings. The words to the left of the => are the keys. The key can either be an integer or a string. The keys in the previous array are: 'name', 'animal', 'color', 'location'.

      The words to the right of the => are the values. Values can be comprised of any data type, including another array. The values in the previous array are: 'Sammy', 'shark', 'blue', 'ocean'.

      Like the indexed array, let’s store the associative array inside a variable, and output the details:

      $sammy = ['name' => 'Sammy', 'animal' => 'shark', 'color' => 'blue', 'location' => 'ocean'];
      var_dump($sammy);
      

      The results will describe this array as having 4 elements. The string for each key is given, but only the value specifies the type string with a character count:

      Output

      array(4) { ["name"]=> string(5) "Sammy" ["animal"]=> string(5) "shark" ["color"]=> string(4) "blue" ["location"]=> string(5) "ocean" }

      Associative arrays allow us to more precisely access a single element. If we want to isolate Sammy’s color, we can do so by adding square brackets containing the name of the key after the array variable:

      echo $sammy['color'];
      

      The resulting output:

      Output

      blue

      As arrays offer key-value mapping for storing data, they can be important elements in your PHP program.

      Constants

      While a constant is not actually a separate data type, it does work differently than other data types. As the name implies, constants are variables which are declared once, after which they do not change throughout your application. The name of a constant should always be uppercase and does not start with a dollar sign. A constant can be declared using either the define function or the const keyword:

      define('MIN_VALUE', 1);
      const MAX_VALUE = 10;
      

      The define function takes two parameters: the first is a string containing the name of the constant, and the second is the value to assign. This could be any of the data type values explained earlier. The const keyword allows the constant to be assigned a value in the same manner as other data types, using the single equal sign. A constant can be used within your application in the same way as other variables, except they will not be interpreted within a double quoted string:

      echo "The value must be between MIN_VALUE and MAX_VALUE";
      echo "The value must be between ".MIN_VALUE." and ".MAX_VALUE;
      

      Because the constants are not interpreted, the output of these lines is different:

      Output

      The value must be between MIN_VALUE and MAX_VALUE The value must be between 1 and 10

      Conclusion

      At this point, you should have a better understanding of some of the major data types that are available for you to use in PHP. Each of these data types will become important as you develop programming projects in the PHP language.



      Source link

      How To Test Your Data With Great Expectations


      The author selected the Diversity in Tech Fund to receive a donation as part of the Write for DOnations program.

      Introduction

      In this tutorial, you will set up a local deployment of Great Expectations, an open source data validation and documentation library written in Python. Data validation is crucial to ensuring that the data you process in your pipelines is correct and free of any data quality issues that might occur due to errors such as incorrect inputs or transformation bugs. Great Expectations allows you to establish assertions about your data called Expectations, and validate any data using those Expectations.

      When you’re finished, you’ll be able to connect Great Expectations to your data, create a suite of Expectations, validate a batch of data using those Expectations, and generate a data quality report with the results of your validation.

      Prerequisites

      To complete this tutorial, you will need:

      Step 1 — Installing Great Expectations and Initializing a Great Expectations Project

      In this step, you will install the Great Expectations package in your local Python environment, download the sample data you’ll use in this tutorial, and initialize a Great Expectations project.

      To begin, open a terminal and make sure to activate your virtual Python environment. Install the Great Expectations Python package and command-line tool (CLI) with the following command:

      • pip install great_expectations==0.13.35

      Note: This tutorial was developed for Great Expectations version 0.13.35 and may not be applicable to other versions.

      In order to have access to the example data repository, run the following git command to clone the directory and change into it as your working directory:

      • git clone https://github.com/do-community/great_expectations_tutorial
      • cd great_expectations_tutorial

      The repository only contains one folder called data, which contains two example CSV files with data that you will use in this tutorial. Take a look at the contents of the data directory:

      You’ll see the following output:

      Output

      yellow_tripdata_sample_2019-01.csv yellow_tripdata_sample_2019-02.csv

      Great Expectations works with many different types of data, such as connections to relational databases, Spark dataframes, and various file formats. For the purpose of this tutorial, you will use these CSV files containing a small set of taxi ride data to get started.

      Finally, initialize your directory as a Great Expectations project by running the following command. Make sure to use the --v3-api flag, as this will switch you to using the most recent API of the package:

      • great_expectations --v3-api init

      When asked OK to proceed? [Y/n]:, press ENTER to proceed.

      This will create a folder called great_expectations, which contains the basic configuration for your Great Expectations project, also called the Data Context. You can inspect the contents of the folder:

      You will see the first level of files and subdirectories that were created inside the great_expectations folder:

      Output

      checkpoints great_expectations.yml plugins expectations notebooks uncommitted

      The folders store all the relevant content for your Great Expectations setup. The great_expectations.yml file contains all important configuration information. Feel free to explore the folders and configuration file a little more before moving on to the next step in the tutorial.

      In the next step, you will add a Datasource to point Great Expectations at your data.

      Step 2 — Adding a Datasource

      In this step, you will configure a Datasource in Great Expectations, which allows you to automatically create data assertions called Expectations as well as validate data with the tool.

      While in your project directory, run the following command:

      • great_expectations --v3-api datasource new

      You will see the following output. Enter the options shown when prompted to configure a file-based Datasource for the data directory:

      Output

      What data would you like Great Expectations to connect to? 1. Files on a filesystem (for processing with Pandas or Spark) 2. Relational database (SQL) : 1 What are you processing your files with? 1. Pandas 2. PySpark : 1 Enter the path of the root directory where the data files are stored. If files are on local disk enter a path relative to your current working directory or an absolute path. : data

      After confirming the directory path with ENTER, Great Expectations will open a Jupyter notebook in your web browser, which allows you to complete the configuration of the Datasource and store it to your Data Context. The following screenshot shows the first few cells of the notebook.

      Screenshot of a Jupyter notebook

      The notebook contains several pre-populated cells of Python code to configure your Datasource. You can modify the settings for the Datasource, such as the name, if you like. However, for the purpose of this tutorial, you’ll leave everything as-is and execute all cells using the Cell > Run All menu option. If run successfully, the last cell output will look as follows:

      Output

      [{'data_connectors': {'default_inferred_data_connector_name': {'module_name': 'great_expectations.datasource.data_connector', 'base_directory': '../data', 'class_name': 'InferredAssetFilesystemDataConnector', 'default_regex': {'group_names': ['data_asset_name'], 'pattern': '(.*)'}}, 'default_runtime_data_connector_name': {'module_name': 'great_expectations.datasource.data_connector', 'class_name': 'RuntimeDataConnector', 'batch_identifiers': ['default_identifier_name']}}, 'module_name': 'great_expectations.datasource', 'class_name': 'Datasource', 'execution_engine': {'module_name': 'great_expectations.execution_engine', 'class_name': 'PandasExecutionEngine'}, 'name': 'my_datasource'}]

      This shows that you have added a new Datasource called my_datasource to your Data Context. Feel free to read through the instructions in the notebook to learn more about the different configuration options before moving on to the next step.

      Warning: Before moving forward, close the browser tab with the notebook, return to your terminal, and press CTRL+C to shut down the running notebook server before proceeding.

      You have now successfully set up a Datasource that points at the data directory, which will allow you to access the CSV files in the directory through Great Expectations. In the next step, you will use one of these CSV files in your Datasource to automatically generate Expectations with a profiler.

      Step 3 — Creating an Expectation Suite With an Automated Profiler

      In this step of the tutorial, you will use the built-in Profiler to create a set of Expectations based on some existing data. For this purpose, let’s take a closer look at the sample data that you downloaded:

      • The files yellow_tripdata_sample_2019-01.csv and yellow_tripdata_sample_2019-02.csv contain taxi ride data from January and February 2019, respectively.
      • This tutorial assumes that you know the January data is correct, and that you want to ensure that any subsequent data files match the January data in terms of number or rows, columns, and the distributions of certain column values.

      For this purpose, you will create Expectations (data assertions) based on certain properties of the January data and then, in a later step, use those Expectations to validate the February data. Let’s get started by creating an Expectation Suite, which is a set of Expectations that are grouped together:

      • great_expectations --v3-api suite new

      By selecting the options shown in the output below, you specify that you would like to use a profiler to generate Expectations automatically, using the yellow_tripdata_sample_2019-01.csv data file as an input. Enter the name my_suite as the Expectation Suite name when prompted and press ENTER at the end when asked Would you like to proceed? [Y/n]:

      Output

      Using v3 (Batch Request) API How would you like to create your Expectation Suite? 1. Manually, without interacting with a sample batch of data (default) 2. Interactively, with a sample batch of data 3. Automatically, using a profiler : 3 A batch of data is required to edit the suite - let's help you to specify it. Which data asset (accessible by data connector "my_datasource_example_data_connector") would you like to use? 1. yellow_tripdata_sample_2019-01.csv 2. yellow_tripdata_sample_2019-02.csv : 1 Name the new Expectation Suite [yellow_tripdata_sample_2019-01.csv.warning]: my_suite When you run this notebook, Great Expectations will store these expectations in a new Expectation Suite "my_suite" here: <path_to_project>/great_expectations_tutorial/great_expectations/expectations/my_suite.json Would you like to proceed? [Y/n]: <press ENTER>

      This will open another Jupyter notebook that lets you complete the configuration of your Expectation Suite. The notebook contains a fair amount of code to configure the built-in profiler, which looks at the CSV file you selected and creates certain types of Expectations for each column in the file based on what it finds in the data.

      Scroll down to the second code cell in the notebook, which contains a list of ignored_columns. By default, the profiler will ignore all columns, so let’s comment out some of them to make sure the profiler creates Expectations for them. Modify the code so it looks like this:

      ignored_columns = [
      #     "vendor_id"
      # ,    "pickup_datetime"
      # ,    "dropoff_datetime"
      # ,    "passenger_count"
          "trip_distance"
      ,    "rate_code_id"
      ,    "store_and_fwd_flag"
      ,    "pickup_location_id"
      ,    "dropoff_location_id"
      ,    "payment_type"
      ,    "fare_amount"
      ,    "extra"
      ,    "mta_tax"
      ,    "tip_amount"
      ,    "tolls_amount"
      ,    "improvement_surcharge"
      ,    "total_amount"
      ,    "congestion_surcharge"
      ,]
      

      Make sure to remove the comma before "trip_distance". By commenting out the columns vendor_id, pickup_datetime, dropoff_datetime, and passenger_count, you are telling the profiler to generate Expectations for those columns. In addition, the profiler will also generate table-level Expectations, such as the number and names of columns in your data, and the number of rows. Once again, execute all cells in the notebook by using the Cell > Run All menu option.

      When executing all cells in this notebook, two things happen:

      1. The code creates an Expectation Suite using the automated profiler and the yellow_tripdata_sample_2019-01.csv file you told it to use.
      2. The last cell in the notebook is also configured to run validation and open a new browser window with Data Docs, which is a data quality report.

      In the next step, you will take a closer look at the Data Docs that were opened in the new browser window.

      Step 4 — Exploring Data Docs

      In this step of the tutorial, you will inspect the Data Docs that Great Expectations generated and learn how to interpret the different pieces of information. Go to the browser window that just opened and take a look at the page, shown in the screenshot below.

      Screenshot of Data Docs

      At the top of the page, you will see a box titled Overview, which contains some information about the validation you just ran using your newly created Expectation Suite my_suite. It will tell you Status: Succeeded and show some basic statistics about how many Expectations were run. If you scroll further down, you will see a section titled Table-Level Expectations. It contains two rows of Expectations, showing the Status, Expectation, and Observed Value for each row. Below the table Expectations, you will see the column-level Expectations for each of the columns you commented out in the notebook.

      Let’s focus on one specific Expectation: The passenger_count column has an Expectation stating “values must belong to this set: 1 2 3 4 5 6.” which is marked with a green checkmark and has an Observed Value of “0% unexpected”. This is telling you that the profiler looked at the values in the passenger_count column in the January CSV file and detected only the values 1 through 6, meaning that all taxi rides had between 1 and 6 passengers. Great Expectations then created an Expectation for this fact. The last cell in the notebook then triggered validation of the January CSV file and it found no unexpected values. This is spuriously true, since the same data that was used to create the Expectation was also the data used for validation.

      In this step, you reviewed the Data Docs and observed the passenger_count column for its Expectation. In the next step, you’ll see how you can validate a different batch of data.

      Step 5 — Creating a Checkpoint and Running Validation

      In the final step of this tutorial, you will create a new Checkpoint, which bundles an Expectation Suite and a batch of data to execute validation of that data. After creating the Checkpoint, you will then run it to validate the February taxi data CSV file and see whether the file passed the Expectations you previously created. To begin, return to your terminal and stop the Jupyter notebook by pressing CTRL+C if it is still running. The following command will start the workflow to create a new Checkpoint called my_checkpoint:

      • great_expectations --v3-api checkpoint new my_checkpoint

      This will open a Jupyter notebook with some pre-populated code to configure the Checkpoint. The second code cell in the notebook will have a random data_asset_name pre-populated from your existing Datasource, which will be one of the two CSV files in the data directory you’ve seen earlier. Ensure that the data_asset_name is yellow_tripdata_sample_2019-02.csv and modify the code if needed to use the correct filename.

      my_checkpoint_name = "my_checkpoint" # This was populated from your CLI command.
      
      yaml_config = f"""
      name: {my_checkpoint_name}
      config_version: 1.0
      class_name: SimpleCheckpoint
      run_name_template: "%Y%m%d-%H%M%S-my-run-name-template"
      validations:
        - batch_request:
            datasource_name: my_datasource
            data_connector_name: default_inferred_data_connector_name
            data_asset_name: yellow_tripdata_sample_2019-02.csv
            data_connector_query:
              index: -1
          expectation_suite_name: my_suite
      """
      print(yaml_config)
      """
      

      This configuration snippet configures a new Checkpoint, which reads the data asset yellow_tripdata_sample_2019-02.csv, i.e., your February CSV file, and validates it using the Expectation Suite my_suite. Confirm that you modified the code correctly, then execute all cells in the notebook. This will save the new Checkpoint to your Data Context.

      Finally, in order to run this new Checkpoint and validate the February data, scroll down to the last cell in the notebook. Uncomment the code in the cell to look as follows:

      context.run_checkpoint(checkpoint_name=my_checkpoint_name)
      context.open_data_docs()
      

      Select the cell and run it using the Cell > Run Cells menu option or the SHIFT+ENTER keyboard shortcut. This will open Data Docs in a new browser tab.

      On the Validation Results overview page, click on the topmost run to navigate to the Validation Result details page. The Validation Result details page will look very similar to the page you saw in the previous step, but it will now show that the Expectation Suite failed, validating the new CSV file. Scroll through the page to see which Expectations have a red X next to them, marking them as failed.

      Find the Expectation on the passenger_count column you looked at in the previous step: “values must belong to this set: 1 2 3 4 5 6”. You will notice that it now shows up as failed and highlights that 1579 unexpected values found. ≈15.79% of 10000 total rows. The row also displays a sample of the unexpected values that were found in the column, namely the value 0. This means that the February taxi ride data suddenly introduced the unexpected value 0 as in the passenger_counts column, which seems like a potential data bug. By running the Checkpoint, you validated the new data with your Expectation Suite and detected this issue.

      Note that each time you execute the run_checkpoint method in the last notebook cell, you kick off another validation run. In a production data pipeline environment, you would call the run_checkpoint command outside of a notebook whenever you’re processing a new batch of data to ensure that the new data passes all validations.

      Conclusion

      In this article you created a first local deployment of the Great Expectations framework for data validation. You initialized a Great Expectations Data Context, created a new file-based Datasource, and automatically generated an Expectation Suite using the built-in profiler. You then created a Checkpoint to run validation against a new batch of data, and inspected the Data Docs to view the validation results.

      This tutorial only taught you the basics of Great Expectations. The package contains more options for configuring Datasources to connect to other types of data, for example relational databases. It also comes with a powerful mechanism to automatically recognize new batches of data based on pattern-matching in the tablename or filename, which allows you to only configure a Checkpoint once to validate any future data inputs. You can learn more about Great Expectations in the official documentation.



      Source link