One place for hosting & domains

      An Overview of Open Source Data Visualization Tools


      Updated by Linode Contributed by Mihalis Tsoukalos

      Creating graphic visualizations for a data set is a powerful way to derive meaning from vast amounts of information. It provides a way to extract meaningful relationships between different aspects of your data depending on how the data is mapped and which graphic representations are chosen. Data visualization is a common practice in many sectors, including various scientific disciplines, business settings, the government sector, and education.

      There are many open source tools available to create sophisticated data visualizations for complex data sets. This guide will provide an introductory exploration of data analysis and 2D graphics rendering packages that can be used with R, Python, and JavaScript to generate data visualizations.

      In this guide you will complete the following steps:

      Before You Begin

      Ensure you have the following programs and packages installed on your computer:

      1. The R programming language
      2. The RStudio Desktop application
      3. Python 3.7.0 or higher
        1. pandas Python package
        2. Matplotlib Python package
        3. wordcloud Python package
      4. The Golang programming language

      Assumptions

      This guide assumes you have some basic familiarity with the following concepts and skills:

      1. Basic programming principles and data structures
      2. Are able to read code written in Go, Python, HTML, CSS, and JavaScript

      Create Your Data Sets

      In this section, you will create a data set using the contents of your Bash history file and optionally, your Zsh history file. You will then create a third data set using a Perl script that will extract information from the first two data sets. In the Create Visualizations for your Data section of the guide, you will use these various data sets to create corresponding visualizations.

      Data Set 1 – Bash History File

      A Bash history file stores all commands executed in your command line interpreter. View your 10 most recently executed commands with the following command:

      head ~/.bash_history
      

      Your output should resemble the following:

        
      git commit -a -m "Fixed Constants links"
      git push
      git diff
      docker images
      brew update; brew upgrade; brew cleanup
      git commit -a -m "Cleanup v2 and v3"
      git push
      git commit -a -m "Added examples section"
      git push
      cat ~/.lenses/lenses-cli.yml
      
      

      Create a new directory named data-setsto store your data and copy your Bash history file to the directory:

      mkdir data-sets && cp ~/.bash_history data-sets/data-1
      

      Data Set 2 – Zsh History File

      If you are using the Zsh shell interpreter, you can use its history file as a second data set. Zsh’s history file format includes data that you will need to exclude from your data set. Use AWK to clean up your Zsh history file and save the output to a new file in the data-sets directory:

      awk -F ";" '{$1=""; print $0}' ~/.zsh_history | sed -e "s/^[ t]*//" -e "/^$/d" > data-sets/data-2
      

      Data Set 3 – Perl Script

      To create your third data set, you will use a Perl script that categorizes the contents of your data set files. The categorization is based on the word count for each line of text in your data files or, in other words, the length of each command stored in your Bash and Zsh history files. The script creates 5 categories of command lengths; 1 – 2 words, 3 – 5 words, 6 – 10 words, 11 – 15 words, and 16 or more words.

      1. Create a file named command_length.pl in your home directory with the following content:

        ~/command_length.pl
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        44
        45
        46
        47
        48
        49
        50
        51
        52
        53
        54
        55
        56
        57
        58
        59
        60
        61
        62
        63
        
        #!/usr/bin/perl -w
        
        use strict;
        
        my $directory = "";
        my $filename = "";
        
        my $CAT1 = 0;
        my $CAT2 = 0;
        my $CAT3 = 0;
        my $CAT4 = 0;
        my $CAT5 = 0;
        
        if ( @ARGV != 1 ) {
           die <<Thanatos
              usage info:
                 Please use exactly 1 argument!
        Thanatos
        }
        
        ($directory) = @ARGV;
        opendir(BIN, $directory)
            || die "Error opening directory $directory: $!n";
        
        while (defined ($filename = readdir BIN) ) {
            # The following command does not process . and ..
            next if( $filename =~ /^..?$/ );
            process_file($directory."/".$filename);
        }
        
        print "Category1tt$CAT1n";
        print "Category2tt$CAT2n";
        print "Category3tt$CAT3n";
        print "Category4tt$CAT4n";
        print "Category5tt$CAT5n";
        exit 0;
        
        sub process_file {
            my $file = shift;
            my $line = "";
        
            open (HISTORYFILE, "< $file")
                || die "Cannot open $file: $!n";
        
            while (defined($line = <HISTORYFILE>)) {
                chomp $line;
                next if ( ! defined($line) );
                check_category($line);
            }
        }
        
        sub check_category {
            my $command = shift;
            chomp $command;
            my $length = length($command);
        
            if ( $length <= 2 ) { $CAT1 ++; }
            elsif ( $length <= 5 ) { $CAT2 ++; }
            elsif ( $length <= 10 ) { $CAT3 ++; }
            elsif ( $length <= 15 ) { $CAT4 ++; }
            else { $CAT5 ++; }
        }
            
      2. Run the Perl script. The script expects a single argument; the directory that holds all data files that you want to process. The file’s output will be saved in a new file command_categories.txt. The command_categories.txt file will be used later in this guide to create visualizations using R.

        ./command_length.pl data-sets > ~/command_categories.txt
        

        Note

        Your Perl script must be executable in order to run. To add these permissions, execute the following command:

        chmod +x command_length.pl
        

        Open the .command_categories.txt file to view the categorizations created by your Perl script. Your file should resemble the following example:

        command_categories.txt
        1
        2
        3
        4
        5
        6
        7
        
        "Category Name" "Number of Times"
        Category1 5514
        Category2 2381
        Category3 2624
        Category4 2021
        Category5 11055
            

      You now have three sources of data that you can use to explore data visualization tools in the next sections.

      • ~/data-sets/data-1
      • ~/data-stes/data-2
      • ~/command_categories.txt

      Create Visualizations for your Data

      In this section you will use the data sets you created in the previous section to generate visualizations for them.

      Visualize your Data with R and RStudio

      R is a specialized programming language used for statistical computing and graphics. It is especially good for creating high quality graphs, like density plots, line charts, pie charts, and scatter plots. RStudio is an integrated development environment (IDE) for R that includes debugging and plotting tools that make it easy to write, debug, and run R scripts.

      In this section, you will use the command_categories.txt file created in the Data Set 3 – Perl Script section and RStudio to create a pie chart and simple spreadsheet of your data.

      1. Open RStudio Desktop and create a data frame. In R, a data frame is a table similar to a two-dimensional array.

        DATA <- read.table("~/command_categories.txt", header=TRUE)
        
        • This command will read the command_categories.txt file that was created in the Data Set 3 – Perl Script section of the guide and create a data frame from it that is stored in the DATA variable.

        • The header=TRUE argument indicates that the file’s first row contains variable names for column values. This means that Category Name and Number of Times will be used as variable names for the two columns of values in command_categories.txt.

      2. Next, create a pie chart visualization for each column of values using R’s pie() function.

        pie(DATA$Number.of.Times, DATA$Category.Name)
        
        • The function’s first argument, DATA$Number.of.Times and DATA$Category.Name, provides the x-vector numeric values to use when creating the pie chart visualization.

        • The second argument, DATA$Category.Name, provides the labels for each pie slice.

        RStudio will display a pie chart visualization of your data in the Plots and Files window similar to the following:

        Pie Chart

      3. Visualize your data in a spreadsheet style format using R’s View() function.

        View(DATA, "Command Lengths")
        

        RStudio will display a spreadsheet style viewer in a new window pane tab.

        Spreadsheet

      Explore R’s graphics package to discover additional functions you can use to create more complex visualizations for your data with RStudio.

      Create a Word Cloud using Python

      Word clouds depict text data using varying font sizes and colors to visually demonstrate the frequency and relative importance of each word. A common application for word clouds is to visualize tag or keyword relevancy. In this section you will use Python3 and your Bash and Zsh history files to generate a word cloud of all your shell commands. The Python packages listed below are commonly used in programs for data analysis and scientific computing and you will use them to generate your word cloud.

      1. Create a file named create_wordcloud.py in your home directory with the following content:

        ~/create_wordcloud.py
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        
        #!/usr/bin/env python3
        
        import pandas as pd
        import matplotlib.pyplot as plt
        import sys
        import os
        from wordcloud import WordCloud, STOPWORDS
        from random import randrange
        
        path = sys.argv[1]
        
        if os.path.exists(path):
            print("Creating word cloud for file: " + path)
        
            data = pd.read_table(path, header = None, names=["CMD"], encoding = "iso-8859-1")
        
            skipWords = []
            skipWords.extend(STOPWORDS)
        
            words = ''.join(data['CMD'])
        
            w = WordCloud(
                    stopwords=set(skipWords),
                    background_color='gray',
                    width=4000,
                    height=2000
                    ).generate(words)
            filename = "word_cloud_" + str(randrange(100)) + ".png"
            print("Your wordcloud's filename is: " + filename)
        
            plt.imshow(w)
            plt.axis("off")
            plt.savefig(filename, dpi=1000)
        
        else:
            print("File" + path +  "does not exist")
            
      2. Run your Python script and pass the path of one of your data set files as an argument. The script will read the contents of the file using panda’s read_table() function and convert it into a data frame with a column name of CMD. It will then use the data in the CMD column to create a concatenated string representation of the data that can be passed to wordcloud to generate a .png wordcloud image.

        ./create_wordcloud.py ~/data-sets/data-1
        

        You should see a similar output from the Python script:

          
        Creating word cloud for file: /Users/username/data-sets/data-2
        Your word cloud's filename is: word_cloud_58.png
            
        
      3. Open the word_cloud_58.png image file to view your word cloud.

        Word Cloud

      You could use a similar process to create a word cloud visualization for any text you want to analyze.

      Visualize Data using D3.js

      D3.js is a JavaScript library that helps you visualize JSON formatted data using HTML, SVG, and CSS. In this section you will us D3.js to create and embed a pie chart visualization into a web page.

      To convert your data set into JSON, you will create a Golang command line utility that generates JSON formatted plain text output. For more complex data sets, you might consider creating a similar command line utility using Golang’s json package.

      1. Create a file named cToJSON.go in your home directory with the following content:

        ./cToJSON.go
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        44
        45
        46
        47
        48
        49
        50
        51
        52
        53
        54
        55
        56
        57
        58
        59
        60
        61
        62
        63
        64
        65
        66
        67
        68
        69
        70
        71
        72
        73
        74
        75
        76
        77
        78
        79
        80
        81
        82
        83
        84
        85
        86
        87
        88
        89
        90
        91
        92
        
        package main
        
        import (
            "bufio"
            "flag"
            "fmt"
            "io"
            "os"
            "regexp"
            "sort"
        )
        
        var DATA = make(map[string]int)
        
        func lineByLine(file string) error {
            var err error
            f, err := os.Open(file)
            if err != nil {
                return err
            }
            defer f.Close()
        
            r := bufio.NewReader(f)
            for {
                line, err := r.ReadString('n')
                if err == io.EOF {
                    break
                } else if err != nil {
                    fmt.Printf("error reading file %s", err)
                    break
                }
        
                r := regexp.MustCompile("[^\s]+")
                words := r.FindAllString(line, -1)
                if len(words) == 0 {
                    continue
                }
        
                if _, ok := DATA[words[0]]; ok {
                    DATA[words[0]]++
                } else {
                    DATA[words[0]] = 1
                }
        
            }
            return nil
        }
        
        func main() {
            flag.Parse()
            if len(flag.Args()) == 0 {
                fmt.Printf("usage: cToJSON <file1> [<file2> ...]n")
                return
            }
        
            for _, file := range flag.Args() {
                err := lineByLine(file)
                if err != nil {
                    fmt.Println(err)
                }
            }
        
            n := map[int][]string{}
            var a []int
            for k, v := range DATA {
                n[v] = append(n[v], k)
            }
        
            for k := range n {
                a = append(a, k)
            }
        
            fmt.Println("[")
            sort.Sort(sort.Reverse(sort.IntSlice(a)))
        
            counter := 0
            for _, k := range a {
                if counter >= 10 {
                    break
                }
        
                for _, s := range n[k] {
                    if counter >= 10 {
                        break
                    }
                    counter++
                    fmt.Printf("{"command":"%s","count":%d},n", s, k)
                }
            }
            fmt.Println("];")
        }
            
        • The utility expects file paths to your Bash and Zsh data sets as arguments.
        • It will then read the files and find the 10 most popular commands and output it as JSON formatted data.
        • Several Golang standard library packages are used in the utility to perform operations liking reading files, using regular expressions, and sorting collections.
      2. Run the command line utility and pass in the paths to each command history data set:

        go run cToJSON.go data-set/data-1 data-set/data-2
        

        Your output should resemble the following:

          
        [
        {"command":"ll","count":1832},
        {"command":"git","count":1567},
        {"command":"cd","count":982},
        {"command":"brew","count":926},
        {"command":"unison","count":916},
        {"command":"gdf","count":478},
        {"command":"ssh","count":474},
        {"command":"rm","count":471},
        {"command":"sync","count":440},
        {"command":"ls","count":421},
        ];
            
        

        You are now ready to create your pie chart visualization and embed it into a web page using D3.js

      3. Create an HTML file named pieChart.html and copy and paste the following content. The DATA variable on line 31 contains the JSON data that was created by the cToJSON.go script in the previous step. Remove the JSON data in the example and replace it with your own JSON data.

        Note

        In this example, your JSON data is hardcoded in pieChart.html for simplicity. Web browser security constraints restrict how a document or script loaded from one origin can interact with a resource from another origin. However, you may consider using the d3-fetch module to fetch your JSON data from a specific URL.
        ~/pieChart.html
          1
          2
          3
          4
          5
          6
          7
          8
          9
         10
         11
         12
         13
         14
         15
         16
         17
         18
         19
         20
         21
         22
         23
         24
         25
         26
         27
         28
         29
         30
         31
         32
         33
         34
         35
         36
         37
         38
         39
         40
         41
         42
         43
         44
         45
         46
         47
         48
         49
         50
         51
         52
         53
         54
         55
         56
         57
         58
         59
         60
         61
         62
         63
         64
         65
         66
         67
         68
         69
         70
         71
         72
         73
         74
         75
         76
         77
         78
         79
         80
         81
         82
         83
         84
         85
         86
         87
         88
         89
         90
         91
         92
         93
         94
         95
         96
         97
         98
         99
        100
        101
        102
        103
        104
        105
        
        <!DOCTYPE html>
        <html lang="en">
          <head>
            <meta charset="utf-8">
            <title>History Visualization</title>
        
            <style type="text/css">
              * { margin: 0; padding: 0; }
        
              #chart {
                background-color: white;
                font: 14px sans-serif;
                margin: 0 auto 50px;
                width: 600px;
                height: 600px;
              }
              #chart .label{
                fill: #404040;
                font-size: 12px;
              }
            </style>
          </head>
        
          <body>
            <div id="chart"></div>
          </body>
        
          <script src="https://d3js.org/d3.v3.min.js"></script>
          <script type="text/javascript">
        
          var DATA = [
              {"command":"ll","count":1832},
              {"command":"git","count":1567},
              {"command":"cd","count":982},
              {"command":"brew","count":926},
              {"command":"unison","count":916},
              {"command":"gdf","count":478},
              {"command":"ssh","count":474},
              {"command":"rm","count":471},
              {"command":"sync","count":440},
              {"command":"ls","count":421}
              ];
        
            var width  = 600;
                height = 600;
                radius = width / 2.5;
        
            var pie = d3.layout.pie()
                        .value(function(d) { return d.count; })
        
            var pieData = pie(DATA);
            var color = d3.scale.category20();
        
            var arc = d3.svg.arc()
                        .innerRadius(0)
                        .outerRadius(radius - 7);
        
            var svg = d3.select("#chart").append("svg")
                        .attr("width", width)
                        .attr("height", height)
                        .append("g")
                        .attr("transform", "translate(" + width / 2 + "," + height / 2 + ")");
        
            var ticks = svg.selectAll("line")
                           .data(pieData)
                           .enter()
                           .append("line");
        
            ticks.attr("x1", 0)
                 .attr("x2", 0)
                 .attr("y1", -radius+4)
                 .attr("y2", -radius-2)
                 .attr("stroke", "black")
                 .attr("transform", function(d) {
                   return "rotate(" + (d.startAngle+d.endAngle)/2 * (180/Math.PI) + ")";
                 });
        
            var labels = svg.selectAll("text")
                            .data(pieData)
                            .enter()
                            .append("text");
        
            labels.attr("class", "label")
                  .attr("transform", function(d) {
                     var dist   = radius + 25;
                         angle  = (d.startAngle + d.endAngle) / 2;
                         x      = dist * Math.sin(angle);
                         y      = -dist * Math.cos(angle);
                     return "translate(" + x + "," + y + ")";
                   })
                  .attr("dy", "0.35em")
                  .attr("text-anchor", "middle")
                  .text(function(d){
                    return d.data.command + " (" + d.data.count + ")";
                  });
        
            var path = svg.selectAll("path")
                          .data(pieData)
                          .enter()
                          .append("path")
                          .attr("fill", function(d, i) { return color(i); })
                          .attr("d", arc);
          </script>
        </html>
        
      4. Navigate to your preferred browser and enter the HTML file’s absolute path to view the pie chart. For a macOS user that has stored the HTML file in their home directory, the path would resemble the following: /Users/username/pieChart.html

        JS Pie Chart

      Next Steps

      Now that you are familiar with some data visualization tools and simple techniques, you can begin to explore more sophisticated approaches using the same tools explored in this guide. Here are a few ideas you can consider:

      • Create a new data set by extracting all git related commands from your history files; analyze and visualize them.
      • Automate some of the techniques discussed in this guide using Cron jobs to generate your data sets automatically.
      • Explore the Python for Data Science eBook’s data visualization section for a deeper dive into using pandas.

      Find answers, ask questions, and help others.

      This guide is published under a CC BY-ND 4.0 license.



      Source link