One place for hosting & domains

      Overview

      An Overview of Open Source Data Visualization Tools


      Updated by Linode Contributed by Mihalis Tsoukalos

      Creating graphic visualizations for a data set is a powerful way to derive meaning from vast amounts of information. It provides a way to extract meaningful relationships between different aspects of your data depending on how the data is mapped and which graphic representations are chosen. Data visualization is a common practice in many sectors, including various scientific disciplines, business settings, the government sector, and education.

      There are many open source tools available to create sophisticated data visualizations for complex data sets. This guide will provide an introductory exploration of data analysis and 2D graphics rendering packages that can be used with R, Python, and JavaScript to generate data visualizations.

      In this guide you will complete the following steps:

      Before You Begin

      Ensure you have the following programs and packages installed on your computer:

      1. The R programming language
      2. The RStudio Desktop application
      3. Python 3.7.0 or higher
        1. pandas Python package
        2. Matplotlib Python package
        3. wordcloud Python package
      4. The Golang programming language

      Assumptions

      This guide assumes you have some basic familiarity with the following concepts and skills:

      1. Basic programming principles and data structures
      2. Are able to read code written in Go, Python, HTML, CSS, and JavaScript

      Create Your Data Sets

      In this section, you will create a data set using the contents of your Bash history file and optionally, your Zsh history file. You will then create a third data set using a Perl script that will extract information from the first two data sets. In the Create Visualizations for your Data section of the guide, you will use these various data sets to create corresponding visualizations.

      Data Set 1 – Bash History File

      A Bash history file stores all commands executed in your command line interpreter. View your 10 most recently executed commands with the following command:

      head ~/.bash_history
      

      Your output should resemble the following:

        
      git commit -a -m "Fixed Constants links"
      git push
      git diff
      docker images
      brew update; brew upgrade; brew cleanup
      git commit -a -m "Cleanup v2 and v3"
      git push
      git commit -a -m "Added examples section"
      git push
      cat ~/.lenses/lenses-cli.yml
      
      

      Create a new directory named data-setsto store your data and copy your Bash history file to the directory:

      mkdir data-sets && cp ~/.bash_history data-sets/data-1
      

      Data Set 2 – Zsh History File

      If you are using the Zsh shell interpreter, you can use its history file as a second data set. Zsh’s history file format includes data that you will need to exclude from your data set. Use AWK to clean up your Zsh history file and save the output to a new file in the data-sets directory:

      awk -F ";" '{$1=""; print $0}' ~/.zsh_history | sed -e "s/^[ t]*//" -e "/^$/d" > data-sets/data-2
      

      Data Set 3 – Perl Script

      To create your third data set, you will use a Perl script that categorizes the contents of your data set files. The categorization is based on the word count for each line of text in your data files or, in other words, the length of each command stored in your Bash and Zsh history files. The script creates 5 categories of command lengths; 1 – 2 words, 3 – 5 words, 6 – 10 words, 11 – 15 words, and 16 or more words.

      1. Create a file named command_length.pl in your home directory with the following content:

        ~/command_length.pl
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        44
        45
        46
        47
        48
        49
        50
        51
        52
        53
        54
        55
        56
        57
        58
        59
        60
        61
        62
        63
        
        #!/usr/bin/perl -w
        
        use strict;
        
        my $directory = "";
        my $filename = "";
        
        my $CAT1 = 0;
        my $CAT2 = 0;
        my $CAT3 = 0;
        my $CAT4 = 0;
        my $CAT5 = 0;
        
        if ( @ARGV != 1 ) {
           die <<Thanatos
              usage info:
                 Please use exactly 1 argument!
        Thanatos
        }
        
        ($directory) = @ARGV;
        opendir(BIN, $directory)
            || die "Error opening directory $directory: $!n";
        
        while (defined ($filename = readdir BIN) ) {
            # The following command does not process . and ..
            next if( $filename =~ /^..?$/ );
            process_file($directory."/".$filename);
        }
        
        print "Category1tt$CAT1n";
        print "Category2tt$CAT2n";
        print "Category3tt$CAT3n";
        print "Category4tt$CAT4n";
        print "Category5tt$CAT5n";
        exit 0;
        
        sub process_file {
            my $file = shift;
            my $line = "";
        
            open (HISTORYFILE, "< $file")
                || die "Cannot open $file: $!n";
        
            while (defined($line = <HISTORYFILE>)) {
                chomp $line;
                next if ( ! defined($line) );
                check_category($line);
            }
        }
        
        sub check_category {
            my $command = shift;
            chomp $command;
            my $length = length($command);
        
            if ( $length <= 2 ) { $CAT1 ++; }
            elsif ( $length <= 5 ) { $CAT2 ++; }
            elsif ( $length <= 10 ) { $CAT3 ++; }
            elsif ( $length <= 15 ) { $CAT4 ++; }
            else { $CAT5 ++; }
        }
            
      2. Run the Perl script. The script expects a single argument; the directory that holds all data files that you want to process. The file’s output will be saved in a new file command_categories.txt. The command_categories.txt file will be used later in this guide to create visualizations using R.

        ./command_length.pl data-sets > ~/command_categories.txt
        

        Note

        Your Perl script must be executable in order to run. To add these permissions, execute the following command:

        chmod +x command_length.pl
        

        Open the .command_categories.txt file to view the categorizations created by your Perl script. Your file should resemble the following example:

        command_categories.txt
        1
        2
        3
        4
        5
        6
        7
        
        "Category Name" "Number of Times"
        Category1 5514
        Category2 2381
        Category3 2624
        Category4 2021
        Category5 11055
            

      You now have three sources of data that you can use to explore data visualization tools in the next sections.

      • ~/data-sets/data-1
      • ~/data-stes/data-2
      • ~/command_categories.txt

      Create Visualizations for your Data

      In this section you will use the data sets you created in the previous section to generate visualizations for them.

      Visualize your Data with R and RStudio

      R is a specialized programming language used for statistical computing and graphics. It is especially good for creating high quality graphs, like density plots, line charts, pie charts, and scatter plots. RStudio is an integrated development environment (IDE) for R that includes debugging and plotting tools that make it easy to write, debug, and run R scripts.

      In this section, you will use the command_categories.txt file created in the Data Set 3 – Perl Script section and RStudio to create a pie chart and simple spreadsheet of your data.

      1. Open RStudio Desktop and create a data frame. In R, a data frame is a table similar to a two-dimensional array.

        DATA <- read.table("~/command_categories.txt", header=TRUE)
        
        • This command will read the command_categories.txt file that was created in the Data Set 3 – Perl Script section of the guide and create a data frame from it that is stored in the DATA variable.

        • The header=TRUE argument indicates that the file’s first row contains variable names for column values. This means that Category Name and Number of Times will be used as variable names for the two columns of values in command_categories.txt.

      2. Next, create a pie chart visualization for each column of values using R’s pie() function.

        pie(DATA$Number.of.Times, DATA$Category.Name)
        
        • The function’s first argument, DATA$Number.of.Times and DATA$Category.Name, provides the x-vector numeric values to use when creating the pie chart visualization.

        • The second argument, DATA$Category.Name, provides the labels for each pie slice.

        RStudio will display a pie chart visualization of your data in the Plots and Files window similar to the following:

        Pie Chart

      3. Visualize your data in a spreadsheet style format using R’s View() function.

        View(DATA, "Command Lengths")
        

        RStudio will display a spreadsheet style viewer in a new window pane tab.

        Spreadsheet

      Explore R’s graphics package to discover additional functions you can use to create more complex visualizations for your data with RStudio.

      Create a Word Cloud using Python

      Word clouds depict text data using varying font sizes and colors to visually demonstrate the frequency and relative importance of each word. A common application for word clouds is to visualize tag or keyword relevancy. In this section you will use Python3 and your Bash and Zsh history files to generate a word cloud of all your shell commands. The Python packages listed below are commonly used in programs for data analysis and scientific computing and you will use them to generate your word cloud.

      1. Create a file named create_wordcloud.py in your home directory with the following content:

        ~/create_wordcloud.py
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        
        #!/usr/bin/env python3
        
        import pandas as pd
        import matplotlib.pyplot as plt
        import sys
        import os
        from wordcloud import WordCloud, STOPWORDS
        from random import randrange
        
        path = sys.argv[1]
        
        if os.path.exists(path):
            print("Creating word cloud for file: " + path)
        
            data = pd.read_table(path, header = None, names=["CMD"], encoding = "iso-8859-1")
        
            skipWords = []
            skipWords.extend(STOPWORDS)
        
            words = ''.join(data['CMD'])
        
            w = WordCloud(
                    stopwords=set(skipWords),
                    background_color='gray',
                    width=4000,
                    height=2000
                    ).generate(words)
            filename = "word_cloud_" + str(randrange(100)) + ".png"
            print("Your wordcloud's filename is: " + filename)
        
            plt.imshow(w)
            plt.axis("off")
            plt.savefig(filename, dpi=1000)
        
        else:
            print("File" + path +  "does not exist")
            
      2. Run your Python script and pass the path of one of your data set files as an argument. The script will read the contents of the file using panda’s read_table() function and convert it into a data frame with a column name of CMD. It will then use the data in the CMD column to create a concatenated string representation of the data that can be passed to wordcloud to generate a .png wordcloud image.

        ./create_wordcloud.py ~/data-sets/data-1
        

        You should see a similar output from the Python script:

          
        Creating word cloud for file: /Users/username/data-sets/data-2
        Your word cloud's filename is: word_cloud_58.png
            
        
      3. Open the word_cloud_58.png image file to view your word cloud.

        Word Cloud

      You could use a similar process to create a word cloud visualization for any text you want to analyze.

      Visualize Data using D3.js

      D3.js is a JavaScript library that helps you visualize JSON formatted data using HTML, SVG, and CSS. In this section you will us D3.js to create and embed a pie chart visualization into a web page.

      To convert your data set into JSON, you will create a Golang command line utility that generates JSON formatted plain text output. For more complex data sets, you might consider creating a similar command line utility using Golang’s json package.

      1. Create a file named cToJSON.go in your home directory with the following content:

        ./cToJSON.go
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        44
        45
        46
        47
        48
        49
        50
        51
        52
        53
        54
        55
        56
        57
        58
        59
        60
        61
        62
        63
        64
        65
        66
        67
        68
        69
        70
        71
        72
        73
        74
        75
        76
        77
        78
        79
        80
        81
        82
        83
        84
        85
        86
        87
        88
        89
        90
        91
        92
        
        package main
        
        import (
            "bufio"
            "flag"
            "fmt"
            "io"
            "os"
            "regexp"
            "sort"
        )
        
        var DATA = make(map[string]int)
        
        func lineByLine(file string) error {
            var err error
            f, err := os.Open(file)
            if err != nil {
                return err
            }
            defer f.Close()
        
            r := bufio.NewReader(f)
            for {
                line, err := r.ReadString('n')
                if err == io.EOF {
                    break
                } else if err != nil {
                    fmt.Printf("error reading file %s", err)
                    break
                }
        
                r := regexp.MustCompile("[^\s]+")
                words := r.FindAllString(line, -1)
                if len(words) == 0 {
                    continue
                }
        
                if _, ok := DATA[words[0]]; ok {
                    DATA[words[0]]++
                } else {
                    DATA[words[0]] = 1
                }
        
            }
            return nil
        }
        
        func main() {
            flag.Parse()
            if len(flag.Args()) == 0 {
                fmt.Printf("usage: cToJSON <file1> [<file2> ...]n")
                return
            }
        
            for _, file := range flag.Args() {
                err := lineByLine(file)
                if err != nil {
                    fmt.Println(err)
                }
            }
        
            n := map[int][]string{}
            var a []int
            for k, v := range DATA {
                n[v] = append(n[v], k)
            }
        
            for k := range n {
                a = append(a, k)
            }
        
            fmt.Println("[")
            sort.Sort(sort.Reverse(sort.IntSlice(a)))
        
            counter := 0
            for _, k := range a {
                if counter >= 10 {
                    break
                }
        
                for _, s := range n[k] {
                    if counter >= 10 {
                        break
                    }
                    counter++
                    fmt.Printf("{"command":"%s","count":%d},n", s, k)
                }
            }
            fmt.Println("];")
        }
            
        • The utility expects file paths to your Bash and Zsh data sets as arguments.
        • It will then read the files and find the 10 most popular commands and output it as JSON formatted data.
        • Several Golang standard library packages are used in the utility to perform operations liking reading files, using regular expressions, and sorting collections.
      2. Run the command line utility and pass in the paths to each command history data set:

        go run cToJSON.go data-set/data-1 data-set/data-2
        

        Your output should resemble the following:

          
        [
        {"command":"ll","count":1832},
        {"command":"git","count":1567},
        {"command":"cd","count":982},
        {"command":"brew","count":926},
        {"command":"unison","count":916},
        {"command":"gdf","count":478},
        {"command":"ssh","count":474},
        {"command":"rm","count":471},
        {"command":"sync","count":440},
        {"command":"ls","count":421},
        ];
            
        

        You are now ready to create your pie chart visualization and embed it into a web page using D3.js

      3. Create an HTML file named pieChart.html and copy and paste the following content. The DATA variable on line 31 contains the JSON data that was created by the cToJSON.go script in the previous step. Remove the JSON data in the example and replace it with your own JSON data.

        Note

        In this example, your JSON data is hardcoded in pieChart.html for simplicity. Web browser security constraints restrict how a document or script loaded from one origin can interact with a resource from another origin. However, you may consider using the d3-fetch module to fetch your JSON data from a specific URL.
        ~/pieChart.html
          1
          2
          3
          4
          5
          6
          7
          8
          9
         10
         11
         12
         13
         14
         15
         16
         17
         18
         19
         20
         21
         22
         23
         24
         25
         26
         27
         28
         29
         30
         31
         32
         33
         34
         35
         36
         37
         38
         39
         40
         41
         42
         43
         44
         45
         46
         47
         48
         49
         50
         51
         52
         53
         54
         55
         56
         57
         58
         59
         60
         61
         62
         63
         64
         65
         66
         67
         68
         69
         70
         71
         72
         73
         74
         75
         76
         77
         78
         79
         80
         81
         82
         83
         84
         85
         86
         87
         88
         89
         90
         91
         92
         93
         94
         95
         96
         97
         98
         99
        100
        101
        102
        103
        104
        105
        
        <!DOCTYPE html>
        <html lang="en">
          <head>
            <meta charset="utf-8">
            <title>History Visualization</title>
        
            <style type="text/css">
              * { margin: 0; padding: 0; }
        
              #chart {
                background-color: white;
                font: 14px sans-serif;
                margin: 0 auto 50px;
                width: 600px;
                height: 600px;
              }
              #chart .label{
                fill: #404040;
                font-size: 12px;
              }
            </style>
          </head>
        
          <body>
            <div id="chart"></div>
          </body>
        
          <script src="https://d3js.org/d3.v3.min.js"></script>
          <script type="text/javascript">
        
          var DATA = [
              {"command":"ll","count":1832},
              {"command":"git","count":1567},
              {"command":"cd","count":982},
              {"command":"brew","count":926},
              {"command":"unison","count":916},
              {"command":"gdf","count":478},
              {"command":"ssh","count":474},
              {"command":"rm","count":471},
              {"command":"sync","count":440},
              {"command":"ls","count":421}
              ];
        
            var width  = 600;
                height = 600;
                radius = width / 2.5;
        
            var pie = d3.layout.pie()
                        .value(function(d) { return d.count; })
        
            var pieData = pie(DATA);
            var color = d3.scale.category20();
        
            var arc = d3.svg.arc()
                        .innerRadius(0)
                        .outerRadius(radius - 7);
        
            var svg = d3.select("#chart").append("svg")
                        .attr("width", width)
                        .attr("height", height)
                        .append("g")
                        .attr("transform", "translate(" + width / 2 + "," + height / 2 + ")");
        
            var ticks = svg.selectAll("line")
                           .data(pieData)
                           .enter()
                           .append("line");
        
            ticks.attr("x1", 0)
                 .attr("x2", 0)
                 .attr("y1", -radius+4)
                 .attr("y2", -radius-2)
                 .attr("stroke", "black")
                 .attr("transform", function(d) {
                   return "rotate(" + (d.startAngle+d.endAngle)/2 * (180/Math.PI) + ")";
                 });
        
            var labels = svg.selectAll("text")
                            .data(pieData)
                            .enter()
                            .append("text");
        
            labels.attr("class", "label")
                  .attr("transform", function(d) {
                     var dist   = radius + 25;
                         angle  = (d.startAngle + d.endAngle) / 2;
                         x      = dist * Math.sin(angle);
                         y      = -dist * Math.cos(angle);
                     return "translate(" + x + "," + y + ")";
                   })
                  .attr("dy", "0.35em")
                  .attr("text-anchor", "middle")
                  .text(function(d){
                    return d.data.command + " (" + d.data.count + ")";
                  });
        
            var path = svg.selectAll("path")
                          .data(pieData)
                          .enter()
                          .append("path")
                          .attr("fill", function(d, i) { return color(i); })
                          .attr("d", arc);
          </script>
        </html>
        
      4. Navigate to your preferred browser and enter the HTML file’s absolute path to view the pie chart. For a macOS user that has stored the HTML file in their home directory, the path would resemble the following: /Users/username/pieChart.html

        JS Pie Chart

      Next Steps

      Now that you are familiar with some data visualization tools and simple techniques, you can begin to explore more sophisticated approaches using the same tools explored in this guide. Here are a few ideas you can consider:

      • Create a new data set by extracting all git related commands from your history files; analyze and visualize them.
      • Automate some of the techniques discussed in this guide using Cron jobs to generate your data sets automatically.
      • Explore the Python for Data Science eBook’s data visualization section for a deeper dive into using pandas.

      Find answers, ask questions, and help others.

      This guide is published under a CC BY-ND 4.0 license.



      Source link

      The Flagship Series: INAP Atlanta Data Center Market Overview


      Atlanta is the Southeast’s leading economic hub. It may be the ninth largest metro-area in the U.S., but it ranks in the top five markets for bandwidth access, in the top three for headquarters of Fortune 500 companies and is home to the world’s busiest airport[i]. It should be no surprise that Atlanta is also popular and growing destination for data centers.

      There’s plenty about Atlanta that is attractive to businesses looking for data center space. Atlanta is favorable for its low power costs, low risk for natural disasters and pro-business climate. The major industries in Atlanta include technology, mobility and IoT, bioscience, supply chain and manufacturing.

      Low power and high connectivity are strong factors driving data center growth in this area. On average, the cost of power is under five cents per kilowatt-hour, a cost even more favorable than the 5.2 cents per kWh in Northern Virginia—the U.S.’s No. 1 data center market. Atlanta is on an Integrated Transmission System (ITS) for power, meaning all providers have access to the same grid, ensuring that transmission is efficient and reliable, according to Georgia Transmission Corp.

      Noted in the Newmark Knight Frank, 2Q18—Atlanta Data Center Market Trends report, the City of Atlanta is invested in becoming a sustainable smart city. Plans include a Smart City Command Center at Georgia Tech’s High Performance Computing Center and a partnership with Google Fiber.

      Economic incentives also aid Atlanta’s data center market growth. According to Bisnoow, legislators in Georgia passed House Bill 696, giving “data center operators a sales and use tax break on the expensive equipment installed in their facilities.” This incentive extends to 2028 and providers have taken notice. 451 Research noted that if all the providers who have declared plans to build follow through, Atlanta will overtake Dallas for the No. 3 spot for data center growth.

      Considering Atlanta for a colocation, network or cloud solution? There are several reasons why we’re confident you’ll call INAP your future partner in this growing market.

      The INAP Atlanta Data Center Difference

      With a choice of providers in this thriving metro, why choose INAP?

      Our Atlanta Data Centers—one located downtown and the other on the perimeter of the city—offer a reliable, high-performing backbone connection to Washington, D.C. and Dallas through our private fiber. INAP customers in these flagship data centers avoid single points of failure with our high-capacity metro network rings. Metro Connect provides multiple points of egress for traffic and is Performance IP® enabled. The metro rings are built on dark fiber and use state-of-the-art networking gear from Ciena.

      Atlanta Data Centers ACS Building
      Our downtown Atlanta flagship data center.

      These flagship data centers offer cages, cabinets and private suites, all housed under facilities designed with Tier 3 compliant attributes. For customers looking to manage largescale deployments, Atlanta is a great fit—our private colocation suites give the flexibility they need. And for customers looking to reduce their footprint, our high-density data centers allow them to fit more gear into a smaller space.

      To find a best-fit configuration within the data center, customers can work with INAP’s expert support technicians to adopt the optimal architecture specific to the needs of their applications. Engineers are onsite in the Atlanta data centers and are dedicated to keeping your infrastructure online, secure and always operating at peak efficiency.

      CHAT NOW

      INAP’s extensive product portfolio supports businesses and allows customers to customize their infrastructure environments.

      At a glance, our Atlanta Data Centers feature:

      • Power: 8 MW of power capacity, 20+ kW per cabinet
      • Space: Over 200,000 square feet of leased space with 45,000 square feet of raised floor
      • Facilities: Tier 3 compliant attributes, located outside of flood plain and seismic zones
      • Energy Efficient Cooling: 4,205 tons of cooling capacity, N+1 with concurrent maintainability
      • Security: 24/7/365 onsite staff, video surveillance, key card and biometric authentication
      • Network: INAP Performance IP® mix, carrier-neutral connectivity, geographic redundancy
      • Compliance: PCI DSS, HIPAA, SOC 2 Type II, Green Globes and ENERGY STAR

      Download the Atlanta Data Center spec sheet here [PDF].

      INAP’s Content Delivery Network Performs with 18 Edge Locations

      Atlanta is one of INAPs’ 18 high-capacity CDN edge locations. Our Content Delivery Network substantially improves end users’ online experience for our customers, regardless of their distance from the content’s origin server. We also have 100 global POPs to support this network.

      Content is delivered along the lowest-latency path—from the origin sever, to the edge, to the user—using INAP’s proven and patented route-optimization technology. We also use GeoDNS technology, which identifies user longitude-latitude and directs requests to the nearest INAP CDN cache, for seamless delivery.

      INAP’s CDN also gives customers control of all aspects of content caching at their CDN edges—automated purges, cache warming, cache bypass, connection limits, request disabling, URL token stripping and much more.

      Learn more about the INAP’s CDN here.

      [i] Newmark Knight Frank, 2Q18 – Atlanta Data Center Market Trends

       

      Laura Vietmeyer


      READ MORE



      Source link