One place for hosting & domains

      Analyzer

      How To Install and Use GoAccess Web Log Analyzer on Ubuntu 20.04


      The author selected the Internet Archive to receive a donation as part of the Write for DOnations program.

      Introduction

      GoAccess is a tool for monitoring web server logs in realtime. It’s written in C and uses the popular ncurses library for its dashboard interface, which can be accessed directly from the command-line.

      This is great because you’re able to SSH into any web server you control and view or analyze relevant statistics quickly and securely. Apart from the command-line dashboard interface, it’s also capable of displaying the statistics in other formats such as HTML, JSON, and CSV, which you can use in other contexts or share with others.

      GoAccess could also be a great alternative to client-side analytics tools depending on your needs. It analyzes your server logs directly, so you don’t need to load any additional scripts, and your data is completely under your control.

      In this tutorial, you’ll install and configure GoAccess for Apache on an Ubuntu 20.04 web server. You’ll access the Apache log files with GoAccess before reviewing the modules available and navigation shortcuts on the command-line interface.

      Prerequisites

      For this tutorial, you’ll need the following:

      Step 1 — Installing GoAccess

      In this step you’ll install the GoAccess tool and its dependencies.

      Start by ensuring that the package database and system are up to date:

      • sudo apt update
      • sudo apt full-upgrade

      Now it’s time to install GoAccess. A version of the tool is available in the Ubuntu repos, but this is not usually the latest stable version. For example, the latest version of GoAccess at the time of writing is 1.4, while the version available from the Ubuntu 20.04 repos is 1.3.

      To ensure that you have the latest stable version of GoAccess installed on your server, you can compile from source or use the official GoAccess repository on Ubuntu.

      Method 1 — Compiling from source

      First, install the dependencies required to compile GoAccess from source:

      • sudo apt install libncursesw5-dev libgeoip-dev libtokyocabinet-dev build-essential

      You install the following dependencies:

      • build-essential: installs many packages, which includes gcc compilers for C, C+, and other programming languages, and make for building the GoAccess makefile.
      • libncursesw5-dev: installs the ncurses library that GoAccess uses for its command-line interface.
      • libgeoip-dev: includes the necessary files for the GeoIP library.
      • libtokyocabinet-dev: provides database dependencies for higher performance.

      Next, download the latest version of the GoAccess from their official website with the following command:

      • wget http://tar.goaccess.io/goaccess-1.4.tar.gz

      Once the download completes, extract the archive with:

      • tar -xzvf goaccess-1.4.tar.gz

      Change into the newly unpacked directory like this:

      Run the configure script found inside this directory:

      • ./configure --enable-utf8 --enable-geoip=legacy

      The --enable-utf8 flag ensures GoAccess compiles with wide character support, while --enable-geoip enables GeoLocation support with the original GeoIP databases. You can replace legacy with mmdb to use the enhanced GeoIP2 databases instead. You can find other configuration options on the GoAccess website.

      You’ll receive output similar to the following:

      Output

      . . . Your build configuration: Prefix : /usr/local Package : goaccess Version : 1.4 Compiler flags : -pthread Linker flags : -lnsl -lncursesw -lGeoIP -lpthread UTF-8 support : yes Dynamic buffer : no Geolocation : GeoIP Legacy Storage method : In-Memory with On-Disk Persitance Storage TLS/SSL : no Bugs : [email protected]

      Run the make command to build the makefile required for installing GoAccess:

      Finally, install GoAccess using the previously created makefile to the system:

      Ensure that the program was installed successfully by running:

      You will receive the following output:

      Output

      GoAccess - 1.4. For more details visit: http://goaccess.io Copyright (C) 2009-2020 by Gerardo Orellana Build configure arguments: --enable-utf8 --enable-geoip=legacy

      Method 2 — Using the Official GoAccess Repos

      Another way to install GoAccess is by using the official Ubuntu repository for the program. This method is preferable if you’d like it to be updated to a newer version automatically during system upgrades without having to compile from source for each new release. You need to add the repository to your server first:

      • echo "deb http://deb.goaccess.io/ $(lsb_release -cs) main" | sudo tee -a /etc/apt/sources.list.d/goaccess.list

      First you get the release name of the distribution and then pipe that to tee, which appends to the file /etc/apt/sources.list.d/goaccess.list.

      With the repository in your sources list, you can now download the GPG key to verify the signature:

      • wget -O - https://deb.goaccess.io/gnugpg.key | sudo apt-key --keyring /etc/apt/trusted.gpg.d/goaccess.gpg add -

      Next, update the package database with the following command:

      Finally, install GoAccess:

      • sudo apt install goaccess

      GoAccess is now installed on your Ubuntu server. In the next step, you’ll access and edit its configuration file so that you can make changes to how the program runs.

      Step 2 — Editing the GoAccess Configuration

      GoAccess comes with a configuration file where you can make permanent changes to the behavior of the program. You’ll edit this file to specify the time, date, and log format so that GoAccess knows how to parse the server logs.

      The configuration file may be located at ~/.goaccessrc or %sysconfdir%/goaccess.conf where %sysconfdir% is either /etc/, /usr/etc/, or /usr/local/etc/. To find out where the config file is located on your server, run the following command:

      Sample output

      /etc/goaccess/goaccess.conf

      Edit this config file using nano:

      • sudo nano /etc/goaccess/goaccess.conf

      Note: If this file does not exist on the server, ensure to create it first and populate it with the contents of the goaccess.conf file on GitHub.

      Many of the lines in the file are commented out. To enable an option, remove the first # character in front of it. Let’s enable the time-format setting for Apache first. This setting specifies the log-format time and allows GoAccess to parse any plain-text Apache log files that meet the supported formatting criteria.

      /etc/goaccess/goaccess.conf

      # The following time format works with any of the
      # Apache/NGINX's log formats below.
      #
      time-format %H:%M:%S
      

      Next, you’ll uncomment the Apache date-format setting that specifies the log-format date:

      /etc/goaccess/goaccess.conf

      # The following date format works with any of the
      # Apache/NGINX's log formats below.
      #
      date-format %d/%b/%Y
      

      Finally, uncomment the log-format setting. Several lines change this setting and the exact one to uncomment depends on the way your web server is set up. If you have a non-virtual hosts setup, uncomment the following log-format line:

      /etc/goaccess/goaccess.conf

      # NCSA Combined Log Format
      log-format %h %^[%d:%t %^] "%r" %s %b "%R" "%u"
      

      Otherwise, if you have virtual hosts set up, uncomment the following line instead:

      /etc/goaccess/goaccess.conf

      # NCSA Combined Log Format with Virtual Host
      log-format %v:%^ %h %^[%d:%t %^] "%r" %s %b "%R" "%u"
      

      At this point, you can save the file and exit the editor. You are now ready to run the GoAccess program and analyze some Apache plain-text log files.

      Step 3 — Accessing Apache’s Log Files with GoAccess

      The Apache server grants access to your website and keeps an access log for all incoming HTTP traffic. These records, or log files, are stored on the system and can be a valuable source of information about your website’s usage and audience.

      On Ubuntu, the Apache log files are stored in the /var/log/apache2 directory by default. To inspect the contents of this directory, run the following command:

      Sample output

      access.log error.log other_vhosts_access.log

      If your server has been running for a long time, you may find compressed .gz files in this directory containing past log files as a result of log rotation. The most recent logs are placed in an access.log file. For web servers with virtual hosts, you may have to cd into sub-directories from within the /apache2 directory to locate each host’s log files.

      Let’s go ahead and run GoAccess against the Apache access logs to gain insight into what type of traffic is being handled by the web server. Run the following command to analyze your access.log file with GoAccess:

      • sudo goaccess /var/log/apache2/access.log

      This will launch the GoAccess command-line dashboard.

      GoAccess command-line dashboard interface

      Note: If you see a Log Format Configuration prompt instead, it means that the changes you made to the GoAccess config file in the previous step are not taking effect. Ensure that your the config file is in the right place and that you have uncommented the necessary settings.

      As mentioned previously, you will sometimes have several compressed log files on a long-running web server. To run GoAccess on all these files without extracting them first, you can pipe the output of the zcat command to goaccess:

      • zcat /var/log/apache2/access.log.*.gz | goaccess -a

      Next you’ll learn how to quickly navigate through the dashboard interface with keyboard shortcuts.

      Step 4 — Navigating the Terminal Dashboard

      At the top of the dashboard is a summary of several key metrics. This includes total requests for the reporting period, unique visitors, log size, 404 not found errors, requested files, size of the parsed log file, HTTP referrers, name of the log source, time taken to process the log file, and more.

      Summary of dashboard metrics

      Below the top panel, you will find all the available modules which provide more details on the aforementioned metrics and other data points supported by GoAccess. To navigate the interface, use the following keyboard shortcuts:

      • TAB to move forward through the available modules and SHIFT+TAB to move backwards.
      • F5 to refresh the dashboard.
      • g to move to the top of the dashboard screen and G to move to the last item in the dashboard.
      • o or ENTER to expand the selected module.
      • j and k to scroll down and up within the active module.
      • s to display the sort options for the active module.
      • / to search across all modules and n to move to the next match.
      • 0-9 and SHIFT+0 to quickly activate the respective numbered module.
      • ? to view the quick help dialog.
      • q to quit the program.

      Let’s examine each of the available modules on the dashboard next. Each one has a number and a title, and an indication of the total number of lines present. The > character indicates the active panel, which is also reflected at the top of the dashboard.

      Active GoAccess panel demonstration

      Here’s a brief explanation of each of the panels. Each section below correspond to the panel number and title in the program.

      1 — Unique Visitors per Day

      This panel displays the hits, unique visitors, and cumulative bandwidth for each reported date. A unique visitor is considered to be one with the same IP address, date, and user-agent. It includes web crawlers and spiders by default.

      Unique visitors per day panel

      2 – Requested Files (URLs)

      This panel provides the statistics concerning the most highly requested non-static files on your web server. It displays the request path, HTTP protocol and method, unique visitors, number of hits, and cumulative bandwidth.

      Requested files

      3 — Static Requests

      This panel provides the same metrics as the previous one, but for static files such as images, CSS, JavaScript, or other file types.

      4 — Not Found URLs (404s)

      This panel also displays the same metrics discussed in 2 and 3, but for paths that were not found on the server (404s).

      5 — Visitor Hostnames and IPs

      This panel provides detailed information on the hosts that connect to your web server. You can find their IP address, the number of visits, and the amount of bandwidth consumed. This is a great way to identify who is eating up all your bandwidth and block them if necessary.

      Visitor hostnames and IPs

      If you expand this panel by pressing o, you will see more info about each host such as its country of origin, city, and reverse DNS lookup result.

      Vistor hostnames and IPs expanded

      6 — Operating Systems

      This panel reports the different operating systems used by the hosts to connect to your web server. Expanding this panel will display specific versions of each operating system.

      Operating systems

      7 — Browsers

      Similar to the previous panel, this reports the browsers used by each unique visitor to your web server and lists specific versions for each browser once expanded.

      Browsers

      8 — Time distribution

      Here, you will find an hourly report for the number of hits, unique visitors, and bandwidth consumed. This is a great way to spot periods of peak traffic on your server.

      Time distribution panel

      9 — Virtual Hosts

      This panel displays the virtual hosts parsed from the log file. It becomes active only if %v is included in the log-format configuration.

      10 — Referrer URLs

      The URLs that referred the visiting hosts to your web server are reflected here. This panel is disabled by default and can only be enabled by commenting out the REFERRERS line highlighted following in the GoAccess config file:

      /etc/goaccess/goaccess.conf

      #ignore-panel VISIT_TIMES
      #ignore-panel VIRTUAL_HOSTS
      #ignore-panel REFERRERS
      #ignore-panel REFERRING_SITES
      

      Referrer URLs panel

      11 — Referring Sites

      This panel displays the IP address of the referring hosts, but not the whole URL.

      12 — Keyphrases

      Here, the keywords used on Google search, Google cache, and Google translate that led to your website are reported. This panel is also disabled by default and must be enabled in the settings:

      /etc/goaccess/goaccess.conf

      #ignore-panel REFERRERS
      #ignore-panel REFERRING_SITES
      #ignore-panel KEYPHRASES
      #ignore-panel STATUS_CODES
      

      13 — HTTP Status Codes

      This panel reflects the overall statistics for HTTP status codes returned by your web server when responding to a request. Expanding the panel will display the aggregated stats for each status code.

      HTTP status codes panel

      14 — Remote User (HTTP Authentication)

      This panel displays the user ID of the person requesting a document on your server as determined by HTTP authentication. For documents that are not password protected, this part will be -. Note that this panel is only enabled if %e is part of the log-format configuration.

      15 — Cache status

      This panel allows you to determine if a request is being cached and served from the cache. It is enabled if %C is part of the log-format variable, and the status could be MISS, BYPASS, EXPIRED, STALE, UPDATING, REVALIDATED, or HIT.

      16 — Geo Location

      This panel provides a summary of the geographical locations derived from visiting IP addresses. Expanding this panel will display the aggregated stats for each country of origin.

      Geo location panel

      You’ve reviewed the panels available in the dashboard, now you’ll generate reports in different formats.

      Step 5 — Generating Reports

      Aside from displaying the data in the terminal, GoAccess also allows you to generate HTML, JSON, or CSV reports. Make sure that you’re in the home directory before running any of the commands in this section:

      To output the report as static HTML, specify an HTML file as the argument to the -o flag. This flag also accepts filenames that end in .json or .csv.

      • sudo goaccess /var/log/apache2/access.log -o stats.html

      A stats.html file should appear in your user directory.

      Output

      goaccess-1.4 goaccess-1.4.tar.gz snap stats.html

      You can copy this file to the user directory on your local machine using scp. Run this command from your local machine, and not the remote server:

      • scp user@your_server_ip:stats.html ~/stats.html

      Once the file has been copied over, you can open it in your browser with the open command on macOS:

      Or if you’re using a Linux distribution on your local machine:

      HTML report in Firefox

      You’ve generated a HTML report and viewed this in your browser.

      Conclusion

      In this article, we covered the GoAccess command-line tool and discussed how to use it for analyzing server logs. Although we only considered how GoAccess may be used with Apache logs, the tool also supports other log formats such as Nginx, Amazon S3, Elastic Load Balancing, and CloudFront.

      You can check the full GoAccess documentation or run man goaccess in your terminal.



      Source link