One place for hosting & domains

      Network

      Inspecting Network Information with netstat


      Updated by Linode

      Contributed by

      Mihalis Tsoukalos

      The netstat command line utility shows information about the network status of a workstation or server. netstat is available on Unix-like and Windows operating systems, with some differences in its usage between these systems.

      netstat is an older utility, and some components of its functionality have been superseded by newer tools, like the ss command. A primary benefit of using netstat is that it is frequently pre-installed on Linux systems, while other tools might not be. As well, many (but not all) of the command line options for netstat can be run without root privileges, so it can still be useful on a system where you do not have root or sudo privileges.


      Assumptions

      This guide assumes some basic knowledge of networking in Linux, including network interfaces, routing tables, and network connections and sockets.

      In This Guide

      This guide will explore the options available when running netstat on Linux. netstat can be used to inspect:

      A list of the command line options can be found below, and some advanced examples of using netstat with the AWK command will be introduced at the end of the guide.

      Note

      This guide is written for a non-root user. Depending on your configuration, some commands might require the help of sudo in order to properly execute. If you are not familiar with the sudo command, see the Users and Groups guide.

      Basic Usage

      Installing netstat

      If netstat is not present on your Linux server or workstation, it can be added by installing the net-tools package:

      sudo apt install net-tools # Debian-based systems
      sudo yum install net-tools # CentOS and RHEL systems
      

      Running netstat without Any Options

      If you execute netstat without any command line arguments and options, the utility will display all open sockets and network connections, formatted in two tables. This will most likely be a relatively long list:

      netstat
      
        
      Active Internet connections (w/o servers)
      Proto Recv-Q Send-Q Local Address           Foreign Address         State
      tcp        0      0 li140-253.members.l:ssh 185.232.67.121:43556    TIME_WAIT
      tcp        0      0 li140-253.members.:smtp 37.252.14.141:64553     SYN_RECV
      tcp        0      0 li140-253.members.l:ssh 37.252.14.141:43860     SYN_RECV
      tcp        0      0 li140-253.members.:smtp 37.252.14.141:44909     SYN_RECV
      tcp        0      0 li140-253.members.l:ssh ppp-2-86-7-61.hom:54757 ESTABLISHED
      tcp        0      0 li140-253.members.l:ssh 37.252.14.141:62736     SYN_RECV
      tcp6       0      0 li140-253.members.:http 37.252.14.141:63805     SYN_RECV
      
      Active UNIX domain sockets (w/o servers)
      Proto RefCnt Flags       Type       State         I-Node   Path
      unix  2      [ ]         DGRAM                    20972    /var/spool/postfix/dev/log
      unix  3      [ ]         DGRAM                    18134    /run/systemd/notify
      unix  3      [ ]         STREAM     CONNECTED     24059
      unix  2      [ ]         DGRAM                    22790
      unix  2      [ ACC ]     STREAM     LISTENING     24523    public/showq
      unix  2      [ ACC ]     STREAM     LISTENING     24526    private/error
      
      

      The first table displays network connections, and the columns of this table are interpreted as follows:

      Column Description
      Proto The protocol of the connection: TCP, UDP, or raw.
      Recv-Q When in reference to a TCP connection, this column shows the number of bytes received by the local network interface but not read by the connected process.
      Send-Q When in reference to a TCP connection, this column shows the number of bytes sent to the other side of the connection but not acknowledged by the remote host.
      Local Address The local address and port for the connection. By default, this will display the host name for the address, if it can be resolved. The service name for the port (e.g. SSH for port 22) will also be displayed by default.
      Foreign Address The address and port number for the connected host. The host name and service name will be displayed by default, similar to the behavior for the Local Address column.
      State The state of the connection. Because raw and UDP connections will generally not have state information, this column will usually be blank for those connection types. For TCP connections, the State column will have a value that matches one of the states specified by TCP: SYN_RECV, SYN_SENT, ESTABLISHED, etc. By default, connections in the LISTEN state will not be displayed.

      The second table displays unix sockets, and the columns of this table are interpreted as follows:

      Column Description
      Proto The protocol of the socket (unix).
      RefCnt The reference count, which is the number of attached processes connected via this socket.
      Flags Any flags associated with the socket. This will most often display ACC, short for SO_ACCEPTON, which is shown for unconnected sockets whose processes are waiting for connection requests.
      Type The type of the socket: datagram/connectionless (SOCK_DGRAM), stream/connection (SOCK_STREAM), raw (SOCK_RAW), reliably-delivered messages (SOCK_RDM), sequential packet (SOCK_SEQPACKET), or the obsolete SOCK_PACKET.
      State The state of the socket: FREE for unallocated sockets, LISTENING for sockets listening for connections, CONNECTING for sockets that are about to be connected, CONNECTED for connected sockets, and DISCONNECTING for disconnecting sockets. If the state is empty, the socket is not connected. Sockets in the LISTENING state will not be displayed by default.
      I-Node The filesystem inode of the socket.
      Path The filesystem path of the socket.

      Command Line Options

      Some important and frequently-used command line options of netstat are as follows:

      Option Definition
      -v Shows verbose output.
      -r Displays the kernel routing tables.
      -e Displays extended information for network connections.
      -i Displays a table of all network interfaces. When used with -a, the output also includes interfaces that are not up.
      -s Displays summary statistics for each protocol.
      -W Avoids truncating IP addresses and provides as much screen space as needed to display them.
      -n Displays numerical (IP) addresses, instead of resolving them to hostnames.
      -A Allows you to specify the protocol family. Valid values are inet, inet6, unix, ipx, ax25, netrom, econet, ddp and bluetooth.
      -t Displays TCP data only.
      -u Displays UDP data only.
      -4 Displays IPv4 connections only.
      -6 Displays IPv6 connections only.
      -c Displays information continuously (every second).
      -p Displays the process ID and the name of the program that owns the socket. It requires root privileges for this.
      -o Displays timer information.
      -a Shows both listening and non-listening network connections and unix sockets.
      -l Displays listening network connections and unix sockets, which are not shown by default.
      -C Displays routing information from the route cache.
      -g Displays multicast group membership information for IPv4 and IPv6.

      The rest of this guide will put the most important of these command line options to work in order to help you learn their usage. However, nothing can replace experimenting with netstat on your own.

      Sockets/Network Connections

      Include the LISTENING State

      Run netstat with the -a option to show both listening and non-listening network connections and sockets:

      netstat -a
      
        
      Active Internet connections (servers and established)
      Proto Recv-Q Send-Q Local Address           Foreign Address         State
      tcp        0      0 0.0.0.0:ssh             0.0.0.0:*               LISTEN
      tcp        0    316 li1076-154.members.:ssh 192.0.2.4:51109       ESTABLISHED
      tcp6       0      0 [::]:ssh                [::]:*                  LISTEN
      Active UNIX domain sockets (servers and established)
      Proto RefCnt Flags       Type       State         I-Node   Path
      unix  2      [ ACC ]     STREAM     LISTENING     15668    /run/systemd/private
      unix  6      [ ]         DGRAM                    9340     /run/systemd/journal/dev-log
      unix  3      [ ]         DGRAM                    9096     /run/systemd/notify
      unix  2      [ ]         DGRAM                    9098     /run/systemd/cgroups-agent
      unix  2      [ ACC ]     STREAM     LISTENING     9107     /run/systemd/fsck.progress
      unix  2      [ ACC ]     SEQPACKET  LISTENING     9117     /run/udev/control
      unix  2      [ ]         DGRAM                    9119     /run/systemd/journal/syslog
      unix  2      [ ]         DGRAM                    50340    /run/user/1000/systemd/notify
      unix  2      [ ACC ]     STREAM     LISTENING     50344    /run/user/1000/systemd/private
      ...
      
      

      Only Show the LISTENING State

      Run netstat with the -l option to only show listening network connections and sockets:

        
      Active Internet connections (only servers)
      Proto Recv-Q Send-Q Local Address           Foreign Address         State
      tcp        0      0 0.0.0.0:ssh             0.0.0.0:*               LISTEN
      tcp6       0      0 [::]:ssh                [::]:*                  LISTEN
      Active UNIX domain sockets (only servers)
      Proto RefCnt Flags       Type       State         I-Node   Path
      unix  2      [ ACC ]     STREAM     LISTENING     15668    /run/systemd/private
      unix  2      [ ACC ]     STREAM     LISTENING     9107     /run/systemd/fsck.progress
      unix  2      [ ACC ]     SEQPACKET  LISTENING     9117     /run/udev/control
      unix  2      [ ACC ]     STREAM     LISTENING     50344    /run/user/1000/systemd/private
      unix  2      [ ACC ]     STREAM     LISTENING     50349    /run/user/1000/gnupg/S.gpg-agent.ssh
      unix  2      [ ACC ]     STREAM     LISTENING     50352    /run/user/1000/gnupg/S.gpg-agent.extra
      unix  2      [ ACC ]     STREAM     LISTENING     50354    /run/user/1000/gnupg/S.gpg-agent.browser
      unix  2      [ ACC ]     STREAM     LISTENING     50356    /run/user/1000/gnupg/S.gpg-agent
      unix  2      [ ACC ]     STREAM     LISTENING     9210     /run/systemd/journal/stdout
      unix  2      [ ACC ]     STREAM     LISTENING     11261    /var/run/dbus/system_bus_socket
      
      

      Show IPv4 Connections Only

      The -A inet, --inet and -4 command line options will all tell netstat to show IPv4 connections (both TCP and UDP) only. Because listening connections are not shown by default, this command displays connections that are in a non-listening state:

      netstat -4
      
        
      Active Internet connections (w/o servers)
      Proto Recv-Q Send-Q Local Address           Foreign Address         State
      tcp        1      0 li140-253.members.:smtp 193.32.160.143:41356    CLOSE_WAIT
      tcp        0    300 li140-253.members.l:ssh athedsl-405473.ho:64917 ESTABLISHED
      tcp        1      0 li140-253.members.:smtp 193.32.160.136:37752    CLOSE_WAIT
      tcp        1      0 li140-253.members.:smtp 193.32.160.136:49900    CLOSE_WAIT
      tcp        1      0 li140-253.members.:smtp 193.32.160.136:49900    CLOSE_WAIT
      
      

      Note

      If you want to display IPv4 connections that are in both listening and non-listening state, add the -a command line option:

      netstat -4a
      

      Show IPv6 Connections Only

      The -A inet6, --inet6 and -6 command line options will all tell netstat to show IPv6 connections (both TCP and UDP) only. Because listening connections are not shown by default, this command displays connections that are in a non-listening state:

      netstat -6
      
        
      Active Internet connections (servers and established)
      Proto Recv-Q Send-Q Local Address           Foreign Address         State
      udp6       0      0 [::]:mdns               [::]:*
      udp6       0      0 [::]:58949              [::]:*
      udp6       0      0 fe80::f03c:91ff:fe6:ntp [::]:*
      udp6       0      0 2a01:7e00::f03c:91f:ntp [::]:*
      udp6       0      0 localhost:ntp           [::]:*
      udp6       0      0 [::]:ntp                [::]:*
      
      

      Note

      If you want to display IPv4 connections that are in both listening and non-listening state, add the -a command line option:

      netstat -6a
      

      Show Listening UNIX Sockets

      The -x option limits netstat to showing Unix sockets. If you want to only display listening UNIX sockets, use the following command:

      netstat -lx
      
        
      Active UNIX domain sockets (only servers)
      Proto RefCnt Flags       Type       State         I-Node   Path
      unix  2      [ ACC ]     STREAM     LISTENING     21569793 /run/user/1000/gnupg/S.gpg-agent.extra
      unix  2      [ ACC ]     STREAM     LISTENING     21569796 /run/user/1000/gnupg/S.gpg-agent.ssh
      unix  2      [ ACC ]     STREAM     LISTENING     21569798 /run/user/1000/gnupg/S.gpg-agent.browser
      unix  2      [ ACC ]     STREAM     LISTENING     21569800 /run/user/1000/gnupg/S.dirmngr
      unix  2      [ ACC ]     STREAM     LISTENING     21569802 /run/user/1000/gnupg/S.gpg-agent
      unix  2      [ ACC ]     STREAM     LISTENING     24485    public/cleanup
      unix  2      [ ACC ]     STREAM     LISTENING     20306    /var/run/dbus/system_bus_socket
      unix  2      [ ACC ]     STREAM     LISTENING     24490    private/tlsmgr
      unix  2      [ ACC ]     STREAM     LISTENING     24493    private/rewrite
      unix  2      [ ACC ]     STREAM     LISTENING     24496    private/bounce
      unix  2      [ ACC ]     STREAM     LISTENING     24499    private/defer
      unix  2      [ ACC ]     STREAM     LISTENING     24502    private/trace
      unix  2      [ ACC ]     STREAM     LISTENING     24505    private/verify
      unix  2      [ ACC ]     STREAM     LISTENING     20319    /var/run/avahi-daemon/socket
      ...
      
      

      Show TCP Connections Only

      The -t option limits netstat to showing TCP network connections. Because listening connections are not shown by default, the following command displays connections that are in a non-listening state:

      netstat -nt
      
        
      Active Internet connections (w/o servers)
      Proto Recv-Q Send-Q Local Address           Foreign Address         State
      tcp        1      0 109.74.193.253:25       193.32.160.143:41356    CLOSE_WAIT
      tcp        0      0 109.74.193.253:22       79.131.135.223:64917    ESTABLISHED
      tcp        1      0 109.74.193.253:25       193.32.160.136:37752    CLOSE_WAIT
      tcp        1      0 109.74.193.253:25       193.32.160.136:49900    CLOSE_WAIT
      tcp6       0      0 109.74.193.253:80       104.18.40.175:26111     SYN_RECV
      tcp6       0      0 109.74.193.253:80       104.18.40.175:47427     SYN_RECV
      tcp6       0      0 109.74.193.253:80       104.18.41.175:24763     SYN_RECV
      tcp6       0      0 109.74.193.253:80       104.18.41.175:32295     SYN_RECV
      tcp6       0      0 109.74.193.253:80       104.18.41.175:53268     SYN_RECV
      tcp6       0      0 109.74.193.253:80       104.18.40.175:4436      SYN_RECV
      tcp6       0      0 109.74.193.253:80       104.18.40.175:17099     SYN_RECV
      tcp6       0      0 109.74.193.253:80       104.18.41.175:12892     SYN_RECV
      
      

      Note

      The -n option in the previous command tells netstat to not resolve IP addresses to hostnames.

      Note

      If you want to display both listening and non-listening TCP connections, add the -a command line option:

      netstat -ta
      

      Show IPv4 TCP Connections Only

      If you are only interested in IPv4 TCP connections, use the -t and -4 options together. Because listening connections are not shown by default, the following command displays connections that are in a non-listening state:

      netstat -nt4
      
        
      Active Internet connections (w/o servers)
      Proto Recv-Q Send-Q Local Address           Foreign Address         State
      tcp        1      0 109.74.193.253:25       193.32.160.143:41356    CLOSE_WAIT
      tcp        0      0 109.74.193.253:22       79.131.135.223:64917    ESTABLISHED
      tcp        1      0 109.74.193.253:25       193.32.160.136:37752    CLOSE_WAIT
      tcp        1      0 109.74.193.253:25       193.32.160.136:49900    CLOSE_WAIT
      
      

      Note

      If you want to display both listening and non-listening IPv4 TCP connections, add the -a command line option:

      netstat -t4a
      

      Show Listening TCP Connections Only

      If you want to display listening TCP connections only, combine -l and -t:

      netstat -lt
      
        
      Active Internet connections (only servers)
      Proto Recv-Q Send-Q Local Address           Foreign Address         State
      tcp        0      0 localhost:mysql         0.0.0.0:*               LISTEN
      tcp        0      0 0.0.0.0:ssh             0.0.0.0:*               LISTEN
      tcp      101      0 0.0.0.0:smtp            0.0.0.0:*               LISTEN
      tcp6       0      0 [::]:http               [::]:*                  LISTEN
      tcp6       0      0 [::]:ssh                [::]:*                  LISTEN
      tcp6       0      0 [::]:https              [::]:*                  LISTEN
      
      

      Show UDP Connections Only

      If you are only interested in seeing UDP connections, use the -u option:

      netstat  -u
      
        
      Active Internet connections (servers and established)
      Proto Recv-Q Send-Q Local Address           Foreign Address         State
      udp        0      0 0.0.0.0:mdns            0.0.0.0:*
      udp        0      0 li140-253.member:syslog 0.0.0.0:*
      udp        0      0 0.0.0.0:60397           0.0.0.0:*
      udp        0      0 0.0.0.0:bootpc          0.0.0.0:*
      udp        0      0 li140-253.members.l:ntp 0.0.0.0:*
      udp        0      0 localhost:ntp           0.0.0.0:*
      udp        0      0 0.0.0.0:ntp             0.0.0.0:*
      udp6       0      0 [::]:mdns               [::]:*
      udp6       0      0 [::]:58949              [::]:*
      udp6       0      0 fe80::f03c:91ff:fe6:ntp [::]:*
      udp6       0      0 2a01:7e00::f03c:91f:ntp [::]:*
      udp6       0      0 localhost:ntp           [::]:*
      udp6       0      0 [::]:ntp                [::]:*
      
      

      Note

      To show only IPv4 or IPv6 UDP connections, combine -u with -4 or -6:

      netstat -u4
      netstat -u6
      

      Show Extended Output

      The -e command line parameter tells netstat to show extended output, which will add the User and Inode columns to the displayed table (but only for network connections, not unix sockets). For example, this command will show extended output for a system’s listening TCP connections:

      netstat -lte
      
        
      Active Internet connections (only servers)
      Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode
      tcp        0      0 localhost:mysql         0.0.0.0:*               LISTEN      mysql      35862475
      tcp        0      0 0.0.0.0:ssh             0.0.0.0:*               LISTEN      root       35572959
      tcp      101      0 0.0.0.0:smtp            0.0.0.0:*               LISTEN      root       35544149
      tcp6       0      0 [::]:http               [::]:*                  LISTEN      root       35577141
      tcp6       0      0 [::]:ssh                [::]:*                  LISTEN      root       35572961
      tcp6       0      0 [::]:https              [::]:*                  LISTEN      root       35577145
      
      

      Show the PID and Program Name

      The -p option displays the process ID and program name that corresponds to a network connection or unix socket.

      Note

      netstat requires root privileges to show the PID and program name of processes that are not owned by your user.

      This command will display the PID and program name for a system’s listening TCP connections:

      sudo netstat -ltp
      
        
      Active Internet connections (only servers)
      Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
      tcp        0      0 localhost:mysql         0.0.0.0:*               LISTEN      24555/mysqld
      tcp        0      0 0.0.0.0:ssh             0.0.0.0:*               LISTEN      1008/sshd
      tcp      101      0 0.0.0.0:smtp            0.0.0.0:*               LISTEN      8576/master
      tcp6       0      0 [::]:http               [::]:*                  LISTEN      1808/apache2
      tcp6       0      0 [::]:ssh                [::]:*                  LISTEN      1008/sshd
      tcp6       0      0 [::]:https              [::]:*                  LISTEN      1808/apache2
      
      

      Note

      In particular, the previous example’s command is a fast way to learn about which networked services are running on your system.

      Combining -p and -e

      Combining -p with -e while having root privileges will simultaneously reveal the user, inode, and PID/program name of your network connections. The following example command will show all of this information for a system’s listening TCP connections:

      sudo netstat -ltpe
      
        
      Active Internet connections (only servers)
      Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode      PID/Program name
      tcp        0      0 localhost:mysql         0.0.0.0:*               LISTEN      mysql      35862475   24555/mysqld
      tcp        0      0 0.0.0.0:ssh             0.0.0.0:*               LISTEN      root       35572959   1008/sshd
      tcp      101      0 0.0.0.0:smtp            0.0.0.0:*               LISTEN      root       35544149   8576/master
      tcp6       0      0 [::]:http               [::]:*                  LISTEN      root       35577141   1808/apache2
      tcp6       0      0 [::]:ssh                [::]:*                  LISTEN      root       35572961   1008/sshd
      tcp6       0      0 [::]:https              [::]:*                  LISTEN      root       35577145   1808/apache2
      
      

      Routing Tables

      One of the most frequent uses of netstat is for showing the routing table of a machine:

      netstat -nr
      
        
      Kernel IP routing table
      Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
      0.0.0.0         109.74.193.1    0.0.0.0         UG        0 0          0 eth0
      109.74.193.0    0.0.0.0         255.255.255.0   U         0 0          0 eth0
      
      

      In this output, the U flag means that the route is in use and the G flag denotes the default gateway. The H flag, which is not displayed here, would mean that the route is to a host and not to a network.

      Network Interfaces

      The -i option shows network statistics on a per-interface basis:

      netstat -i
      
        
      Kernel Interface table
      Iface      MTU    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
      eth0      1500  7075525      0      0 0       6830902      0      0      0 BMRU
      lo       65536   573817      0      0 0        573817      0      0      0 LRU
      
      
      Column Description
      Iface The name of the interface.
      MTU The value of the Maximum Transmission Unit.
      RX-OK The number of error free packets received.
      RX-ERR The number of packets received with errors.
      RX-DRP The number of dropped packets when receiving.
      RX-OVR The number of packets lost due to the overflow when receiving.
      TX-OK The number of error-free packets transmitted.
      RX-ERR The number of transmitted packets with errors.
      TX-DRP The number of dropped packets when transmitting.
      TX-OVR The number of packets lost due to the overflow when transmitting.
      Flag Flag values for the interface.

      If you combine -a with -i, netstat will also display interfaces that are not up:

      netstat -ia
      
        
      Kernel Interface table
      Iface      MTU    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
      dummy0    1500        0      0      0 0             0      0      0      0 BO
      erspan0   1450        0      0      0 0             0      0      0      0 BM
      eth0      1500 13128358      0      0 0      15677694      0      0      0 BMRU
      gre0      1476        0      0      0 0             0      0      0      0 O
      gretap0   1462        0      0      0 0             0      0      0      0 BM
      ip6_vti0  1364        0      0      0 0             0      0      0      0 O
      ip6gre0   1448        0      0      0 0             0      0      0      0 O
      ip6tnl0   1452        0      0      0 0             0      0      0      0 O
      ip_vti0   1480        0      0      0 0             0      0      0      0 O
      lo       65536   846097      0      0 0        846097      0      0      0 LRU
      sit0      1480        0      0      0 0             0      0      0      0 O
      teql0     1500        0      0      0 0             0      0      0      0 O
      tunl0     1480        0      0      0 0             0      0      0      0 O
      
      

      Network Protocols

      Use the -s option to see network statistics on a per-protocol basis:

      netstat -s
      
        
      Ip:
          Forwarding: 2
          6775334 total packets received
          11 with invalid addresses
          0 forwarded
          0 incoming packets discarded
          6775323 incoming packets delivered
          7339283 requests sent out
      Icmp:
          10531 ICMP messages received
          4415 input ICMP message failed
          InCsumErrors: 3
          ICMP input histogram:
              destination unreachable: 6035
              timeout in transit: 93
              redirects: 1
              echo requests: 4379
              timestamp request: 20
          16939 ICMP messages sent
          0 ICMP messages failed
          ICMP output histogram:
              destination unreachable: 12540
              echo replies: 4379
              timestamp replies: 20
      IcmpMsg:
              InType3: 6035
              InType5: 1
              InType8: 4379
              InType11: 93
              InType13: 20
              OutType0: 4379
              OutType3: 12540
              OutType14: 20
      Tcp:
          38781 active connection openings
          330301 passive connection openings
          10683 failed connection attempts
          26722 connection resets received
          1 connections established
          6580191 segments received
          10797407 segments sent out
          654603 segments retransmitted
          748 bad segments received
          408640 resets sent
          InCsumErrors: 747
      Udp:
          212303 packets received
          13230 packets to unknown port received
          126 packet receive errors
          213173 packets sent
          0 receive buffer errors
          0 send buffer errors
          InCsumErrors: 126
      UdpLite:
      TcpExt:
          10451 resets received for embryonic SYN_RECV sockets
          9 ICMP packets dropped because they were out-of-window
          41710 TCP sockets finished time wait in fast timer
          294 packetes rejected in established connections because of timestamp
          161285 delayed acks sent
          22 delayed acks further delayed because of locked socket
          Quick ack mode was activated 20984 times
          43 SYNs to LISTEN sockets dropped
          1199311 packet headers predicted
          1851531 acknowledgments not containing data payload received
          919487 predicted acknowledgments
          114 times recovered from packet loss due to fast retransmit
          TCPSackRecovery: 5474
          TCPSACKReneging: 2
          Detected reordering 8235 times using SACK
          Detected reordering 21 times using reno fast retransmit
          Detected reordering 219 times using time stamp
          154 congestion windows fully recovered without slow start
          80 congestion windows partially recovered using Hoe heuristic
          TCPDSACKUndo: 142
          1009 congestion windows recovered without slow start after partial ack
          TCPLostRetransmit: 33008
          68 timeouts after reno fast retransmit
          TCPSackFailures: 302
          599 timeouts in loss state
          57605 fast retransmits
          2647 retransmits in slow start
          TCPTimeouts: 618841
          TCPLossProbes: 35168
          TCPLossProbeRecovery: 12069
          TCPRenoRecoveryFail: 60
          TCPSackRecoveryFail: 668
          TCPBacklogCoalesce: 54624
          TCPDSACKOldSent: 20866
          TCPDSACKOfoSent: 56
          TCPDSACKRecv: 5136
          TCPDSACKOfoRecv: 76
          20881 connections reset due to unexpected data
          1466 connections reset due to early user close
          3960 connections aborted due to timeout
          TCPSACKDiscard: 54
          TCPDSACKIgnoredOld: 28
          TCPDSACKIgnoredNoUndo: 2114
          TCPSpuriousRTOs: 23
          TCPSackShifted: 33515
          TCPSackMerged: 56742
          TCPSackShiftFallback: 24412
          TCPDeferAcceptDrop: 127813
          TCPRcvCoalesce: 258864
          TCPOFOQueue: 24749
          TCPOFOMerge: 56
          TCPChallengeACK: 238
          TCPSYNChallenge: 3
          TCPFastOpenCookieReqd: 6
          TCPFromZeroWindowAdv: 32
          TCPToZeroWindowAdv: 32
          TCPWantZeroWindowAdv: 187
          TCPSynRetrans: 517978
          TCPOrigDataSent: 7250401
          TCPHystartTrainDetect: 102
          TCPHystartTrainCwnd: 15000
          TCPHystartDelayDetect: 1533
          TCPHystartDelayCwnd: 101578
          TCPACKSkippedSynRecv: 16
          TCPACKSkippedPAWS: 160
          TCPACKSkippedSeq: 54
          TCPACKSkippedTimeWait: 140
          TCPACKSkippedChallenge: 91
          TCPWinProbe: 1552
          TCPDelivered: 7093769
          TCPAckCompressed: 12241
          TCPWqueueTooBig: 222
      IpExt:
          InMcastPkts: 104
          OutMcastPkts: 106
          InOctets: 3072954621
          OutOctets: 10300134722
          InMcastOctets: 8757
          OutMcastOctets: 8837
          InNoECTPkts: 6759736
          InECT1Pkts: 312
          InECT0Pkts: 54355
          InCEPkts: 8644
      Sctp:
          0 Current Associations
          0 Active Associations
          0 Passive Associations
          0 Number of Aborteds
          0 Number of Graceful Terminations
          14 Number of Out of Blue packets
          0 Number of Packets with invalid Checksum
          14 Number of control chunks sent
          0 Number of ordered chunks sent
          0 Number of Unordered chunks sent
          14 Number of control chunks received
          0 Number of ordered chunks received
          0 Number of Unordered chunks received
          0 Number of messages fragmented
          0 Number of messages reassembled
          14 Number of SCTP packets sent
          14 Number of SCTP packets received
          SctpInPktSoftirq: 14
          SctpInPktDiscards: 14
      
      

      Including the -w option will tell netstat to display raw network statistics:

      sudo netstat -sw
      
        
      Ip:
          Forwarding: 2
          6775954 total packets received
          11 with invalid addresses
          0 forwarded
          0 incoming packets discarded
          6775943 incoming packets delivered
          7339740 requests sent out
      Icmp:
          10531 ICMP messages received
          4415 input ICMP message failed
          InCsumErrors: 3
          ICMP input histogram:
              destination unreachable: 6035
              timeout in transit: 93
              redirects: 1
              echo requests: 4379
              timestamp request: 20
          16942 ICMP messages sent
          0 ICMP messages failed
          ICMP output histogram:
              destination unreachable: 12543
              echo replies: 4379
              timestamp replies: 20
      IcmpMsg:
              InType3: 6035
              InType5: 1
              InType8: 4379
              InType11: 93
              InType13: 20
              OutType0: 4379
              OutType3: 12543
              OutType14: 20
      UdpLite:
      IpExt:
          InMcastPkts: 104
          OutMcastPkts: 106
          InOctets: 3072998471
          OutOctets: 10300305693
          InMcastOctets: 8757
          OutMcastOctets: 8837
          InNoECTPkts: 6760354
          InECT1Pkts: 312
          InECT0Pkts: 54357
          InCEPkts: 8644
      
      

      Multicast Group Membership

      The netstat -g command displays multicast group membership information:

      netstat -g
      
        
      IPv6/IPv4 Group Memberships
      Interface       RefCnt Group
      --------------- ------ ---------------------
      lo              1      all-systems.mcast.net
      eth0            1      224.0.0.251
      eth0            1      all-systems.mcast.net
      lo              1      ip6-allnodes
      lo              1      ff01::1
      dummy0          1      ip6-allnodes
      dummy0          1      ff01::1
      eth0            1      ff02::202
      eth0            1      ff02::fb
      ...
      
      

      Note

      The default output of netstat -g displays both IPv4 and IPv6 data.

      Using AWK to process netstat output

      The AWK programming language can help you process netstat output and generate handy reports.

      Showing the Number of Listening Processes Per Username

      The following command calculates the total number of listening processes per username:

      sudo netstat -lte | awk '{print $7}' | grep -v Address | grep -v "^$" | sort | uniq -c | awk '{print $2 ": " $1}'
      
        
      mysql: 1
      root: 5
      
      
      • The netstat command collects the listening TCP connections and includes the users for the connections’ processes in the output.
      • The first awk command limits the output to the column that displays the user.
      • The first grep command deletes the line with the header information generated by netstat.
      • The second grep command deletes empty lines from the output.
      • The sort command sorts the users alphabetically.
      • After that, the uniq command counts line occurrences while omitting repeated output.
      • Lastly, the second awk command reverses the two columns of the uniq command’s output and prints the data on screen.

      HTTP Connections

      The following command, which needs root privileges to run, extracts the IP address from all established Apache connections and calculates the number of connections per IP address:

      sudo netstat -anpt | grep apache2 | grep ESTABLISHED | awk -F "[ :]*" '{print $4}' | uniq -c
      
        
            4 109.74.193.253
      
      

      TCP Connections

      The following command calculates the number of TCP connections per IP address, sorted by the number of connections:

      netstat -nt | awk '/^tcp/ {print $5}' | awk -F: '{print $1}' | sort | uniq -c | sort -nr
      
        
            2 193.32.160.136
            1 79.131.135.223
            1 193.32.160.143
            1 106.13.205.251
      
      

      Counting TCP States

      The next command counts the various types of TCP states:

      netstat -ant | awk '{print $6}' | grep -v established) | grep -v Foreign | sort | uniq -c | sort -n
      
        
            2 ESTABLISHED
            3 CLOSE_WAIT
            6 LISTEN
      
      

      Summary

      Even if there exists other more modern tools that can replace netstat, netstat remains a handy tool that will definitely help you if you ever have networking problems on your Linux machine. However, never forget to check your log files for errors or warnings related to your network problem before troubleshooting.

      More Information

      You may wish to consult the following resources for additional information on this topic. While these are provided in the hope that they will be useful, please note that we cannot vouch for the accuracy or timeliness of externally hosted materials.

      Find answers, ask questions, and help others.

      This guide is published under a CC BY-ND 4.0 license.



      Source link

      INAP Executive Spotlight: Mary Jane Horne, SVP, Global Network Services


      In the INAP Executive Spotlight series, we interview senior leaders across the organization, hearing candid reflections about their careers, the mentors who shaped them and big lessons learned along the way.Mary Jane Horne headshot black and white

      Next in the series is Mary Jane Horne, SVP of Global Network Services. With over 25 years of network and operations experience, Horne currently oversees INAP’s network engineering, carrier management, and global support teams, and is responsible for these activities across INAP’s worldwide footprint.

      Horne shares the lessons she’s learned throughout her career, working in the technology, media and telecommunications industries in the U.S. and abroad. Read on to learn what she loves about her role in tech, and the advice that she has for those looking to progress along their career path.

      The interview has been lightly edited for clarity and length.

      How did you get started in network engineering? What inspired you to pursue it?

      Growing up, my dad was an engineer. I started out in college as a computer science major, but switched after my first year to engineering. I spent five years at Northeastern University in Boston studying electrical and computer engineering, and I worked for the federal government while in school.

      After graduation, I went to work for the phone company, and my first job was as a central office design engineer. I was given some of the best advice of my career by my first manager, which was to move around as much as I could at the “doer “level, to figure out how the company worked. I had 10 jobs in the 13 and a half years I worked there, with a variety of roles in field engineering, technical sales support, customer service and corporate development. I learned how interdependent everyone was, and how best to improve important processes.

      After deciding to change companies to a small fiber start up, I realized the most important part of any company is its foundation. In the roles I held there, we created the strategy for the company, built out the network, thought out of the box for customer solutions and drove sales from $100k in year one to $64.5M in year five. Here is where I truly embraced the role network plays in driving the success of the company.

      Can you tell us more about your work with the global network services team? What are some challenges with that part of the business?

      Our global network strategy started by going from metro to metro and grooming the network components (both fiber and lit services) which eliminated of a lot of unnecessary costs in running the network. We also lit an express 100-gig ring between 3 key data center locations (Dallas/NY/San Jose) to carry more of our own traffic on-net. We have, since the completion of these first 2 initiatives, been upgrading a majority of the US and trans-Atlantic backbones to 100gig as well, to provide much needed additional capacity. We’re deploying new state of the art technology from Ciena on the fiber and bandwidth we are purchasing, allowing us to provide scalability and redundancy, while giving us the opportunity to develop new products in the future. When all is said and done with these three initiatives, the network operating expenses are flat with what they were before, however, our capacity will be three times what it was in the old network.

      We also have the software side of the network. We have CDN, Performance IP®, Managed DNS, as well as other in-house tools supported by the team. They are continuously evaluating where we need to take these products in order to stay competitive, which may include partnering and white labeling. How do we get these products launched across this network that we are deploying and upgrading? Global network services is not just a foundation, but it’s also the product and services that ride across the network. We have infrastructure evolution, as well as product evolution, and that’s where I focus with the team.

      What do you love about your role in tech?

      Learning new things and trying new things is part of who I am. Because tech is ever changing, it’s always been very exciting for me. I think as tech has evolved, some people have fallen off the bandwagon since they don’t keep up with the latest and greatest trends.

      In tech, you must be a person who looks to the future. I look at what’s coming up, not just how I need to design a network for today, and what the customers need today, but what I need three years from now. What should I consider now to prepare for any changes that might come down the road? That’s one of the things that I’ve always been attracted to in the tech industry— looking far enough ahead to say, “I need to do this, but I don’t want to be shortsighted and do it the cheap way just to get done with today. I want to look at how to do it the best way, so we are ready for the future, and we can then move forward faster.” Tech gives me exciting opportunities to do that.

      Of the qualities you possess, which do you think has been the greatest influence on your success?

      The ability to try anything and rise to challenges, even when I have no idea what I’m doing. I credit my boss, Pete Aquino [INAP CEO], for challenging me over the course of our working relationship. He would say, “I have a need for X.” And I’d say, “I’ve never done that before.” He’d respond, “That’s fine. I know you’ll figure it out.”

      I have learned so much because I did things that I never would have done anywhere else in my career, because somebody trusted me to figure it out. The only thing you need to say to me is it’s impossible, or everyone else who tried couldn’t do it, because now I’m sure I’m going to get it done. I love a challenge. I think that’s driven me through my career.

      Who are some of the people that have mentored you in your career?

      Some of the best advice I’ve ever been given came from another other female leader in the industry. When I wanted to make that jump from being a manager to the next level, my boss at the time was a female director, and that was considered quite the accomplishment (back then) at a phone company. I said to her, “’I’m ready, I’m looking to move up. I’m really excited.” She gave me the second best piece of advice I’ve ever been given: Just because you are really good at what you do today, does not mean you ready for the next level. She pointed out, in order to be considered for the next level, you need to continuously demonstrate leadership qualities and focus on how you embrace and lead change.

      That was an eye opening, great piece of advice. That’s when I made some drastic changes and left the big stable environment to go to a risky startup, where you have to lead every day to be successful.

      If you had to pick a piece of advice that you’d give to someone pursuing IT or network engineering as a career path, what would that be?

      I just approved some training for people who want to learn more. Don’t be afraid to ask for that. Always stay current, always stay hungry, always learn as much as you can, and learn across platforms. It’ll make you more valuable.

      Also, tell your boss what you need and what you’re interested in. You must have open communication with your manager. We are not mind readers, so talk about what your plans might be, or ask for help in developing them. We are the ones who have to drive our own careers.

      Are there any other big lessons you’ve learned in your career that you want to share?

      I learned to take a step back and think about things in the big picture, instead of just what I’m doing today. What I decide to do today could affect what other people will be doing well into the future, especially in technology. Ask yourself, am I really making the right choice, or do I need to evaluate other options?

      I also believe we should cross-train people. At a minimum, I think we should have people sit in somebody else’s job for a week or two, and swap chairs. It gives employees appreciation for other roles and responsibilities that they may not truly understand or have misjudged. It also may help folks develop a path to pursue other roles in the future.

      I was lucky enough in my career to be able to move from department to department, so I could get a better view of how a company worked. You can’t always do that in smaller companies, but I think those are valuable lessons to learn. We should spend more time educating one another on how things work at INAP.

      Laura Vietmeyer


      READ MORE



      Source link

      How To Build a Neural Network to Recognize Handwritten Digits with TensorFlow


      Introduction

      Neural networks are used as a method of deep learning, one of the many subfields of artificial intelligence. They were first proposed around 70 years ago as an attempt at simulating the way the human brain works, though in a much more simplified form. Individual ‘neurons’ are connected in layers, with weights assigned to determine how the neuron responds when signals are propagated through the network. Previously, neural networks were limited in the number of neurons they were able to simulate, and therefore the complexity of learning they could achieve. But in recent years, due to advancements in hardware development, we have been able to build very deep networks, and train them on enormous datasets to achieve breakthroughs in machine intelligence.

      These breakthroughs have allowed machines to match and exceed the capabilities of humans at performing certain tasks. One such task is object recognition. Though machines have historically been unable to match human vision, recent advances in deep learning have made it possible to build neural networks which can recognize objects, faces, text, and even emotions.

      In this tutorial, you will implement a small subsection of object recognition—digit recognition. Using TensorFlow, an open-source Python library developed by the Google Brain labs for deep learning research, you will take hand-drawn images of the numbers 0-9 and build and train a neural network to recognize and predict the correct label for the digit displayed.

      While you won’t need prior experience in practical deep learning or TensorFlow to follow along with this tutorial, we’ll assume some familiarity with machine learning terms and concepts such as training and testing, features and labels, optimization, and evaluation. You can learn more about these concepts in An Introduction to Machine Learning.

      Prerequisites

      To complete this tutorial, you’ll need:

      Step 1 — Configuring the Project

      Before you can develop the recognition program, you’ll need to install a few dependencies and create a workspace to hold your files.

      We’ll use a Python 3 virtual environment to manage our project’s dependencies. Create a new directory for your project and navigate to the new directory:

      • mkdir tensorflow-demo
      • cd tensorflow-demo

      Execute the following commands to set up the virtual environment for this tutorial:

      • python3 -m venv tensorflow-demo
      • source tensorflow-demo/bin/activate

      Next, install the libraries you’ll use in this tutorial. We’ll use specific versions of these libraries by creating a requirements.txt file in the project directory which specifies the requirement and the version we need. Create the requirements.txt file:

      Open the file in your text editor and add the following lines to specify the Image, NumPy, and TensorFlow libraries and their versions:

      requirements.txt

      image==1.5.20
      numpy==1.14.3
      tensorflow==1.4.0
      

      Save the file and exit the editor. Then install these libraries with the following command:

      • pip install -r requirements.txt

      With the dependencies installed, we can start working on our project.

      Step 2 — Importing the MNIST Dataset

      The dataset we will be using in this tutorial is called the MNIST dataset, and it is a classic in the machine learning community. This dataset is made up of images of handwritten digits, 28x28 pixels in size. Here are some examples of the digits included in the dataset:

      Examples of MNIST images

      Let's create a Python program to work with this dataset. We will use one file for all of our work in this tutorial. Create a new file called main.py:

      Now open this file in your text editor of choice and add this line of code to the file to import the TensorFlow library:

      main.py

      import tensorflow as tf
      

      Add the following lines of code to your file to import the MNIST dataset and store the image data in the variable mnist:

      main.py

      from tensorflow.examples.tutorials.mnist import input_data
      mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) # y labels are oh-encoded
      

      When reading in the data, we are using one-hot-encoding to represent the labels (the actual digit drawn, e.g. "3") of the images. One-hot-encoding uses a vector of binary values to represent numeric or categorical values. As our labels are for the digits 0-9, the vector contains ten values, one for each possible digit. One of these values is set to 1, to represent the digit at that index of the vector, and the rest are set to 0. For example, the digit 3 is represented using the vector [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]. As the value at index 3 is stored as 1, the vector therefore represents the digit 3.

      To represent the actual images themselves, the 28x28 pixels are flattened into a 1D vector which is 784 pixels in size. Each of the 784 pixels making up the image is stored as a value between 0 and 255. This determines the grayscale of the pixel, as our images are presented in black and white only. So a black pixel is represented by 255, and a white pixel by 0, with the various shades of gray somewhere in between.

      We can use the mnist variable to find out the size of the dataset we have just imported. Looking at the num_examples for each of the three subsets, we can determine that the dataset has been split into 55,000 images for training, 5000 for validation, and 10,000 for testing. Add the following lines to your file:

      main.py

      n_train = mnist.train.num_examples # 55,000
      n_validation = mnist.validation.num_examples # 5000
      n_test = mnist.test.num_examples # 10,000
      

      Now that we have our data imported, it’s time to think about the neural network.

      Step 3 — Defining the Neural Network Architecture

      The architecture of the neural network refers to elements such as the number of layers in the network, the number of units in each layer, and how the units are connected between layers. As neural networks are loosely inspired by the workings of the human brain, here the term unit is used to represent what we would biologically think of as a neuron. Like neurons passing signals around the brain, units take some values from previous units as input, perform a computation, and then pass on the new value as output to other units. These units are layered to form the network, starting at a minimum with one layer for inputting values, and one layer to output values. The term hidden layer is used for all of the layers in between the input and output layers, i.e. those "hidden" from the real world.

      Different architectures can yield drastically different results, as the performance can be thought of as a function of the architecture among other things, such as the parameters, the data, and the duration of training.

      Add the following lines of code to your file to store the number of units per layer in global variables. This allows us to alter the network architecture in one place, and at the end of the tutorial you can test for yourself how different numbers of layers and units will impact the results of our model:

      main.py

      n_input = 784   # input layer (28x28 pixels)
      n_hidden1 = 512 # 1st hidden layer
      n_hidden2 = 256 # 2nd hidden layer
      n_hidden3 = 128 # 3rd hidden layer
      n_output = 10   # output layer (0-9 digits)
      

      The following diagram shows a visualization of the architecture we've designed, with each layer fully connected to the surrounding layers:

      Diagram of a neural network

      The term "deep neural network" relates to the number of hidden layers, with "shallow" usually meaning just one hidden layer, and "deep" referring to multiple hidden layers. Given enough training data, a shallow neural network with a sufficient number of units should theoretically be able to represent any function that a deep neural network can. But it is often more computationally efficient to use a smaller deep neural network to achieve the same task that would require a shallow network with exponentially more hidden units. Shallow neural networks also often encounter overfitting, where the network essentially memorizes the training data that it has seen, and is not able to generalize the knowledge to new data. This is why deep neural networks are more commonly used: the multiple layers between the raw input data and the output label allow the network to learn features at various levels of abstraction, making the network itself better able to generalize.

      Other elements of the neural network that need to be defined here are the hyperparameters. Unlike the parameters that will get updated during training, these values are set initially and remain constant throughout the process. In your file, set the following variables and values:

      main.py

      learning_rate = 1e-4
      n_iterations = 1000
      batch_size = 128
      dropout = 0.5
      

      The learning rate represents ow much the parameters will adjust at each step of the learning process. These adjustments are a key component of training: after each pass through the network we tune the weights slightly to try and reduce the loss. Larger learning rates can converge faster, but also have the potential to overshoot the optimal values as they are updated. The number of iterations refers to how many times we go through the training step, and the batch size refers to how many training examples we are using at each step. The dropout variable represents a threshold at which we elimanate some units at random. We will be using dropout in our final hidden layer to give each unit a 50% chance of being eliminated at every training step. This helps prevent overfitting.

      We have now defined the architecture of our neural network, and the hyperparameters that impact the learning process. The next step is to build the network as a TensorFlow graph.

      Step 4 — Building the TensorFlow Graph

      To build our network, we will set up the network as a computational graph for TensorFlow to execute. The core concept of TensorFlow is the tensor, a data structure similar to an array or list. initialized, manipulated as they are passed through the graph, and updated through the learning process.

      We’ll start by defining three tensors as placeholders, which are tensors that we'll feed values into later. Add the following to your file:

      main.py

      X = tf.placeholder("float", [None, n_input])
      Y = tf.placeholder("float", [None, n_output])
      keep_prob = tf.placeholder(tf.float32) 
      

      The only parameter that needs to be specified at its declaration is the size of the data we will be feeding in. For X we use a shape of [None, 784], where None represents any amount, as we will be feeding in an undefined number of 784-pixel images. The shape of Y is [None, 10] as we will be using it for an undefined number of label outputs, with 10 possible classes. The keep_prob tensor is used to control the dropout rate, and we initialize it as a placeholder rather than an immutable variable because we want to use the same tensor both for training (when dropout is set to 0.5) and testing (when dropout is set to 1.0).

      The parameters that the network will update in the training process are the weight and bias values, so for these we need to set an initial value rather than an empty placeholder. These values are essentially where the network does its learning, as they are used in the activation functions of the neurons, representing the strength of the connections between units.

      Since the values are optimized during training, we could set them to zero for now. But the initial value actually has a significant impact on the final accuracy of the model. We'll use random values from a truncated normal distribution for the weights. We want them to be close to zero, so they can adjust in either a positive or negative direction, and slightly different, so they generate different errors. This will ensure that the model learns something useful. Add these lines:

      main.py

      weights = {
          'w1': tf.Variable(tf.truncated_normal([n_input, n_hidden1], stddev=0.1)),
          'w2': tf.Variable(tf.truncated_normal([n_hidden1, n_hidden2], stddev=0.1)),
          'w3': tf.Variable(tf.truncated_normal([n_hidden2, n_hidden3], stddev=0.1)),
          'out': tf.Variable(tf.truncated_normal([n_hidden3, n_output], stddev=0.1)),
      }
      

      For the bias, we use a small constant value to ensure that the tensors activate in the intial stages and therefore contribute to the propagation. The weights and bias tensors are stored in dictionary objects for ease of access. Add this code to your file to define the biases:

      main.py

      
      biases = {
          'b1': tf.Variable(tf.constant(0.1, shape=[n_hidden1])),
          'b2': tf.Variable(tf.constant(0.1, shape=[n_hidden2])),
          'b3': tf.Variable(tf.constant(0.1, shape=[n_hidden3])),
          'out': tf.Variable(tf.constant(0.1, shape=[n_output]))
      }
      

      Next, set up the layers of the network by defining the operations that will manipulate the tensors. Add these lines to your file:

      main.py

      layer_1 = tf.add(tf.matmul(X, weights['w1']), biases['b1'])
      layer_2 = tf.add(tf.matmul(layer_1, weights['w2']), biases['b2'])
      layer_3 = tf.add(tf.matmul(layer_2, weights['w3']), biases['b3'])
      layer_drop = tf.nn.dropout(layer_3, keep_prob)
      output_layer = tf.matmul(layer_3, weights['out']) + biases['out']
      

      Each hidden layer will execute matrix multiplication on the previous layer’s outputs and the current layer’s weights, and add the bias to these values. At the last hidden layer, we will apply a dropout operation using our keep_prob value of 0.5.

      The final step in building the graph is to define the loss function that we want to optimize. A popular choice of loss function in TensorFlow programs is cross-entropy, also known as log-loss, which quantifies the difference between two probability distributions (the predictions and the labels). A perfect classification would result in a cross-entropy of 0, with the loss completely minimized.

      We also need to choose the optimization algorithm which will be used to minimize the loss function. A process named gradient descent optimization is a common method for finding the (local) minimum of a function by taking iterative steps along the gradient in a negative (descending) direction. There are several choices of gradient descent optimization algorithms already implemented in TensorFlow, and in this tutorial we will be using the Adam optimizer. This extends upon gradient descent optimization by using momentum to speed up the process through computing an exponentially weighted average of the gradients and using that in the adjustments. Add the following code to your file:

      main.py

      cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=Y, logits=output_layer))
      train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
      

      We've now defined the network and built it out with TensorFlow. The next step is to feed data through the graph to train it, and then test that it has actually learnt something.

      Step 5 — Training and Testing

      The training process involves feeding the training dataset through the graph and optimizing the loss function. Every time the network iterates through a batch of more training images, it updates the parameters to reduce the loss in order to more accurately predict the digits shown. The testing process involves running our testing dataset through the trained graph, and keeping track of the number of images that are correctly predicted, so that we can calculate the accuracy.

      Before starting the training process, we will define our method of evaluating the accuracy so we can print it out on mini-batches of data while we train. These printed statements will allow us to check that from the first iteration to the last, loss decreases and accuracy increases; they will also allow us to track whether or not we have ran enough iterations to reach a consistent and optimal result:

      main.py

      correct_pred = tf.equal(tf.argmax(output_layer, 1), tf.argmax(Y, 1))
      accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
      

      In correct_pred, we use the arg_max function to compare which images are being predicted correctly by looking at the output_layer (predictions) and Y (labels), and we use the equal function to return this as a list of [Booleans](tps://www.digitalocean.com/community/tutorials/understanding-data-types-in-python-3#booleans). We can then cast this list to floats and calculate the mean to get a total accuracy score.

      We are now ready to initialize a session for running the graph. In this session we will feed the network with our training examples, and once trained, we feed the same graph with new test examples to determine the accuracy of the model. Add the following lines of code to your file:

      main.py

      init = tf.global_variables_initializer()
      sess = tf.Session()
      sess.run(init)
      

      The essence of the training process in deep learning is to optimize the loss function. Here we are aiming to minimize the difference between the predicted labels of the images, and the true labels of the images. The process involves four steps which are repeated for a set number of iterations:

      • Propagate values forward through the network
      • Compute the loss
      • Propagate values backward through the network
      • Update the parameters

      At each training step, the parameters are adjusted slightly to try and reduce the loss for the next step. As the learning progresses, we should see a reduction in loss, and eventually we can stop training and use the network as a model for testing our new data.

      Add this code to the file:

      main.py

      # train on mini batches
      for i in range(n_iterations):
          batch_x, batch_y = mnist.train.next_batch(batch_size)
          sess.run(train_step, feed_dict={X: batch_x, Y: batch_y, keep_prob:dropout})
      
          # print loss and accuracy (per minibatch)
          if i%100==0:
              minibatch_loss, minibatch_accuracy = sess.run([cross_entropy, accuracy], feed_dict={X: batch_x, Y: batch_y, keep_prob:1.0})
              print("Iteration", str(i), "t| Loss =", str(minibatch_loss), "t| Accuracy =", str(minibatch_accuracy))
      

      After 100 iterations of each training step in which we feed a mini-batch of images through the network, we print out the loss and accuracy of that batch. Note that we should not be expecting a decreasing loss and increasing accuracy here, as the values are per batch, not for the entire model. We use mini-batches of images rather than feeding them through individually to speed up the training process and allow the network to see a number of different examples before updating the parameters.

      Once the training is complete, we can run the session on the test images. This time we are using a keep_prob dropout rate of 1.0 to ensure all units are active in the testing process.

      Add this code to the file:

      main.py

      test_accuracy = sess.run(accuracy, feed_dict={X: mnist.test.images, Y: mnist.test.labels, keep_prob:1.0})
      print("nAccuracy on test set:", test_accuracy)
      

      It’s now time to run our program and see how accurately our neural network can recognize these handwritten digits. Save the main.py file and execute the following command in the terminal to run the script:

      You'll see an output similar to the following, although individual loss and accuracy results may vary slightly:

      Output

      Iteration 0 | Loss = 3.67079 | Accuracy = 0.140625 Iteration 100 | Loss = 0.492122 | Accuracy = 0.84375 Iteration 200 | Loss = 0.421595 | Accuracy = 0.882812 Iteration 300 | Loss = 0.307726 | Accuracy = 0.921875 Iteration 400 | Loss = 0.392948 | Accuracy = 0.882812 Iteration 500 | Loss = 0.371461 | Accuracy = 0.90625 Iteration 600 | Loss = 0.378425 | Accuracy = 0.882812 Iteration 700 | Loss = 0.338605 | Accuracy = 0.914062 Iteration 800 | Loss = 0.379697 | Accuracy = 0.875 Iteration 900 | Loss = 0.444303 | Accuracy = 0.90625 Accuracy on test set: 0.9206

      To try and improve the accuracy of our model, or to learn more about the impact of tuning hyperparameters, we can test the effect of changing the learning rate, the dropout threshold, the batch size, and the number of iterations. We can also change the number of units in our hidden layers, and change the amount of hidden layers themselves, to see how different architectures increase or decrease the model accuracy.

      To demonstrate that the network is actually recognizing the hand-drawn images, let's test it on a single image of our own.

      First either download this sample test image or open up a graphics editor and create your own 28x28 pixel image of a digit.

      Open the main.py file in your editor and add the following lines of code to the top of the file to import two libraries necessary for image manipulation.

      main.py

      import numpy as np
      from PIL import Image
      ...
      

      Then at the end of the file, add the following line of code to load the test image of the handwritten digit:

      main.py

      img = np.invert(Image.open("test_img.png").convert('L')).ravel()
      
      

      The open function of the Image library loads the test image as a 4D array containing the three RGB color channels and the Alpha transparency. This is not the same representation we used previously when reading in the dataset with TensorFlow, so we'll need to do some extra work to match the format.

      First, we use the convert function with the L parameter to reduce the 4D RGBA representation to one grayscale color channel. We store this as a numpy array and invert it using np.invert, because the current matrix represents black as 0 and white as 255, whereas we need the opposite. Finally, we call ravel to flatten the array.

      Now that the image data is structured correctly, we can run a session in the same way as previously, but this time only feeding in the single image for testing. Add the following code to your file to test the image and print the outputted label.

      main.py

      prediction = sess.run(tf.argmax(output_layer,1), feed_dict={X: [img]})
      print ("Prediction for test image:", np.squeeze(prediction))
      

      The np.squeeze function is called on the prediction to return the single integer from the array (i.e. to go from [2] to 2). The resulting output demonstrates that the network has recognized this image as the digit 2.

      Output

      Prediction for test image: 2

      You can try testing the network with more complex images –– digits that look like other digits, for example, or digits that have been drawn poorly or incorrectly –– to see how well it fares.

      Conclusion

      In this tutorial you successfully trained a neural network to classify the MNIST dataset with around 92% accuracy and tested it on an image of your own. Current state-of-the-art research achieves around 99% on this same problem, using more complex network architectures involving convolutional layers. These use the 2D structure of the image to better represent the contents, unlike our method which flattened all the pixels into one vector of 784 units. You can read more about this topic on the TensorFlow website, and see the research papers detailing the most accurate results on the MNIST website.

      Now that you know how to build and train a neural network, you can try and use this implementation on your own data, or test it on other popular datasets such as the Google StreetView House Numbers, or the CIFAR-10 dataset for more general image recognition.



      Source link