One place for hosting & domains

      Troubleshooting

      Troubleshooting Kubernetes


      Updated by Linode

      Written by Linode Community

      Troubleshooting issues with Kubernetes can be complex, and it can be difficult to account for all the possible error conditions you may see. This guide tries to equip you with the core tools that can be useful when troubleshooting, and it introduces some situations that you may find yourself in.


      Where to go for help outside this guide

      If your issue is not covered by this guide, we also recommend researching and posting in the Linode Community Questions site and in #linode on the Kubernetes Slack, where other Linode users (and the Kubernetes community) can offer advice.

      If you are running a cluster on Linode’s managed LKE service, and you are experiencing an issue related to your master/control plane components, you can report these issues to Linode by contacting Linode Support. Examples in this category include:

      • Kubernetes’ API server not running. If kubectl does not respond as expected, this can indicate problems with the API server.

      • The CCM, CSI, Calico, or kube-dns pods are not running.

      • Annotations on LoadBalancer services aren’t functioning.

      • PersistentVolumes are not re-attaching.

      Please note that the kube-apiserver and etcd pods will not be visible for LKE clusters, and this is expected. Issues outside of the scope of Linode Support include:

      In this guide we will:

      General Troubleshooting Strategies

      To troubleshoot issues with the applications running on your cluster, you can rely on the kubectl command to gather debugging information. kubectl includes a set of subcommands that can be used to research issues with your cluster, and this guide will highlight four of them: get, describe, logs, and exec.

      To troubleshoot issues with your cluster, you may need to directly view the logs that are generated by Kubernetes’ components.

      kubectl get

      Use the get command to list different kinds of resources in your cluster (nodes, pods, services, etc). The output will show the status for each resource returned. For example, this output shows that a pod is in the CrashLoopBackOff status, which means it should be investigated further:

      kubectl get pods
      NAME              READY     STATUS             RESTARTS   AGE
      ghost-0           0/1       CrashLoopBackOff   34         2h
      mariadb-0         1/1       Running            0          2h
      
      • Use the --namespace flag to show resources in a certain namespace:

        # Show pods in the `kube-system` namespace
        kubectl get pods --namespace kube-system
        

        Note

        If you’ve set up Kubernetes using automated solutions like Linode’s Kubernetes Engine, k8s-alpha CLI, or Rancher, you’ll see csi-linode and ccm-linode pods in the kube-system namespace. This is normal as long as they’re in the Running status.
      • Use the -o flag to return the resources as YAML or JSON. The Kubernetes API’s complete description for the returned resources will be shown:

        # Get pods as YAML API objects
        kubectl get pods -o yaml
        
      • Sort the returned resources with the --sort-by flag:

        # Sort by name
        kubectl get pods --sort-by=.metadata.name
        
      • Use the --selector or -l flag to get resources that match a label. This is useful for finding all pods for a given service:

        # Get pods which match the app=ghost selector
        kubectl get pods -l app=ghost
        
      • Use the --field-selector flag to return resources which match different resource fields:

        # Get all pods that are Pending
        kubectl get pods --field-selector status.phase=Pending
        
        # Get all pods that are not in the kube-system namespace
        kubectl get pods --field-selector metadata.namespace!=kube-system
        

      kubectl describe

      Use the describe command to return a detailed report of the state of one or more resources in your cluster. Pass a resource type to the describe command to get a report for each of those resources:

      kubectl describe nodes
      

      Pass the name of a resource to get a report for just that object:

      kubectl describe pods ghost-0
      

      You can also use the --selector (-l) flag to filter the returned resources, as with the get command.

      kubectl logs

      Use the logs command to print logs collected by a pod:

      kubectl logs mariadb-0
      
      • Use the --selector (-l) flag to print logs from all pods that match a selector:

        kubectl logs -l app=ghost
        
      • If a pod’s container was killed and restarted, you can view the previous container’s logs with the --previous or -p flag:

        kubectl logs -p ghost-0
        

      kubectl exec

      You can run arbitrary commands on a pod’s container by passing them to kubectl’s exec command:

      kubectl exec mariadb-0 -- ps aux
      

      The full syntax for the command is:

      kubectl exec ${POD_NAME} -c ${CONTAINER_NAME} -- ${CMD} ${ARG1} ${ARG2} ... ${ARGN}
      

      Note

      The -c flag is optional, and is only needed when the specified pod is running more than one container.

      It is possible to run an interactive shell on an existing pod/container. Pass the -it flags to exec and run the shell:

      kubectl exec -it mariadb-0 -- /bin/bash
      

      Enter exit to later leave this shell.

      Viewing Master and Worker Logs

      If the Kubernetes API server isn’t working normally, then you may not be able to use kubectl to troubleshoot. When this happens, or if you are experiencing other more fundamental issues with your cluster, you can instead log directly into your nodes and view the logs present on your filesystem.

      Non-systemd systems

      If your nodes do not run systemd, the location of logs on your master nodes should be:

      On your worker nodes:

      systemd systems

      If your nodes run systemd, you can access the logs that kubelet generates with journalctl:

      journalctl --unit kubelet
      

      Logs for your other Kubernetes software components can be found through your container runtime. When using Docker, you can use the docker ps and docker logs commands to investigate. For example, to find the container running your API server:

      docker ps | grep apiserver
      

      The output will display a list of information separated by tabs:

      2f4e6242e1a2    cfdda15fbce2    "kube-apiserver --au…"  2 days ago  Up 2 days   k8s_kube-apiserver_kube-apiserver-k8s-trouble-1-master-1_kube-system_085b2ab3bd6d908bde1af92bd25e5aaa_0
      

      The first entry (in this example: 2f4e6242e1a2) will be an alphanumeric string, and it is the ID for the container. Copy this string and pass it to docker logs to view the logs for your API server:

      docker logs ${CONTAINER_ID}
      

      Troubleshooting Examples

      Viewing the Wrong Cluster

      If your kubectl commands are not returning the resources and information you expect, then your client may be assigned to the wrong cluster context. To view all of the cluster contexts on your system, run:

      kubectl config get-contexts
      

      An asterisk will appear next to the active context:

      CURRENT   NAME                                        CLUSTER            AUTHINFO
                my-cluster-kayciZBRO5s@my-cluster           my-cluster         my-cluster-kayciZBRO5s
      *         other-cluster-kaUzJOMWJ3c@other-cluster     other-cluster      other-cluster-kaUzJOMWJ3c
      

      To switch to another context, run:

      kubectl config use-context ${CLUSTER_NAME}
      

      E.g.:

      kubectl config use-context my-cluster-kayciZBRO5s@my-cluster
      

      Can’t Provision Cluster Nodes

      If you are not able to create new nodes in your cluster, you may see an error message similar to:

        
      Error creating a Linode Instance: [400] Account Limit reached. Please open a support ticket.
      
      

      This is a reference to the total number of Linode resources that can exist on your account. To create new Linode instances for your cluster, you will need to either remove other instances on your account, or request a limit increase. To request a limit increase, contact Linode Support.

      Insufficient CPU or Memory

      If one of your pods requests more memory or CPU than is available on your worker nodes, then one of these scenarios may happen:

      • The pod will remain in the Pending state, because the scheduler cannot find a node to run it on. This will be visible when running kubectl get pods.

        If you run the kubectl describe command on your pod, the Events section may list a FailedScheduling event, along with a message like Failed for reason PodExceedsFreeCPU and possibly others. You can run kubectl describe nodes to view information about the allocated resources for each node.

      • The pod may continually crash. For example, the Ghost pod specified by Ghost’s Helm chart will show the following error in its logs when not enough memory is available:

        kubectl logs ghost --tail=5
        1) SystemError
        
        Message: You are recommended to have at least 150 MB of memory available for smooth operation. It looks like you have ~58 MB available.
        

      If your cluster has insufficient resources for a new pod, you will need to:

      • Reduce the number of other pods/deployments/applications running on your cluster,
      • Resize the Linode instances that represent your worker nodes to a higher-tier plan, or
      • Add a new worker node to your cluster.

      More Information

      You may wish to consult the following resources for additional information on this topic. While these are provided in the hope that they will be useful, please note that we cannot vouch for the accuracy or timeliness of externally hosted materials.

      Find answers, ask questions, and help others.

      This guide is published under a CC BY-ND 4.0 license.



      Source link

      Troubleshooting Kubernetes


      Updated by Linode Written by Linode Community

      Troubleshooting issues with Kubernetes can be complex, and it can be difficult to account for all the possible error conditions you may see. This guide tries to equip you with the core tools that can be useful when troubleshooting, and it introduces some situations that you may find yourself in.

      Where to go for help outside this guide

      If your issue is not covered by this guide, we also recommend researching and posting in the Linode Community Questions site and in #linode on the Kubernetes Slack, where other Linode users (and the Kubernetes community) can offer advice.

      If you are running a cluster on Linode’s managed LKE service, and you are experiencing an issue related to your master/control plane components, you can report these issues to Linode by contacting Linode Support. Examples in this category include:

      • Kubernetes’ API server not running. If kubectl does not respond as expected, this can indicate problems with the API server.

      • The CCM, CSI, Calico, or kube-dns pods are not running.

      • Annotations on LoadBalancer services aren’t functioning.

      • PersistentVolumes are not re-attaching.

      Please note that the kube-apiserver and etcd pods will not be visible for LKE clusters, and this is expected. Issues outside of the scope of Linode Support include:

      In this guide we will:

      General Troubleshooting Strategies

      To troubleshoot issues with the applications running on your cluster, you can rely on the kubectl command to gather debugging information. kubectl includes a set of subcommands that can be used to research issues with your cluster, and this guide will highlight four of them: get, describe, logs, and exec.

      To troubleshoot issues with your cluster, you may need to directly view the logs that are generated by Kubernetes’ components.

      kubectl get

      Use the get command to list different kinds of resources in your cluster (nodes, pods, services, etc). The output will show the status for each resource returned. For example, this output shows that a pod is in the CrashLoopBackOff status, which means it should be investigated further:

      kubectl get pods
      NAME              READY     STATUS             RESTARTS   AGE
      ghost-0           0/1       CrashLoopBackOff   34         2h
      mariadb-0         1/1       Running            0          2h
      
      • Use the --namespace flag to show resources in a certain namespace:

        # Show pods in the `kube-system` namespace
        kubectl get pods --namespace kube-system
        

        Note

        If you’ve set up Kubernetes using automated solutions like Linode’s Kubernetes Engine, k8s-alpha CLI, or Rancher, you’ll see csi-linode and ccm-linode pods in the kube-system namespace. This is normal as long as they’re in the Running status.
      • Use the -o flag to return the resources as YAML or JSON. The Kubernetes API’s complete description for the returned resources will be shown:

        # Get pods as YAML API objects
        kubectl get pods -o yaml
        
      • Sort the returned resources with the --sort-by flag:

        # Sort by name
        kubectl get pods --sort-by=.metadata.name
        
      • Use the --selector or -l flag to get resources that match a label. This is useful for finding all pods for a given service:

        # Get pods which match the app=ghost selector
        kubectl get pods -l app=ghost
        
      • Use the --field-selector flag to return resources which match different resource fields:

        # Get all pods that are Pending
        kubectl get pods --field-selector status.phase=Pending
        
        # Get all pods that are not in the kube-system namespace
        kubectl get pods --field-selector metadata.namespace!=kube-system
        

      kubectl describe

      Use the describe command to return a detailed report of the state of one or more resources in your cluster. Pass a resource type to the describe command to get a report for each of those resources:

      kubectl describe nodes
      

      Pass the name of a resource to get a report for just that object:

      kubectl describe pods ghost-0
      

      You can also use the --selector (-l) flag to filter the returned resources, as with the get command.

      kubectl logs

      Use the logs command to print logs collected by a pod:

      kubectl logs mariadb-0
      
      • Use the --selector (-l) flag to print logs from all pods that match a selector:

        kubectl logs -l app=ghost
        
      • If a pod’s container was killed and restarted, you can view the previous container’s logs with the --previous or -p flag:

        kubectl logs -p ghost-0
        

      kubectl exec

      You can run arbitrary commands on a pod’s container by passing them to kubectl’s exec command:

      kubectl exec mariadb-0 -- ps aux
      

      The full syntax for the command is:

      kubectl exec ${POD_NAME} -c ${CONTAINER_NAME} -- ${CMD} ${ARG1} ${ARG2} ... ${ARGN}
      

      Note

      The -c flag is optional, and is only needed when the specified pod is running more than one container.

      It is possible to run an interactive shell on an existing pod/container. Pass the -it flags to exec and run the shell:

      kubectl exec -it mariadb-0 -- /bin/bash
      

      Enter exit to later leave this shell.

      Viewing Master and Worker Logs

      If the Kubernetes API server isn’t working normally, then you may not be able to use kubectl to troubleshoot. When this happens, or if you are experiencing other more fundamental issues with your cluster, you can instead log directly into your nodes and view the logs present on your filesystem.

      Non-systemd systems

      If your nodes do not run systemd, the location of logs on your master nodes should be:

      On your worker nodes:

      systemd systems

      If your nodes run systemd, you can access the logs that kubelet generates with journalctl:

      journalctl --unit kubelet
      

      Logs for your other Kubernetes software components can be found through your container runtime. When using Docker, you can use the docker ps and docker logs commands to investigate. For example, to find the container running your API server:

      docker ps | grep apiserver
      

      The output will display a list of information separated by tabs:

      2f4e6242e1a2    cfdda15fbce2    "kube-apiserver --au…"  2 days ago  Up 2 days   k8s_kube-apiserver_kube-apiserver-k8s-trouble-1-master-1_kube-system_085b2ab3bd6d908bde1af92bd25e5aaa_0
      

      The first entry (in this example: 2f4e6242e1a2) will be an alphanumeric string, and it is the ID for the container. Copy this string and pass it to docker logs to view the logs for your API server:

      docker logs ${CONTAINER_ID}
      

      Troubleshooting Examples

      Viewing the Wrong Cluster

      If your kubectl commands are not returning the resources and information you expect, then your client may be assigned to the wrong cluster context. To view all of the cluster contexts on your system, run:

      kubectl config get-contexts
      

      An asterisk will appear next to the active context:

      CURRENT   NAME                                        CLUSTER            AUTHINFO
                my-cluster-kayciZBRO5s@my-cluster           my-cluster         my-cluster-kayciZBRO5s
      *         other-cluster-kaUzJOMWJ3c@other-cluster     other-cluster      other-cluster-kaUzJOMWJ3c
      

      To switch to another context, run:

      kubectl config use-context ${CLUSTER_NAME}
      

      E.g.:

      kubectl config use-context my-cluster-kayciZBRO5s@my-cluster
      

      Can’t Provision Cluster Nodes

      If you are not able to create new nodes in your cluster, you may see an error message similar to:

        
      Error creating a Linode Instance: [400] Account Limit reached. Please open a support ticket.
      
      

      This is a reference to the total number of Linode resources that can exist on your account. To create new Linode instances for your cluster, you will need to either remove other instances on your account, or request a limit increase. To request a limit increase, contact Linode Support.

      Insufficient CPU or Memory

      If one of your pods requests more memory or CPU than is available on your worker nodes, then one of these scenarios may happen:

      • The pod will remain in the Pending state, because the scheduler cannot find a node to run it on. This will be visible when running kubectl get pods.

        If you run the kubectl describe command on your pod, the Events section may list a FailedScheduling event, along with a message like Failed for reason PodExceedsFreeCPU and possibly others. You can run kubectl describe nodes to view information about the allocated resources for each node.

      • The pod may continually crash. For example, the Ghost pod specified by Ghost’s Helm chart will show the following error in its logs when not enough memory is available:

        kubectl logs ghost --tail=5
        1) SystemError
        
        Message: You are recommended to have at least 150 MB of memory available for smooth operation. It looks like you have ~58 MB available.
        

      If your cluster has insufficient resources for a new pod, you will need to:

      • Reduce the number of other pods/deployments/applications running on your cluster,
      • Resize the Linode instances that represent your worker nodes to a higher-tier plan, or
      • Add a new worker node to your cluster.

      More Information

      You may wish to consult the following resources for additional information on this topic. While these are provided in the hope that they will be useful, please note that we cannot vouch for the accuracy or timeliness of externally hosted materials.

      Find answers, ask questions, and help others.

      This guide is published under a CC BY-ND 4.0 license.



      Source link

      Troubleshooting Web Servers, Databases, and Other Services


      Updated by Linode Written by Linode

      This guide presents troubleshooting strategies for when you can’t connect to your web server, database, or other services running on your Linode. This guide assumes that you have access to SSH. If you can’t log in with SSH, review Troubleshooting SSH and then return to this guide.

      Where to go for help outside this guide

      This guide explains how to use different troubleshooting commands on your Linode. These commands can produce diagnostic information and logs that may expose the root of your connection issues. For some specific examples of diagnostic information, this guide also explains the corresponding cause of the issue and presents solutions for it.

      If the information and logs you gather do not match a solution outlined here, consider searching the Linode Community Site for posts that match your system’s symptoms. Or, post a new question in the Community Site and include your commands’ output.

      Linode is not responsible for the configuration or installation of software on your Linode. Refer to Linode’s Scope of Support for a description of which issues Linode Support can help with.

      General Troubleshooting Strategies

      This section highlights troubleshooting strategies that apply to every service.

      Check if the Service is Running

      The service may not be running. Check the status of the service:

      Distribution Command                                                               
      systemd systems (Arch, Ubuntu 16.04+, Debian 8+, CentOS 7+, etc) sudo systemctl status <service name> -l
      sysvinit systems (CentOS 6, Ubuntu 14.04, Debian 7, etc) sudo service <service name> status

      Restart the Service

      If the service isn’t running, try restarting it:

      Distribution Command
      systemd systems sudo systemctl restart <service name>
      sysVinit systems sudo service <service name> restart

      Enable the Service

      If your system was recently rebooted, and the service didn’t start automatically at boot, then it may not be enabled. Enable the service to prevent this from happening in the future:

      Distribution Command
      systemd systems sudo systemctl enable <service name>
      sysVinit systems sudo chkconfig <service name> on

      Check your Service’s Bound IP Address and Ports

      Your service may be listening on an unexpected port, or it may not be bound to your public IP address (or whatever address is desirable). To view which address and ports a service is bound on, run the ss command with these options:

      sudo ss -atpu
      

      Review the application’s documentation for help determining the address and port your service should bind to.

      Note

      One notable example is if a service is only bound to a public IPv4 address and not to an IPv6 address. If a user connects to your Linode over IPv6, they will not be able to access the service.

      Analyze Service Logs

      If your service doesn’t start normally, review your system logs for the service. Your system logs may be in the following locations:

      Distribution System Logs
      systemd systems Run journalctl
      Ubuntu 14.04, Debian 7 /var/log/syslog
      CentOS 6 /var/log/messages

      Your service’s log location will vary by the application, but they are often stored in /var/log. The less command is a useful tool for browsing through your logs.

      Try pasting your log messages into a search engine or searching for your messages in the Linode Community Site to see if anyone else has run into similar issues. If you don’t find any results, you can try asking about your issues in a new post on the Linode Community Site. If it becomes difficult to find a solution, you may need to rebuild your Linode.

      Review Firewall Rules

      If your service is running but your connections still fail, your firewall (which is likely implemented by the iptables software) may be blocking the connections. To review your current firewall ruleset, run:

      sudo iptables -L # displays IPv4 rules
      sudo ip6tables -L # displays IPv6 rules
      

      Note

      Your deployment may be running FirewallD or UFW, which are frontends used to more easily manage your iptables rules. Run these commands to find out if you are running either package:

      sudo ufw status
      sudo firewall-cmd --state
      

      Review How to Configure a Firewall with UFW and Introduction to FirewallD on CentOS to learn how to manage and inspect your firewall rules with those packages.

      Firewall rulesets can vary widely. Review the Control Network Traffic with iptables guide to analyze your rules and determine if they are blocking connections. For example, a rule which allows incoming HTTP traffic could look like this:

        
      -A INPUT -p tcp -m tcp --dport 80 -m conntrack --ctstate NEW -j ACCEPT
      
      

      Disable Firewall Rules

      In addition to analyzing your firewall ruleset, you can also temporarily disable your firewall to test if it is interfering with your connections. Leaving your firewall disabled increases your security risk, so we recommend re-enabling it afterward with a modified ruleset that will accept your connections. Review Control Network Traffic with iptables for help with this subject.

      1. Create a temporary backup of your current iptables:

        sudo iptables-save > ~/iptables.txt
        
      2. Set the INPUT, FORWARD and OUTPUT packet policies as ACCEPT:

        sudo iptables -P INPUT ACCEPT
        sudo iptables -P FORWARD ACCEPT
        sudo iptables -P OUTPUT ACCEPT
        
      3. Flush the nat table that is consulted when a packet that creates a new connection is encountered:

        sudo iptables -t nat -F
        
      4. Flush the mangle table that is used for specialized packet alteration:

        sudo iptables -t mangle -F
        
      5. Flush all the chains in the table:

        sudo iptables -F
        
      6. Delete every non-built-in chain in the table:

        sudo iptables -X
        
      7. Repeat these steps with the ip6tables command to flush your IPv6 rules. Be sure to assign a different name to the IPv6 rules file (e.g. ~/ip6tables.txt).

      Troubleshoot Web Servers

      If your web server is not running or if connections are timing out, review the general troubleshooting strategies.

      Note

      If your web server is responding with an error code, your troubleshooting will vary by what code is returned. For more detailed information about each request that’s failing, read your web server’s logs. Here are some commands that can help you find your web server’s logs:

      • Apache:

        grep ErrorLog -r /etc/apache2  # On Ubuntu, Debian
        grep ErrorLog -r /etc/httpd    # On CentOS, Fedora, RHEL
        
      • NGINX:

        grep error_log -r /etc/nginx
        

      Frequent Error Codes

      • HTTP 401 Unauthorized, HTTP 403 Forbidden

        The requesting user did not have sufficient permission or access to the requested URL. Review your web server authorization and access control configuration:

      • HTTP 404 Not Found

        The URL that a user requested could not be found by the web server. Review your web server configuration and make sure your website files are stored in the right location on your filesystem:

      • HTTP 500, 502, 503, 504

        The web server requested a resource from a process it depends on, but the process did not respond as expected. For example, if a database query needs to be performed for a web request, but the database isn’t running, then a 50X code will be returned. To troubleshoot these issues, investigate the service that the web server depends on.

      Troubleshoot Databases

      Is your Disk Full?

      One common reason that a database may not start is if your disk is full. To check how much disk space you are using, run:

      df -h
      

      Note

      This reported disk usage is not the same as the reported storage usage in the Linode Manager. The storage usage in the Linode Manager refers to how much of the the disk space you pay for is allocated to your Linode’s disks. The output of df -h shows how full those disks are.

      You have several options for resolving disk space issues:

      • Free up space on your disk by locating and removing files you don’t need, using a tool like ncdu.

      • If you have any unallocated space on your Linode (storage that you pay for already but which isn’t assigned to your disk), resize your disk to take advantage of the space.

      • Upgrade your Linode to a higher-tier resource plan and then resize your disk to use the newly available space. If your Linode has a pending free upgrade for your storage space, you can choose to take this free upgrade to solve the issue.

      Database Performance Troubleshooting

      If your database is running but returning slowly, research how to optimize the database software for the resources your Linode has. If you run MySQL or MariaDB, read How to Optimize MySQL Performance Using MySQLTuner.

      Find answers, ask questions, and help others.

      This guide is published under a CC BY-ND 4.0 license.



      Source link