As a SQL DBA or a system admin of highly transactional, performance demanding SQL databases, you may often find yourself perplexed by “strange” performance issues reported by your user base. By strange, I mean any issue where you are out of ideas, having exhausted standard troubleshooting tactics and when spending money on all-flash storage is just not in the budget.
Working under pressure from customers or clients to resolve performance issues is not easy, especially when C-Level, sales and end users are breathing down your neck to solve the problem immediately. Contrary to popular belief from many end users, we all know that these types of issues are not resolved with a magic button or the flip of a switch.
But what if there was a solution that came close?
Let’s review the typical troubleshooting process, and an often-overlooked setting that may just be your new “magic button” for resolving unusual SQL server performance issues.
Resolving SQL Server Performance Issues: The Typical Process
Personally, I find troubleshooting SQL related performance issues very interesting. In my previous consulting gigs, I participated in many white boarding sessions and troubleshooting engagements as a highly paid last-resort option for many clients. When I dug into their troubleshooting process, I found a familiar set of events happening inside an IT department specific to SQL Server performance issues.
Here are the typical steps:
Review monitoring tools for CPU, RAM, IO, Blocks and so on
Start a SQL Profiler to collect possible offending queries and get a live view of the slowness
Check underlying storage for latency per IO, and possible bottle necks
Check if anyone else is running any performance intensive processes during production hours
Find possible offending queries and stop them from executing
DBAs check their SQL indexes and other settings
When nothing is found from the above process, the finger pointing starts. “It’s the query.” “No, it’s the index.” “It’s the storage.” “Nope. It’s the settings in your SQL server.” And so it goes.
An Often-Forgotten Setting to Improve SQL Server Performance
Based on the typical troubleshooting process, IT either implements a solution to prevent identical issues from coming back or hope to fix the issue by adding all flash and other expensive resources. These solutions have their place and are all equally important to consider.
There is, however, an often-forgotten setting that you should check first—the block allocation size of your NTFS partition in the Microsoft Windows Server.
The block allocation setting of the NTFS partition is set at formatting time, which happens very early in the process and is often performed by a sysadmin building the VM or bare metal server well before Microsoft SQL is installed. In my experience, this setting is left as the default (4K) during the server build process and is never looked at again.
Why is 4K a bad setting? A Microsoft SQL page is 8KB in size. With a 4K block, you are creating two IO operations for every page request. This is a big deal. The Microsoft recommended block size for SQL server is 64K. This way, the page is collected in one IO operation.
In bench tests of highly transactional databases on 64K block allocation in the NTFS partition, I frequently observe improved database performance by as much as 50 percent or more. The more IO intensive your DB is, the more this setting helps. Assuming your SQL server’s drive layout is perfect, for many “strange performance” issues, this setting was the magic button. So, if you are experiencing unexplained performance issues, this simple formatting setting maybe just what you are looking for.
A word of caution: We don’t want to confuse this NTFS block allocation with your underlying storage blocks. This storage should be set to the manufacturer’s recommended block size. For example, as of recently, Nimble storage bock allocation at 8k provided best results with medium and large database sizes. This could change depending on the storage vendor and other factors, so be sure to check this with your storage vendor prior to creating LUNs for SQL servers.
How to Check the NTFS Block Allocation Setting
Here is a simple way to check what block allocation is being used by your Window Server NTFS partition:
Open the command prompt as administrator and run the following command replacing the C: drive with a drive letter of your database data files. Repeat this step for your drives containing the logs and TempDB files:
fsutil fsinfo ntfsinfo c:
Look for the reading “Bytes Per Cluster.” If it’s set to 4096, that is the undesirable 4K setting.
The fix is easy but could be time consuming with large database sizes. If you have an AlwaysOn SQL cluster, this can be done with no downtime. If you don’t have an AlwaysOn MSSQL cluster, then a downtime window will be required. Or, perhaps it’s time to build an AlwaysOn SQL cluster and kill two birds with one stone.
To address the issue, you will want to re-format the disks containing SQL data with 64K blocks.
If your NTFS block setting is at 4K right now, moving the DB files to 64K formatted disks will immediately improve performance. Don’t wait to check into this one.
Redis is an open-source, in-memory key-value data store. It comes with several commands that can help with troubleshooting and debugging issues. Because of Redis’s nature as an in-memory key-value store, many of these commands focus on memory management, but there are others that are valuable for providing an overview of the state of your Redis server. This tutorial will provide details on how to use some of these commands to help diagnose and resolve issues you may run into as you use Redis.
How To Use This Guide This guide is written as a cheat sheet with self-contained examples. We encourage you to jump to any section that is relevant to the task you’re trying to complete.
The commands and outputs shown in this guide were tested on an Ubuntu 18.04 server running Redis version 4.0.9. To obtain a similar setup, you can follow Step 1 of our guide on How To Install and Secure Redis on Ubuntu 18.04. We will demonstrate how these commands behave by running them with redis-cli, the Redis command line interface. Note that if you’re using a different Redis interface — Redli, for example — the exact outputs of certain commands may differ.
Alternatively, you could provision a managed Redis database instance to test these commands, but note that depending on the level of control allowed by your database provider, some commands in this guide may not work as described. To provision a DigitalOcean Managed Database, follow our Managed Databases product documentation. Then, you must eitherinstall Redliorset up a TLS tunnel in order to connect to the Managed Database over TLS.
memory usage tells you how much memory is currently being used by a single key. It takes the name of a key as an argument and outputs the number of bytes it uses:
memory usage key_meaningOfLife
For a more general understanding of how your Redis server is using memory, you can run the memory stats command:
This command outputs an array of memory-related metrics and their values. The following are the metrics reported by memory stats:
peak.allocated: The peak number of bytes consumed by Redis
total.allocated: The total number of bytes allocated by Redis
startup.allocated: The initial number of bytes consumed by Redis at startup
replication.backlog: The size of the replication backlog, in bytes
clients.slaves: The total size of all replica overheads (the output and query buffers and connection contexts)
clients.normal: The total size of all client overheads
aof.buffer: The total size of the current and rewrite append-only file buffers
db.0: The overheads of the main and expiry dictionaries for each database in use on the server, reported in bytes
overhead.total: The sum of all overheads used to manage Redis’s keyspace
keys.count: The total number of keys stored in all the databases on the server
keys.bytes-per-key: The ratio of the server’s net memory usage and keys.count
dataset.bytes: The size of the dataset, in bytes
dataset.percentage: The percentage of Redis’s net memory usage taken by dataset.bytes
peak.percentage: The percentage of peak.allocated taken out of total.allocated
fragmentation: The ratio of the amount of memory currently in use divided by the physical memory Redis is actually using
memory malloc-stats provides an internal statistics report from jemalloc, the memory allocator used by Redis on Linux systems:
If it seems like you’re running into memory-related issues, but parsing the output of the previous commands proves to be unhelpful, you can try running memory doctor:
This feature will output any memory consumption issues that it can find and suggest potential solutions.
Getting General Information about Your Redis Instance
A debugging command that isn’t directly related to memory management is monitor. This command allows you to see a constant stream of every command processed by the Redis server:
This command returns a lot of information. If you only want to see one info block, you can specify it as an argument to info:
Note that the information returned by the info command will depend on which version of Redis you’re using.
Using the keys Command
The keys command is helpful in cases where you’ve forgotten the name of a key, or perhaps you’ve created one but accidentally misspelled its name. keys looks for keys that match a pattern:
The following glob-style variables are supported
? is a wildcard standing for any single character, so s?mmy matches sammy, sommy, and sqmmy
* is a wildcard that stands for any number of characters, including no characters at all, so sa*y matches sammy, say, sammmmmmy, and salmony
You can specify two or more characters that the pattern can include by wrapping them in brackets, so s[ai]mmy will match sammy and simmy, but not summy
To set a wildcard that disregards one or more letters, wrap them in brackets and precede them with a carrot (^), so s[^oi]mmy will match sammy and sxmmy, but not sommy or simmy
To set a wildcard that includes a range of letters, separate the beginning and end of the range with a hyphen and wrap it in brackets, so s[a-o]mmy will match sammy, skmmy, and sommy, but not srmmy
Warning: The Redis documentation warns that keys should almost never be used in a production environment, since it can have a major negative impact on performance.
This guide details a number of commands that are helpful for troubleshooting and resolving issues one might encounter as they work with Redis. If there are other related commands, arguments, or procedures you’d like to see outlined in this guide, please ask or make suggestions in the comments below.
For more information on Redis commands, see our tutorial series on How to Manage a Redis Database.
Use promo code DOCS10 for $10 credit on a new account.
This guide presents troubleshooting strategies for Linodes that are unresponsive to any network access. One reason that a Linode may be unresponsive is if you recently performed a distribution upgrade or other broad software updates to your Linode, as those changes can lead to unexpected problems for your core system components.
Similarly, your server may be unresponsive after maintenance was applied by Linode to your server’s host (frequently, this is correlated with software/distribution upgrades performed on your deployment prior to the host’s maintenance). This guide is designed as a useful resource for either of these scenarios.
If you can ping your Linode, but you cannot access SSH or other services, this guide will not assist with troubleshooting those services. Instead, refer to the Troubleshooting SSH or Troubleshooting Web Servers, Databases, and Other Services guides.
Where to go for help outside this guide
This guide explains how to use different troubleshooting commands on your Linode. These commands can produce diagnostic information and logs that may expose the root of your connection issues. For some specific examples of diagnostic information, this guide also explains the corresponding cause of the issue and presents solutions for it.
If the information and logs you gather do not match a solution outlined here, consider searching the Linode Community Site for posts that match your system’s symptoms. Or, post a new question in the Community Site and include your commands’ output.
Linode is not responsible for the configuration or installation of software on your Linode. Refer to Linode’s Scope of Support for a description of which issues Linode Support can help with.
Before You Begin
There are a few core troubleshooting tools you should familiarize yourself with that are used when diagnosing connection problems.
The Linode Shell (Lish)
Lish is a shell that provides access to your Linode’s serial console. Lish does not establish a network connection to your Linode, so you can use it when your networking is down or SSH is inaccessible. Much of your troubleshooting for basic connection issues will be performed from the Lish console.
To learn about Lish in more detail, and for instructions on how to connect to your Linode via Lish, review the Using the Linode Shell (Lish) guide. In particular, using your web browser is a fast and simple way to access Lish.
When your network traffic leaves your computer to your Linode, it travels through a series of routers that are administered by your internet service provider, by Linode’s transit providers, and by the various organizations that form the Internet’s backbone. It is possible to analyze the route that your traffic takes for possible service interruptions using a tool called MTR.
MTR is similar to the traceroute tool, in that it will trace and display your traffic’s route. MTR also runs several iterations of its tracing algorithm, which means that it can report statistics like average packet loss and latency over the period that the MTR test runs.
Review the installation instructions in Linode’s Diagnosing Network Issues with MTR guide and install MTR on your computer.
Is your Linode Running?
Log in to the Linode Manager and inspect the Linode’s dashboard. If the Linode is powered off, turn it on.
Inspect the Lish Console
If the Linode is listed as running in the Manager, or after you boot it from the Manager, open the Lish console and look for a login prompt. If a login prompt exists, try logging in with your root user credentials (or any other Linux user credentials that you previously created on the server).
The root user is available in Lish even if root user login is disabled in your SSH configuration.
If you can log in at the Lish console, move on to the diagnose network connection issues section of this guide.
If you see a log in prompt, but you have forgotten the credentials for your Linode, follow the instructions for resetting your root password and then attempt to log in at the Lish console again.
If you do not see a login prompt, your Linode may have issues with booting.
Troubleshoot Booting Issues
If your Linode isn’t booting normally, you will not be able to rely on the Lish console to troubleshoot your deployment directly. To continue, you will first need to reboot your Linode into Rescue Mode, which is a special recovery environment that Linode provides.
When you boot into Rescue Mode, you are booting your Linode into the Finnix recovery Linux distribution. This Finnix image includes a working network configuration, and you will be able to mount your Linode’s disks from this environment, which means that you will be able to access your files.
Review the Rescue and Rebuild guide for instructions and boot into Rescue Mode. If your Linode does not reboot into Rescue Mode successfully, please contact Linode Support.
Connect to Rescue Mode via the Lish console as you would normally. You will not be required to enter a username or password to start using the Lish console while in Rescue Mode.
Perform a File System Check
If your Linode can’t boot, then it may have experienced filesystem corruption.
Review the Rescue and Rebuild guide for instructions on running a filesystem check.
Never run a filesystem check on a disk that is mounted.
If your filesystem check reports errors that cannot be fixed, you may need to rebuild your Linode.
If the filesystem check reports errors that it has fixed, try rebooting your Linode under your normal configuration profile. After you reboot, you may find that your connection issues are resolved. If you still cannot connect as normal, restart the troubleshooting process from the beginning of this guide.
If the filesystem check does not report any errors, there may be another reason for your booting issues. Continue to inspecting your system and kernel logs.
Inspect System and Kernel Logs
In addition to being able to mount your Linode’s disks, you can also change root (sometimes abbreviated as chroot) within Rescue Mode. Chrooting will make Rescue Mode’s working environment emulate your normal Linux distribution. This means your files and logs will appear where you normally expect them, and you will be able to work with tools like your standard package manager and other system utilities.
To proceed, review the Rescue and Rebuild guide’s instructions on changing root. Once you have chrooted, you can then investigate your Linode’s logs for messages that may describe the cause of your booting issues.
In systemd Linux distributions (like Debian 8+, Ubuntu 16.04+, CentOS 7+, and recent releases of Arch), you can run the journalctl command to view system and kernel logs. In these and other distributions, you may also find system log messages in the following files:
You can use the less command to review the contents of these files (e.g. less /var/log/syslog). Try pasting your log messages into a search engine or searching in the Linode Community Site to see if anyone else has run into similar issues. If you don’t find any results, you can try asking about your issues in a new post on the Linode Community Site. If it becomes difficult to find a solution, you may need to rebuild your Linode.
Quick Tip for Ubuntu and Debian Systems
After you have chrooted inside Rescue Mode, the following command may help with issues related to your package manager’s configuration:
dpkg --configure -a
After running this command, try rebooting your Linode into your normal configuration profile. If your issues persist, you may need to investigate and research your system logs further, or consider rebuilding your Linode.
Diagnose Network Connection Issues
If you can boot your Linode normally and access the Lish console, you can continue investigating network issues. Networking issues may have two causes:
There may be a network routing problem between you and your Linode, or:
If the traffic is properly routed, your Linode’s network configuration may be malfunctioning.
Check for Network Route Problems
To diagnose routing problems, run and analyze an MTR report from your computer to your Linode. For instructions on how to use MTR, review Linode’s MTR guide. It is useful to run your MTR report for 100 cycles in order to get a good sample size (note that running a report with this many cycles will take more time to complete). This recommended command includes other helpful options:
mtr -rwbzc 100 -i 0.2 -rw 198.51.100.0 <Linode's IP address>
Once you have generated this report, compare it with the following example scenarios.
If you are located in China, and the output of your MTR report shows high packet loss or an improperly configured router, then your IP address may have been blacklisted by the GFW (Great Firewall of China). Linode is not able to change your IP address if it has been blacklisted by the GFW. If you have this issue, review this community post for troubleshooting help.
This example report shows high persistent packet loss starting mid-way through the route at hop 3, which indicates an issue with the router at hop 3. If your report looks like this, open a support ticket with your MTR results for further troubleshooting assistance.
If your route only shows packet loss at certain routers, and not through to the end of the route, then it is likely that those routers are purposefully limiting ICMP responses. This is generally not a problem for your connection. Linode’s MTR guide provides more context for packet loss issues.
If your report resembles the example, open a support ticket with your MTR results for further troubleshooting assistance. Also, consult Linode’s MTR guide for more context on packet loss issues.
If your report shows question marks instead of the hostnames (or IP addresses) of the routers, and if these question marks persist to the end of the route, then the report indicates an improperly configured router. If your report looks like this, open a support ticket with your MTR results for further troubleshooting assistance.
If your route only shows question marks for certain routers, and not through to the end of the route, then it is likely that those routers are purposefully blocking ICMP responses. This is generally not a problem for your connection. Linode’s MTR guide provides more information about router configuration issues.
If your report shows no packet loss or low packet loss (or non-persistent packet loss isolated to certain routers) until the end of the route, and 100% loss at your Linode, then the report indicates that your Linode’s network interface is not configured correctly. If your report looks like this, move down to confirming network configuration issues from Rescue Mode.
If your report does not look like any of the previous examples, read through the MTR guide for other potential scenarios.
Confirm Network Configuration Issues from Rescue Mode
If your MTR indicates a configuration issue within your Linode, you can confirm the problem by using Rescue Mode:
Reboot your Linode into Rescue Mode.
Run another MTR report from your computer to your Linode’s IP address.
As noted earlier, Rescue Mode boots with a working network configuration. If your new MTR report does not show the same packet loss that it did before, this result confirms that your deployment’s network configuration needs to be fixed. Continue to troubleshooting network configuration issues.
If your new MTR report still shows the same packet loss at your Linode, this result indicates issues outside of your configuration. Open a support ticket with your MTR results for further troubleshooting assistance.
Open a Support Ticket with your MTR Results
Before opening a support ticket, you should also generate a reverse MTR report. The MTR tool is run from your Linode and targets your machine’s IP address on your local network, whether you’re on your home LAN, for example, or public WiFi. To run an MTR from your Linode, log in to your Lish console. To find your local IP, visit a website like https://www.whatismyip.com/.
Once you have generated your original MTR and your reverse MTR, open a Linode support ticket, and include your reports and a description of the troubleshooting you’ve performed so far. Linode Support will try to help further diagnose the routing issue.
Troubleshoot Network Configuration Issues
If you have determined that your network configuration is the cause of the problem, review the following troubleshooting suggestions. If you make any changes in an attempt to fix the issue, you can test those changes with these steps:
Run another MTR report (or ping the Linode) from your computer to your Linode’s IP.
If the report shows no packet loss but you still can’t access SSH or other services, this result indicates that your network connection is up again, but the other services are still down. Move onto troubleshooting SSH or troubleshooting other services.
If the report still shows the same packet loss, review the remaining troubleshooting suggestions in this section.
If the recommendations in this section do not resolve your issue, try pasting your diagnostic commands’ output into a search engine or searching for your output in the Linode Community Site to see if anyone else has run into similar issues. If you don’t find any results, you can try asking about your issues in a new post on the Linode Community Site. If it becomes difficult to find a solution, you may need to rebuild your Linode.
Try Enabling Network Helper
A quick fix may be to enable Linode’s Network Helper tool. Network Helper will attempt to generate the appropriate static networking configuration for your Linux distribution. After you enable Network Helper, reboot your Linode for the changes to take effect. If Network Helper was already enabled, continue to the remaining troubleshooting suggestions in this section.
Did You Upgrade to Ubuntu 18.04+ From an Earlier Version?
If you performed an inline upgrade from an earlier version of Ubuntu to Ubuntu 18.04+, you may need to enable the systemd-networkd service:
sudo systemctl enable systemd-networkd
Afterwards, reboot your Linode.
Run Diagnostic Commands
To collect more information about your network configuration, collect output from the diagnostic commands appropriate for your distribution:
Network diagnostic commands
Debian 7, Ubuntu 14.04
sudo service network status
sudo ifdown eth0 && sudo ifup eth0
Debian 8 and 9, Ubuntu 16.04
sudo systemctl status networking.service -l
sudo journalctl -u networking --no-pager | tail -20
sudo ifdown eth0 && sudo ifup eth0
sudo networkctl status
sudo systemctl status systemd-networkd -l
sudo journalctl -u systemd-networkd --no-pager | tail -20
sudo netplan apply
sudo systemctl status systemd-networkd -l
sudo journalctl -u systemd-networkd --no-pager | tail -20
sudo service network status
sudo ifdown eth0 && sudo ifup eth0
CentOS 7, Fedora
sudo systemctl status NetworkManager -l
sudo journalctl -u NetworkManager --no-pager | tail -20
sudo ifdown eth0 && sudo ifup eth0
Inspect Error Messages
Your commands’ output may show error messages, including generic errors like Failed to start Raise network interfaces. There may also be more specific errors that appear. Two common errors that can appear are related to Sendmail and iptables:
If you find a message similar to the following, it is likely that a broken Sendmail update is at fault:
/etc/network/if-up.d/sendmail: 44: .: Can't open /usr/share/sendmail/dynamic run-parts: /etc/network/if-up.d/sendmail exited with return code 2
The Sendmail issue can usually be resolved by running the following command and restarting your Linode:
sudo mv /etc/network/if-up.d/sendmail ~
ifdown -a && ifup -a
Read more about the Sendmail bug here.
Malformed rules in your iptables ruleset can sometimes cause issues for your network scripts. An error similar to the following can appear in your logs if this is the case:
Apr 06 01:03:17 xlauncher ifup: run-parts: failed to exec /etc/network/if- Apr 06 01:03:17 xlauncher ifup: run-parts: /etc/network/if-up.d/iptables e
Run the following command and restart your Linode to resolve this issue:
sudo mv /etc/network/if-up.d/iptables ~
Please note that your firewall will be down at this point, so you will need to re-enable it manually. Review the Control Network Traffic with iptables guide for help with managing iptables.
Was your Interface Renamed?
In your commands’ output, you might notice that your eth0 interface is missing and replaced with another name (for example, ensp or ensp0). This behavior can be caused by systemd’s Predictable Network Interface Names feature.
Disable the use of Predictable Network Interface Names with these commands:
Reboot your Linode for the changes to take effect.
Review Firewall Rules
If your interface is up but your networking is still down, your firewall (which is likely implemented by the iptables software) may be blocking all connections, including basic ping requests. To review your current firewall ruleset, run:
Your deployment may be running FirewallD or UFW, which are frontend software packages used to more easily manage your iptables rules. Run these commands to find out if you are running either package:
sudo ufw status
sudo firewall-cmd --state
Review How to Configure a Firewall with UFW and Introduction to FirewallD on CentOS to learn how to manage and inspect your firewall rules with those packages.
Firewall rulesets can vary widely. Review our Control Network Traffic with iptables guide to analyze your rules and determine if they are blocking connections.
Disable Firewall Rules
In addition to analyzing your firewall ruleset, you can also temporarily disable your firewall to test if it is interfering with your connections. Leaving your firewall disabled increases your security risk, so we recommend re-enabling it afterwards with a modified ruleset that will accept your connections. Review Control Network Traffic with iptables for help with this subject.
Create a temporary backup of your current iptables:
sudo iptables-save > ~/iptables.txt
Set the INPUT, FORWARD and OUTPUT packet policies as ACCEPT: