There are many considerations to take into account when implementing a Microsoft SQL Server in a private cloud environment. Today’s SQL dependent applications have different performance and high availability (HA) requirements. As a solutions architect, my goal is to provide our customers the best performing, highly available designs while managing budgetary concerns, scalability, supportability and total cost of ownership. Like so many tasks in IT infrastructure strategy, success is all about planning. There are many moving pieces and balancing everything to reach our goal becomes a challenge if we don’t ask the right questions up front.
In this two-part series, I’ll share my approach to scoping, sizing and designing private cloud infrastructure capable of migrating or standing up new Microsoft SQL Server environments. In part one, I’ll identify performance considerations and provide real world examples to make sure that your SQL Server environment is ready to meet application, growth and DR requirements of your organization. In part two, I’ll focus on SQL Server deployment options with high availability.
If you need a review on SQL server basics before we dive into the private cloud design, you can brush up here.
Here’s what we’ll cover in this post, with links if you’d prefer to jump ahead:
SQL Server Performance Considerations: RAM, IOPS, CPU and More
What follows are the basic performance considerations to take into account when designing a Microsoft SQL Server environment.
RAM—These requirements are based on database size and developer recommendations. Ideally, you’ll have enough RAM to put the entire database into RAM. However, that’s not always possible with large DB sizes. RAM is delicious to SQL and the server will eat it up, so be generous if your budget allows. Leave 20 percent of RAM reserved on the server for OS and other services.
IOPS—The SQL Server is mostly a read/write machine, and its performance is dependent on disk IO and storage latency. With SSDs becoming more affordable, many SQL DBAs now prefer to run on SSDs due to high IOPS provided by SSD and NVMe drives. In the past, many 15K SAS disks in RAID10 were the norm for data volumes.
Low latency is essential to SQL performance. By today’s standards, keeping latency below 5ms per IO is the norm. Sub 1ms latency is very common with local SSD storage. However, 10ms is still a good response time for most medium performance applications.
CPU—Bare metal CPU allocation is easy. Your server has a number of CPUs and all of them can be allocated to SQL with no negative considerations, other than licensing costs. However, allocating CPU to a SQL Server in a VMWare private cloud environment should be done with caution. Licensing is based on CPU cores. Too much assigned, but unused, vCPU GHz in a VMWare VM can negatively impact performance. Rightsizing is key.
The SQL Server is not normally a CPU hog. Unexpected and prolonged high CPU usage during production time means something is not right with the database or SQL code. Adding CPU in those cases may not solve the issue. These unexpected conditions should be checked by SQL DBAs and developers. Higher than normal CPU utilization is to be expected during maintenance time. In instances where a database may require high CPU utilization as a norm, a developer or SQL DBA should check the database for missing indexes or other issues before adding more vCPUs to virtual servers.
SQL as a VM
Based on the above considerations, when designing a SQL Server environment, budgetary and licensing considerations may start leaning your design toward SQL as a VM on a private cloud with a restricted number of assigned vCPUs. Running SQL on a private cloud is a great way to save on licensing costs. It’s also a good performer for most application loads, and because SQL is more of an IO-dependent system, its performance will greatly depend on the latency and IO availability of your storage system.
Storage Systems IO Availability
SAN storage will normally provide great amounts of IO based on the amount of disks and disk types on the SAN. The norm to expect from many SAN systems is 5ms to 15ms of latency per IO. When your application desires sub 1ms latency, such as gaming, financial or ad-tech systems, a local SSD RAID10 disk array will save the day. These local disk arrays are easy to set up and use with both VMs and bare metal server implementations. You may ask, “What about the HA and extra redundancy features of using a SAN instead of local disk?” I will be discussing High Availability in a performance-demanding environment in Part 2: Deploying Microsoft SQL Servers in a Private Cloud with High Availability. Check back soon.
In the end, your application’s performance requirements and budget will drive your decision whether to run a private cloud or a bare metal SQL deployment. Both VM and bare metal deployments can be configured with high availability and will easily integrate into your private cloud environments.
With these performance considerations in mind, let’s discuss other scoping and sizing matrices that we’ll need to design our high-performing and future-proofed SQL deployment.
Scoping and Sizing for Microsoft SQL Deployments in a Private Cloud
As a Solution Engineer at INAP, I collect numerous measurements during the SQL scoping process. Before delving into numbers, I start by evaluating pain points clients may have with their current SQL Servers. I want to know what hurts.
Asking these and other questions helps me understand how to resolve issues and future proof your next SQL deployment:
- Is it performing to your expectation?
- Are you meeting your SQL database maintenance time window?
- Do your users complain that their SQL dependent application freezes sometimes for no reason?
- When was the last time you restored a database from your backup and how long did that take?
- What’s your failover plan in case your production database server dies?
- What backup software is being used for SQL?
- Are your databases running in simple or full restore mode?
Once I know the exact problem, or if I’m designing for a new application, I start by collecting specific technical details. The most common specific measurements and requirements collected during the scoping phase are:
- DB sizes for all databases per SQL instance
- Instances (Is there more than one SQL instance, and why?)
- Growth requirements (Usually measured in daily change rate to help with other considerations like backups and replication for DR purposes)
- RAM requirements & CPU requirements
- Performance requirements, such as latency/ IO and IOPS requirements
- HA requirements (Can your customer base wait for you to restore your SQL Server in case of an outage? How long does it take to restore?)
- Regulatory/Compliance requirements such as HIPAA and PCI
- Maintenance schedule (Do the maintenance jobs complete on time every day)
- Replication requirements
- Reporting requirements (Does this deployment need a reporting server so as to not interrupt production workloads)
- Backup (What is used today to backup production data? Has it been tested? What are the challenges?)
In environments requiring high performance database response times, we utilize native Windows Server tools, such as perfmon, to collect very detailed performance matrixes from exiting SQL Servers to identify performance bottlenecks and other considerations that help us resolve these issues in current or future database deployments. Because a good SQL Server’s performance is heavily dependent on the disk sub-systems, we will go deeper in disk design recommendations for private clouds, VMs and bare metal deployments.
SQL Server Disk Layout for Performance, HA and DR
Separating database files into different disks is a best practice. It helps performance and helps your DBA easily identify performance issues when troubleshooting. For example, you could have a runaway query beating up your TempDB. If your TempDB is on a separate disk from your prod database files, you can easily identify that the TempDB disk is being thrashed and that same TempDB issue is not stepping on other workloads in your environments.
For high performance requirements, SSDs are recommended. The following is a basic disk layout for performance:
- Data (MDF, NDF files): Fast read/write disk, many drives in an array preferred for best performance
- Index (NDF files, not often used): Fast read/write disk, many drives in an array preferred for best performance
- Log (LDF files): Fast write performance, not much reading happens here
- TempDB (Temp database to crunch numbers and formulas): Should be a fast disk. SSD is preferred. Do not combine with other data or log files.
- Page File (NTFS): Keep this on a separate disk LUN or Array if possible. C: drive is not a good place for a page file on a SQL Server.
In a SQL deployment that requires DR, the above disk layout will allow you to granularly replicate just the SQL data and the OS that needs to be replicated, leaving TempDB and pagefile out of the replication design since those files are reset during reboot. TempDB and pagefile also produce lots of noisy IO. Replicating those files will result in unnecessarily heavy bandwidth and disk utilization.
Check back soon for Part 2: Deploying Microsoft SQL Servers in a Private Cloud with High Availability