One place for hosting & domains

      Disaster

      GratITude: Keeping Customers Online In the Face of Disaster


      Today, we’re kicking off our GratITude Series, highlighting the hard-working members of our data center and IT teams in essential roles powering a connected world. Their dedication enables our thousands of global customers to successfully operate their businesses in good times and in times of immense challenge. None of us could do what we do without them, and here we’ll share their stories.

      When Texas was hit with a winter storm this February, nobody imagined it would lead to a regional disaster. But as the power grids failed, pipes froze and burst and gas deliveries ground to a halt, thousands were left scrambling to fulfill their most basic needs. In the aftermath, officials are looking to figure out why the situation escalated to a crisis so rapidly, but for those on the ground during the storm and its aftereffects, what mattered in the moment was finding the way forward. In our Dallas flagship data center, the INAP team put in the hours to keep our customers online. What did it take to get through this crisis without one failure?

      The Dallas Data Center Team

      INAP data centers offer full redundancy and reliable network connectively, among other benefits, to guarantee uptime to our customers. To meet this guarantee, Dallas Data Center Operations Supervisor Billy Boland and Regional Infrastructure Engineer David Thornton were on-site 24/7 for several days through the storm and the aftermath to keep the Dallas flagship data center up and running. Both men credit the tenure of the majority of the Dallas team and years of cross training as two factors that saw them through the disaster without dropping a single customer.

       

      Boyd and Thornton
      Billy Boland, Data Center Operations Supervisor, and David Thornton, Regional Infrastructure Engineer, were on-site 24/7 through the storm and the aftermath to keep customers online.

      “I’m the only facility guy here in Dallas,” said Thornton. “Billy and his team, they’re the data center engineers. They work primarily with the customer, but they support me. Then on top of that, I’ve been cross training them on multiple things for years.”

      That cross training would prove key in getting through the long hours it took to weather the storm. Cross training, and years of experience that gave Boland and Thornton the problem-solving skills to get through an unthinkable crisis.

      Boland has been with INAP since 2011 and will be celebrating his 10-year-anniversary in September this year. He’s been working at the Dallas facility since it opened, starting out as a data center engineer and progressing to his current position as the data center operations supervisor. Thornton has been with INAP nearly as long, joining the Dallas team in 2012. Although he self-effacingly calls himself a “glorified maintenance man,” in his role as regional infrastructure engineer the facility is his responsibility—anything to do with the building itself, including the power, cooling, plumbing, paint, carpet, light bulbs and more.

      Prior to joining the INAP team, both men picked up expertise via differing avenues. Boland started out in telecommunications installation, which could have taken him down any number of routes in his career. As luck would have it for INAP, he went the data center engineer route and has been able stick around and move up in the company.

      Thornton began his career as an electrician in the local Dallas market before going overseas with the military. In Iraq, he learned all about critical infrastructure—how to maintain bases, how to keep equipment running and how to “MacGyver” it when needed. After traveling the world with the military for 10 years, moving through various disciplines into construction management, he settled back down in the states at the behest of his wife and started his position with INAP.

      “I wanted a nice, simple job,” he said with a laugh.

      Of course, the job became anything but simple in February.

      The Disaster, Problem Solving and All the Right Decisions

      Texas has seen its fair share of winter weather over the years. And like any data center, the Dallas team has a backup plan in place for failures, utility outages and cold weather. What was different in this case was that the power grid began to collapse at the outset of the storm. Rolling outages were the biggest issue for Thornton and Boland. Typically, they could run backup generators that function on diesel fuel, but the power outages also affected the refineries and fuel vendors. And because the cold was so unexpected, the fuel vendors didn’t have the antifreeze additive in their fuel.

      The issues compounded with gas stations running out of fuel, hotels filling up because homes didn’t have power (and then half of the hotels not having power) and food and water shortages. Once the storm passed, the region was still dealing with the aftermath for days.

      Boland surveys the exterior of the data center in the days following the storm. Despite clear skies in the aftermath, infrastructure failures impeded gas deliveries and other essential services.

      Fortunately, INAP leadership had the foresight to have Thornton and Boland book hotel rooms ahead of the storm, before space filled up. This allowed them a respite from the data center to rest, dry off and warm up from working outside in the snow and cold.

      “The hotel didn’t lose power,” Boland said. “Insanely, it didn’t lose power.”

      “Having the hotel paid off,” added Thornton. “When the winter storm hit, it was 24/7 for us for two or three days. We were dealing with one issue after the other. The power going on and off didn’t have a major impact to our critical systems because our facility is designed for that, but little things were occurring that we had to deal with.”

      Both their experience and the cross-training Thornton has done with the Dallas team also helped them overcome the difficulties they were facing. Not only was Boland able to back Thornton up in the 24/7 facility work, but they were able to make the right decisions in the moment to keep the data center running. In a rolling blackout scenario, the logical choice would be to switch to the generator full-time to keep the power flow consistent. However, with the fuel vendor issues, this was not a viable option. They realized this early on and worked to conserve the fuel they did have, rather than go on an extended generator run. Several data centers in the area did make the choice to do an extended run, however, and ended up dropping customers.

      “If we had done an extended run, we wouldn’t have made it,” Thornton said.

      Their decision kept the data center up and running. “We never dropped anybody,” Boland added.

      Dallas Flagship Snow
      The exterior of INAP’s Dallas Flagship data center in the days following in the storm.

      It’s easy to quip that “teamwork makes the dream work,” but in this case, when the going got really rough, the team was able to see things through.

      “The team being what they were, we were able to lean on each other for different things,” said Thornton, “And we were able to identify upcoming issues and adapt.”

      INAP is endlessly grateful for the Dallas team members and their resilience and problem solving in the face of this unforeseeable situation. They are an asset to the company and our customers who rely on our services to power their infrastructures.

      Laura Vietmeyer


      READ MORE



      Source link

      Backups vs. Disaster Recovery: The Ultimate Guide


      Backups or disaster recovery (DR)?

      If you’re planning a data defense strategy for your company, it’s important to understand which strategy is best for your business needs—backup or disaster recovery.

      The Difference Between Backup and Disaster Recovery

      Backup refers to the process of saving data by copying it to a safe place. Data can then be recovered in the event of infrastructure or service issues. Backups can take many forms, including duplicating data on the cloud or a secondary server in the same production data center, or saving data to a remote data center, etc.

      Disaster recovery involves a set of policies, tools and procedures to enable the recovery or continuation of vital technology infrastructure and systems following a natural or human-induced disaster. Disaster recovery focuses on IT supporting critical business functions as part of business continuity, which involves keeping all essential aspects of a business functioning despite significant disruptive events.

      While both solutions can help protect your data and critical information against unplanned disruptions and outages, sometimes backups alone aren’t enough.

      Here is a breakdown of what you can expect from backups and disaster recovery solutions, so you can ensure your business keeps running even if your primary servers go down.

      Basic Backup Solutions

      Remember back in college or high school when you had to write a big term paper or thesis and you would save your work to a jump drive or CD (yes, those used to be a thing) in case your computer crashed and you lost everything?

      You were running a basic backup of your most critical files.

      How Backups Work

      Backups work by providing quick and easy access to your data in case of smaller disruptions like outages, lost equipment, accidental deletion or hard drive crashes. Backup solutions copy your existing information to a second storage environment. You could choose to simply back up a few important files or your entire database.

      The Cons of Backup Solutions

      There are a few drawbacks to relying on backup solutions as your failsafe. Consider the college term paper example: If you have a sudden inspiration and write three more pages just to have your computer crash before saving your work to your backup source, you’ll have to start from the last moment you backed up. It’s the same with your business files—your data will only be updated to your previous backup.

      Since many companies use backup for smaller-scale outages, in many instances they will keep their backups on-site or close to their primary facility. If these companies are hit by widespread natural disasters like hurricanes or earthquakes, there’s a chance those backups could go offline as well.

      Cloud Backup Solutions

      As a response, cloud-based backup options are becoming more popular because data center providers are able to offer near real-time data replication at off-site locations. In some cases, these cloud backup solutions are more cost-effective and reliable for business needs.


      LEARN MORE

      Disaster Recovery Solutions

      For more large-scale outages, disaster recovery is your best option.

      Disaster recovery solutions cover more than just the major natural disasters that might immediately come to mind. In fact, only about 10 percent of unplanned outages are caused by weather. That’s behind system failure, cyber incidents and human error.

      Disaster recovery solutions replicate your environment, so if there is a major disruption, an automatic failover transfers the management and operation of your infrastructure to a secondary machine and site to keep your applications and business online. Your servers will then run off your disaster recovery site until your primary facility is back online and capable of resuming system functionality.

      It’s important to note that disaster recovery options come in all shapes and sizes. Synchronous solutions replicate your data in near real-time. That makes this option one of the most comprehensive, but also generally more expensive. On the other hand, asynchronous solutions have more delayed duplication, which means some of your most recent data may not be recovered.

      Important Backup and Disaster Recovery Terms

      Understanding a few essential terms can help develop your strategic decisions and enable you to better evaluate backup and disaster recovery solutions.

      • Recovery time objective (RTO) is the amount of time it takes to recover normal business operations after an outage. As you look to set your RTO, you’ll need to consider how much time you’re willing to lose—and the impact that time will have on your bottom line. The RTO might vary greatly from one type of business to another. For example, if a public library loses its catalog system, it can likely continue to function manually for a few days while the systems are restored. But if a major online retailer loses its inventory system, even 10 minutes of downtime—and the associated loss in revenue—would be unacceptable.
      • Recovery point objective (RPO) refers to the amount of data you can afford to lose in a disaster. You might need to copy data to a remote data center continuously so that an outage will not result in any data loss. Or you might decide that losing five minutes or one hour of data would be acceptable.
      • Failover is the disaster recovery process of automatically offloading tasks to backup systems in a way that is seamless to users. You might fail over from your primary data center to a secondary site, with redundant systems that are ready to take over immediately.
      • Failback is the disaster recovery process of switching back to the original systems. Once the disaster has passed and your primary data center is back up and running, you should be able to fail back seamlessly as well.
      • Restore is the process of transferring backup data to your primary system or data center. The restore process is generally considered part of backup rather than disaster recovery.

      Backups vs. Disaster Recovery: How to Choose the Best Solution for Your Business

      In some cases, just the backup is enough to protect certain parts of your business from interruptions. For example, a complete disaster recovery plan for computers or mobile devices intended for employees generally does not require a full disaster recovery solution. If an employee’s device is lost or broken, your company is unlikely to be critically affected. You can replace the device and restore your data from a backup.

      On the other hand, disaster recovery is crucial to protecting services and infrastructure that your company depends on to operate on a day-to-day basis. For example, suppose your employees’ PCs run as “thin clients” dependent on a central server to work. In that case, an interruption on that server can critically affect the business’ entire operation as it will prevent all employees from being able to use their workstations. Such an event is much more severe than an individual workstation break.

      In most cases, the best solutions involve both backups and disaster recovery.

      A solid backup plan that keeps your data accessible is helpful for minor disruptions, but without a larger, more comprehensive strategy, can cause all sorts of problems for your company. For instance, if your business collects, stores or transmits information that requires strict PCI DSS or HIPAA compliance, you will want to make sure those files are properly backed up and accessible in the event of a disaster—which might not be possible with basic backup solutions.

      Consider incorporating your basic backup under the umbrella of a larger disaster recovery strategy to ensure you’re fully protected. Third-party providers will offer cloud-based disaster recovery as a service (DRaaS) solutions that are often more cost-effective and appropriate for your business needs.

      Do your homework and determine the best strategy for your company. Because it’s not a question of if, but when you’ll need to recover from an unplanned outage.

      Explore INAP Disaster Recovery as a Service.

      LEARN MORE

      Original version published May 30, 2018

      Thiago Alcantara


      READ MORE



      Source link

      Disaster Recovery (DR) Testing: The Why, What and Who


      The importance of disaster recovery (DR) and business continuity plans can’t be overstated. Here on the ThinkIT blog, we’ve covered how to get started, the basics of making a plan, table-top exercises to help your staff test your DR plan and more. In this piece, I’ll explore the importance of DR testing—why it’s so important, what elements need to be considered and whether you should handle testing on your own or outsource it to a third-party DRaaS provider. We’ll also review the options for running a failover test.

      Why DR Testing: Imagine the Worst-Case Scenario

      Those of us in the DR business on 9/11 remember the devastating, tragic story of Cantor Fitzgerald, a financial services firm that lost 656 of their 960 employees that morning. The company occupied floors 101- 105 of the north tower at the World Trade Center. At the time, I’m sure IT disaster recovery and business continuity were not top of mind for those involved in the painful tragedy.

      From an IT and business continuity standpoint, however, Cantor Fitzgerald was able to get systems online 48 hours after the attacks. They used a DR company at that time called Comdisco and were able to get remaining employees up and running answering phones, emails and stock trading within five days of the attacks. Today, Cantor Fitzgerald is still in business with about 10,000 employees. The company also followed through on CEO Howard Lutnick’s promises that were made to families and surviving employees.

      What: DR Testing Essential Elements

      How did the Cantor Fitzgerald survive? They had a true, tested and documented DR and business continuity plan. A combination of internal experts and assets and a third-party vendor helped the company create and test a detailed, scripted recovery plan and methodology.

      Let’s briefly explore two key elements that need to be considered in DR testing processes, whether you choose to handle testing yourself, outsource it to a third-party or use a combination of both.

      Recovery Point Objectives

      Recovery Point Objectives (RPOs) preserve the company’s critical data with point-in-time backups or real-time replication to off-site media, like tape or online storage. Determine the impact of data loss in time (how long since the last good backup) and money (how many transactions and how much revenue will be lost because of it) and preserve your data accordingly. This is the “easier” element of DR testing, although nothing in IT or DR is exactly easy.

      Recovery Time Objective

      The tougher challenge in testing is determining Recovery Time Objective (RTO), which is how long will it take to restore enough functionality to keep the business running, and how quickly employees, vendors and customers can be tracked and connected in those DR systems. Once that has been established, the next important step is to determine how long it will take to get all that data back into the production environment and re-connect people once the disaster is over.

      Who: DIY and Third-Party DR Considerations

      One of the more logical IT initiatives to throw in the cloud or hand over to a third-party vendor is disaster recovery. After all, the old joke from CIOs down to IT Directors is “DR is #4 on my top three ‘To-Do List’ right now.”

      Personally, however, I would caution giving all the keys away to a third-party provider. An outside provider is less likely to have the passion for your business that a proud employee has, and they will not know the intricacies of the business, such as revenue drivers and the IT applications.

      On the other side, choosing a good third-party vendor and solution will save you time and probably money, preventing you from buying and owning two environments. A third-party DRaaS provider can also manage the infrastructure behind the scenes, allowing your team to focus on production, customers and vendors.

      My advice? Choose a solution provider that allows some co-management with your team. You should work hand in hand with your third-party DR vendor. Make them an extension of your team and manage them as you would any employee or critical application in your environment. Leverage them for what they are best at, while at the same time holding yourself and your organization accountable for your vendor’s participation in and seamless execution of your DR plan.

      Running a Failover Test

      You have many options of what and how to test. Some companies will do a full-blown failover of the entire environment, while others test subsets of their environment in a crawl, walk, run methodology. In either case, I find it most effective to work with a provider to isolate your test environment from the replication environment. That way, you can continue to replicate valuable information while you are testing. In the event of a disaster while you are testing, you will still be able to achieve your required RPO/RTOs.

      The test environment should be validated with transaction and remote connectivity from users and departments as if the production data center is no longer accessible. Here you will find (and actually may hope to find) holes in your plan and be able to document improvements and changes from your previous test.  A DR Plan is an ever improving, ever evolving, living, breathing document.

      Failing or not having a perfect DR test is not necessarily a bad thing. You are of course striving for a perfect test leading to a perfect failover in a real disaster. But the idea of testing is to find holes in your plan, update changes in your DR Plan since the previous test and continue to improve your recovery plan.

      You should definitely choose a third-party DR provider who bundles in test time—either one or two tests per year at a minimum—and who also provides documentation and runs books back to you following a test. You both need to be in sync should a disaster strike and to make the next test a success.  There is nothing worse than re-inventing the wheel with your third-party provider at the start of every new test.

      Always test, keep your critical disaster recovery systems up to date with your DR systems and test again. Disasters don’t occur very often, but when they do, the effects can be devastating. Be ready.

      Explore INAP Disaster Recovery as a Service.

      LEARN MORE

      Carleton Hall


      READ MORE



      Source link