Locking Down the Cloud Part 4: Monitoring the Cloud

December 5, 2013 No Comments

If you’ve been paying attention to this series these past few months, you know by now how to lock down the cloud. In parts I, II and III, we talked about the importance of performance, compliance and security in a successful cloud infrastructure. We looked at best practices to secure your data and provide the speed, scalability and uptime customers want; we also discussed ways to avoid a compliance disaster and the ramifications that come with it, like brand damage and high fines from regulations such as the Payment Card Industry (PCI) and Health Insurance Portability and Accountability Act (HIPAA).

Now it’s time for the final installment: operational excellence. You’ve built and secured your cloud, which is great – but keeping it running at peak efficiency requires its own set of skills. To maintain a cloud environment that stays secure and compliant while adapting to your changing needs, you must stay attentive in the below four areas.

Disaster Prevention

Catastrophes can strike anyone – and whether you’re dealing with a malicious attack or a natural disaster, an outage can wreak long-lasting devastation. Maintaining uptime and reliability in the face of a failure is critical.

If you don’t have a solid disaster prevention plan in place, now is the time to create one.

The first step is evaluating your Recovery Time Objective (RTO) and tolerance for downtime. If your business can handle a downtime of 24 hours or more, basic data backups might be the right solution for you, but if you need a full restoration of services within minutes or hours, look into a mirror site – a replicated production environment with the resources scaled down to minimal levels, hosted at a remote facility with a mechanism to provide active failover. If you require little-to-no downtime, you’ll need a stronger plan with maximum failure resiliency and additional benefits such as higher capacity, Geo-Load balancing and fault tolerance. A typical infrastructure would include two or more production environments located in isolated datacenters, with full data replication from files to databases.

Compliance Monitoring

If there’s one overarching rule when it comes to compliance, it’s this: expect change. Your company will evolve over time, industry regulations like PCI or HIPAA will change and that means your compliance practices will need to change as well. So if you’ve been thinking of compliance as a one-time accomplishment, start thinking of it as an ongoing process that requires continuous monitoring and attention.

The first step: assessing your controls and choosing the appropriate monitoring interval for each one. Some examples follow but remember, not all controls are created equal and each needs to have its own monitoring interval determined.

– Log reviews. Generally these should be reviewed daily to spot potential trouble and resolve issues before they get serious.

– Patching. This should begin 30 days upon release and should be done on a monthly basis for all applications and plug-ins.

– Malware scans. Real-time alerting and reporting are critical here – but your team will have to decide what constitutes “real-time” for your business.

– Access reviews. These are often overlooked, but it’s recommended to assess privileged accounts monthly and all others on a quarterly basis.

– Vulnerability scanning should be done monthly to check that your vulnerability management program is working optimally.

If this sounds like a lot of work, know that tools are available to monitor these controls. Centralize the results in a secure system and use them to build a dashboard that tracks compliance for you. Don’t forget to keep your compliance documentation updated, to prove that your security controls satisfy your industry requirements – and be sure that any third party providers are staying up to date on your compliance needs and documentation too.

Security Monitoring

A layered security system uses many tools to serve many functions, which means you must carefully monitor all of them to make sure they’re successful in thwarting attacks. Be thorough and diligent in your efforts; look at all of your security devices, appliances and software and create alerts for any anomalies, slowdowns or degradations. This includes high CPU utilization, disks getting too full, key processes becoming hung or memory issues. Any of those can indicate a problem.

Daily security system reports and alerts should be monitored, with particular attention paid to unsuccessful login attempts and administrator changes affecting access control and configurations. Your logging system should involve multiple sources to correlate any changes to identify potential attacks and operational issues. Remember that security is a team effort; not only should your InfoSec team have oversight of all operational processes, but your staff should be trained on proper use of the monitoring tools to reduce troubleshooting time and increase the effectiveness of the tools.

Performance Monitoring

To keep your cloud at peak performance, you’ll want to keep a close eye on operational and resource utilizations. This means that all infrastructure systems should be monitored through alerts configured to warn of impending issues such as processor, memory and disk usage, disk and network I/O, application performance, performance of key processes and other parameters.

To realistically evaluate your cloud’s performance, you’ll want to calculate those thresholds based on vendor-provided maximum data, then fine-tune them over time, according to your observations. These thresholds will be the guidelines for your monitoring system’s alerts and triggers. Logs should be collected and correlated to provide necessary information for troubleshooting, while your staff should be trained and ready to eliminate any performance obstacles detected.

And there you have it – the secrets for successfully maintaining and monitoring a high-performing cloud. When it comes to a thriving virtual infrastructure, performance, security, compliance and monitoring work hand in hand. Build your cloud thoroughly, manage it carefully, and you’ll have a smooth and protected cloud environment that performs beautifully.

Kurt Hagerman, Director of Information Security As the director of information security at FireHost, Kurt Hagerman oversees all compliance-related and security initiatives. Hagerman is responsible for helping FireHost with the attainment of ISO, PCI, HIPAA and other certifications, which allows FireHost customers to more easily achieve the necessary compliances for their own businesses. His position further includes merging information security and compliance into one organization, and enacting a strong security program where levels of compliance are by-products.