Why do so many monitoring deployments fail?

By John Richardson
3/25/2013

Monitoring is more important than ever in this era of virtualization and cloud. Good performance data is critical to getting the most leverage from your servers while maintaining stability, and it’s getting harder than ever to work without performance data. As you build out your private cloud, you’ll find monitoring critical to measuring and reporting against SLAs that are built into your service catalog.

Yet monitoring tools have their own wing in the Hall of Shelfware. “We’re rolling out [product name]” often means that the tool was purchased some time ago, can report up/down only, and getting to the next level may not be a top priority. It’s a common theme we hear.

You were promised a tool that provided pre-failure alerts, captured IP addresses for your CMDB, and kept performance data that could be used to optimize your environment and troubleshoot issues. Yet all you got were alerts telling you that a server is down. As one executive confided, “I don’t need a tool to tell me a server’s down – if anyone cares, my phone rings to let me know!”

Successful monitoring is challenging for both technical and business reasons.

From a technical perspective, the biggest issue revolves around tuning the alerts. Every monitored device can generate multiple informational messages every minute that your monitoring tool can pick up and evaluate for action. Tuning is figuring out:

  • which messages to pay attention to
  • at which point those messages should generate an alert
  • what level alert should be generated
  • any automated responses to the condition that should be employed
  • who should be notified

Typical implementations start with way too many alerts being generated, and often have most (if not all) of them set as priority alerts. In this scenario, the most critical alerts are buried by informational noise, and soon all the alerts are ignored. That’s usually when all but the failure alerts are suppressed, leaving you with up/down monitoring.

Since the alert’s priority level and notification rules require the business to provide input, the person tuning will need to both be very strong technically and able to communicate at a practical level with business users. If no one on your team has experience successfully deploying monitoring tools, you begin to see why the failure rates on these deployments are so high.

Finding people who already have this experience is very hard, and keeping them is harder. While these people likely enjoy deploying the tools, it’s not easy to find work to keep them happy and busy on a full time basis once the tool is in place.

managed monitoring service is ideal for resolving this dilemma. For a monthly fee, you get the monitoring platform delivered from the cloud, Severity 1 alert notification, and standardized SLA reporting on your infrastructure. The monthly fee also includes the tuning expertise, and the ability to customize reporting to meet specific requirements. With a managed monitoring service, you get the data you want without having to buy, implement, maintain, and tune. Problem solved.