Using AWS CloudWatch to Maximize Your Monitoring Coverage

Introduction
Establishing Goals for Monitoring
What Resources Will You Monitor?
How Often Will You Monitor Your Resources?
What Monitoring Tools Will You Use?
Who Should be Notified When Something Goes Wrong?
Summary
Next steps
References

Introduction

AWS CloudWatch is a powerful monitoring service provided by Amazon Web Services (AWS). It allows you to monitor the performance and availability of your resources, such as EC2 instances, RDS databases, and more. With CloudWatch, you can set alarms to notify you when certain thresholds are breached, and you can use its rich set of metrics and logs to gain insight into the behavior of your applications. In this article, we will explore how to use CloudWatch to maximize your monitoring coverage and ensure that you are alerted to potential issues as they arise. We will cover topics such as setting up CloudWatch alarms, analyzing log data, and integrating CloudWatch with other tools. In this article, you will get a solid understanding of how to use CloudWatch to comprehensively monitor and troubleshoot your AWS resources.

While CloudWatch implementations vary by application, properly monitoring an application via CloudWatch involves several best practices to maximize consistent success. According to Amazon, there are several basic monitoring best practices that all developers can implement, and that we will expand on in this article. Good monitoring starts by answering a few questions that are specific to your application.

What are your goals for monitoring?
- Are you detecting errors, missed application tasks, performance, or some combination of the three?
What resources will you monitor?
- What are the parts of your application that need to be observed? How do you prioritize observing parts of your application?
How often will you monitor these resources?
- Do you want to see issues in real-time, or do you set aside a specific time to review logs? How do you prioritize issues which you need to immediately fix over ones that you don’t?
What monitoring tools will you use?
- Do you want to build only with CloudWatch as a monitoring tool, or do you have other downstream services and automations that should consume CloudWatch logs and alerts in order to function?
Who will perform the monitoring tasks?
- Do you have specific support or development personnel who will monitor your logs? Are there specific applications which monitor your logs automatically?
Who should be notified when something goes wrong?
- How do you notify different people and applications based on log context and priority?

Establishing Goals for Monitoring

Depending on your application, your CloudWatch goals may differ. The first step to any form of CloudWatch success is to establish goals for what exactly you will monitor. You may find that goals fall into any of the following categories, or several:

Track application performance and/or health both instantaneously and over time.
Respond to specific expected or unexpected application states, generally using alarms.
Triggering automated actions in response to specific events or thresholds.
Collect net performance and health for convenient future analysis.
Visualizing application metrics.

Apart from these core uses, CloudWatch’s potential is nearly limitless. Before getting started with CloudWatch, it’s important to limit the scope of your CloudWatch usage as to not get overwhelmed.

What Resources Will You Monitor?

CloudWatch can be used to monitor most modules in the AWS’s cloud suite. This means that in almost any instance, you can be using CloudWatch to monitor the individual modules of your application. Often, these modules will start monitoring by default, so the principal role of the developer is to organize (usually default) monitoring procedures into a collective application model.

How Often Will You Monitor Your Resources?

Your unique needs and requirements will determine how frequently you use AWS CloudWatch to monitor your resources. You can choose to monitor your resources in real-time or at regular intervals thanks to AWS CloudWatch's alarm setting and metric reporting options.

For instance, you might want to check on your EC2 instances every minute to make sure they are functioning properly, while you might only need to check the amount of free storage space in your RDS database once an hour.

This monitoring frequency mainly depends on personal preference as informed by application design. For example, if a certain service is critical for how a web app conveys data to customers, it is beneficial to have an alarm-based system on this resource to detect failures, as well as frequent monitoring to assess non failure-inducing system stress. If a web app relies on a capacity-constrained compute instance, such is in EC2 or Lightsail, you can use CloudWatch to monitor disk utilization, memory metrics, and overall compute utilization. You can use alarms in tandem with this monitoring to respond with actions such as auto-scaling.

What Monitoring Tools Will You Use?

Oftentimes, especially if you are using an alarm-response framework to respond to application behavior, you will use external tooling to help manage your application. The simplest example of this is something like SNS, which is AWS’s SMS messaging service. Although this is built-in, it can be used to send messages out of CloudWatch to alert real people to CloudWatch activities.

Sometimes, an application may feed data to or rely on an external service. For example, some third party APIs that leverage CloudWatch analytics in AWS include:

Datadog: Allows users to track, analyze, and alert on critical metrics from their AWS infrastructure. See docs here.
New Relic: Offers real-time visibility into AWS applications and infrastructure. See docs here.
Sumo Logic: Provides an end-to-end solution for monitoring AWS applications and infrastructure. See docs here.
Splunk: Provides analytics and log-management solutions for AWS applications and services. See docs here.

Generally, attaching these services is somewhat involved, and is dependent on the type of external service. View the included documentation for each third party service for more information.

Who Should be Notified When Something Goes Wrong?

When all goes wrong and applications are unable to self-heal or safely continue running, real people may have to intervene. One of CloudWatch’s most effective use cases is its ability to alert the external world to problems. Sometimes, this may come as a result of regular operations monitoring, when developers can assess problems as they are happening. However, in the case of component failure, paring CloudWatch alarms with a messaging framework like SNS allows real people to intervene with software problems as they happen.

For an in-depth tutorial on how to use notifications with AWS CloudWatch alarms, you can read more in AWS Documentation. This tutorial covers topics such as setting up CloudWatch alarms, configuring notification types and destinations, and troubleshooting. There is also a section on setting up SNS topics and subscriptions for notifications, which is a great way to keep your team informed of any changes or issues in your AWS environment.

Summary

In this general guide, we covered a few important steps to conceptually understanding how to configure an AWS CloudWatch application.

AWS CloudWatch is a powerful monitoring service provided by Amazon Web Services (AWS) that allows users to monitor the performance and availability of their resources.

Good monitoring starts by answering a few questions that are specific to your application such as goals for monitoring, resources to monitor, frequency of monitoring, and more. Establishing goals for monitoring is the first step to any form of CloudWatch success.

CloudWatch can be used to monitor most modules in the AWS cloud suite. Your unique needs and requirements will determine how frequently you use AWS CloudWatch to monitor your resources.

You may use internal tooling to manage your application, such as SNS for SMS messaging. You may also use external tooling through available integrations for popular software. CloudWatch can be paired with a messaging framework like SNS to alert real people to CloudWatch activities.

Next steps

If you're interested in learning more about the basics of coding and software development, check out our Coding Essentials Guidebook for Developers, where we cover the essential languages, concepts, and tools that you'll need to become a professional developer.

Thanks and happy coding! We hope you enjoyed this article. If you have any questions or comments, feel free to reach out to jacob@initialcommit.io.

References

AWS Cloudwatch Documentation - https://docs.aws.amazon.com/cloudwatch/index.html
Datadog AWS Integration - https://docs.datadoghq.com/integrations/amazon_web_services/
New Relic AWS Integration - https://docs.newrelic.com/docs/infrastructure/amazon-integrations/get-started/introduction-aws-integrations/
Sumo Logic AWS Integration - https://www.sumologic.com/solutions/aws-monitoring/
Splunk AWS Integration - https://dev.splunk.com/observability/docs/integrations/aws_integration_overview/
Best practices for monitoring EC2 - https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring_best_practices.html