re:invent 2024: What's New in CloudWatch โœจ


Hey Reader ๐Ÿ‘‹๐Ÿฝ

re:invent happened already two weeks ago and there were some amazing launches ๐Ÿ‘€

CloudWatch got a lot of love at that re:invent. This is why we are showing you our top CloudWatch launches for this year. We've worked through all of them, tried to get them working with our example application of the CloudWatch Book, and are now busy updating the book โœ๐Ÿฝ.

Let's dive into CloudWatch.

TLDR;

The launches were categorized into 5 main topics:

  1. More Coverage: Database Insights, Container enhanced visibility, new metrics
  2. Easier Correlation: Unified Navigation
  3. Less silos, more analytics: OpenSearch Integration
  4. Deeper distributed tracing: Transaction Search
  5. Aided investigations: AI Investigations

Infographic

Tired of reading the whole email? Check out the infographic.

โ€‹

CloudWatch Unified Navigation (easier correlation)

Letโ€™s start with something very cool, the CloudWatch Unified Navigation.

This feature aims to integrate CloudWatch into almost every service pane available on AWS. It is basically a new sidebar that you can trigger.

You will mostly see this feature with the explore-related button's name (naming is hard, yes).

The new feature should help you find things that belong together. Often you will find yourself looking at certain traces and you know that something else belongs to that as well. E.g. another trace, log, or metrics. This is what this is meant for.

Finding this feature was harder than I thought. In the documentation, it states that it is available on different pages of CloudWatch. In the launch session, there was also a compass icon with the name โ€œexplore relatedโ€ available. Somehow, that wasnโ€™t the case for me.

You need to look for it in the top right corner. It is not the compass icon described in the documentation ๐Ÿคท๐Ÿฝโ€โ™‚๏ธ but it is a laptop with a wrench - I already submitted feedback.

โ€‹

The pages you can access it from:

  • CloudWatch Metrics (navigation, legend, data points)
  • Console toolbar
  • In different services (e.g. Lambda โ†’ Monitoring โ†’ โ€ฆ โ†’ Explore related)

Once you open up this pane, you will see additional information. This is quite neat! First of all the tracing overview page got a nice overhaul. Letโ€™s hope this comes to the general trace map as well.

From this pane, you can see all related metrics, logs, and traces. You can also go further by clicking on the connected resources. For example, on another service or API that is used from these services. Then you can see the metrics, logs, and traces of this resource.

For everybody who knows how hard it can be to even find the correct log group name, this can be a lifesaver.

โ€‹Here is a list of supported services within the explore-related page. For some services that are mentioned, it somehow doesnโ€™t work anyway. For example, for our Step Function.

Overall, a very cool feature in our opinion. Especially, to find fast-related logs, traces, and components.

โ€‹

Logs Insights News (less silos, more analytics)

We love logs insights. And if you use CloudWatch as your main observability solution, you will use logs insights daily. There were a couple of launches for Logs Insights itself. Iโ€™ll summarize them here.

New Languages to analyze logs - SQL and PPL

You can now use two more languages to analyze logs. Piped Processing Language (PPL) and SQL.

PPL follows a typical Pipe approach like youโ€™re used to it in Linux:

And SQL, well is SQL.

In SQL you can use cool SQL functions like

  • join
  • aggregations

and all the other stuff SQL has to offer ๐Ÿ˜‰

Here, for example, we join the logs of a Lambda Log Group with API Access logs on the requestId.

10,000 Log Groups

There was a limitation of having 50 log groups in one query. This was changed if you search for log groups by a prefix or use all log groups available

โ€‹

Field Indexes

You can now also index fields of logs that you are analyzing. This will improve the performance of queries and hence reduce the costs.

For example, here Iโ€™ve created a new index on all our Lambda log groups (/aws/lambda/dev prefix) on the request ID in our correlation IDs.

OpenSearch โค๏ธ CloudWatch (less silos, more analytics)

โ€‹OpenSearch now natively integrates with CloudWatch. You can create dashboards for some pre-defined use cases like:

  • VPC Flow Logs
  • CloudTrail Logs
  • WAF Logs

The idea is quite cool. You can use it everywhere where you can use OpenSearch Direct Query. This is kind of a serverless variant of OpenSearch. You only pay for the usage (but not too little).

Their pricing still seems a bit harsh and hard to calculate. Here is a pricing example from their landing page:

The total monthly charges = $732
$3 (Direct Query OCU)$350 (Serverless Indexing)$29 (Serverless Storage)$350 (Serverless Search)

This is with a monthly ingest of over 1 TB!

Great feature, especially for getting an ELK stack-like experience. Letโ€™s see if we can build dashboards ourselves soon without the need to use a pre-defined dashboard.

Transaction Search (deeper, distributed tracing)

Transaction search is another very interesting piece! Once you enable it it will transform your X-Ray traces into Open Telemetry spans. These spans help you gain visibility into your application.

For us, this simply looks like distributed tracing for now. But maybe this is the way of AWS to support more Open Telemetry instead of only supporting X-Ray. Maybe this will even replace X-Ray at some point? ๐Ÿค”

โ€‹

Weโ€™ve enabled transaction search for our GitHub repository tracker (our example CloudWatch Book application) and got a few spans:

โ€‹

Once you open one of those you will be redirected to the actual X-Ray trace.

You can also do some basic aggregations:

โ€‹

But for us some services are missing, so that needs to be further investigated.

Application Signals

With this one, we needed to think first. Because Application Signals already exist as a category of services.

โ€‹

Services like Evidently (RIP), RUM, and Synthetics fall into the category of Application Signals. However, this launch also describes the service or feature of Application Signals. Yes, naming things is hard. This feature already existed and was launched last year at re:invent.

Application Signals wants to give you an overall view of your application and give you the whole visibility. The launch post promises three main features for developers

  1. Developers can answer any question related to performance through an interactive visual editor
  2. Developers can diagnose rarely occurring issues
  3. Logs offer advanced features for transaction spans

With Application Signals, you can also define Service Level Objectives (SLO). These can help you understand if you meet the goals youโ€™ve set for yourself or not. These can for example be availability, latency, errors, etc.

Application Signals are there for whole services. You can enable it for:

  • ECS
  • EKS
  • Lambda

But you can also enable it for everything that the CloudWatch agent can run on. You need to enable them by installing the CloudWatch Agent or AWS Distro for OpenTelemetry.

โ€‹

Weโ€™ve activated Transaction Search for our example web application for the CloudWatch Book and an Application Signal Service was automatically created as well:

โ€‹

The canaries (we have one) are not connected yet, but we already get an overview like that.

If you want to learn more about Application Signals, make sure to check out the amazing One Observability workshop.

X-Ray to OTEL

We think one main insight into all of these launches is that AWS supports more and more OpenTelemetry now! It seems that AWS is basing its new services on OTEL data spans instead of their format. This is quite cool because it allows you to use third-party software for traces as well.

AI Investigation

Investigations is the first ๐Ÿ‘†๐Ÿฝ AI feature of CloudWatch in this re:invent. The idea is to help you debug and investigate any issues you have. You can connect it with your chat applications via connecting it to SNS. And it also allows you to connect your ticketing system like Linear, Jira, or whatever you use.

You can trigger a sample investigation to get an idea how what it looks like:

โ€‹

There are different panes you can see:

  • Feed: The feed is the overview you are often used to in a ticketing system. You can see what youโ€™re other developers posted to this investigation.
  • Suggestions: Suggestions are auto-generated by Q. It looks at recent deployments, configs, and much more to give you an idea of how you can improve. This looks quite nice!

Overall, the idea is amazing. It hardly depends on how well it will work. Iโ€™m amazed by it and will make use of it. Letโ€™s see how good it will work in a production app with lots of traffic!

โ€‹

Auditing Tracing Configuration

CloudWatch gives you a new overview of your tracing settings. You can turn it on for your whole account or organization. Once activated it will search for resources in your account.

It then shows you an overview of activated traces of the following resource types:

  • EC2 Instances
  • VPCs
  • Lambda Functions

The idea here is to give you an overview of all the different tracing settings within your infrastructure. You donโ€™t want to miss traces of a crucial application. Especially, since for the OTEL spans they clearly recommend to sample 100% of your traces, this will help you with that!

โ€‹

Unfortunately, for our accounts, it didnโ€™t work yet and we couldnโ€™t find any resources.

Synthetics

Synthetics also got two minor updates. With Synthetics you can build E2E web tests. Typically, you use a headless browser for that. That is a browser that you can control from code. There is now a new runtime, playwright for that. This is quite nice! What comes with that as well is that you can store your logs directly in CloudWatch instead of storing them as text files in S3. Thatโ€™s quite cool!

Synthetics will now also finally delete Lambda resources when canaries are removed. This was quite a hassle always if youโ€™ve removed a canary you needed to remove the CloudWatch Log Group, Lambda, and everything yourself. This should now be automated!

New Metrics (more coverage)

CloudWatch announced several new metrics for some services.

โ€‹Event Source Mapping Metrics for Lambdaโ€‹

There are now metrics available for the actual event source mapping (ESM) in Lambda. This is quite useful. If you connect SQS with a Lambda, for example, the main magic happens within the event source mapping. Until now this was kind of a black box. Now you can see metrics like

  • PolledEventCount (events read by ESM)
  • InvokedEventCount (events invoking Lambda function)
  • FilteredOutEventCount (events filtered out)
  • FailedInvokeEventCount (events failing to invoke)

โ€‹ECS Container Insights enhanced observabilityโ€‹

ECS now has an additional mode called enhanced observability. Before it was only called ECS Container Insights and the enhanced observability bit gives you some more metrics.

You can set it up very easily: aws ecs put-account-setting --name containerInsights --value enhanced

Some more metrics are:

  • ContainerMemoryUtilization
  • ContainerCpuUtilization
  • ContainerCpuReserved

โ€‹Database Insightsโ€‹

โ€‹

Database Insights gives you more insights into your database (๐Ÿฅ). Only Aurora MySQL and Aurora PostgreSQL are supported right now. It will mainly summarize logs and metrics from your DB in a dashboard.

There are two modes: Standard and Advanced.

โ€‹

โ€‹Network Flow Monitoringโ€‹

Network flow monitoring allows you to get network data to CloudWatch. You need to install an agent for that. If you do that you get near real-time information about your network traffic. While this is a bit bigger than โ€œweโ€™ve added some new metricsโ€, in the end, youโ€™ll have new metrics ๐Ÿ˜‰

Summary

This re:invent had some amazing launches. Only the CloudWatch launches were amazing!

TLDR;

  1. More Coverage: More Metrics
  2. Easier Correlation: CloudWatch Unified Navigation
  3. Less silos, more analytics: OpenSearch integration
  4. Deeper distributed tracing: X-Ray โ†’ OTEL spans
  5. Aided investigations: AI Q Developer Assistant

Improving the user experience for CloudWatch should be one of the number one topics of AWS in our opinion. CloudWatch is often the only service why developers log into the console still a lot. The unified navigation is a great first step.

Making use of OTEL spans instead of their own X-Ray format is a great idea as well from our perspective. It allows AWS to support more observability tools and gives customers the ability to export them into third-party tools and correlate with more systems.

Letโ€™s see what the future brings!

See you in two weeks โœŒ๐Ÿฝ

Sandro & Tobi

P.S. Sandro was also interviewed on this one the podcast: Living in the Cloud. The episode is not out yet, keep your eyes open.

โ€‹

โ€‹

โ€‹Tobias Schmidt & Sandro Volpicella & from AWS Fundamentalsโ€‹
โ€‹
Cloud Engineers โ€ข Fullstack Developers โ€ข Educators

You're receiving this email because you're part of our awesome community!

If you'd prefer not to receive updates, you can easily unsubscribe anytime by clicking here: Unsubscribe

โ€‹

Our address: Dr.-Otto-BรถรŸner-Weg 7a, Ottobrunn, Bavaria 85521

AWS for the Real World

Join our community of over 8,800 readers delving into AWS. We highlight real-world best practices through easy-to-understand visualizations and one-pagers. Expect a fresh newsletter edition every two weeks.

Read more from AWS for the Real World

โŒ› Reading time: 14 minutes ๐ŸŽ“ Main Learning: Feature Flags with AWS AppConfig ๐Ÿ‘พ GitHub Repository โœ๏ธ Read the Full Post Online ๐Ÿ”— Hey Reader ๐Ÿ‘‹๐Ÿฝ There's no other field where it's so common to have "a small side-project" like in the software industry. Even though it's possible to build things as quickly as ever before due to cloud providers, tools, platforms, and AI, many indie founders (and also large enterprises) tend to fall into the same trap: they tend to build features that users do not...

โŒ› Reading time: 17 minutes ๐ŸŽ“ Main Learning: Observability at Scale with Open-Source ๐Ÿ‘พ GitHub Repository โœ๏ธ Read the Full Post Online ๐Ÿ”— Hey Reader ๐Ÿ‘‹๐Ÿฝ Welcome to this edition of the AWS Fundamentals newsletter! In this issue, we're focusing on observability with open-source tools on AWS. As most of you already know, we can use Amazon CloudWatch and X-Ray to monitor our application from every angle. But what if we want to hybrid setup where we run certain parts of our ecosystem outside of AWS?...

โŒ› Reading time: 9 minutes ๐ŸŽ“ Main Learning: Polling or WebSockets: Choosing with Amazon API Gateway ๐Ÿ‘พ GitHub Repository โœ๏ธ Read the Full Post Online ๐Ÿ”— Hey Reader ๐Ÿ‘‹๐Ÿฝ What would you use for quick and regular data updates inside your web app? Or let's phrase it another way: how would you choose between Polling and WebSockets? ๐Ÿ’ญ Understanding the nuances between these two communication methods is important, as they both come with their own advantages, gotchas, and side effects that are not very...