πŸ”Ž OpenTelemetry on AWS: Observability at Scale with Open-Source


Hey Reader πŸ‘‹πŸ½

Welcome to this edition of the AWS Fundamentals newsletter!

In this issue, we're focusing on observability with open-source tools on AWS.

As most of you already know, we can use Amazon CloudWatch and X-Ray to monitor our application from every angle. But what if we want to hybrid setup where we run certain parts of our ecosystem outside of AWS? Maybe on-premise or on another cloud provider like Azure or GCP. Or we way even want to use an external observability tool instead of just relying on CloudWatch? πŸ€”

The solution is to make our observability strategy to comply with the OpenTelemetry standard! ✨

This way, we can have vendor-agnostic telemetry across our whole stack - even for the parts outside of AWS.

Whether you're working with serverless applications, managing complex infrastructures or just getting started with AWS and observability, this edition will help you to understand the big picture between CloudWatch, X-Ray and OpenTelemetry.

Let's get started! πŸ™Œ

​

Introduction

When running a function in AWS Lambda, the service automatically captures logs created by your function handlers and sends them to Amazon CloudWatch Logs.

You can enable execution tracing in your function configuration, allowing AWS Lambda to upload traces to AWS X-Ray when a function execution is completed.

Using the infrastructure-as-code framework SST, let's see a practical example with a single function in AWS Lambda:

And the resource configuration in SST:

With X-Ray Active Tracing enabled for our function and the correct permissions for CloudWatch and X-Ray, we expect the following output when executing this function:

  1. Invocation metrics are sent to CloudWatch
  2. All logs are forwarded to CloudWatch in /aws/lambda/<function name>
  3. The function environment variables must have values for the X-Ray Daemon configuration. The AWS Lambda service injects these for you
  4. Traces for function invocation are forwarded to X-Ray
  5. Tracing details timeline with our myDatabaseQuery execution time with the 3 seconds window created by setTimeout, simulating an external call your application would perform
    ​

That's already pretty cool, as we get a native integration between services in the AWS ecosystem from Day 1. As you transition from infrastructure to managed services, you can leverage fully managed telemetry integrations, reducing your operational overhead in monitoring and troubleshooting.

For AWS fully-managed services, AWS automatically configures the CloudWatch Unified Agent and X-Ray Daemon and injects the necessary environment variables for your application and dependencies, creating a fully managed telemetry platform that correlates your logs, metrics, and traces.

For self-managed compute services like Amazon EC2, Amazon ECS, or Amazon EKS, you may need to configure the CloudWatch Unified Agent and X-Ray Daemon to collect system metrics and application telemetry, which are then forwarded to CloudWatch and X-Ray services.
​

From Native Integration to Open-Source Standards

Now we've covered the native integration. But what about a general approach on observability?

This is where numerous initiatives have been created, discussed, and proposed over the years. These include metrics-focused solutions like OpenMetrics, OpenCensus, or Prometheus and tracing-oriented projects like OpenTracing, OpenZipkin, or Jaeger.

These solutions offer ways to collect, store, and visualize signals emitted by distributed systems. From infrastructure to developer experience, how do we connect them to measure the internal state of our systems?

This works wonders for AWS native integration and solutions! However, over the years, many competing vendors, tools, and formats for collecting telemetry data (metrics, logs, and traces) from applications and infrastructure have been developed.

This fragmentation made it difficult for developers to choose and implement observability solutions. Many observability tools used proprietary formats and protocols, making it challenging for organizations to switch between monitoring and analytics platforms without significant rework.

The industry recognized the benefits of an open-source, community-driven approach to solving observability challenges.

The OpenTelemetry initiative was announced at KubeCon 2019.

The initiative's main objective is to make robust, portable telemetry a built-in feature of cloud-native software!

What's OpenTelemetry?

OpenTelemetry is a comprehensive framework designed for creating and managing telemetry signals in a vendor-agnostic manner, supporting various observability backends.

It focuses on the generation, collection, management, and export of telemetry data, enabling easy instrumentation across diverse applications and systems.

Key components include:

  • a specification for APIs, SDKs, and compatibility
  • a standard protocol (OTLP) for telemetry data
  • semantic conventions for naming telemetry data types
  • language-specific APIs and SDKs for initializing and exporting telemetry
  • a library ecosystem for automatic instrumentation
  • The OpenTelemetry Collector as a central hub for receiving, processing, and exporting telemetry data, supporting multiple formats and destinations.

The backend, storage, and visualization phases are intentionally left to other tools.

The readiness of AWS to fully adopt OpenTelemetry remains a question, as it requires significant adaptation to align with this evolving standard.

AWS Distro for OpenTelemetry

To enable customers to benefit from OpenTelemetry, the AWS Observability team created the AWS Distro for OpenTelemetry.

You can use ADOT to instrument your applications running on AWS App Runner, AWS Lambda, Amazon Elastic Compute Cloud (EC2), Amazon Elastic Container Service (ECS), and Amazon Elastic Kubernetes Service (EKS) on EC2, and AWS Fargate, as well as in your on-premises datacenter.

That sounds cool, but what's the real advantage here? Where does it fit in the observability stack diagram?

The AWS Distro for OpenTelemetry extends the OpenTelemetry Collector and provides secure, production-ready configurations for your receivers, pipeline processors, and exporters to use AWS or AWS Partners monitoring solutions.

The AWS Distro for OpenTelemetry is an official product from AWS, meaning you can use AWS Support and Customer Service to report and resolve technical issues.

Using OpenTelemetry on AWS Lambda

The AWS Distro for OpenTelemetry Lambda Layer configures the following components in your AWS Lambda function:

  1. ​OpenTelemetry SDK and API for JavaScript​
  2. ​OpenTelemetry AWS Lambda Instrumentation for Node.js​
  3. ​ADOT Collector for AWS Lambda​

With X-Ray Active Tracing enabled, the layer detects the injected X-Ray environment variables and converts them into OpenTelemetry spans and context, allowing you to use OpenTelemetry APIs to enhance your request signals.

You can now instrument your code using OpenTelemetry APIs. The layer configures the OTEL Collector to export data to AWS X-Ray, where the receiver converts your OpenTelemetry-instrumented data into the X-Ray format and forwards it to the X-Ray service.

While this flow might seem redundant when exporting back to X-ray, remember that the ADOT Collector, with its receivers and exporters, allows you to route your telemetry data to any supported monitoring solution.

The key advantage? Once your code is instrumented with OpenTelemetry APIs, switching to a different monitoring destination only requires updating the collector configuration - no code changes are needed!

Here's how to configure the OpenTelemetry Lambda layer in SST:

In the configuration above, we can find the ARN of the AWS Distro for OpenTelemetry Lambda Layer in layers: […​]. Lambda layers are a regionalized resource, meaning they can only be used in the Region where they are published. Use the layer in the same region as your Lambda functions.

In the configuration above, we can find the ARN of the AWS Distro for OpenTelemetry Lambda Layer in layers: […​]. Lambda layers are a regionalized resource, meaning they can only be used in the Region where they are published. Use the layer in the same region as your Lambda functions.

AWS Lambda functions provide different architectures of the computer processor. We are using x86_64 for the function, and we must use the correct ARN for the ADOT Layer in the same architecture as the function: aws-otel-nodejs-amd64. We can find detailed instructions in the ADOT Lambda Layer documentation.

To automatically instrument our function with OpenTelemetry, we use the AWS_LAMBDA_EXEC_WRAPPER environment variable set to /opt/otel-handler. These wrapper scripts will invoke your Lambda application with the automatic instrumentation applied.

By default, the layer is configured to export traces to AWS X-Ray. That's why we need the AWSXRayDaemonWriteAccess managed policy in the function role.

And for some extras: We mark @opentelemetry/api as an external package to prevent esbuild from bundling it. It is already available in the layer!

And that's already it. πŸŽ‰

We should end up with our traces being available in X-Ray! While the context group in the X-Ray dashboard is slightly different, we are collecting the correct phases of the AWS Lambda function invocation, application code, and termination!

This Is Just the Beginning!

AWS Distro for OpenTelemetry has broad capabilities, and the learning curve to implement it correctly in your environment can vary.

We are writing more chapters and real-world scenarios using OpenTelemetry!

In the meantime, know your data and learn more about tracing, metrics, and log signals!

The OpenTelemetry API documentation is excellent and shows how to use tracing, contexts, and metrics to enrich your signals in many examples and languages.

This chapter's humble example introduced the OpenTelemetry components and how AWS extends that ecosystem to provide production-ready solutions for its customers with open-source standards.

​
​

P.S.: if you happen to be in the southern part of Germany πŸ‡©πŸ‡ͺ, consider dropping by the Karlsruhe and/or Heilbronn AWS User Group Meetings - I'm (Tobi) also there and would love to meet you πŸ‘‹ πŸ’›

​

​

​Tobias Schmidt & Sandro Volpicella & from AWS Fundamentals​
​
Cloud Engineers β€’ Fullstack Developers β€’ Educators

You're receiving this email because you're part of our awesome community!

If you'd prefer not to receive updates, you can easily unsubscribe anytime by clicking here: Unsubscribe

​

Our address: Dr.-Otto-Bâßner-Weg 7a, Ottobrunn, Bavaria 85521

AWS for the Real World

Join our community of over 8,800 readers delving into AWS. We highlight real-world best practices through easy-to-understand visualizations and one-pagers. Expect a fresh newsletter edition every two weeks.

Read more from AWS for the Real World

βŒ› Reading time: 14 minutes πŸŽ“ Main Learning: Feature Flags with AWS AppConfig πŸ‘Ύ GitHub Repository ✍️ Read the Full Post Online πŸ”— Hey Reader πŸ‘‹πŸ½ There's no other field where it's so common to have "a small side-project" like in the software industry. Even though it's possible to build things as quickly as ever before due to cloud providers, tools, platforms, and AI, many indie founders (and also large enterprises) tend to fall into the same trap: they tend to build features that users do not...

βŒ› Reading time: 9 minutes πŸŽ“ Main Learning: Polling or WebSockets: Choosing with Amazon API Gateway πŸ‘Ύ GitHub Repository ✍️ Read the Full Post Online πŸ”— Hey Reader πŸ‘‹πŸ½ What would you use for quick and regular data updates inside your web app? Or let's phrase it another way: how would you choose between Polling and WebSockets? πŸ’­ Understanding the nuances between these two communication methods is important, as they both come with their own advantages, gotchas, and side effects that are not very...

βŒ› Reading time: 6 minutes πŸŽ“ Main Learning: DynamoDB Global Tables πŸ‘Ύ GitHub Repository ✍️ Read the Full Post Online πŸ”— Hey Reader πŸ‘‹πŸ½ DynamoDB is one of the most popular AWS services that requires minimal management. However, as Dr. Werner Vogels reminds us: β€œEverything Fails All the Time.” ⚑️ Therefore, even with managed services like DynamoDB, being prepared for a regional outage is important. The good thing: with DynamoDB Global Tables, you can easily replicate tables across multiple AWS...