Join our community of over 8,800 readers delving into AWS. We highlight real-world best practices through easy-to-understand visualizations and one-pagers. Expect a fresh newsletter edition every two weeks.
Share
π OpenTelemetry on AWS: Observability at Scale with Open-Source
Published 25 days agoΒ β’Β 6 min read
β Reading time: 17 minutes
π Main Learning: Observability at Scale with Open-Source
Welcome to this edition of the AWS Fundamentals newsletter!
In this issue, we're focusing on observability with open-source tools on AWS.
As most of you already know, we can use Amazon CloudWatch and X-Ray to monitor our application from every angle. But what if we want to hybrid setup where we run certain parts of our ecosystem outside of AWS? Maybe on-premise or on another cloud provider like Azure or GCP. Or we way even want to use an external observability tool instead of just relying on CloudWatch? π€
The solution is to make our observability strategy to comply with the OpenTelemetry standard! β¨
This way, we can have vendor-agnostic telemetry across our whole stack - even for the parts outside of AWS.
Whether you're working with serverless applications, managing complex infrastructures or just getting started with AWS and observability, this edition will help you to understand the big picture between CloudWatch, X-Ray and OpenTelemetry.
Let's get started! π
β
Introduction
When running a function in AWS Lambda, the service automatically captures logs created by your function handlers and sends them to Amazon CloudWatch Logs.
You can enable execution tracing in your function configuration, allowing AWS Lambda to upload traces to AWS X-Ray when a function execution is completed.
Using the infrastructure-as-code framework SST, let's see a practical example with a single function in AWS Lambda:
And the resource configuration in SST:
With X-Ray Active Tracing enabled for our function and the correct permissions for CloudWatch and X-Ray, we expect the following output when executing this function:
Invocation metrics are sent to CloudWatch
All logs are forwarded to CloudWatch in /aws/lambda/<function name>
The function environment variables must have values for the X-Ray Daemon configuration. The AWS Lambda service injects these for you
Traces for function invocation are forwarded to X-Ray
Tracing details timeline with our myDatabaseQuery execution time with the 3 seconds window created by setTimeout, simulating an external call your application would perform β
That's already pretty cool, as we get a native integration between services in the AWS ecosystem from Day 1. As you transition from infrastructure to managed services, you can leverage fully managed telemetry integrations, reducing your operational overhead in monitoring and troubleshooting.
For AWS fully-managed services, AWS automatically configures the CloudWatch Unified Agent and X-Ray Daemon and injects the necessary environment variables for your application and dependencies, creating a fully managed telemetry platform that correlates your logs, metrics, and traces.
For self-managed compute services like Amazon EC2, Amazon ECS, or Amazon EKS, you may need to configure the CloudWatch Unified Agent and X-Ray Daemon to collect system metrics and application telemetry, which are then forwarded to CloudWatch and X-Ray services. β
From Native Integration to Open-Source Standards
Now we've covered the nativeintegration. But what about a general approach on observability?
This is where numerous initiatives have been created, discussed, and proposed over the years. These include metrics-focused solutions like OpenMetrics, OpenCensus, or Prometheus and tracing-oriented projects like OpenTracing, OpenZipkin, or Jaeger.
These solutions offer ways to collect, store, and visualize signals emitted by distributed systems. From infrastructure to developer experience, how do we connect them to measure the internal state of our systems?
This works wonders for AWS native integration and solutions! However, over the years, many competing vendors, tools, and formats for collecting telemetry data (metrics, logs, and traces) from applications and infrastructure have been developed.
This fragmentation made it difficult for developers to choose and implement observability solutions. Many observability tools used proprietary formats and protocols, making it challenging for organizations to switch between monitoring and analytics platforms without significant rework.
The industry recognized the benefits of an open-source, community-driven approach to solving observability challenges.
The initiative's main objective is to make robust, portable telemetry a built-in feature of cloud-native software!
What's OpenTelemetry?
OpenTelemetry is a comprehensive framework designed for creating and managing telemetry signals in a vendor-agnostic manner, supporting various observability backends.
It focuses on the generation, collection, management, and export of telemetry data, enabling easy instrumentation across diverse applications and systems.
Key components include:
a specification for APIs, SDKs, and compatibility
a standard protocol (OTLP) for telemetry data
semantic conventions for naming telemetry data types
language-specific APIs and SDKs for initializing and exporting telemetry
a library ecosystem for automatic instrumentation
The OpenTelemetry Collector as a central hub for receiving, processing, and exporting telemetry data, supporting multiple formats and destinations.
The backend, storage, and visualization phases are intentionally left to other tools.
The readiness of AWS to fully adopt OpenTelemetry remains a question, as it requires significant adaptation to align with this evolving standard.
You can use ADOT to instrument your applications running on AWS App Runner, AWS Lambda, Amazon Elastic Compute Cloud (EC2), Amazon Elastic Container Service (ECS), and Amazon Elastic Kubernetes Service (EKS) on EC2, and AWS Fargate, as well as in your on-premises datacenter.
That sounds cool, but what's the real advantage here? Where does it fit in the observability stack diagram?
The AWS Distro for OpenTelemetry extends the OpenTelemetry Collector and provides secure, production-ready configurations for your receivers, pipeline processors, and exporters to use AWS or AWS Partners monitoring solutions.
With X-Ray Active Tracing enabled, the layer detects the injected X-Ray environment variables and converts them into OpenTelemetry spans and context, allowing you to use OpenTelemetry APIs to enhance your request signals.
You can now instrument your code using OpenTelemetry APIs. The layer configures the OTEL Collector to export data to AWS X-Ray, where the receiver converts your OpenTelemetry-instrumented data into the X-Ray format and forwards it to the X-Ray service.
While this flow might seem redundant when exporting back to X-ray, remember that the ADOT Collector, with its receivers and exporters, allows you to route your telemetry data to any supported monitoring solution.
The key advantage? Once your code is instrumented with OpenTelemetry APIs, switching to a different monitoring destination only requires updating the collector configuration - no code changes are needed!
Here's how to configure the OpenTelemetry Lambda layer in SST:
In the configuration above, we can find the ARN of the AWS Distro for OpenTelemetry Lambda Layer in layers: [β¦β]. Lambda layers are a regionalized resource, meaning they can only be used in the Region where they are published. Use the layer in the same region as your Lambda functions.
In the configuration above, we can find the ARN of the AWS Distro for OpenTelemetry Lambda Layer in layers: [β¦β]. Lambda layers are a regionalized resource, meaning they can only be used in the Region where they are published. Use the layer in the same region as your Lambda functions.
To automatically instrument our function with OpenTelemetry, we use the AWS_LAMBDA_EXEC_WRAPPER environment variable set to /opt/otel-handler. These wrapper scripts will invoke your Lambda application with the automatic instrumentation applied.
By default, the layer is configured to export traces to AWS X-Ray. That's why we need the AWSXRayDaemonWriteAccess managed policy in the function role.
And for some extras: We mark @opentelemetry/api as an external package to prevent esbuild from bundling it. It is already available in the layer!
And that's already it. π
We should end up with our traces being available in X-Ray! While the context group in the X-Ray dashboard is slightly different, we are collecting the correct phases of the AWS Lambda function invocation, application code, and termination!
This Is Just the Beginning!
AWS Distro for OpenTelemetry has broad capabilities, and the learning curve to implement it correctly in your environment can vary.
We are writing more chapters and real-world scenarios using OpenTelemetry!
In the meantime, know your data and learn more about tracing, metrics, and log signals!
This chapter's humble example introduced the OpenTelemetry components and how AWS extends that ecosystem to provide production-ready solutions for its customers with open-source standards.
Join our community of over 8,800 readers delving into AWS. We highlight real-world best practices through easy-to-understand visualizations and one-pagers. Expect a fresh newsletter edition every two weeks.
β Reading time: 14 minutes π Main Learning: Feature Flags with AWS AppConfig πΎ GitHub Repository βοΈ Read the Full Post Online π Hey Reader ππ½ There's no other field where it's so common to have "a small side-project" like in the software industry. Even though it's possible to build things as quickly as ever before due to cloud providers, tools, platforms, and AI, many indie founders (and also large enterprises) tend to fall into the same trap: they tend to build features that users do not...
β Reading time: 9 minutes π Main Learning: Polling or WebSockets: Choosing with Amazon API Gateway πΎ GitHub Repository βοΈ Read the Full Post Online π Hey Reader ππ½ What would you use for quick and regular data updates inside your web app? Or let's phrase it another way: how would you choose between Polling and WebSockets? π Understanding the nuances between these two communication methods is important, as they both come with their own advantages, gotchas, and side effects that are not very...
β Reading time: 6 minutes π Main Learning: DynamoDB Global Tables πΎ GitHub Repository βοΈ Read the Full Post Online π Hey Reader ππ½ DynamoDB is one of the most popular AWS services that requires minimal management. However, as Dr. Werner Vogels reminds us: βEverything Fails All the Time.β β‘οΈ Therefore, even with managed services like DynamoDB, being prepared for a regional outage is important. The good thing: with DynamoDB Global Tables, you can easily replicate tables across multiple AWS...