Hi Reader ๐๐ฝ
This newsletter is all about DynamoDB - one of the most famous AWS flagship services around.
๐ก A guessing question to get you started ๐คโ
What was the peak of requests per second for DynamoDB at the Amazon Prime Days 2022? Scroll down to find the answer. You'll be amazed.
โ
Let's quickly go over the topics we'll cover in this issue.
That's a lot but it's worth to go deep on this great service that's used by a lot of companies.
Let's go! ๐
DynamoDB is a fully-managed NoSQL database that is is able to handle any scale. Additionally, it offers great features to integrate natively with other services. As itโs not your common NoSQL storage but comes with a long list of unique points, itโs important to understand its internals beforehand.
DynamoDB is a NoSQL database, which means it is not explicitly enforcing a schema. You can add or remove fields with every write operation.
This does not mean that there is no schema to follow. It's just not enforced on the database level itself. Your application still implicitly expects that your data is structure in a certain way.
Changing a complex implicit NoSQL schema can be even more difficult than migrating a schema in SQL.
DynamoDBโs internals is built around the following major concepts: tables, items, attributes, and types.
With the types, DynamoDB enforces "it's own JSON format". This means each attribute is wrapped into it's type identifier.
On application level, you don't need to work with this nested (rather complex) fields as there are packages for all the popular languages that automatically map from JSON to DynamoDB JSON and visa versa.
One last important fact: each document in DynamoDB can be up to 400 kB in size, which is a lot and won't be reached in most applications. But as you're often aiming for a single table design (one document that contains all relevant information, instead of creating multiple tables that depend on each other), it's possible to exceed this.
A primary key identifies an item uniquely. In DynamoDB, this key can be either a simple one or a composite.
Composite keys extend the options to query for items, which we'll have a look in the next paragraph.
There are two ways to retrieve items: scans and queries.
Scans are always the last resort and should be avoided at all cost.
๐ก Looking at our previous paragraph about primary keys: if your partition key is not well-distributed across partitions (e.g. a single partition will receive a huge percentage of your items), this can lead to hot partitions.
Hot partitions will have a negative impact on the general performance, as those partitions will receive more read and/or write operations (later more on those in the capacity paragraph) as other partitions.
Why is this a problem? Because your read & write capacity units are distributed across all partitions. This means one hot partition can lead to throttles way before you reach your overall capacity.
We've seen that you always require the partition key to query for items. This requires a very well-planned schema where you know all your query capabilities beforehand.
Often, this is difficult or simply not possible as requirements can change.
But you're not out of options and you don't have to fall back on scans, as DynamoDB also offers secondary indexes. With them, your query capabilities can be extended.
There are two different types of secondary indexes (SI):
๐ก A Small Dive Into How Partitions Work In DynamoDB: a table is divided into multiple partitions, and each partition is stored on a different server. When an item is added to the table, it is assigned to a partition based on the partition key value. All items with the same partition key value are stored in the same partition and are therefore stored on the same server. This allows DynamoDB to distribute the data across multiple servers, which helps to scale the table as the size of the data grows. If youโre interested in more detail about partitions check out this amazing article by Alex Debrie.
If you want to insert data into DynamoDB you pass Expression Names & Expression Values. This is also rather unintuitive in the first place, but you'll get used to it. It's also possible to just use another abstraction layer like DynamoDBMapper (which is available for the famous languages) that will make this easier with typed classes for your database items.
But let's have a look at how normal query would look with names and values in the CLI:
The expression attribute (which just act as variables) names start with a #
and the expression attribute values with :
.
#pk)
is mapped to orderId
and our target field's name is mapped to quantity.Then we map the values in the same way only in the block --expression-attribute-values
.
DynamoDB offers two different capacity modes: On-Demand and Provisioned. For on-demand, thereโs no need how many reads and writes youโll need per second, as it will scale immediately. Provisioned capacity requires you to know your traffic patterns, at a steady level (that can also be scaled via CloudWatch, but much slower) of available read and write capacity.
In general, these capacity modes define two things.
ThrottlingException
.DynamoDB charges you based on Read Capacity Units (RCU) and Write Capacity Units (WCU).
One read capacity unit refers to one strongly consistent read or two eventually consistent reads per second. This read can be for an item with a size of up to 4 KB. If the item has more than 4 KB you will consume more RCUs - e.g. 5 KB will consume 2 RCUs.
The On-Demand capacity mode doesnโt require you to define any WCU or RCU. This is a good choice if you fulfill at least one of the following conditions:
On-Demand will cover almost any load, as the service limits are immense.
Provisioned capacity is up to 7 times cheaper than on-demand, but requires you to define RCUs and WCUs. Those will be billed, regardless if you actually use them.
Use this mode for predictable traffic. It doesnโt need to be steady as you can scale RCUs and WCUs with auto-scaling policies.
When to use what - A Summary To Remember ๐
Our Suggestion: Donโt overthink this from the beginning. Use provisioned capacity with low RCUs and WCUs until you reach the Free Tier limits (25 RCUs/WCUs per month). Afterward, chose on-demand.
With DynamoDBโs global table feature, you can synchronize tables across regions easily, increasing resiliency and following the patterns of the Well-Architected Framework of AWS.
Data is not only backed up to another region but has also a bi-directional synchronization. Regardless of the write region, each region within the global table definition will receive all updates.
DynamoDB offers a fully-managed backup solution. Complicated processes of backing up or restoring data are a thing of the past.
In general, DynamoDB differentiates if you use the AWS Backup service or if you use the direct backup functionality of DynamoDB.
With DynamoDB streams, you can invoke Lambda functions for item operations in DynamoDB. As an example, we want to send a confirmation email to the user when a new order is saved.
You can activate streams in the DynamoDB console by going to the tab Exports and Streams.
That's not all for DynamoDB, but the most important facts. It's also a great service to get started in combination with Lambda, as you can get things up and running really quickly.
Don't hesitate to get your hands on building ๐
โ
We wish you nice rest of the week! ๐
Tobi & Sandro
โ
๐ต๏ธโโ๏ธ P.S: The answer to the intro question is 105.2 million requests per second. DynamoDB reached this while still maintaining single-digit milliseconds response times! ๐ฅ
If you want to read more, learn why AWS Organizations is your best friend for large-scale projects and what's the difference between CloudWatch and CloudTrail! โ
Join our community of over 8,800 readers delving into AWS. We highlight real-world best practices through easy-to-understand visualizations and one-pagers. Expect a fresh newsletter edition every two weeks.
โ Reading time: 10 minutes ๐ Main Learning: Building a Serverless Platform With SST, Lambda & Next.js โ๏ธ Read the Full Post Online ๐ Hey Reader ๐๐ฝ In this post, we want to guide you through our complete setup for our custom video platform. Our CloudWatch Book's Video Section This starts from the purchase to actually accessing our custom build video-access platform. Overview about our CloudWatch Book Landing Page & Video Platform Architecture We'll explain why we decided against a third-party...
โ Reading time: 11 minutes ๐ Main Learning: Step Functions - Express vs. Standard ๐พ GitHub Code โ๏ธ Blog Post Hey Reader while Sandro is learning something new at the AWS Community Day in Munich today, we'll explore Express and Standard Step Functions, the two types of workflows offered by AWS Step Functions. Weโll break down their differences, when to use each, and the benefits of both. Example Application: running both workflow types to see their performance differences If you want to try...
Hey Reader First things first: we apologize for not providing updates on The CloudWatch Book for a while! ๐ข Sometimes, things don't go as planned and unexpected obstacles arise. But now, we're back in action, creating videos and putting the final touches on the book's content! ๐ฅ Don't just take our word for it! As an early subscriber, here's a free video from one of our favorite chapters: Anomaly Detection ๐ In this deep-dive, you'll learn how to detect unusual patterns in metrics without...