Hi Reader ๐๐ฝ
This newsletter is all about DynamoDB - one of the most famous AWS flagship services around.
๐ก A guessing question to get you started ๐คโ
What was the peak of requests per second for DynamoDB at the Amazon Prime Days 2022? Scroll down to find the answer. You'll be amazed.
โ
Let's quickly go over the topics we'll cover in this issue.
That's a lot but it's worth to go deep on this great service that's used by a lot of companies.
Let's go! ๐
DynamoDB is a fully-managed NoSQL database that is is able to handle any scale. Additionally, it offers great features to integrate natively with other services. As itโs not your common NoSQL storage but comes with a long list of unique points, itโs important to understand its internals beforehand.
DynamoDB is a NoSQL database, which means it is not explicitly enforcing a schema. You can add or remove fields with every write operation.
This does not mean that there is no schema to follow. It's just not enforced on the database level itself. Your application still implicitly expects that your data is structure in a certain way.
Changing a complex implicit NoSQL schema can be even more difficult than migrating a schema in SQL.
DynamoDBโs internals is built around the following major concepts: tables, items, attributes, and types.
With the types, DynamoDB enforces "it's own JSON format". This means each attribute is wrapped into it's type identifier.
On application level, you don't need to work with this nested (rather complex) fields as there are packages for all the popular languages that automatically map from JSON to DynamoDB JSON and visa versa.
One last important fact: each document in DynamoDB can be up to 400 kB in size, which is a lot and won't be reached in most applications. But as you're often aiming for a single table design (one document that contains all relevant information, instead of creating multiple tables that depend on each other), it's possible to exceed this.
A primary key identifies an item uniquely. In DynamoDB, this key can be either a simple one or a composite.
Composite keys extend the options to query for items, which we'll have a look in the next paragraph.
There are two ways to retrieve items: scans and queries.
Scans are always the last resort and should be avoided at all cost.
๐ก Looking at our previous paragraph about primary keys: if your partition key is not well-distributed across partitions (e.g. a single partition will receive a huge percentage of your items), this can lead to hot partitions.
Hot partitions will have a negative impact on the general performance, as those partitions will receive more read and/or write operations (later more on those in the capacity paragraph) as other partitions.
Why is this a problem? Because your read & write capacity units are distributed across all partitions. This means one hot partition can lead to throttles way before you reach your overall capacity.
We've seen that you always require the partition key to query for items. This requires a very well-planned schema where you know all your query capabilities beforehand.
Often, this is difficult or simply not possible as requirements can change.
But you're not out of options and you don't have to fall back on scans, as DynamoDB also offers secondary indexes. With them, your query capabilities can be extended.
There are two different types of secondary indexes (SI):
๐ก A Small Dive Into How Partitions Work In DynamoDB: a table is divided into multiple partitions, and each partition is stored on a different server. When an item is added to the table, it is assigned to a partition based on the partition key value. All items with the same partition key value are stored in the same partition and are therefore stored on the same server. This allows DynamoDB to distribute the data across multiple servers, which helps to scale the table as the size of the data grows. If youโre interested in more detail about partitions check out this amazing article by Alex Debrie.
If you want to insert data into DynamoDB you pass Expression Names & Expression Values. This is also rather unintuitive in the first place, but you'll get used to it. It's also possible to just use another abstraction layer like DynamoDBMapper (which is available for the famous languages) that will make this easier with typed classes for your database items.
But let's have a look at how normal query would look with names and values in the CLI:
The expression attribute (which just act as variables) names start with a #
and the expression attribute values with :
.
#pk)
is mapped to orderId
and our target field's name is mapped to quantity.Then we map the values in the same way only in the block --expression-attribute-values
.
DynamoDB offers two different capacity modes: On-Demand and Provisioned. For on-demand, thereโs no need how many reads and writes youโll need per second, as it will scale immediately. Provisioned capacity requires you to know your traffic patterns, at a steady level (that can also be scaled via CloudWatch, but much slower) of available read and write capacity.
In general, these capacity modes define two things.
ThrottlingException
.DynamoDB charges you based on Read Capacity Units (RCU) and Write Capacity Units (WCU).
One read capacity unit refers to one strongly consistent read or two eventually consistent reads per second. This read can be for an item with a size of up to 4 KB. If the item has more than 4 KB you will consume more RCUs - e.g. 5 KB will consume 2 RCUs.
The On-Demand capacity mode doesnโt require you to define any WCU or RCU. This is a good choice if you fulfill at least one of the following conditions:
On-Demand will cover almost any load, as the service limits are immense.
Provisioned capacity is up to 7 times cheaper than on-demand, but requires you to define RCUs and WCUs. Those will be billed, regardless if you actually use them.
Use this mode for predictable traffic. It doesnโt need to be steady as you can scale RCUs and WCUs with auto-scaling policies.
When to use what - A Summary To Remember ๐
Our Suggestion: Donโt overthink this from the beginning. Use provisioned capacity with low RCUs and WCUs until you reach the Free Tier limits (25 RCUs/WCUs per month). Afterward, chose on-demand.
With DynamoDBโs global table feature, you can synchronize tables across regions easily, increasing resiliency and following the patterns of the Well-Architected Framework of AWS.
Data is not only backed up to another region but has also a bi-directional synchronization. Regardless of the write region, each region within the global table definition will receive all updates.
DynamoDB offers a fully-managed backup solution. Complicated processes of backing up or restoring data are a thing of the past.
In general, DynamoDB differentiates if you use the AWS Backup service or if you use the direct backup functionality of DynamoDB.
With DynamoDB streams, you can invoke Lambda functions for item operations in DynamoDB. As an example, we want to send a confirmation email to the user when a new order is saved.
You can activate streams in the DynamoDB console by going to the tab Exports and Streams.
That's not all for DynamoDB, but the most important facts. It's also a great service to get started in combination with Lambda, as you can get things up and running really quickly.
Don't hesitate to get your hands on building ๐
โ
We wish you nice rest of the week! ๐
Tobi & Sandro
โ
๐ต๏ธโโ๏ธ P.S: The answer to the intro question is 105.2 million requests per second. DynamoDB reached this while still maintaining single-digit milliseconds response times! ๐ฅ
If you want to read more, learn why AWS Organizations is your best friend for large-scale projects and what's the difference between CloudWatch and CloudTrail! โ
Join our community of over 8,800 readers delving into AWS. We highlight real-world best practices through easy-to-understand visualizations and one-pagers. Expect a fresh newsletter edition every two weeks.
โ Reading time: 13 minutes ๐ Main Learning: How to Run Apps on Fargate via ECS ๐พ GitHub Repository โ๏ธ Read the Full Post Online ๐ Hey Reader ๐๐ฝ When building applications on AWS, we need to run our code somewhere: a computation service. There are a lot of well-known and mature computation services on AWS. Youโll often find Lambda as the primary choice, as itโs where you donโt need to manage any infrastructure. You only need to bring your code - itโs Serverless โก๏ธ. However, more options can be...
โ Reading time: 10 minutes ๐ Main Learning: Running Postgres on Aurora DSQL with Drizzle ๐พ GitHub Repository โ๏ธ Read the Full Post Online ๐ Hey Reader ๐๐ฝ With re:Invent 2024, AWS finally came up with an answer to what many people (including us) asked for years: "What if there were something like DynamoDB but for SQL?" With Amazon Aurora DSQL, this is finally possible. Itโs not just a โscales-to-zeroโ solution like Aurora Serverless V2. It is a true distributed, serverless, pay-per-use...
โ Reading time: 12 minutes ๐ Main Learning: CloudWatch Launches re:invent 2024 โ๏ธ Read the Full Post Online ๐ Hey Reader ๐๐ฝ re:invent happened already two weeks ago and there were some amazing launches ๐ CloudWatch got a lot of love at that re:invent. This is why we are showing you our top CloudWatch launches for this year. We've worked through all of them, tried to get them working with our example application of the CloudWatch Book, and are now busy updating the book โ๐ฝ. Let's dive into...