Hi Reader ππ½
This newsletter is all about DynamoDB - one of the most famous AWS flagship services around.
π‘ A guessing question to get you started π€β
What was the peak of requests per second for DynamoDB at the Amazon Prime Days 2022? Scroll down to find the answer. You'll be amazed.
β
Let's quickly go over the topics we'll cover in this issue.
That's a lot but it's worth to go deep on this great service that's used by a lot of companies.
Let's go! π
DynamoDB is a fully-managed NoSQL database that is is able to handle any scale. Additionally, it offers great features to integrate natively with other services. As itβs not your common NoSQL storage but comes with a long list of unique points, itβs important to understand its internals beforehand.
DynamoDB is a NoSQL database, which means it is not explicitly enforcing a schema. You can add or remove fields with every write operation.
This does not mean that there is no schema to follow. It's just not enforced on the database level itself. Your application still implicitly expects that your data is structure in a certain way.
Changing a complex implicit NoSQL schema can be even more difficult than migrating a schema in SQL.
DynamoDBβs internals is built around the following major concepts: tables, items, attributes, and types.
With the types, DynamoDB enforces "it's own JSON format". This means each attribute is wrapped into it's type identifier.
On application level, you don't need to work with this nested (rather complex) fields as there are packages for all the popular languages that automatically map from JSON to DynamoDB JSON and visa versa.
One last important fact: each document in DynamoDB can be up to 400 kB in size, which is a lot and won't be reached in most applications. But as you're often aiming for a single table design (one document that contains all relevant information, instead of creating multiple tables that depend on each other), it's possible to exceed this.
A primary key identifies an item uniquely. In DynamoDB, this key can be either a simple one or a composite.
Composite keys extend the options to query for items, which we'll have a look in the next paragraph.
There are two ways to retrieve items: scans and queries.
Scans are always the last resort and should be avoided at all cost.
π‘ Looking at our previous paragraph about primary keys: if your partition key is not well-distributed across partitions (e.g. a single partition will receive a huge percentage of your items), this can lead to hot partitions.
Hot partitions will have a negative impact on the general performance, as those partitions will receive more read and/or write operations (later more on those in the capacity paragraph) as other partitions.
Why is this a problem? Because your read & write capacity units are distributed across all partitions. This means one hot partition can lead to throttles way before you reach your overall capacity.
We've seen that you always require the partition key to query for items. This requires a very well-planned schema where you know all your query capabilities beforehand.
Often, this is difficult or simply not possible as requirements can change.
But you're not out of options and you don't have to fall back on scans, as DynamoDB also offers secondary indexes. With them, your query capabilities can be extended.
There are two different types of secondary indexes (SI):
π‘ A Small Dive Into How Partitions Work In DynamoDB: a table is divided into multiple partitions, and each partition is stored on a different server. When an item is added to the table, it is assigned to a partition based on the partition key value. All items with the same partition key value are stored in the same partition and are therefore stored on the same server. This allows DynamoDB to distribute the data across multiple servers, which helps to scale the table as the size of the data grows. If youβre interested in more detail about partitions check out this amazing article by Alex Debrie.
If you want to insert data into DynamoDB you pass Expression Names & Expression Values. This is also rather unintuitive in the first place, but you'll get used to it. It's also possible to just use another abstraction layer like DynamoDBMapper (which is available for the famous languages) that will make this easier with typed classes for your database items.
But let's have a look at how normal query would look with names and values in the CLI:
The expression attribute (which just act as variables) names start with a #
and the expression attribute values with :
.
#pk)
is mapped to orderId
and our target field's name is mapped to quantity.Then we map the values in the same way only in the block --expression-attribute-values
.
DynamoDB offers two different capacity modes: On-Demand and Provisioned. For on-demand, thereβs no need how many reads and writes youβll need per second, as it will scale immediately. Provisioned capacity requires you to know your traffic patterns, at a steady level (that can also be scaled via CloudWatch, but much slower) of available read and write capacity.
In general, these capacity modes define two things.
ThrottlingException
.DynamoDB charges you based on Read Capacity Units (RCU) and Write Capacity Units (WCU).
One read capacity unit refers to one strongly consistent read or two eventually consistent reads per second. This read can be for an item with a size of up to 4 KB. If the item has more than 4 KB you will consume more RCUs - e.g. 5 KB will consume 2 RCUs.
The On-Demand capacity mode doesnβt require you to define any WCU or RCU. This is a good choice if you fulfill at least one of the following conditions:
On-Demand will cover almost any load, as the service limits are immense.
Provisioned capacity is up to 7 times cheaper than on-demand, but requires you to define RCUs and WCUs. Those will be billed, regardless if you actually use them.
Use this mode for predictable traffic. It doesnβt need to be steady as you can scale RCUs and WCUs with auto-scaling policies.
When to use what - A Summary To Remember π
Our Suggestion: Donβt overthink this from the beginning. Use provisioned capacity with low RCUs and WCUs until you reach the Free Tier limits (25 RCUs/WCUs per month). Afterward, chose on-demand.
With DynamoDBβs global table feature, you can synchronize tables across regions easily, increasing resiliency and following the patterns of the Well-Architected Framework of AWS.
Data is not only backed up to another region but has also a bi-directional synchronization. Regardless of the write region, each region within the global table definition will receive all updates.
DynamoDB offers a fully-managed backup solution. Complicated processes of backing up or restoring data are a thing of the past.
In general, DynamoDB differentiates if you use the AWS Backup service or if you use the direct backup functionality of DynamoDB.
With DynamoDB streams, you can invoke Lambda functions for item operations in DynamoDB. As an example, we want to send a confirmation email to the user when a new order is saved.
You can activate streams in the DynamoDB console by going to the tab Exports and Streams.
That's not all for DynamoDB, but the most important facts. It's also a great service to get started in combination with Lambda, as you can get things up and running really quickly.
Don't hesitate to get your hands on building π
β
We wish you nice rest of the week! π
Tobi & Sandro
β
π΅οΈββοΈ P.S: The answer to the intro question is 105.2 million requests per second. DynamoDB reached this while still maintaining single-digit milliseconds response times! π₯
If you want to read more, learn why AWS Organizations is your best friend for large-scale projects and what's the difference between CloudWatch and CloudTrail! β
Join our community of over 8,800 readers delving into AWS. We highlight real-world best practices through easy-to-understand visualizations and one-pagers. Expect a fresh newsletter edition every two weeks.
β Reading time: 14 minutes π Main Learning: Feature Flags with AWS AppConfig πΎ GitHub Repository βοΈ Read the Full Post Online π Hey Reader ππ½ There's no other field where it's so common to have "a small side-project" like in the software industry. Even though it's possible to build things as quickly as ever before due to cloud providers, tools, platforms, and AI, many indie founders (and also large enterprises) tend to fall into the same trap: they tend to build features that users do not...
β Reading time: 17 minutes π Main Learning: Observability at Scale with Open-Source πΎ GitHub Repository βοΈ Read the Full Post Online π Hey Reader ππ½ Welcome to this edition of the AWS Fundamentals newsletter! In this issue, we're focusing on observability with open-source tools on AWS. As most of you already know, we can use Amazon CloudWatch and X-Ray to monitor our application from every angle. But what if we want to hybrid setup where we run certain parts of our ecosystem outside of AWS?...
β Reading time: 9 minutes π Main Learning: Polling or WebSockets: Choosing with Amazon API Gateway πΎ GitHub Repository βοΈ Read the Full Post Online π Hey Reader ππ½ What would you use for quick and regular data updates inside your web app? Or let's phrase it another way: how would you choose between Polling and WebSockets? π Understanding the nuances between these two communication methods is important, as they both come with their own advantages, gotchas, and side effects that are not very...