The release cadence of AWS is impressive. In 2018, there were 1381 release announcements from AWS on the “What’s New” blog feed. At AWS re:Invent 2018, we saw some amazing announcements like a new fully-managed timeseries database, managed Kafka, a whole host of machine learning tools, and even a managed blockchain.
But that’s not what this post is about. This post is about the constant, relentless improvements that AWS makes to existing services. Most of these improvements are unseen and unannounced. After all, it’s hard to make a release announcement for under-the-hood improvements. Yet these consistent improvements are why AWS continues to delight its customers — it’s never happy with the status quo.
Let’s walk through a few of my favorite examples of this from 2018.
During Werner Vogel’s keynote at AWS re:Invent 2018, he dropped this little nugget:
Amazon Redshift, one of AWS’ most popular services and the leading managed data warehouse solution, improved its performance by 350% in the last 6 months, and it got a 30 second, off-hand mention by Werner during his 2-hour keynote. Not a press release. Not an email blast. A quick comment before moving on to the ‘real’ announcements of re:Invent.
Think how many thousands of customers benefitted from these improvements. Without paying a dime extra. Without needing to migrate to new hardware. Automatically.
That said, DynamoDB has its quirks. One of the quirks was around partitions. If you didn’t have a great understanding of DynamoDB internals, you could run into performance degradation due to hot keys or throughput dilution. This led to workarounds like write sharding to handle the inherent realities of DynamoDB partitions.
But the DynamoDB team didn’t rest. Education can only go so far — they want to make it so you need to worry about this less and less. And so, they implemented adaptive capacity.
To understand the benefits of adaptive capacity, you need to know a little about how DynamoDB works. Data in DynamoDB is assigned to partitions. Assignment to a partition is based on the primary key of each piece of data. Your data access patterns should be spread evenly across the partitions to receive the best performance.
This can be really hard! In reality, you probably have some items that are accessed more frequently than others. Your access patterns probably follow at least a Zipf distribution. It may be even more extreme than that.
Adaptive capacity helps by spreading your provisioned throughput to match your access patterns. If one of your partitions gets significantly more usage than the others, AWS will move throughput from other partitions toward your ‘hot’ partition. This will make the best use of your throughput without requiring overpayment and diluted performance.
This was announced on the AWS Database blog without much fanfare. But again, it’s making a huge difference to a lot of AWS customers.
Invisible improvements at work.
I hear your objections already. “But wait, how is this an invisible improvement? This was announced on stage at re:Invent. It received multiple blog posts and press coverage. What’s invisible about this?”
The AWS team did make some noise about Firecracker, but the noise was for a different purpose. Firecracker is a technical achievement and a big contribution to the open source community. The announcement around it was more about displaying the great work that is happening at AWS and showing some commitment to open source.
You should not run Firecracker. Ever. You should take advantage of Firecracker by using Lambda and Fargate.
That’s all well and good, but it affects a tiny fraction of the programming community. Unless you’re a kernel hacker or have solid infrastructure chops, you should not run Firecracker. Ever. You should take advantage of Firecracker by using Lambda and Fargate.
Firecracker is a virtual-machine manager built for the era of ephemeral compute. It can strip out a bunch of utilities that aren’t relevant in a cloud-native, serverless world. The result is quicker virtual machine boot times and lower overhead.
How does this matter to you, the AWS customer? Faster cold starts for your Lambda functions. Lower prices, as AWS can pack more instances onto a given server.
These are the things you care about. And these are the things that were largely invisible from all the fanfare about Firecracker.
S3 is my favorite AWS service. It’s rock-solid and versatile. During re:Invent, I mentioned that S3 might be the most important innovation over the last 15 years.
Think how many AWS innovations have used S3 as bedrock — EC2 AMIs, EBS and RDS backups, AWS Lambda zip files, everything Athena, and more.
Then think of all the other companies that rely on S3 — Instagram was able to store terabytes of images with only three engineers before its acquisition by Facebook. Or think of how much easier data engineering work is without having to manage your own HDFS instance. Or think of the rise of static sites and the JAMstack without having dead-simple static storage.
But we’re not here to gush about S3. We’re here to talk about invisible improvements. And in the middle of July, S3 dropped their own note about performance enhancements.
In a tiny post with under 200 words, AWS noted two things:
S3 performance had increased, and
Random prefixing for S3 object names is no longer helpful.
The second point is pretty esoteric but will be familiar to many who have done serious work on S3. In the beginning, your S3 performance could be impacted if you had a large number of requests that started with the same prefix. It’s similar to the DynamoDB partitions noted above — certain quirks about the underlying tech could bite you at scale.
The pattern is similar — consistent, relentless progress to sand off rough edges and make the experience better.
Launching early and iterating quickly is a common product development tactic for startups. Or as LinkedIn founder Reid Hoffman said, “If you are not embarrassed by the first version of your product, you’ve launched too late.”
You don’t often see this tactic at large, well-established companies. There’s a natural risk-aversion as people worry about damaging the brand with a bad release.
There’s no such caution at AWS. Looking back at some of the releases, some of the core products appear borderline-unusable, particularly compared to today. (Discussing the improvements of Lambda and DynamoDB over the years can fill separate posts all of their own …).
More recently, I’ve been bullish on the potential of Serverless Aurora, particularly around the Data API to provide HTTP access to a SQL database. However, both the Data API and Serverless Aurora generally are not ready for most production use-cases. That said, I have little doubt that the AWS team will continue to iterate and improve in 2019 and beyond.
To close, I love this tongue-in-cheek tweet from Zack Kanter:
One big concern with using managed services is vendor lock in. How are you supposed to sleep at night when one day you might just wake up and have three-nines availability for free? What if they go to five-nines without warning you or raising the prices? https://t.co/x3h7YSYv0Z— Zack Kanter (@zackkanter) October 17, 2018
When you’re relying on managed services from a cloud provider, you’re not just buying into their uptime, you’re buying into their roadmap. How is your managed service provider making things easier for you?