I recently wrote an article comparing three tools that you can use on AWS to analyze large amounts of data: Starburst Presto, Redshift and Redshift Spectrum. I compared Performance and Cost using data and queries from the TPC-H benchmark, on a 1TB dataset (which adds up to 8.66 billion records!). But as you probably know, there are more data analysis tools that one can use in AWS. One in particular I’m going to take a look at is Elastic Map Reduce (EMR). In this article I compare the following tools: Starburst Presto, EMR Presto, EMR Spark and EMR Hive.
If you're an application owner, sooner or later you'll need to analyze large amounts of data. The good news? Whatever your needs are, you’ll likely be covered. The problem? Handling and analyzing large amounts of data is inherently complicated, that's why it's very important to understand the options out there. In this article, I will focus on three very interesting tools designed to analyze large amounts of data: Starburst Presto, Redshift and Redshift Spectrum. I compare Performance and Cost using data and queries from the TPC-H benchmark, on a 1TB dataset (which adds up to 8.66 billion records!)
Most of us accumulate things over time, whether we want it or not. The same happens with data stored in S3. You might have files that were once popular, but now they are only filling up space and making your AWS bill higher than it should. Thanks to the S3 Infrequent Access storage class, you can save money storing files that are not accessed frequently but that you still want to keep accessible. See how you could reduce your S3 cost by about 50% a year.
Do you want to run your web applications in AWS, but are worried about potential code changes or vendor lock-in? Then AWS Elastic File System (EFS) might be the solution! With EFS you can run your web applications in AWS with minimal or zero code changes and at the same time enjoy all the advantages of using the cloud, such as elasticity, high availability and pay-as-you-go. EFS, however, is a tricky service. It's very easy to run into performance traps. In this article, I show you how to avoid common issues with EFS, based on my own project experience migrating and launching applications using EFS.
Keeping track and managing AWS cost is a difficult and time consuming task for AWS customers. Nobody likes doing it, but if you ignore it you could easily experience a bad AWS billing surprise. MiserBot is a Slack chatbot designed to make it fun and easy to stay on top of your AWS cost, and save money!
Calculating AWS cost at scale is a critical task before launching an application. One thing is to pay a few dollars for a development environment - and a different one is to pay for a Production application your business will depend on. That's why I created the AWS Near Real-time Price Calculator tool. An easy, automated way to estimate AWS cost in near real time, using real usage metrics. I just extended this tool's capabilities for serverless applications running on AWS Lambda, Kinesis and Dynamo DB.
AWS Lambda is extremely convenient and cheap to get started with, but you have to keep an eye on cost once your applications run at scale. That's why I've built some tools to help with the monitoring and optimization of AWS Lambda cost. If you're planning to run AWS Lambda functions at scale, reading this post can save you thousands of dollars.
AWS announced Athena back in re:Invent 2016. Athena is a very handy service that lets you query data that is stored in S3, without you having to launch any infrastructure. Just put data files in S3, use SQL syntax and let Athena do its magic. It's awesome. That's why it's a great tool for doing some detailed analysis on AWS Cost and Usage reports. But it's not as straightforward as it sounds, that's why I wrote some tools to simplify the whole process.
As you probably know, Amazon S3 suffered on February 28th, 2017, a big outage. This affected pretty much all AWS services in its biggest region, N. Virginia, and a big portion of the internet. Here are some key takeaways for the rest of us, as application and business owners.
AWS cost optimization is one of the most important tasks for any application owner. It's no secret that AWS pricing can be complicated, but thankfully there are many ways in which you can keep cost under control. AWS QuickSight is a great way to analyze billing reports, understand where your money is going and find ways to cut cost. In this article I take a close look at how to use AWS QuickSight to analyze AWS billing reports.
You've built and launched a great product. Customers are liking it and you're getting some nice usage growth. Your revenue targets are looking good. What can go wrong? If you make some of these mistakes, there are a LOT of things that can go wrong.
As usual, there were a LOT of announcements in this year's re:Invent conference. How can you benefit from all these features? In this article I describe some ways in which you can use these new features to run optimal applications on AWS.
You're executing load tests using Locust. Wouldn't it be nice to have your test results in a single dashboard, together with system metrics such as CPU Usage, memory, Disk I/O, etc? In this article I show you how to export your Locust load test results in real time, to CloudWatch Logs and CloudWatch Metrics.
You're building a new application that will run on AWS, or migrating an existing application to AWS. If you're considering AWS, you want to pick the right components to power your application. With more than 100 AWS services available today, how do you choose one with confidence? How do you identify advantages and disadvantages? How do you uncover and address critical gaps, as early as possible? In this article I walk you through essential steps for choosing the right AWS services for your application.
Are you already using AWS Lambda, or planning to launch your next application using AWS Lambda? How do you make sure your application reliably serves your customers? Operating a "serverless" application in a production environment brings some familiar challenges, but also new ones. In this article I cover some points that will make your life easier once your Lambda function runs in a production environment.
Do you want to know as soon as possible when you're heading for a very large AWS bill? Like in 10 minutes, not 6, 12 or 24 hours later, when there's not much you can do about it. Or how about executing performance tests and not only see response times, but also AWS price metrics, in near real-time, without you doing any manual calculations? This article describes a way to calculate monthly EC2 pricing in near real-time, based on your current usage, using the AWS Price List API, AWS Lambda and CloudWatch Events. CloudFormation template included.
Are you hiring engineers to work on your AWS applications? How do you know which candidates are a good fit? Traditional software engineering skills are a must, but there are also specific skills that engineers must have in today's world of cloud development and operations. In this article I write about what to look for when hiring software engineers for your AWS cloud projects.
Choosing an AWS region is not a trivial decision. There are many variables that affect the price, performance and availability of your application as well as the AWS services you can use. If you choose the wrong region you could end up paying more than double and waiting several months before you can take advantage of new products and features.
One of the most important things you should do before working with an external tool or service provider is to make sure you know which operations they are executing on your AWS resources. CloudTrail is AWS' standard auditing mechanism; it logs all API activity that takes place in your account. But one problem is that once you have CloudTrail data, it's difficult to analyze it. In this post I show you an automated way in which you can use CloudFormation to automatically set up CloudTrail and Elasticsearch for easy visualization of your activity data.
Ever wondered what EC2 configuration is the most optimal for your application? Have you ever tried different configurations and found there are a lot of knobs to turn in AWS? If you want to find a configuration in AWS that will support any business growth, you have to test and iterate. In this post I describe the steps I followed to test different EC2 instance types and determine which one best met my requirements. The steps I describe here can be applied to any application type.
I use t2.nano EC2 instances a lot, mainly for experiments and some development. Eventually I became curious about what types of workloads a t2.nano could handle, therefore I installed a WordPress blog in a t2.nano and ran load tests. Then I gave my t2.nano a little help from CloudFront. Here are my findings.
In this post I'll show you how to set up a performance test environment that you can use to simulate any number of IoT devices and do performance tests for applications using the AWS IoT platform. We will use EC2, AWS IoT, Locust and the MQTT Paho client.
T2 EC2 instance types are a great way to save money if you run an application that typically is not too busy, but that needs to handle occasional bursts in traffic. That being said, you need to understand CPU credits and make sure your application always has a healthy CPU credit balance. If you run out of credits, the CPU in your instance will be capped, putting your customer experience at risk. The table in this post tells you how much time you have left before you run out of CPU credits.
Do you write code for AWS Lambda functions? How do you move your code across development stages? In this post we'll take a look at three methods we can use to decouple code and configuration in AWS Lambda functions. This comes in handy in any agile development cycle, when our code is constantly moving from development to test environments and eventually to Production.
Do you want to automate tasks around your JMeter performance tests? If you want to know whether your tests passed or failed, the first thing you need is a set of metrics to monitor. In this post I show you how to feed your JMeter test results into CloudWatch Logs and generate test result metrics in real-time. As a bonus, I'm also including a CloudFormation template.