Querying 8.66 Billion Records, part II - a Performance and Cost Comparison between Starburst Presto and EMR SQL Engines

I recently wrote an article comparing three tools that you can use on AWS to analyze large amounts of data: Starburst Presto, Redshift and Redshift Spectrum. I compared Performance and Cost using data and queries from the TPC-H benchmark, on a 1TB dataset (which adds up to 8.66 billion records!). But as you probably know, there are more data analysis tools that one can use in AWS. One in particular I’m going to take a look at is Elastic Map Reduce (EMR). In this article I compare the following tools: Starburst Presto, EMR Presto, EMR Spark and EMR Hive.

Querying 8.66 Billion Records - a Performance and Cost Comparison between Starburst Presto and Redshift

If you're an application owner, sooner or later you'll need to analyze large amounts of data. The good news? Whatever your needs are, you’ll likely be covered. The problem? Handling and analyzing large amounts of data is inherently complicated, that's why it's very important to understand the options out there. In this article, I will focus on three very interesting tools designed to analyze large amounts of data: Starburst Presto, Redshift and Redshift Spectrum. I compare Performance and Cost using data and queries from the TPC-H benchmark, on a 1TB dataset (which adds up to 8.66 billion records!)

How to Cut your S3 Cost in Half by Using the S3 Infrequent Access Storage Class

Most of us accumulate things over time, whether we want it or not. The same happens with data stored in S3. You might have files that were once popular, but now they are only filling up space and making your AWS bill higher than it should. Thanks to the S3 Infrequent Access storage class, you can save money storing files that are not accessed frequently but that you still want to keep accessible. See how you could reduce your S3 cost by about 50% a year.

How to use AWS Elastic File System to Finally Migrate your Web Applications to the Cloud

Do you want to run your web applications in AWS, but are worried about potential code changes or vendor lock-in? Then AWS Elastic File System (EFS) might be the solution! With EFS you can run your web applications in AWS with minimal or zero code changes and at the same time enjoy all the advantages of using the cloud, such as elasticity, high availability and pay-as-you-go. EFS, however, is a tricky service. It's very easy to run into performance traps. In this article, I show you how to avoid common issues with EFS, based on my own project experience migrating and launching applications using EFS.

Now you can calculate AWS cost in near real-time for your serverless applications

Calculating AWS cost at scale is a critical task before launching an application. One thing is to pay a few dollars for a development environment - and a different one is to pay for a Production application your business will depend on. That's why I created the AWS Near Real-time Price Calculator tool. An easy, automated way to estimate AWS cost in near real time, using real usage metrics. I just extended this tool's capabilities for serverless applications running on AWS Lambda, Kinesis and Dynamo DB.

Use These Tools to Keep your AWS Lambda Cost Under Control.

AWS Lambda is extremely convenient and cheap to get started with, but you have to keep an eye on cost once your applications run at scale. That's why I've built some tools to help with the monitoring and optimization of AWS Lambda cost. If you're planning to run AWS Lambda functions at scale, reading this post can save you thousands of dollars.

Using Athena to Save Money on your AWS Bill

AWS announced Athena back in re:Invent 2016. Athena is a very handy service that lets you query data that is stored in S3, without you having to launch any infrastructure. Just put data files in S3, use SQL syntax and let Athena do its magic. It's awesome. That's why it's a great tool for doing some detailed analysis on AWS Cost and Usage reports. But it's not as straightforward as it sounds, that's why I wrote some tools to simplify the whole process.

Takeaways from the S3 outage on February 28th, 2017.

As you probably know, Amazon S3 suffered on February 28th, 2017, a big outage. This affected pretty much all AWS services in its biggest region, N. Virginia, and a big portion of the internet. Here are some key takeaways for the rest of us, as application and business owners.

How to use AWS QuickSight to do AWS Cost Optimization (and save a lot of money)

AWS cost optimization is one of the most important tasks for any application owner. It's no secret that AWS pricing can be complicated, but thankfully there are many ways in which you can keep cost under control. AWS QuickSight is a great way to analyze billing reports, understand where your money is going and find ways to cut cost. In this article I take a close look at how to use AWS QuickSight to analyze AWS billing reports.

Turbocharge your Locust load tests by exporting results to CloudWatch

You're executing load tests using Locust. Wouldn't it be nice to have your test results in a single dashboard, together with system metrics such as CPU Usage, memory, Disk I/O, etc? In this article I show you how to export your Locust load test results in real time, to CloudWatch Logs and CloudWatch Metrics.

How to know if an AWS service is right for you

You're building a new application that will run on AWS, or migrating an existing application to AWS. If you're considering AWS, you want to pick the right components to power your application. With more than 100 AWS services available today, how do you choose one with confidence? How do you identify advantages and disadvantages? How do you uncover and address critical gaps, as early as possible? In this article I walk you through essential steps for choosing the right AWS services for your application.

How to operate reliable AWS Lambda applications in production

Are you already using AWS Lambda, or planning to launch your next application using AWS Lambda? How do you make sure your application reliably serves your customers? Operating a "serverless" application in a production environment brings some familiar challenges, but also new ones. In this article I cover some points that will make your life easier once your Lambda function runs in a production environment.

Know how much your EC2 application WILL cost you, in near real-time, using this Lambda function.

Do you want to know as soon as possible when you're heading for a very large AWS bill? Like in 10 minutes, not 6, 12 or 24 hours later, when there's not much you can do about it. Or how about executing performance tests and not only see response times, but also AWS price metrics, in near real-time, without you doing any manual calculations? This article describes a way to calculate monthly EC2 pricing in near real-time, based on your current usage, using the AWS Price List API, AWS Lambda and CloudWatch Events. CloudFormation template included.

Are you hiring AWS cloud engineers? Here are some tips on what to look for...

Are you hiring engineers to work on your AWS applications? How do you know which candidates are a good fit? Traditional software engineering skills are a must, but there are also specific skills that engineers must have in today's world of cloud development and operations. In this article I write about what to look for when hiring software engineers for your AWS cloud projects.

Save yourself a lot of pain (and money) by choosing your AWS Region wisely

Choosing an AWS region is not a trivial decision. There are many variables that affect the price, performance and availability of your application as well as the AWS services you can use. If you choose the wrong region you could end up paying more than double and waiting several months before you can take advantage of new products and features.

Do you grant third parties access to your AWS account... Do you also want to know what's going on? Use CloudTrail and the AWS Elasticsearch Service

One of the most important things you should do before working with an external tool or service provider is to make sure you know which operations they are executing on your AWS resources. CloudTrail is AWS' standard auditing mechanism; it logs all API activity that takes place in your account. But one problem is that once you have CloudTrail data, it's difficult to analyze it. In this post I show you an automated way in which you can use CloudFormation to automatically set up CloudTrail and Elasticsearch for easy visualization of your activity data.

How to find an optimal EC2 configuration in 5 steps (with actual performance tests and results)

Ever wondered what EC2 configuration is the most optimal for your application? Have you ever tried different configurations and found there are a lot of knobs to turn in AWS? If you want to find a configuration in AWS that will support any business growth, you have to test and iterate. In this post I describe the steps I followed to test different EC2 instance types and determine which one best met my requirements. The steps I describe here can be applied to any application type.

How much time do I have left before my instance runs out of CPU credits?

T2 EC2 instance types are a great way to save money if you run an application that typically is not too busy, but that needs to handle occasional bursts in traffic. That being said, you need to understand CPU credits and make sure your application always has a healthy CPU credit balance. If you run out of credits, the CPU in your instance will be capped, putting your customer experience at risk. The table in this post tells you how much time you have left before you run out of CPU credits.

Configure your Lambda functions like a champ and let your code sail smoothly to Production

Do you write code for AWS Lambda functions? How do you move your code across development stages? In this post we'll take a look at three methods we can use to decouple code and configuration in AWS Lambda functions. This comes in handy in any agile development cycle, when our code is constantly moving from development to test environments and eventually to Production.

Publish JMeter results to AWS CloudWatch and get ready for performance test automation.

Do you want to automate tasks around your JMeter performance tests? If you want to know whether your tests passed or failed, the first thing you need is a set of metrics to monitor. In this post I show you how to feed your JMeter test results into CloudWatch Logs and generate test result metrics in real-time. As a bonus, I'm also including a CloudFormation template.