Costs management on the cloud

The easiest way to learn is by doing but what if it involves leaving your credit card number beforehand? I've never been comfortable with that but there is no other choice to get some hands-on experience on the cloud. Hopefully, it doesn't mean you can't control your expenses. In this article, we'll see how.

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I wrote one on that topic! You can read it online on the O'Reilly platform, or get a print copy on Amazon.

I also help solve your data engineering problems 👉 contact@waitingforcode.com 📩

To be honest with you, I'm still wondering why the cloud providers don't propose a model based on the gift cards you can buy on Amazon or Play Store. So instead of leaving your credit card number, you would buy $X of credit to use within some period of time. Probably, it wouldn't be an option chosen by the business users who need some continuity but could be an easier way to master cloud costs while playing around with the services.

Anyway, as long as it's not possible, the best way for this cost mastery topic is the monitoring and alerting with some event-driven processing behind. It's a short version of the solution. If you're interested in a more detailed one, please keep reading!

Costs monitoring

The first costs management component you can put in place is a monitoring dashboard. AWS, Azure and GCP offer a built-in costs monitoring services. On AWS you will use AWS Cost Explorer and AWS Budgets, on Azure the Cost Management overview whereas on GCP, the Billing Reports feature.

All these components have several things in common. First, they show not only the daily usage but also its forecast for the current month. But keep in mind that it's only a prediction and nothing contractual.

Second, they do not provide the real-time view of your expenses. The usage cost is delayed, and the single update guarantee I could deduce from the documentation is at least once a day update. Sometimes it can be more often, like 6 times a day (Azure), but it will never be real-time.

The final interesting feature is the possibility of deep-diving into the expenses and analyzing which service or group was the most expensive in the analyzed period. A smart practice here is to use tags on the resources to organize them in logical groups and know the most expensive ones easily.

Alerting

Fortunately, the services mentioned above don't have only the dashboarding features. They also have an alerting system. The idea behind it is straightforward. You define a threshold, and whenever your forecasted billing goes closer to it, you get notified about that fact.

On AWS, this feature integrates with CloudWatch service, and the notification is delivered using an SNS topic and/or an email. On Azure, you define a Cost alert that can send the alert when you spent an arbitrary percentage of the forecasted spendings. Here, you can send an email but also invoke an Azure Function or a LogicApp. On GCP, you can also define the threshold-based alerts and send them either to an email address or a Pub/Sub topic, meaning that you could trigger anything else from that point.

Auto destruction

You've certainly noticed, we went from a passive monitoring strategy, requiring our physical activity, to a more reactive approach notifying us about reaching the budget threshold. The third approach requires even smaller human involvement. This auto destruction mode relies on an automatically triggered application that will destroy any obsolete stack. By obsolete, I mean a stack that you should have removed before closing your session. To optimize the costs, you can use here the serverless functions services.

To detect an obsolete stack you can apply various strategies. The easiest one could be a scheduled execution of the destructor, for example, in the night, when you're sure to sleep and not working on your cloud project. Of course, if you have nothing to destroy, you will just waste the money. But on the other hand, spending a few dollars per month can be more worthy than forgetting to clean up the stack and letting it run for several days or weeks.

Hopefully, we could try to improve the empty executions by:

scheduling the stack destruction action some time after creating it - for example by creating a CloudWatch CRON rule starting the stack destruction Lambda function 5 hours after the initial deployment. Of course, it doesn't protect you against incurring costs if you use the stack for only 1 hour and leave.
adding expiration tags - every deployed resource could be tagged with the expiration time and a serverless function [I'm invoking it because IMHO, it's the cheapest solution for that kind of work], could parse all resources every X minutes and destroy the expired ones. The difference with the previous proposal is that the expiration time doesn't need to be the same for all functions.

One thing to keep in mind, though. To use this auto-destruction strategy, you should use an IaC deployment instead of creating the resources manually. Otherwise, it will be easy to forget to add the tag or the schedule and, unfortunately, get a small surprise at the end of the billing period. Yet, some managed solutions like Policies on Azure or auto-tagging serverless functions exist to handle that scenario

Despite the services or automatic management strategies, still the best choice for controlling the cloud expenses - especially if it's only for the playground experience - is destroying the stacks when you terminate. And if you decide to close your account, ensure that there are no active resources. Otherwise, you can be billed for them!

Consulting

With nearly 16 years of experience, including 8 as data engineer, I offer expert consulting to design and optimize scalable data solutions. As an O’Reilly author, Data+AI Summit speaker, and blogger, I bring cutting-edge insights to modernize infrastructure, build robust pipelines, and drive data-driven decision-making. Let's transform your data challenges into opportunities—reach out to elevate your data engineering game today!

👉 contact@waitingforcode.com
🔗 past projects