4 min read
How we optimized Cloud Run Networking with Direct VPC Egress
Google Cloud Run is astonishingly simple and increasingly popular among developers. It lets anyone deploy containerized applications quickly and...
Get the latest insights, research and news delivered straight to your inbox.
Plus, enter to win the 2nd edition of Omnichannel Retail by Tim Mason & Sarah Jarvis!
No spam. We promise. 💜
How ASDA leveraged Eagle Eye's market-leading loyalty platform and expertise to launch 'ASDA Rewards', deployed just 3 months after project kick-off.
Contact us to find out how we can enable your teams on our platform.
At EagleAI, we process several terabytes of data every day — both as inputs to our predictive AI models and as outputs to personalize the offers we recommend to our clients. A significant portion of that data is stored on Google Cloud Storage (GCS), and if left unchecked, storage costs can quietly become a major drain on our cloud budget.
In this article, I’ll show you how we identified a misconfigured bucket — replicated across multiple projects — that was responsible for the majority of our GCS costs. By making a few simple adjustments, we managed to reduce those costs by 80% for the affected buckets, and by 50% across all GCS usage!
If your business is running at scale, you probably have multiple GCP projects — different dev environments, clients, or teams — with similar resources replicated across them (BigQuery datasets, Cloud Run apps, GCS buckets, etc.).
Tracking costs in detail can be a real headache, especially with GCS where bucket names must be globally unique. In this context, how do you group similar resources across projects to understand their combined impact?
GCP provides several tools to help, if you follow a few best practices:
gs_bucket: temporary
)Using these practices, we quickly discovered that a single bucket — replicated across multiple projects — was responsible for more than 60% of our total GCS costs.
Before changing anything, you need a more granular view of what’s stored in your bucket — so you can safely reconfigure things without accidentally deleting critical files.
Fortunately, GCP provides the gcloud storage
CLI tool (which recently replaced gsutil
) to retrieve file metadata and analyze storage usage. You can use it to:
Here's a simplified example to estimate folder sizes:
$ gcloud storage du -a -r -s $(gcloud storage ls gs://[BUCKET_NAME]/)
Then, gather the relevant teams that use the bucket and determine the appropriate lifecycle for each folder. To make things easier, follow these guiding questions:
GCS offers a wide range of configuration options to manage file retention, versioning, and cost. Based on our experience, here are some best practices:
💡 Beware of minimum storage duration requirements! For example, files moved to Archive must be stored for at least one year before deletion. Moreover operations on those files such as listing or reading are more expensive.
💡Prefix and suffix-based lifecycle rules were introduced in August 2022 to apply custom conditions on specific folders and file extensions.
Example configuration we use today:
💡 In case of conflict, deletion rules take precedence over class transitions. So in this setup, a file in a high-volume folder will be deleted after 3 months, even if it's eligible for Coldline transition.
Legend: Evolution of GCS storage costs before and after applying lifecycle and soft-delete policies.
By applying these configurations to our most expensive buckets, we achieved an 80% reduction in their storage costs, representing a 50% drop in our overall GCS expenses!
While GCS storage might seem cheap compared to other GCP components (like BigQuery analysis, Compute CPUs and RAMs usage,..), it can silently become a major line item over time — especially if left unmanaged.
The good news? With the right tooling and a few easy-to-implement practices, controlling GCS costs is both simple and impactful.
Senior Data Engineer at EagleAI, with a strong background in machine learning and data science. I focus on building scalable, cloud-native data platforms on GCP, with a strong emphasis on performance and cost-efficiency. As a GCP-certified Data Engineer, I develop production-grade systems that power real-time personalized promotions in retail.
Get the latest insights, research, and news delivered straight to your inbox.
Plus, enter to win the 2nd edition of Omnichannel Retail by Tim Mason & Sarah Jarvis!
No spam. We promise. 💜
4 min read
Google Cloud Run is astonishingly simple and increasingly popular among developers. It lets anyone deploy containerized applications quickly and...
4 min read
In our previous post, Meet MACH: The Principle of Eagle Eye's Technology Approach, we examined the nature of MACH architecture, its benefits, and how...
6 min read
Artificial intelligence (AI) is reimagining the entire retail journey, giving brands more capabilities and their customers more experiences....