At EagleAI, we process several terabytes of data every day — both as inputs to our predictive AI models and as outputs to personalize the offers we recommend to our clients. A significant portion of that data is stored on Google Cloud Storage (GCS), and if left unchecked, storage costs can quietly become a major drain on our cloud budget.
In this article, I’ll show you how we identified a misconfigured bucket — replicated across multiple projects — that was responsible for the majority of our GCS costs. By making a few simple adjustments, we managed to reduce those costs by 80% for the affected buckets, and by 50% across all GCS usage!
If your business is running at scale, you probably have multiple GCP projects — different dev environments, clients, or teams — with similar resources replicated across them (BigQuery datasets, Cloud Run apps, GCS buckets, etc.).
Tracking costs in detail can be a real headache, especially with GCS where bucket names must be globally unique. In this context, how do you group similar resources across projects to understand their combined impact?
GCP provides several tools to help, if you follow a few best practices:
gs_bucket: temporary
)Using these practices, we quickly discovered that a single bucket — replicated across multiple projects — was responsible for more than 60% of our total GCS costs.
Before changing anything, you need a more granular view of what’s stored in your bucket — so you can safely reconfigure things without accidentally deleting critical files.
Fortunately, GCP provides the gcloud storage
CLI tool (which recently replaced gsutil
) to retrieve file metadata and analyze storage usage. You can use it to:
Here's a simplified example to estimate folder sizes:
$ gcloud storage du -a -r -s $(gcloud storage ls gs://[BUCKET_NAME]/)
Then, gather the relevant teams that use the bucket and determine the appropriate lifecycle for each folder. To make things easier, follow these guiding questions:
GCS offers a wide range of configuration options to manage file retention, versioning, and cost. Based on our experience, here are some best practices:
💡 Beware of minimum storage duration requirements! For example, files moved to Archive must be stored for at least one year before deletion. Moreover operations on those files such as listing or reading are more expensive.
💡Prefix and suffix-based lifecycle rules were introduced in August 2022 to apply custom conditions on specific folders and file extensions.
Example configuration we use today:
💡 In case of conflict, deletion rules take precedence over class transitions. So in this setup, a file in a high-volume folder will be deleted after 3 months, even if it's eligible for Coldline transition.
Legend: Evolution of GCS storage costs before and after applying lifecycle and soft-delete policies.
By applying these configurations to our most expensive buckets, we achieved an 80% reduction in their storage costs, representing a 50% drop in our overall GCS expenses!
While GCS storage might seem cheap compared to other GCP components (like BigQuery analysis, Compute CPUs and RAMs usage,..), it can silently become a major line item over time — especially if left unmanaged.
The good news? With the right tooling and a few easy-to-implement practices, controlling GCS costs is both simple and impactful.