5 min read

Spark or BigQuery? Why We Chose… Both!

Picture of Wassim Almaaoui Wassim Almaaoui on 19 November, 2025

Loyalty Technology AI

Contents

Back in 2016: The Spark Era

Spark Vs BigQuery: The clash

Or Rule of Thumb

Putting It Into Practice

Example 1: When SQL Wins, Analytics

Example 2: When Code Wins: Business Logic

Final Thoughts

At Eagle AI, data isn’t just part of our business, it is our business. Our mission is simple but ambitious: build the right promotion for the right customer and tune every parameter, like trigger amount, reward, brand, and this, in a 100% individualized way, all based on purchase history.

So, yeah… data processing isn’t a support function here. It’s the core engine behind everything we do.

Back in 2016: The Spark Era

When we first built our data platform, Spark (via Dataproc) was the hot thing.
It was the technology everyone in the data world was excited about: powerful, distributed, and super hype at the time.

And for us, it made perfect sense:

Our engineers were already fluent in Scala/Spark.
It was flexible: ideal for streaming, complex logic, and integration with various systems (Kafka, GCS, Postgres, etc.) thanks to spark connectors.
And, honestly, Spark code just felt nice: clean, modular, and testable.

It was the perfect choice for a data-driven startup back then: modern, scalable, and expressive.

Then Came BigQuery (for Us, Around 2021)

BigQuery wasn’t new but around 2021, we started experimenting with it seriously, and quickly realized its power:

Fully serverless – no clusters, no infrastructure headaches.
No expertise required – no need for Spark-specific skills in code optimization or cluster sizing.
Incredibly fast for analytical workloads.
SQL-based, which made onboarding new developers much easier.
Predictable pricing (≈$6 per TB scanned), and with our well-partitioned datasets, it turned out usually cheaper than running Spark.
And of course, seamless integration with the rest of GCP Services: Looker, Dataform, Vertex AI, Cloud Functions, CloudSQL… everything just clicked.

It felt like the future: simple, fast, and cloud-native.

spark-bigquery-image-1

Spark Vs BigQuery: The clash

At some point, we took a major step on the storage side:

we decided to migrate all our data from GCS to BigQuery, a no-brainer for us (and worth its own article 👀).

From that point on, BigQuery became our main data layer, not just a warehouse but the backbone where all our raw and processed data lives.

Now, when it comes to processing, which is what this article is really about, the choice isn’t that obvious.

Even though all our data sits in BigQuery, we still have two ways to process it:

SQL directly in BigQuery, or
Spark jobs that read from and write back to BigQuery.

And that's where the debate really began.

Time to make decisions!

We gathered the whole data team to decide between two options:

- Option 1: Stay with Spark: flexible, proven, and part of our DNA.
- Option 2: Go all-in on BigQuery: simpler, faster, and cloud-native.

We debated a lot. Each side had solid arguments.
And in the end… we didn’t pick one.

We picked both.

Here’s a visual overview of some key milestones in the evolution of Eagle AI’s data stack, highlighted by the introduction of BigQuery into a setup historically built around Spark and GCS.

spark-modern-data-stack-evolution

We decided to keep both frameworks, because they can actually be seen as complementary tools, as long as you choose the right one for each job.

That last part turned out to be the real challenge: making sure everyone (including newcomers) knew which tool to use when.

So we created a simple internal guide to help with that.

Our Rule of Thumb

Use BigQuery when:

It’s mainly SQL analytics or transformations (joins, group bys, aggregations).

Performance matters (big tables, short runtime).

You’re doing exploration or debugging (query directly in the GCP console).

Use Spark when:

It’s ML, streaming, or custom logic (API calls, Slack notifications, SFTP...).

The job has complex business rules: code is clearer and easier to test.

The job involves multiple logic paths or implementations, these are easier to manage with interfaces, inheritance, and structured control flow (if/else, pattern matching, etc.)

It’s an extension of an existing Spark job: reuse logic instead of rewriting it.

Putting It Into Practice

To make it more concrete, here are two examples from our daily jobs that perfectly illustrate this balance. One where BigQuery clearly is much faster, and another where Spark coding makes development and readability easier.

Example 1: When SQL Wins, Analytics

Let’s take a concrete case. We wanted to aggregate a few sales KPIs by customer segment after joining three tables (sales, customer, and segment).

In total, the query scans around 200 GB of data.


SELECT
 segment.segment_id,
 COUNT(DISTINCT sales.customer_id) AS total_customers,
COUNT(DISTINCT sales.trx_id) AS total_transactions,
 SUM(sales.amount) AS total_ca,
APPROX_QUANTILES(sales.amount, 100)[OFFSET(90)] AS p90_transaction_amount
FROM
 `sales` AS sales
INNER JOIN
`customer` AS customer USING(customer_id)
INNER JOIN
 `segment` AS segment USING(customer_id)
WHERE
 day >= "2024–09–01"
GROUP BY 1;

Spark vs BigQuery

We ran the exact same job on Spark (with a reasonably large cluster) and BigQuery.
Here’s how they compared:

Metric	Spark (8 workers × 8 CPU / 64 GB RAM each)	BigQuery
Runtime	~1h45 min	⚡ 2 min
Cost	~$6.80	~$1.20
Maintenance	Cluster setup & tuning	None (serverless)

The Verdict

For pure analytical workloads, BigQuery simply crushed it, (much) faster, cheaper, and effortless.

Turtle labeled Spark racing a rabbit labeled BigQuery in a playful speed comparison graphic

The maintenance part is non-negligible: I had to test multiple Spark configurations to find the right worker type and parameters just to make the job succeed. This highlights how much optimization and babysitting BigQuery saves you.

From experience, Spark jobs need constant fine-tuning and revalidation after every code change. It’s easy to break optimizations or lineage if you’re not careful.

BigQuery, on the other hand, takes care of all that behind the scenes.

Example 2: When Code Wins: Business Logic

Now take a job that filters customer promotions based on multiple business rules:

Select up to 3 National promotions (best by rank).
Then fill the remainder with Regional promotions until you reach 10 total.
Exclude any Regional promotions whose brand already appears in the selected National set.

Here’s how it looks in Scala/Spark:

val nbPromotions = 10

val reward =
 val ds: Dataset[CustomerPromotions] = 
  spark.read.format("bigquery").load("project.dataset.customer_promotions")
 .as[CustomerPromotions]
 
 ds.map(x => x.customerId -> selectBestPromotions(x.promotions, nbPromotions))

 def selectBestPromotions(promotions: Seq[Promotion], nbPromotions: Int) = {
   val sortedPromotions = promotions.sortBy(_.rank)
   val nationalPromotions = sortedPromotions.filter(_.scope == National).take(3)
   val brandsInNationalPromotions = nationalPromotions.map(_.brand_id).toSet
   val regionalPromotions = sortedPromotions
     .filter(_.scope == Regional)
     .filter(promo => !brandsInNationalPromotions.contains(promo.brand_id))
     .take(nbPromotions - nationalPromotions.length)
   
  nationalPromotions ++ regionalPromotions
 }

And here’s the same logic in BigQuery SQL

DECLARE nb_promotions INT64 DEFAULT 10;
WITH nat AS (
 - Top-3 National per customer ranked
 SELECT
 cp.customer_id,
 (SELECT ARRAY_AGG(n ORDER BY n.rank LIMIT 3)
 FROM UNNEST(cp.promotions) AS n
 WHERE n.scope = 'National') AS nat
 FROM `project.dataset.customer_promotions` AS cp
)
SELECT
 cp.customer_id,
 ARRAY_CONCAT(
 COALESCE(n.nat, []),
 (
 - Complete with Regional, exclude Brands in National
 SELECT ARRAY_AGG(r ORDER BY r.rank
 LIMIT GREATEST(@nb_promotions - ARRAY_LENGTH(COALESCE(n.nat, [])), 0))
 FROM UNNEST(cp.promotions) AS r
 WHERE r.scope = 'Regional'
 AND NOT EXISTS (
 SELECT 1 FROM UNNEST(IFNULL(n.nat, [])) AS nn
 WHERE nn.brand_id = r.brand_id
 )
 )
 ) AS promotions
FROM `project.dataset.customer_promotions` AS cp
LEFT JOIN nat AS n
 USING (customer_id);

The result is the same, but the Spark code is clearer, easier to maintain, and simpler to test, as the main business logic is divided into well-named functions that outline each step clearly.

Once the logic becomes more complex, Spark provides better readability and flexibility.

Final Thoughts

At Eagle AI, we’ve stopped thinking in terms of Spark vs BigQuery.

Instead, we ask:

“Which one makes this job simpler, faster, and easier to maintain?”

Sometimes that’s BigQuery, especially for analytics and transformations.
Sometimes it’s Spark, for jobs with richer business logic or external integrations.

And sometimes, you can even combine both worlds in a single job.
For example, we might start with a heavy SQL transformation directly on BigQuery, something like:

val ds = spark.read.option("query", "SELECT * FROM … JOIN … GROUP BY …").load()

Then, once the data is pre-aggregated and ready, we switch to Spark to apply the more complex, business-specific logic or make external API calls.

This approach lets BigQuery handle the heavy lifting, while Spark focuses on the parts that are harder to express in SQL.

Because the best data platform isn’t about choosing sides, it’s about choosing wisely.

Get the latest insights, research, and news delivered straight to your inbox.

Plus, enter to win the 2^nd edition of Omnichannel Retail by Tim Mason & Sarah Jarvis!

No spam. We promise. 💜

1 min read

From Holiday Highs to Lasting Loyalty: Strategies That Endure

Eagle Eye on 16 January, 2025

Turn seasonal spikes into long-term loyalty. Explore strategies retailers can use to maintain momentum & deepen customer relationships beyond peak periods.

5 min read

How EagleAI Cut Google Cloud Storage Costs by 50%

Robin Monnier on 30 June, 2025

Discover how EagleAI reduced Google Cloud Storage costs by 50% using lifecycle rules, automation, and better monitoring across projects.

Loyalty Technology AI

7 min read

Right Loyalty Solution for Retail: Purpose-Built vs. Matched

Eagle Eye on 17 September, 2025

Discover how to choose the right loyalty solutions for retail. Explore core features, industry challenges, and why Eagle Eye delivers unmatched expertise.

Loyalty Technology Retail Personalization AI Loyalty Programs Customer Engagement

Platform

Core Products

Made for Scale

Resources

Product Resources

Loyalty's Next Chapter:
The Forces Reshaping Retail in 2025

FY25 Results

Who We Help

Case Studies

Partners

Become a Partner

Spark or BigQuery? Why We Chose… Both!

Back in 2016: The Spark Era

Then Came BigQuery (for Us, Around 2021)

Spark Vs BigQuery: The clash

Time to make decisions!

Our Rule of Thumb

Putting It Into Practice

Example 1: When SQL Wins, Analytics

Spark vs BigQuery

The Verdict

Example 2: When Code Wins: Business Logic

Final Thoughts

From Holiday Highs to Lasting Loyalty: Strategies That Endure

How EagleAI Cut Google Cloud Storage Costs by 50%

Right Loyalty Solution for Retail: Purpose-Built vs. Matched

Solutions

Industries

Resources

Company

Platform

Core Products

Made for Scale

Resources

Product Resources

Loyalty's Next Chapter: The Forces Reshaping Retail in 2025

FY25 Results

Who We Help

Case Studies

Partners

Become a Partner

Spark or BigQuery? Why We Chose… Both!

Back in 2016: The Spark Era

Then Came BigQuery (for Us, Around 2021)

Spark Vs BigQuery: The clash

Time to make decisions!

Our Rule of Thumb

Putting It Into Practice

Example 1: When SQL Wins, Analytics

Spark vs BigQuery

The Verdict

Example 2: When Code Wins: Business Logic

Final Thoughts

From Holiday Highs to Lasting Loyalty: Strategies That Endure

How EagleAI Cut Google Cloud Storage Costs by 50%

Right Loyalty Solution for Retail: Purpose-Built vs. Matched

Loyalty's Next Chapter:
The Forces Reshaping Retail in 2025