Stack evaluation
A decision framework for choosing data infrastructure without being sold what you do not need. Comparison tables, a decision flowchart, and a 20-point over-engineering checklist.
The data tooling market is built on a specific asymmetry: vendors know exactly what their products cost you at scale, and most engineering leaders do not. The sales cycle exploits that gap. This document exists to close it before a contract is signed.
What follows is an honest assessment of when the major data infrastructure categories are justified, what they cost at different volumes, and what the vendor is hoping you will not ask. The comparison tables use published pricing as of mid-2025. The decision flowchart applies one criterion: does your actual data volume and latency requirement justify this tool, or are you buying for anticipated scale that may never arrive?
The over-engineering checklist at the end is the fastest diagnostic. Tick more than three boxes and the problem is not tooling: it is that the architecture is serving resumes, not the business.
Each comparison shows the actual scale threshold where the more expensive option becomes justified, monthly costs at three volume tiers, operational overhead, vendor lock-in exposure scored from 0 to 10, and an honest recommendation. Read the recommendation column first.
| Comparison | Scale threshold | Cost: <1M rows/day | Cost: 1–10M rows/day | Cost: 10M+ rows/day | Lock-in score | Honest recommendation |
|---|---|---|---|---|---|---|
| Snowflake vs DuckDB | 100 GB+ active data, 5+ concurrent writers | Snowflake: ~$50 DuckDB: $0 |
Snowflake: ~$300 DuckDB: $0 |
Snowflake: $2,000+ DuckDB: $0–$50 (VM) |
Snowflake: 8/10 DuckDB: 1/10 |
Use DuckDB on a single VM until it strains. You do not need Snowflake for a 10 GB database. |
| BigQuery vs DuckDB + S3 | 1 TB+ querying, complex ML integration | BQ: ~$5 DuckDB: $0 |
BQ: ~$50 DuckDB: $0 |
BQ: $1,000+ DuckDB: $0 |
BQ: 9/10 DuckDB: 1/10 |
BigQuery is cheap at low volume and aggressive at scale. Use DuckDB against local Parquet until analysts complain about query times. |
| Airflow vs cron + Python | 20+ interdependent DAGs, frequent backfilling | Airflow: $50 (hosting) Cron: $0 |
Airflow: $150 Cron: $0 |
Airflow: $500+ Cron: maintenance burden |
Airflow: 7/10 Cron: 0/10 |
If you have three Python scripts running nightly, use cron. Do not stand up Airflow for three scripts. |
| dbt Core vs raw SQL | 5+ data modellers, deeply nested views | dbt: $0 SQL: $0 |
dbt: $0 SQL: $0 |
dbt: $0 SQL: maintenance burden |
dbt Core: 4/10 Raw SQL: 0/10 |
Adopt dbt Core early if you have dedicated analysts. Avoid dbt Cloud until the Snowflake bill is already unavoidable. |
| Databricks vs DuckDB | Real-time streaming, multi-node processing | DBX: $200 DuckDB: $0 |
DBX: $800 DuckDB: $0 |
DBX: $5,000+ DuckDB: $0 |
DBX: 9/10 DuckDB: 1/10 |
Spark is a distributed system for distributed data. Most organisations do not have distributed data. DuckDB replaces ninety percent of local Spark use cases. |
| Kafka vs Postgres replication vs batch | Sub-second latency requirements | Kafka: $150+ Postgres: $0 |
Kafka: $400 Postgres: $0 |
Kafka: $2,000+ Postgres: $0 |
Kafka: 6/10 Postgres: 2/10 |
Batch is correct for ninety-nine percent of businesses. Real-time requirements are frequently an executive preference, not an operational need. Use Postgres logical replication before Kafka. |
| Redshift vs DuckDB + Parquet | Legacy AWS entrenchment, massive concurrent joins | Redshift: $180 DuckDB: $0 |
Redshift: $360 DuckDB: $0 |
Redshift: $2,000+ DuckDB: $0 |
Redshift: 9/10 DuckDB: 2/10 |
Redshift is aging. DuckDB querying Parquet over S3 is fast and effectively free. Migrate if you are not already committed. |
| Looker vs Evidence.dev vs Metabase | Non-technical users who need to build dashboards | Looker: $3,000+ Evidence/Metabase: $0 |
Looker: $3,000+ Evidence/Metabase: $0 |
Looker: $5,000+ Evidence/Metabase: $0 |
Looker: 10/10 Evidence/Metabase: 2/10 |
Metabase is the default. Evidence.dev is excellent if your consumers can read markdown. Looker is for when you have run out of better arguments. |
| Fivetran vs custom ETL | 15+ distinct APIs with changing schemas | Fivetran: $500 Custom: $0 |
Fivetran: $1,500+ Custom: $0 |
Fivetran: $5,000+ Custom: $0 |
Fivetran: 8/10 Custom: 0/10 |
Fivetran is the cost of not writing API polling scripts. Worth it for Salesforce or Zendesk. Not worth it for your own internal Postgres database that your engineers control. |
Four questions. Answer them in order. Stop at the first No.
No: You do not need distributed systems. You do not need Spark. You do not need Snowflake. A Postgres database and DuckDB handle your workload.
Yes: Proceed to question 2.
No: You do not need streaming. You do not need Kafka. You need a batch job that runs hourly.
Yes: Are you certain? If genuinely yes, evaluate Postgres logical replication before Kafka. Kafka is a last resort, not a first choice.
No: Buy managed solutions only where engineering time is the genuine bottleneck (complex third-party APIs where Fivetran saves weeks). Otherwise: two or three components maximum.
Yes: Proceed to question 4.
No: Query it directly with DuckDB.
Yes: Query it directly with DuckDB until it reaches one terabyte. Snowflake is justified after that threshold, not before.
Tick more than three and the architecture is not serving the business. Tick more than seven and the stack is a resume-driven development project.
What the stack should look like before a vendor contract is signed. These are not aspirational: they are the architectures that serve most real workloads at each tier.
At this scale, some managed tooling becomes defensible. Use it selectively.
On the sales call, these phrases mean something different from what is being said. Translate them in real time.
Nauman Shahid builds zero-dependency data infrastructure for organisations in the UAE and the Gulf region. If your current data stack has more vendor dependencies than your workload requires, a diagnostic engagement identifies the exposure: www.mindflex.tech. The Vendor Lock-In Audit is at audit.nauman.cc.
These documents come from live diagnostic work. If your data infrastructure, vendor exposure, or compliance posture needs attention:
Discuss a diagnostic engagement →