Streamlining the Medallion Architecture

The Challenge

Modern data architectures move data 15 times

On average, in most data architectures, a piece of data is physically moved and stored up to 15 times. There are many historical reasons for this, but the result is the same everywhere: a solution that's cumbersome and complex to manage, with countless points of failure across scripts, programs, ETL/ELT processes, scheduling and dependencies.

60% of enterprise data engineering effort continues to be spent on data movement and integration — confirming how costly this pattern has become.
— McKinsey, State of Data Engineering 2024

Hand-drawn diagram of a single source piece of data passing through 15 numbered hops before reaching a still-waiting business user — **Fig 1.** From source to business: 15 hops, 15 chances to break, and a user still waiting.

The Warehouse Era

Layers everywhere — by default

In the era of the data warehouse, multiple layers of storage and separation were the norm: a raw / landing zone, staging zone, curated / processed zone, refined zone, consumption zone, and maybe an archive on top.

Each layer had a purpose. Together, they had a price.

The Lake Promise

A new label, the same architecture

Then came the Data Lake, meant to simplify everything by adding flexibility. Most customers followed the same pattern: a raw zone, staging zone, curated zone, refined / trusted zone, discovery / sandbox zone, archive.

Many organizations recreated warehouse-style layering inside data lakes — leading to even more complexity and more data movement.
— BARC Germany, Data Management Report 2024

This was no less complex to manage, and arguably more so, given the maturity gap of the tools involved. View it through a lakehouse lens, lake and warehouse fused, and the same customers found it impossible to return any investment on the architecture, or to support the digital transformations their organizations actually needed.

The biggest blockers to AI and analytics adoption are slow data preparation cycles, fragmented storage, and excessive data duplication — all typical of multi-layered lakehouse environments.
— MIT Technology Review Insights, 2024

The Medallion

Bronze, Silver, Gold — same pattern, new colors

Within a Medallion Architecture the industry is following a similar pattern, with a Bronze (Raw) Zone, Silver (Refined Zone) and a Gold (Aggregated Zone). As an industry, and as data practitioners, we have an opportunity to use this architecture to its full potential, rather than repeating the patterns of the last 30 years and being very disappointed in the next 2–5 years as we fail to deliver the data transformation our organizations depend on.

Three columns showing the Data Warehouse, Data Lake and Medallion architectures side-by-side, each with multiple stacked zones — **Fig 2.** Three eras, one habit. The Medallion is a layout — not, by itself, a Data Mesh.

The Medallion architecture is a structured layering system intended to improve data quality.
— Databricks, Public Documentation

Note, Databricks does not claim Medallion is a Data Mesh, and that distinction matters. Compatibility is not equivalence.

The Significant Challenge

Six taxes the layered model quietly charges

The most significant challenge with layered data architectures, including the Medallion, is the sheer volume of ETL/ELT processing required to move and reconcile data between layers. It manifests in six familiar ways.

Complexity

As data progresses through layers, it undergoes multiple transformations and checks. The pipelines themselves become the system you spend most of your time managing.

Enterprises continue to struggle with "pipeline sprawl" — even simple use cases create dozens of dependent transformations.
— Gartner, 2024

Resource intensive

Each transformation step consumes compute. With large volumes the bill scales fast, and rarely scales with the value created.

Latency

Multiple stages introduce lag. The longer the chain, the harder real-time analytics becomes, and the less timely the answers reaching the business.

Maintenance overhead

Schemas drift, requirements shift, source systems evolve. Pipelines are the first to feel it, and usually the loudest.

Quality and consistency

Errors at one stage propagate through every downstream layer. Trust erodes in places no one is watching.

Layered architectures magnify data quality issues by allowing errors to compound over multiple hops.
— BARC, 2024

Governance and compliance

Lineage gets longer. Access gets harder. Compliance becomes a forensic exercise rather than a built-in property.

The Data Tiles Approach

Stop moving. Start meshing.

To address these challenges, Data Tiles advocates a different posture toward data architecture, including the Medallion. Four shifts make the difference.

Hand-drawn central data mesh hub with four petals: streamline ETL, automation & AI, incremental & real-time, domain-owned products — **Fig 4.** Four shifts that replace the layered factory with a living mesh.

1. Streamline ETL — don't process for process' sake

Use smart storage rules to inform where data should sit, and in what technology. Avoid one-size-fits-all. And don't treat ELT as a fix; it just relocates the complexity downstream onto a different engine, without reducing the amount of physical data created.

ELT does not reduce complexity — it simply relocates it.
— Medium Engineering, 2024

2. Move toward a Data Mesh

Allow users to access and analyze data without dragging it through multiple layers, reducing the need for extensive ETL/ELT, while keeping governance, lineage and quality intact.

Most organizations misinterpret Data Mesh as a technology choice rather than an organizational model — which is why so many implementations fail.
— Gartner, 2024

Databricks themselves note that "the Medallion architecture is compatible with the concept of a data mesh, Bronze and Silver tables can be joined together in a one-to-many fashion." Worth saying clearly: compatibility does not make Databricks a Data Mesh platform. It remains an engineering-first tool, not a domain-oriented data product environment.

3. Automation and AI

Automating ETL/ELT processes, and using AI to manage pipelines, reduces manual overhead, catches drift early, and improves efficiency where humans currently fight fires.

4. Incremental and real-time processing

Process change, not the whole lake. Adopting real-time techniques cuts latency and gives the business, and AI, the live signal it actually needs.

Real-time data availability is one of the top three enablers of successful AI programs.
— MIT Technology Review, 2024

The Path Forward

Balance — not more layers

By adopting these approaches, organizations regain the ability to organize and manage data effectively, especially in complex environments with diverse data types and large volumes.

The key is to balance the need for data processing with efficient architecture and technology choices.

Bronze, Silver, Gold isn't the problem. Treating it as the destination instead of a layout is. Mesh the data, automate the plumbing, process what changed, and the Medallion finally earns its name.

Join a Data Conversation

Cameron Price.

Cameron Price

CEO & Founder

Cameron writes on the architectural choices that quietly compound — like layering for layering's sake — and what it takes to turn Bronze, Silver, Gold from a slogan into a living mesh that actually serves the business.

Watch · Data Conversation with Cameron Price

References