Streamlining the Medallion Architecture: Optimizing Data Transfor

Streamlining the Medallion Architecture:
Optimizing Data Transformation
The Challenge of Modern Data Architectures
On average, in most data architectures, a piece of data is physically moved and stored up to 15 times. There are many historical reasons for this, but unfortunately this leads to a solution that is cumbersome and complex to manage, with many points of failure across scripts, programs, ETL/ELT processes, scheduling, and dependencies. (McKinsey’s 2024 “State of Data Engineering” report notes that 60% of enterprise data engineering effort continues to be spent on data movement and integration, confirming how costly this pattern has become.)
The Data Warehouse Era
In the era of the data warehouse, multiple layers of data storage and separation were common, such as raw/landing zone, staging zone, curated/processed zone, refined zone, a consumption zone, and maybe an archive zone.
The Data Lake Promise
Then came the Data Lake, which looked to simplify this architecture by providing additional flexibility and approach to such architectures. Unfortunately, most customers followed the same pattern, deploying architectures such as a raw zone, staging zone, curated zone, refined/trusted zone, discovery/sandbox zone, and maybe an archive zone. (BARC Germany’s 2024 Data Management Report highlights that many organisations “recreated warehouse-style layering inside data lakes,” leading to even more complexity and data movement.)
This was no less complex to manage and arguably more complex due to the maturity and complexity of the tools involved.
Then view this in a lakehouse architecture where a data lake and data warehouse are combined, and those customers found it impossible to return any investment on such architectures or support critical digital transformation requirements across their organizations. (MIT Technology Review Insights (2024) notes that the biggest blockers to AI and analytics adoption are slow data preparation cycles, fragmented storage, and excessive data duplication—issues typical of multi-layered lakehouse environments.)
The Medallion Architecture Pattern
Within a Medallion Architecture, the industry is following a similar pattern, with a Bronze (Raw) Zone, Silver (Refined Zone), and a Gold (Aggregated Zone). As an industry and data practitioners, we have an opportunity to utilise this architecture to its best potential, rather than following the same patterns of the last 30 years and being very disappointed in the coming 2-5 years as we fail to deliver on the critical data transformation activities that are so important for our organizations. (Databricks’ official definition of the Medallion Architecture describes it as a structured layering system intended to improve data quality, but does not claim it is a Data Mesh, reinforcing the need for thoughtful application rather than blind adoption.)
The Significant Challenge
The most significant challenge with such layered data architectures, including the Medallion Architecture, is the amount of ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform) processing required to manage and move data between these layers. This challenge manifests in several ways:
Complexity
As data progresses through different layers, it often undergoes multiple transformations and checks, which can make the ETL/ELT processes complex and difficult to manage. (Gartner’s 2024 data engineering trends note that enterprises continue to struggle with “pipeline sprawl,” where even simple use cases create dozens of dependent transformations.)
Resource Intensive
Each transformation step consumes computational resources. With large volumes of data, this can become quite resource-intensive, requiring significant computing power and potentially increasing costs.
Latency
Multiple ETL/ELT stages can introduce latency. The time taken to process and move data from one layer to another can impact the timeliness of the data, which is particularly critical for real-time analytics.
Maintenance Overhead
Maintaining multiple ETL/ELT pipelines, especially in dynamic environments where data schemas and business requirements change frequently, can be challenging and labour-intensive.
Data Quality and Consistency
Ensuring data quality and consistency across multiple transformations and layers is challenging. Errors or inconsistencies introduced at any stage can propagate through the system. (BARC 2024 also warns that layered architectures magnify data quality issues by allowing errors to compound over multiple hops.)
Governance and Compliance
As data moves through different layers, keeping track of its lineage, ensuring compliance with regulations, and managing access and security become more complex.
Data Tiles Approach to Efficiency
To address these challenges, Data Tiles advocates the adoption of new approaches to ensure efficiency of data architectures including the Medallion data architecture. These include:
Streamlining ETL Processes
Simplifying and optimizing ETL/ELT processes to reduce complexity and resource consumption.
ELT (Extract, Load, Transform)
Don't process for process' sake. Use smart storage rules to inform where data should be stored, and in what technology. Do not use a one size fits all approach. Avoid moving towards an ELT model as a solution, as this just moves the challenges downstream to different technology, without reducing the complexity and the amount of physical data that is created. (A 2024 Medium engineering analysis warns that “ELT does not reduce complexity; it simply relocates it,” which mirrors this argument.)
Data Mesh
Move towards a data mesh architecture, allowing users to access and analyze data without moving it through multiple layers, reducing the need for extensive ETL/ELT processes, whilst maintaining governance, lineage, and quality. (Gartner (2024) cautions that most organisations misinterpret Data Mesh as a technology choice, rather than an organizational model, highlighting the danger of implementing it incorrectly.) For example, Databricks advocate this approach: "The Medallion architecture is compatible with the concept of a data mesh, bronze and silver tables can be joined together in a one-to-many fashion" (Databricks public website 2023). However, I emphasize compatibility does not make Databricks a Data Mesh platform. It remains an engineering-first tool rather than a domain-oriented data product environment.
Automation and AI
Automating ETL/ELT processes and using AI to manage data pipelines can reduce manual overhead and improve efficiency.
Incremental and Real-Time Processing
Incrementally processing data as it changes and adopting real-time data processing techniques can reduce latency and improve data timeliness. (MIT Technology Review (2024) identifies real-time data availability as one of the top three enablers of successful AI programmes.)
The Path Forward
By adopting such approaches, organizations gain the ability to organise and manage data effectively, especially in complex environments with diverse data types and large volumes. The key is to balance the need for data processing with efficient architecture and technology choices.
Join a Data Conversation,
Cameron Price.
Loading...
References: 
BARC Germany (2024). Data Management Survey 2024: Architecture Trends, Data Quality, and Engineering Maturity. BARC Research.
Databricks (2023). Medallion Architecture – Bronze / Silver / Gold Design Pattern. Databricks Public Documentation. Available at: databricks.com.
Gartner (2024). Top Trends in Data Engineering and Data Management. Gartner Research.
McKinsey & Company (2024). The State of Data Engineering 2024: Why Data Movement Still Dominates Enterprise Workflows. McKinsey Digital.
Medium (2024). The Limitations of ELT: Why “Load First” Doesn't Remove Complexity. Medium Engineering Article.
MIT Technology Review Insights (2024). AI Readiness: Why Data Fragmentation and Slow Preparation Cycles Remain the Top Barriers to Enterprise AI.
Medium (2023). De Seta, N. Creating a Mesh with Medallion: Why Architecture and Operating Models Are Not the Same. Medium Data Publication.
Adevinta Tech Blog (2024). From Lakehouse Architecture to Data Mesh: What Organisations Get Wrong. Adevinta Engineering.