This is a critical concept I’ve applied once again in a recent project; a standard design pattern that helps logically structure and preprocess data within a Lakehouse environment.
The Medallion Architecture comprises three distinct layers: Bronze, Silver, and Gold. Each layer represents a step in the data refinement process, with each subsequent layer storing data of higher quality and readiness for analysis.
1️⃣ Bronze Layer: Raw Data Storage
The Bronze layer serves as the landing zone for raw, unprocessed data. This layer ingests data directly from various batch and streaming sources without imposing any structure. The primary focus here is on data ingestion and storage, ensuring that all incoming data is preserved in its original format for further processing.
2️⃣ Silver Layer: Cleaned and Conformed Data
The Silver layer is where the data undergoes significant transformation. In this layer, the raw data from the Bronze layer is cleaned, filtered, and organized into a more structured format. Table structures and column transformations are applied to ensure consistency, quality, and conformance to business rules. The goal here is to create datasets that are ready for broader analytical use.
3️⃣ Gold Layer: Curated Business-Level Data
The Gold layer contains high-quality, curated datasets that are ready for business intelligence and advanced analytics. Data in the Gold layer is optimized for reporting and decision-making processes, providing valuable insights that drive business strategy.
By structuring data in this way, a clear and logical progression from raw data to actionable insights is ensured, enhancing the overall efficiency and effectiveness of data management and analytics efforts.
Image by Eli Ugbomeh

