TOP FIVE DIFFERENCES BETWEEN DATA LAKES AND DATA WAREHOUSES
Data Warehouse Wikipedia defines Data Warehouses as: “…central repositories of integrated data from one or more disparate sources. They store current and historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons.” This is a very high-level definition that describes the purpose of a data warehouse but doesn’t explain how the purpose is achieved. I would go on to add that a data warehouse has the following properties: It represents an abstracted picture of the business organized by subject area. It is highly transformed and structured. Data is not loaded to the data warehouse until the use for it has been defined. It generally follows a methodology such as those defined by Ralph Kimball and Bill Inmon . Data Lake Pentaho CTO James Dixon has generally been credited with coining the term “data lake”. He describes a data mart (a subset of a data warehouse) as ak...