Posts

Showing posts from July, 2019

TOP FIVE DIFFERENCES BETWEEN DATA LAKES AND DATA WAREHOUSES

Data Warehouse Wikipedia  defines Data Warehouses as: “…central repositories of integrated data from one or more disparate sources. They store current and historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons.” This is a very high-level definition that describes the purpose of a data warehouse but doesn’t explain how the purpose is achieved. I would go on to add that a data warehouse has the following properties: It represents an abstracted picture of the business organized by subject area. It is highly transformed and structured. Data is not loaded to the data warehouse until the use for it has been defined. It generally follows a methodology such as those defined by  Ralph Kimball  and  Bill Inmon . Data Lake Pentaho CTO James Dixon  has generally been credited with coining the term “data lake”. He describes a data mart (a subset of a data warehouse) as ak...