Step 2 in Data Science - Prepare



We divide, prepare data into two steps based on the nature of the activity.
The first step in data preparation involves literally looking at the data to understand its nature, what it means, it's quality, and format.
It often takes a preliminary analysis of data or samples of data to understand this.
This is why this step is called Prepare.

Once we know more about the data through exploratory analysis, the next step is pre-processing
of data for analysis.
It includes cleaning data, subsetting or filtering data, and creating data that programs can read and understand via modeling raw data into a more defined data model or packaging it using a specific data format.

If there are multiple datasets involved, this step also includes integration of data from different data sources or streams



Comments

Popular posts from this blog

TOP FIVE DIFFERENCES BETWEEN DATA LAKES AND DATA WAREHOUSES

Step 1 in Data Science - Acquire the Data