Step 2 in Data Science - Prepare
We divide, prepare data into two steps based on the nature of the activity. The first step in data preparation involves literally looking at the data to understand its nature, what it means, it's quality, and format. It often takes a preliminary analysis of data or samples of data to understand this. This is why this step is called Prepare. Once we know more about the data through exploratory analysis, the next step is pre-processing of data for analysis. It includes cleaning data, subsetting or filtering data, and creating data that programs can read and understand via modeling raw data into a more defined data model or packaging it using a specific data format. If there are multiple datasets involved, this step also includes integration of data from different data sources or streams