Data Pre-processing and Post Processing

 Course Objectives :                          Why this Subject?

  1. Build an understanding of the fundamental concepts of data science and to make them understand the importance of data collection and pre-processing tasks.
  2. Familiarize the student with various exploratory data analytics techniques.
  3. Introduce the student to model development and evaluation techniques.
  4. Will be able to learn model evaluation and generalization error techniques.

Course Outcomes (CO)

  • CO1 Understand the fundamental concepts of data science and to make them understand the importance of data collection and pre-processing tasks.
  • CO2 Explain various exploratory data analytics techniques.
  • CO3 Understand of various model development and evaluation techniques.
  • CO4 Apply mechanism for model evaluation and generalizing error techniques.

UNIT-I

Introduction to Data Science – Evolution of Data Science – Data Science Roles – Stages in a Data Science Project – Applications of Data Science in various fields – Data Security Issues.

Data Collection and Data Pre-Processing Data Collection Strategies – Data Pre-Processing Overview – Data Cleaning – Data Integration and Transformation – Data Reduction – Data Discretization.

UNIT-II

Exploratory Data Analytics Descriptive Statistics – Mean, Standard Deviation, Skewness and Kurtosis – Box Plots – Pivot Table – Heat Map – Correlation Statistics – ANOVA.

UNIT-III

Model Development Simple and Multiple Regression – Model Evaluation using Visualization – Residual Plot – Distribution Plot – Polynomial Regression and Pipelines – Measures for In-sample Evaluation – Prediction and Decision Making.

UNIT – IV

Model Evaluation Generalization Error – Out-of-Sample Evaluation Metrics – Cross Validation – Overfitting – Under Fitting and Model Selection – Prediction by using Ridge Regression – Testing Multiple Parameters by using Grid Search

Textbook(s):

  1. Daniel T. Larose; Chantal D. Larose, "Data Preprocessing," in Discovering Knowledge in Data: An Introduction to Data Mining, Wiley, 2014, pp.16-50, doi: 10.1002/9781118874059.ch2.

References:

  1. Cathy O’Neil and Rachel Schutt , “Doing Data Science”, O'Reilly, 2015
  2. Machine Learning and Big Data: Concepts, Algorithms, Tools, and Applications Uma N. Dulhare, Khaleel Ahmad, Khairol Amali Bin Ahmad First published: 15 July 2020

No comments:

Post a Comment