## Python with Data Science : Syllabus

### Data Science using Python

- Difference Between Data analytics & Data Science?
- Why and How python used in Data Analytics and Data Science?
- What is Cubes – OLAP Framework in Python
- What is Warehousing and DBMS
- Data Analytics vs. Data warehousing, OLAP, Extract Transform Load (ETL), MIS Reporting
- What are the problems and business objectives in different industries?
- How leading companies are using the power of analytics?
- Critical success Factor for AI
- What are the analytics tools & their popularity?
- What are Analytics Methodology & its framework?
- What are Analytics projects?

### Python Core Concept

- What is Python
- Introduction to installation of Python
- Overview of Python Editors & IDE’s(Rodeo, Canopy, Jupyter , pycharm, python)
- How to work on Jupiter Notebook and customize settings.
- Basic Syntax, Data Types & Data objects/structures (strings, Tuples, Lists, Dictionaries)
- Conditional Statement (If, Else-If or Nested If Else If)
- Variable & Labels – Date & Time Values
- Looping with Python (For Loop, While Loop or Nested Loop)
- Control Statement in Python (Break Continue or Pass)
- String Manipulation in Python: – Basic Operators, Method, Function etc.
- Function and Modules (Importing, Installation, Packages with Version handling, Composition)
- Input and Output Module in Python
- Exception Handling in Python
- OOPs Concept in Python
- Classes, Inheritance, Overloading, Overriding, Data Hiding in Python.
- Regular Expression in Python : – Match, Search, Modifier or Pattern
- CGI Concept in Python : – Get and Post Method, CGI Enviroment Variable, Cookies, Upload, Get and Post Method
- Database Handling in Python :- Connections, Transitions, Execution and Error Handling in Python
- Networking: – Socket, Method, Internet Modules
- Multithreading in Python : – Thread, Multithreaded
- GUI Programming: – Tkinter and Widgets
- Important Packages of Python : – NumPy, SciPy, Seaborn, scikit-learn, Pandas, Matplotlib, etc
- Reading and writing data
- Simple Graph plotting
- Debugging & Code profiling

### Scientific Distribution

- Numpy, Scify, Seaborn, Pandas, Scikitlearn, Statmodels, Nltk, MetplotLib etc

### How to Import and Export Data by using python modules

- How to Import Data from different sources?
- Connection setup with database
- Viewing Data objects – sub setting, methods
- Export Data to different formats
- Most Important python modules: Pandas, beautiful soup

### Data Manipulation in Python and its Techniques

- What is Manipulation in Python.
- What are important Modules to Manipulate Data?
- Cleaning and Prepping Data using Python.
- Detecting Missing Values using python during cleaning.
- What are Map and Data Library?
- Data Manipulation steps: – Sorting, filtering, merging, duplicates, appending, sub setting, sampling, derived variables, Data type conversions, renaming, formatting etc)
- Data manipulation Tools : – Functions, Operators, Packages, Control Structures, Loops, Arrays, Method etc
- Python Built-in Functions
- Python User Defined Functions and Classes.
- How to stripping out extraneous information
- Normalizing and Formatting data
- Important Python modules for data.

### Perform Data Visualization in Python.

- What is EDA (Exploratory Data Analysis) in Python?
- Descriptive statistics, Frequency Tables and summarizing Data.
- Univariate and Multivariate Analysis (Data Distribution & Graphical Analysis)
- Bivariate Analysis in Statistics (Distributions & Relationships, Cross Tabs, Graphical Analysis)
- How we Plot Graphs- Bar/pie/line chart/histogram/ boxplot/ scatter/ density…
- Important Packages for Exploratory Analysis (NumPy Arrays, Scipy, Matplotlib, seaborn, and Pandas etc)

### Statistics using Python

- What is Statistics in Python. How we calculate Mean, Median, Mode, Central Tendencies and R-Squared or Adj R-Squared using python.
- Probability Distribution in python, Normal Distribution and Central Theorem using Python.
- Statistics Concept and Hypothesis Concept of Testing
- What is KS Test or Z Test . How to Calculate P Value using Distribution?
- What is Anova, Correlations and Chi-square
- Important modules used for statistical methods: Numpy, Scipy, Pandas

### Predictive Analysis using python framework.

- What is model in analytics and how it is used?
- What are the best algorithm for prediction?
- What is Data Modeling in Python?
- Common Techniques used for analytics & modeling process
- Most Popular modeling algorithms
- Phases of Predictive Modeling

### Data Exploration

- What is Data Exploration and why it important?
- EDA Methods and How it used in Machine Learning.
- Common EDA framework for exploring the data and identifying problems with the data.
- How to identify missing data?
- How to identify outlier’s data?
- Method to visualize the data trends and patterns.

### Data Preparation

- Data Preparation for Machine Learning.
- Consolidate and Aggregation of Missing Values, Dummy Creation, Variable Reduction, Outlier Treatment.
- Variable Reduction, Data Reduction Techniques, Principal Component Analysis, Improve Accuracy.

### Data Exploration for Modeling

- Need for structured exploratory data
- EDA framework for exploring the data and identifying any problems with the data (Data Audit Report)
- Identify missing data
- Identify outlier’s data
- Visualize the data trends and patterns

### Segmentation in Machine Learning

- What is Segmentation?
- Edge Based Segmentation, Image Based Segmentation or Region Based Segmentation.
- Different types of Segmentation (Subjective vs Objective, Heuristic vs. Statistical)
- Heuristic Segmentation Techniques.
- Behavioral Segmentation Techniques (K-Means Cluster Analysis)
- Cluster evaluation and profiling
- Implementation on new data

### Linear Regression using Python

- Introduction & Assumptions on Linear Regression
- How to build Linear Regression Model
- Standard metrics (R-Square/Adjusted R-Square, Global hypothesis, Variable significance, etc.)
- Framework to assess the overall effectiveness of the model
- Statistical Model Validation (Re running Vs. Scoring)
- Business Outputs (Error distribution (histogram), Decile Analysis, Model equation, drivers etc.)
- Interpretation of the Results, Business Validation and New Data Implementation

### Logistic Regression for Machine Learning

- What is Logistic Regression?
- Difference between Linear Regression Vs. Generalized Linear Models Vs. Logistic Regression.
- Build Logistic Regression Model (Binary Logistic Model)
- Understand standard model metrics (Variable significance, Concordance, Gini, KS, Misclassification, Hosmer Lemeshov Test, ROC Curve etc)
- Validation of Logistic Regression Models.
- Standard Business Outputs (Lift charts, Model equation, Decile Analysis, ROC Curve, Probability Cut-offs, Drivers or variable importance, etc)
- Interpretation of the Results, Business Validation and New Data Implementation

### Time Series Forecasting

- Intro About Time Series Forecasting
- Components in Time Series – Seasonality, Trend, Cyclicity, Systematic, Level and Decomposition
- Classification of Techniques (Pattern based or Pattern less)
- Basic Techniques for Forecasting : – Averages, Smoothening, etc
- Advanced Techniques – AR Models, holt’s winter, holt’s linear, ARIMA, etc
- Measure Forecasting Accuracy – MAPE, MAD, MSE, etc

### Machine Learning

- Machine Learning Vs Predictive Modeling
- Types of Business problems – Cache Mapping of Techniques – Regression vs. segmentation vs. classification vs. Forecasting.
- Essentials Classes of Learning Algorithms -Supervised Vs Unsupervised Learning
- What are different Phases of Predictive Modeling (Data Pre-processing, Model Building, Sampling, Validation)
- Overfitting & Performance Metrics
- Feature engineering & dimensionality reduction
- Cost function Optimization
- Overview of gradient descent algorithms.
- What is Cross validation (Bootstrapping, K-Fold validation etc)
- Overview on Model performance metrics (R-square, Adjusted R-squre, precision, sensitivity, specificity, RMSE, MAPE, AUC, ROC curve, recall, confusion metrics )

### Decision Trees in Machine Learning

- Introduction on Decision Trees.
- Types of the Decision Tree Algorithms
- Construct of Decision Trees using Simplified Examples
- Generalizing Decision Trees
- Pruning a Decision Tree
- Decision Trees with Validation
- Overfitting is best practice.

### Supervised Learning: Ensemble Trees

- Ensembling in Machine Learning
- Difference between Manual Ensembling Vs. Automated Ensembling
- Methods of Ensembling.
- Bagging Boosting, Stacking and Random Forest (Logic, Practical Applications)
- Ada Boost
- Gradient Boosting Machines (GBM)
- XGBoost

### Unsupervised Learning

- Importance of segmentation & Role of ML in Segmentation?
- Distance in Math – Formulas and Concept.
- K-Means Clustering algorithm
- Expectation Maximization algorithm
- Hierarchical Cluster Analysis
- Sklearn Clustering (DBSCAN)
- Principle components Analysis (PCA)

### ANN (Artificial Neural Network)

- Neural Networks modification and Its Applications
- Single Layer Neural Network (Perceptrons), and Hand Calculations
- Learning In a Multi Layered Neural Net.
- Deep neural Networks for Regression
- Deep neural Networks for Classification
- Interpretation of Outputs and Fine tune the models with Hyper Parameters Tunning
- Validating Artificial Neural Network models

### KNN (K-Nearest Neighbor’s)

- What is KNN & Applications?
- KNN for missing Values.
- KNN for resolve regression problems in python.
- KNN for solving classification problems using python.
- How to validate KNN model
- Model fine tuning using hyper parameters

### NAÏVE BAYES

- Conditional Probability in Naïve Bayes.
- Bayes Theorem and Its Probability Theorem.
- Naïve Bayes for classifier.
- Applications of Naïve Bayes in Classifier

### VECOR MACHINES ALGORITHM

- Applications of Support Vector Machine
- Support Vector Regression – Data Mining Map
- Support vector machine algorithm (Linear & Non-Linear)
- Mathematical Intuition and how to develop.
- Validating SVM Results.

### TEXT DATA MINING

- Taming big Data
- Difference between Structured vs Unstructured vs. Semi-structured Data
- Finding patterns in text: text Analysis, text as a graph
- What is Natural Language processing (NLP)
- Text Analytics – Sentiment Analysis with Python
- Text Analytics – Word cloud analysis with Python
- Text Analytics – Classification (Spam/Not spam)
- Text Analytics – Segmentation using K-Means and Hierarchical Clustering