## What is data science?

Data science mean the study of large number of data which was stored on different databases or other medium. In this we record, analyse, cleaning and storing the data warehouse like Hive Apache. The goal of data science is to get result from structured and unstructured data.

## What is the role of data scientist?

Data scientist knows the exact meaning of data. They collect data from various scores and store it on warehouse. After they clean and filter data by using languages and tools. They get accurate or predictive result of data with the help of statistics and other tools.

## Will data scientist require coding?

Yes, Data scientist clean data with the help of Python and R Programming Languages

## Is data science learning being hard?

Yes, Data science learning is hard. You must have known the basic concept of python and R Programming to learn **data science training in Gurgaon**. The student must have good knowledge of Statistics and python basic coding to learn data science.

## What are the most important tools used in Data Science Course in Gurgaon?

Here are the few best tools which are used in **data science course in Gurgaon**. Few of them are : –

- SAS
- Apache Spark
- BigML
- Matlab
- Excel
- GGPLOT
- Tableau
- Jupyter
- Matplotlib
- Pandas
- Scipy
- Numpy
- BeatuifulSoap
- Socket
- Tensor Flow
- NLTK

## Data Science : Syllabus

### Data Science using Python

- Difference Between Data analytics & Data Science?
- Why and How python used in Data Analytics and Data Science?
- What is Cubes – OLAP Framework in Python
- What is Warehousing and DBMS
- Data Analytics vs. Data warehousing, OLAP, Extract Transform Load (ETL), MIS Reporting
- What are the problems and business objectives in different industries?
- How leading companies are using the power of analytics?
- Critical success Factor for AI
- What are the analytics tools & their popularity?
- What are Analytics Methodology & its framework?
- What are Analytics projects?

### Python Core Concept

- What is Python
- Introduction to installation of Python
- Overview of Python Editors & IDE’s(Rodeo, Canopy, Jupyter , pycharm, python)
- How to work on Jupiter Notebook and customize settings.
- Basic Syntax, Data Types & Data objects/structures (strings, Tuples, Lists, Dictionaries)
- Conditional Statement (If, Else-If or Nested If Else If)
- Variable & Labels – Date & Time Values
- Looping with Python (For Loop, While Loop or Nested Loop)
- Control Statement in Python (Break Continue or Pass)
- String Manipulation in Python: – Basic Operators, Method, Function etc.
- Function and Modules (Importing, Installation, Packages with Version handling, Composition)
- Input and Output Module in Python
- Exception Handling in Python
- OOPs Concept in Python
- Classes, Inheritance, Overloading, Overriding, Data Hiding in Python.
- Regular Expression in Python : – Match, Search, Modifier or Pattern
- CGI Concept in Python : – Get and Post Method, CGI Enviroment Variable, Cookies, Upload, Get and Post Method
- Database Handling in Python :- Connections, Transitions, Execution and Error Handling in Python
- Networking: – Socket, Method, Internet Modules
- Multithreading in Python : – Thread, Multithreaded
- GUI Programming: – Tkinter and Widgets
- Important Packages of Python : – NumPy, SciPy, Seaborn, scikit-learn, Pandas, Matplotlib, etc
- Reading and writing data
- Simple Graph plotting
- Debugging & Code profiling

### Scientific Distribution

- Numpy, Scify, Seaborn, Pandas, Scikitlearn, Statmodels, Nltk, MetplotLib etc

### How to Import and Export Data by using python modules

- How to Import Data from different sources?
- Connection setup with database
- Viewing Data objects – sub setting, methods
- Export Data to different formats
- Most Important python modules: Pandas, beautiful soup

### Data Manipulation in Python and its Techniques

- What is Manipulation in Python.
- What are important Modules to Manipulate Data?
- Cleaning and Prepping Data using Python.
- Detecting Missing Values using python during cleaning.
- What are Map and Data Library?
- Data Manipulation steps: – Sorting, filtering, merging, duplicates, appending, sub setting, sampling, derived variables, Data type conversions, renaming, formatting etc)
- Data manipulation Tools : – Functions, Operators, Packages, Control Structures, Loops, Arrays, Method etc
- Python Built-in Functions
- Python User Defined Functions and Classes.
- How to stripping out extraneous information
- Normalizing and Formatting data
- Important Python modules for data.

### Perform Data Visualization in Python.

- What is EDA (Exploratory Data Analysis) in Python?
- Descriptive statistics, Frequency Tables and summarizing Data.
- Univariate and Multivariate Analysis (Data Distribution & Graphical Analysis)
- Bivariate Analysis in Statistics (Distributions & Relationships, Cross Tabs, Graphical Analysis)
- How we Plot Graphs- Bar/pie/line chart/histogram/ boxplot/ scatter/ density…
- Important Packages for Exploratory Analysis (NumPy Arrays, Scipy, Matplotlib, seaborn, and Pandas etc)

### Statistics using Python

- What is Statistics in Python. How we calculate Mean, Median, Mode, Central Tendencies and R-Squared or Adj R-Squared using python.
- Probability Distribution in python, Normal Distribution and Central Theorem using Python.
- Statistics Concept and Hypothesis Concept of Testing
- What is KS Test or Z Test . How to Calculate P Value using Distribution?
- What is Anova, Correlations and Chi-square
- Important modules used for statistical methods: Numpy, Scipy, Pandas

### Predictive Analysis using python framework.

- What is model in analytics and how it is used?
- What are the best algorithm for prediction?
- What is Data Modeling in Python?
- Common Techniques used for analytics & modeling process
- Most Popular modeling algorithms
- Phases of Predictive Modeling

### Data Exploration

- What is Data Exploration and why it important?
- EDA Methods and How it used in Machine Learning.
- Common EDA framework for exploring the data and identifying problems with the data.
- How to identify missing data?
- How to identify outlier’s data?
- Method to visualize the data trends and patterns.

### Data Preparation

- Data Preparation for Machine Learning.
- Consolidate and Aggregation of Missing Values, Dummy Creation, Variable Reduction, Outlier Treatment.
- Variable Reduction, Data Reduction Techniques, Principal Component Analysis, Improve Accuracy.

### Data Exploration for Modeling

- Need for structured exploratory data
- EDA framework for exploring the data and identifying any problems with the data (Data Audit Report)
- Identify missing data
- Identify outlier’s data
- Visualize the data trends and patterns

### Segmentation in Machine Learning

- What is Segmentation?
- Edge Based Segmentation, Image Based Segmentation or Region Based Segmentation.
- Different types of Segmentation (Subjective vs Objective, Heuristic vs. Statistical)
- Heuristic Segmentation Techniques.
- Behavioral Segmentation Techniques (K-Means Cluster Analysis)
- Cluster evaluation and profiling
- Implementation on new data

### Linear Regression using Python

- Introduction & Assumptions on Linear Regression
- How to build Linear Regression Model
- Standard metrics (R-Square/Adjusted R-Square, Global hypothesis, Variable significance, etc.)
- Framework to assess the overall effectiveness of the model
- Statistical Model Validation (Re running Vs. Scoring)
- Business Outputs (Error distribution (histogram), Decile Analysis, Model equation, drivers etc.)
- Interpretation of the Results, Business Validation and New Data Implementation

### Logistic Regression for Machine Learning

- What is Logistic Regression?
- Difference between Linear Regression Vs. Generalized Linear Models Vs. Logistic Regression.
- Build Logistic Regression Model (Binary Logistic Model)
- Understand standard model metrics (Variable significance, Concordance, Gini, KS, Misclassification, Hosmer Lemeshov Test, ROC Curve etc)
- Validation of Logistic Regression Models.
- Standard Business Outputs (Lift charts, Model equation, Decile Analysis, ROC Curve, Probability Cut-offs, Drivers or variable importance, etc)
- Interpretation of the Results, Business Validation and New Data Implementation

### Time Series Forecasting

- Intro About Time Series Forecasting
- Components in Time Series – Seasonality, Trend, Cyclicity, Systematic, Level and Decomposition
- Classification of Techniques (Pattern based or Pattern less)
- Basic Techniques for Forecasting : – Averages, Smoothening, etc
- Advanced Techniques – AR Models, holt’s winter, holt’s linear, ARIMA, etc
- Measure Forecasting Accuracy – MAPE, MAD, MSE, etc

### Machine Learning

- Machine Learning Vs Predictive Modeling
- Types of Business problems – Cache Mapping of Techniques – Regression vs. segmentation vs. classification vs. Forecasting.
- Essentials Classes of Learning Algorithms -Supervised Vs Unsupervised Learning
- What are different Phases of Predictive Modeling (Data Pre-processing, Model Building, Sampling, Validation)
- Overfitting & Performance Metrics
- Feature engineering & dimensionality reduction
- Cost function Optimization
- Overview of gradient descent algorithms.
- What is Cross validation (Bootstrapping, K-Fold validation etc)
- Overview on Model performance metrics (R-square, Adjusted R-squre, precision, sensitivity, specificity, RMSE, MAPE, AUC, ROC curve, recall, confusion metrics )

### Decision Trees in Machine Learning

- Introduction on Decision Trees.
- Types of the Decision Tree Algorithms
- Construct of Decision Trees using Simplified Examples
- Generalizing Decision Trees
- Pruning a Decision Tree
- Decision Trees with Validation
- Overfitting is best practice.

### Supervised Learning: Ensemble Trees

- Ensembling in Machine Learning
- Difference between Manual Ensembling Vs. Automated Ensembling
- Methods of Ensembling.
- Bagging Boosting, Stacking and Random Forest (Logic, Practical Applications)
- Ada Boost
- Gradient Boosting Machines (GBM)
- XGBoost

### Unsupervised Learning

- Importance of segmentation & Role of ML in Segmentation?
- Distance in Math – Formulas and Concept.
- K-Means Clustering algorithm
- Expectation Maximization algorithm
- Hierarchical Cluster Analysis
- Sklearn Clustering (DBSCAN)
- Principle components Analysis (PCA)

### ANN (Artificial Neural Network)

- Neural Networks modification and Its Applications
- Single Layer Neural Network (Perceptrons), and Hand Calculations
- Learning In a Multi Layered Neural Net.
- Deep neural Networks for Regression
- Deep neural Networks for Classification
- Interpretation of Outputs and Fine tune the models with Hyper Parameters Tunning
- Validating Artificial Neural Network models

### KNN (K-Nearest Neighbor’s)

- What is KNN & Applications?
- KNN for missing Values.
- KNN for resolve regression problems in python.
- KNN for solving classification problems using python.
- How to validate KNN model
- Model fine tuning using hyper parameters

### NAÏVE BAYES

- Conditional Probability in Naïve Bayes.
- Bayes Theorem and Its Probability Theorem.
- Naïve Bayes for classifier.
- Applications of Naïve Bayes in Classifier

### VECOR MACHINES ALGORITHM

- Applications of Support Vector Machine
- Support Vector Regression – Data Mining Map
- Support vector machine algorithm (Linear & Non-Linear)
- Mathematical Intuition and how to develop.
- Validating SVM Results.

### TEXT DATA MINING

- Taming big Data
- Difference between Structured vs Unstructured vs. Semi-structured Data
- Finding patterns in text: text Analysis, text as a graph
- What is Natural Language processing (NLP)
- Text Analytics – Sentiment Analysis with Python
- Text Analytics – Word cloud analysis with Python
- Text Analytics – Classification (Spam/Not spam)
- Text Analytics – Segmentation using K-Means and Hierarchical Clustering

if you are looking for the best **data science institute in gurgaon**, Call Us