Data Science Syllabus
DATA SCIENCE (concentrated on machine learning)
PYTHON BASICS
1. Introduction to Python
Python is one of the most popular & powerful languages for machine learning used by most top companies like Facebook, Amazon, Google, Yahoo etc. It is free and open source. This module is all about learning how to start working with Python. We shall teach you how to use the Python language to work with data.
-
Installation of Python framework and packages: Anaconda & pip
-
Introduction to Python Editors & IDE's
- Concept of Packages/Libraries - Important packages
-
Installing & loading Packages
-
Creating Python variables
-
Numeric , string and logical operations
-
Data containers : Lists , Dictionaries, Tuples & sets
2. Iterative Operations & Functions in Python
This is where you shall learn the functionalities and powerful capabilities of Python that will make it easy for you to work with data and set the stage for using Python for machine learning & data science.
-
Writing for loops in Python
- While loops and conditional blocks
- List/Dictionary comprehensions with loops
- Writing your own functions in Python
- Writing your own classes and functions
- Writing your own modules
3. Data summary & visualization in Python
Data visualization is extremely important to understand what the data is saying and gain insights in just one glance. Visualization of data is a strong point of the Python software using the latest ggplot&Seaborn packages and you will learn the same in this module.
- Simple plottting
- Need for data summary & visualization
- Summarising numeric data in pandas
- Summarising categorical data
- Group wise summary of mixed data
- Basics of visualisation with ggplot&Seaborn
- Inferential visualisation with Seaborn
- Visual summary of different data combinations
4. Data Handling in Python using NumPy& Pandas
Python is a very versatile language and in this module we expand on its capabilities related to data handling. Focusing on packages numpy and pandas we learn how to manipulate data which will be eventually useful in converting raw data suitable for machine learning algorithms.
- Introduction to NumPy arrays, functions & properties
- Introduction to Pandas & data frames
- Importing and exporting external data in Python
- Feature engineering using Python
PYTHON TEXT MINING
1. Working with Text in Python
- Introduction to Text Mining
- Handling Text in Python
- Regular Expressions
- Demonstration: Regex with Pandas and Named Groups
- Internationalization and Issues with Non-ASCII Characters
2. Basic Natural Language Processing
- Basic Natural Language Processing
- Basic NLP tasks with NLTK
- Advanced NLP tasks with NLTK
3. Classification of Text
- Text Classification
- Identifying Features from Text
- Naive Bayes Classifiers
- Naive Bayes Variations
- Support Vector Machines
- Learning Text Classifiers in Python
4. Topic Modeling
- Semantic Text Similarity
- Topic Modeling
- Generative Models and LDA
- Information Extraction
5. Web Scraping
- Gathering text data using web scraping with urllib
- Processing raw web data with BeautifulSoup
- Interacting with Google search using urllib with custom user agent
- Collecting twitter data with Twitter API
PYTHON MACHINE LEARNING SYLLABUS
1. Machine Learning Basics
In this module we understand how we can transform our business problems to data problems so that we can use machine learning algos to
solve them. We will further get into discovering what categories of business problems and subsequently machine learning algos are there.
Then we will get updated on methodologies associated with solving such problems. These methodologies will form basis of techniques we learn
ahead in the course. We’ll wrap up this module with discussion on importance and methods of validation of our results.
Converting business problems to data problems
Understanding supervised and unsupervised learning with examples
Understanding biases associated with any machine learning algorithm
Ways of reducing bias and increasing generalisationcapabilites
Drivers of machine learning algorithms
Cost functions
Brief introduction to gradient descent
Importance of model validation
Methods of model validation
Cross validation & average error
2. Generalized Linear Models in Python
We start with implementing machine learning algorithms in this module. We also get exposed to some important concepts related to regression and classification which we will be using in the later modules as well. Also this is where we get introduced to scikit-learn, the legendary python library famous for its machine learning prowess.
Linear Regression
Regularisation of Generalised Linear Models
Ridge and Lasso Regression
Logistic Regression
Methods of threshold determination and performance measures for classification score models
3. Tree Models using Python
In this module you will learn a very popular class of machine learning models which are rule based tree structures also known as Decision Trees. We'll examine the biased nature of these models and learn how to use bagging methodologies to arrive at a new technique known as Random Forest to analyse data.
Introduction to decision trees
Tuning tree size with cross validation
Introduction to bagging algorithm
Random Forests
Grid search and randomized grid search
ExtraTrees (Extremely Randomised Trees)
Partial dependence plots
4. Boosting Algorithms using Python
Want to win a data science contest on Kaggle or data hackathons or be known as a top data scientist? Then learning boosting algorithms is a must as they provide a very powerful way of analysing data and solving hard to crack problems.
Concept of weak learners
Introduction to boosting algorithms
Adaptive Boosting
Extreme Gradient Boosting (XGBoost)
5. Support Vector Machines (SVM) &kNN in Python
We step in a powerful world of “observation based algorithms” which can capture patterns in the data which otherwise go undetected. We start this discussion with KNN which is fairly simple. After that we move to SVM which is very powerful at capturing non-linear patterns in the data.
Introduction to idea of observation based learning
Distances and similarities
k Nearest Neighbours (kNN) for classification
Brief mathematical background on SVM
Regression with kNN& SVM
6. Unsupervised learning in Python
Many machine learning algos become difficult to work with when dealing with many variables in the data. We will learn methods which help solve this problem and also clustering techniques.
Need for dimensionality reduction
Principal Component Analysis (PCA)
Difference between PCAs and Latent Factors
Factor Analysis
Hierarchical, K-means & DBSCAN Clustering
7. Artificial Intelligence & Neural Networks in Python
Artificial Neural Networks are the building blocks of artificial intelligence. Learn the techniques which replicate how the human brain works and create machines which can solve problems like humans.
Introduction to Neural Networks
Single layer neural network
Multiple layer Neural network
Neural Networks Implementation in Python