Python Data Science Toolbox
This is my personal open notebook for foundamental Data Science in Python at work place. Unlike many books, this notebook is quite superficial. It covers major steps of doing simple data analysis and a lot of simple code examples. I learnt the those tools from many sources, welcome to browse and edit this notebook. This notebook is just a remainder for what is available out there. It's under development.
You can visit this handbook for technical details. I recommend taking Coursera's ML courses by Andrew for beginners who want to learn the foundamentals of ML.
There are many great sources to learn Data Science, and here are some advice to dive into this field quickly:
- Get some basic foundation in statistics and python. See www.hackerrank.com.
- Get to know the general idea and areas of knowledge about data science.
- Practice as you go. Google or Bing any term or question you don't understand. Stackoverflow and supporting documents for specific packages and functions are your best friend. During this process, do not lose sight of the big picture of data science.
- Become a better data scientiest by doing more projects! (Don't try to memorize these tools, just do data science!)
-
- Getting Data
- By Loading Files
- From APIs
- From SQL database
- Organizing Data
- Take a First Look
- Data Cleaning
- Transform DataFrames
- Feature Engineering
- Getting Data
-
- Simple Data Visualization
- Simple Statistical Tools
-
- Visualizing High-Dimension Data
- Interative Data Visualization
-
- K-means
- Hierarchical clustering
-
- Classification via Logistic Regression
- Ensemble Learning
- XGBoost
- Pipelining
- Hyperparameter Tuning
- Basic Deep Learning
-
- very common git operations
-
- System Configurations
- File Management
- Exchanging Data
- Task Management
-
coding Best Practices with Python. Course link.
- Efficient Coding in Python
- Writing Efficient Code with pandas
- Writing Functions in Python
- Object-Oriented Programming in Python
- Robust Workflows in Python