Skip to content

da-niao-dan/Python-DS-ToolBox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python-DS-ToolBox

Python Data Science Toolbox

What is this note book about

This is my personal open notebook for foundamental Data Science in Python at work place. Unlike many books, this notebook is quite superficial. It covers major steps of doing simple data analysis and a lot of simple code examples. I learnt the those tools from many sources, welcome to browse and edit this notebook. This notebook is just a remainder for what is available out there. It's under development.

You can visit this handbook for technical details. I recommend taking Coursera's ML courses by Andrew for beginners who want to learn the foundamentals of ML.

Quick advice

There are many great sources to learn Data Science, and here are some advice to dive into this field quickly:

  1. Get some basic foundation in statistics and python. See www.hackerrank.com.
  2. Get to know the general idea and areas of knowledge about data science.
  3. Practice as you go. Google or Bing any term or question you don't understand. Stackoverflow and supporting documents for specific packages and functions are your best friend. During this process, do not lose sight of the big picture of data science.
  4. Become a better data scientiest by doing more projects! (Don't try to memorize these tools, just do data science!)

Materials in this notebook

  1. Road Map in Business

  2. Environment Configuration

  3. Data Processing

    • Getting Data
      • By Loading Files
      • From APIs
      • From SQL database
    • Organizing Data
      • Take a First Look
      • Data Cleaning
      • Transform DataFrames
    • Feature Engineering
  4. Exploring Data

    • Simple Data Visualization
    • Simple Statistical Tools
  5. Communicating with Data

    • Visualizing High-Dimension Data
    • Interative Data Visualization
  6. Basic Unsupervised Learning

    • K-means
    • Hierarchical clustering
  7. Basic Supervised Learning

  8. Git

    • very common git operations
  9. Linux and Bash shells

    • System Configurations
    • File Management
    • Exchanging Data
    • Task Management
  10. Network Analysis

  11. PySpark

  12. DeploymentTools

  13. coding Best Practices with Python. Course link.

  14. Kaggle How

Releases

No releases published

Packages

No packages published