layout | title | description | group | order |
---|---|---|---|---|
page |
Computer Setup |
Setup Your Data Science Environment |
navigation |
2 |
{% include JB/setup %}
-
First things first. Your terminal program allows you to type commands to control your computer. On a Mac, you can open the Terminal by going to your Applications screen and selecting Terminal (it might be in the folder named "Other"). Or, you can open Spotlight (Cmd + Space) and type "Terminal".
-
First, let's install
brew
if you haven't done that yet. Homebrew is a program that allows you to easily install other software on OSX. In your terminal, run:# This downloads the Ruby code of the installation script and runs it /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
Verify your installation by making sure
brew --version
doesn't error at your terminal. -
Next, install
wget
. This is a command-line tool that lets you download files / webpages at the command line.# Uses Homebrew to install wget brew install wget
-
Download the Anaconda installation script:
# Uses wget to download the installation script, naming it install_anaconda.sh wget -O install_anaconda.sh https://repo.continuum.io/archive/Anaconda3-4.2.0-MacOSX-x86_64.sh
-
Install Anaconda:
# Run the installation script bash install_anaconda.sh
Ensure the installation worked by running
conda --version
. -
Run these commands to create a new conda environment. Each conda environment has its own package versions. This allows us to switch between package versions easily. For example, this class uses Python 3, but you might have another that uses Python 2. With a conda environment, you can switch between those at will.
# Create a conda env called ds100 that uses python 3.5 conda create --name ds100 python=3.5 # Switch to the ds100 environment source activate ds100 # Install the packages for ds100 conda install -n ds100 jupyter pandas numpy matplotlib scikit-learn seaborn scikit-image pip install datascience okpy
From now on, you can switch to the
ds100
env withsource activate ds100
, and switch back to the default env withsource deactivate
. -
Now, use
brew
to install the latest version ofgit
by running:brew install git
Ensure that
git
is installed by runninggit --version
. The version should be 2.5.0 or higher.
You may remove the install_anaconda.sh
script now if you'd like since it's
quite large.
Click here to continue to the next part of the setup.
Getting set up on Windows is especially prone to error if you aren't careful
about your configuration. If you've already had Anaconda or git
installed and
can't get the other to work, try uninstalling everything and starting from
scratch.
-
Visit https://www.continuum.io/downloads#windows and download the installer for Python 3.5. Download the 64-bit installer if your computer is 64-bit (more likely), the 32-bit installer if not. You can Google how to check whether your computer is 64 or 32 bit.
-
Leave all the options as default (install for all users, in the default location). Make sure both of these checkboxes are checked:
-
Install.
-
Verify that the installation is working by starting the Anaconda Prompt (you should be able to start it from the Start Menu) and typing
python
:Notice how the
python
prompt shows that it is running from Anaconda. Now you haveconda
installed!From now on, when we talk about the "Terminal" or "Command Prompt", we are referring to the Anaconda Prompt that you just installed.
-
Run these commands to create a new conda environment. Each conda environment has its own package versions. This allows us to switch between package versions easily. For example, this class uses Python 3, but you might have another that uses Python 2. With a conda environment, you can switch between those at will.
# Create a conda env called ds100 that uses python 3.5 conda create --name ds100 python=3.5 # Switch to the ds100 environment activate ds100 # Install the packages for ds100 conda install -n ds100 jupyter pandas numpy matplotlib scikit-learn seaborn scikit-image pip install datascience okpy
From now on, you can switch to the
ds100
env withactivate ds100
, and switch back to the default env withdeactivate
.
-
You might already have
git
installed. Typegit
at the Anaconda Prompt. If that works, then you can skip these steps. Otherwise, you'll something that looks like: -
At the anaconda prompt, type:
# Use anaconda to install git conda install -c anaconda git -y
-
Now, verify that
git
is installed by typinggit --version
on the command line. You should see output that looks like:
Click here to continue to the next part of the setup.
These instructions assume you have apt-get
(Ubuntu and Debian).
For other distributions of Linux, substitute the available package manager.
-
You likely already know this if you're running Linux, but just in case: your terminal program allows you to type commands to control your computer. On Linux, you can open the Terminal by going to the Applications menu and clicking "Terminal".
-
Install
wget
. This is a command-line tool that lets you download files / webpages at the command line.sudo apt-get install wget
-
Download the Anaconda installation script:
wget -O install_anaconda.sh https://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh
If you have a 32-bit operating system, use this command instead.
wget -O install_anaconda.sh https://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86.sh
-
Install Anaconda:
bash install_anaconda.sh
Ensure the installation worked by running
conda --version
. -
Run these commands to create a new conda environment. Each conda environment has its own package versions. This allows us to switch between package versions easily. For example, this class uses Python 3, but you might have another that uses Python 2. With a conda environment, you can switch between those at will.
# Create a conda env called ds100 that uses python 3.5 conda create --name ds100 python=3.5 # Switch to the ds100 environment source activate ds100 # Install the packages for ds100 conda install -n ds100 jupyter pandas numpy matplotlib scikit-learn seaborn scikit-image pip install datascience okpy
From now on, you can switch to the
ds100
env withsource activate ds100
, and switch back to the default env withsource deactivate
. -
Now, install the latest version of
git
:sudo add-apt-repository ppa:git-core/ppa sudo apt-get update sudo apt-get install git
Ensure that
git
is installed by runninggit --version
. The version should be 2.5.0 or higher.
You may remove the install_anaconda.sh
script now if you'd like since it's
quite large.
Click here to continue to the next part of the setup.
These instructions are the same for OSX, Windows, and Linux.
Now, let's download the course materials so you can start working on assignments.
-
Visit https://github.com/ and log in / create an account if you don't already have one.
-
Visit https://ds100-repo.herokuapp.com/ and fill out the form to create a private repo to hold all of your work this semester. Bookmark the GitHub URL because you'll be using it soon.
-
In your terminal, navigate to the directory you want to put the DS100 assignments in.
-
Run the following commands. Replace
<URL_OF_YOUR_PRIVATE_REPO>
with the URL of your repo (eg.https://github.com/DS-100/s0001
).# Download the repo git clone https://github.com/DS-100/sp17-materials # Enter the repo folder cd sp17-materials # Rename the origin remote to ds100 git remote rename origin ds100 # Set the origin remote to your repo git remote add origin <URL_OF_YOUR_PRIVATE_REPO>
This should download a copy of the course materials (including this homework)
onto your personal computer and set up git
remotes so that you can pull
released assignments from the staff and push your personal work to your private
repo.
Now, when you want to pull new/updated assignments, you can run:
# Make a work-in-progress commit since git doesn't allow pulling when you
# have uncommited modifications
git commit -am "WIP"
# Get updates from the course repo. The options here tell git to override
# any conflicts in the files with what you currently have so that your work
# is never erased.
git pull -s recursive -X ours --no-edit ds100 master
And when you want to push your work to your private repo:
# Send updates to your personal private git repo
git push origin master
To read up on how git remotes work, check out this page from the git
tutorials and
this Stack Overflow post.
If you're still confused, Google your question or ask a TA.
To open Jupyter notebooks, you'll navigate to the sp17-materials
directory and run:
jupyter notebook
This will automatically open the notebook interface in your browser. You can then browse to a notebook and open it.
Finally, let's open a notebook that will check to see whether you've installed everything correctly.
In your sp17-materials
directory, ensure that you are in the ds100
conda
environment by running source activate ds100
on OSX / Linux or source activate
on Windows. Then, run git pull ds100 master
and then jupyter notebook
.
Now, open the test_setup.ipynb
notebook. If you've installed everything
correctly, all the cells should run without error.
Congrats! You've set up your computer for DS100 work.