-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathdatastores.txt
62 lines (52 loc) · 2.65 KB
/
datastores.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
Goals
- make students comfortable with asking questions about how they are accessing their data
- give students a broad survey of what is possible and enough concepts that they can dig into whatever might be applicable to their environment
- feel empowered - you are in charge of your data, In Soviet Russia, your data is is charge of you
- threee stages: get it working, get it working right, get it working quickly
- focus here on making it possible to "get it working" by providing samples for different methods of data access
- learn concepts that enable asking more informed questions when you undertake a data project
- "A beginning is a very delicate time" - Dune quote - the way you think about accessing your data influences the rest of your project
- two phases of a software process to consider: development and operations
- here, we want to be comfortable with development with a mind towards adapting what we develop into something that
other people won't regret we wrote...
What is a datastore?
- place where data is stored, organized, and persisted... what does this mean?
- way to abstract "storage details" from an application
- lots of different concerns/use cases result in lots of different ways to read and write data
- does this matter? yes and no... if you're reading data, there are some things you might care about:
- where it comes from
- what its "shape" is
- performance/storage
- examples:
- file
- SQL databases
- No-SQL databases
- API
What is an API?
- stands for Application Programming Interface
- in short, a way for a program (application) to talk to another program programmatically
- memory based (e.g programs importing logic/capabilities from other programs)
- importing libraries
- talking to the operating system
- network level protocols (difference between protocol and API... we'll pretend there are none for the sake of this
discussion, but I'm happy to discuss more offline)
- standard APIs (e.g. HTTP, ODBC, FTP, etc.)
- custom ways to access functionality exposed via networking protocols
- typically data you might get over the Internet for a certain dataset or website, tailored for accessing that content
- this brings us to data formats (which applies to files as well)
Data Formats
- assumes you have access to your "pile" of data
- formats are how you turn that "pile" into something intelligible
- CSV
- JSON
- XML
- XLS
- binary
- roll your own
- based on format, you will probably want some form of interpreter
Great... where's the code?
* csv_examples.py
* xlsx_examples.py
* db_examples.py
* api_examples.py
* other sources of data - web pages (see Web Scraping talk later)