Skip to content

Commit

Permalink
Update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
pwwang committed Apr 28, 2021
1 parent 380be6d commit 1ce0a98
Show file tree
Hide file tree
Showing 7 changed files with 228 additions and 5 deletions.
8 changes: 3 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# datar

Port of R data packages ([tidyr][1], [dplyr][2], [tibble][4], etc) in python, using [pipda][3].
Port of R data packages (especially from tidyverse): [tidyr][1], [dplyr][2], [tibble][4] and so on in python, using [pipda][3].

Unlike other similar packages in python that just mimic the piping sign, `datar` follows the API designs from the original packages as possible. So that nearly no extra effort is needed for those who are familar with those R packages to transition to python.

<!-- badges -->

Expand All @@ -12,10 +14,6 @@ Port of R data packages ([tidyr][1], [dplyr][2], [tibble][4], etc) in python, us
pip install -U datar
```

## Philosophy
- Try to keep APIs with the original ones from those R packages
- Try not to change python's default behaviors (i.e, 0-based indexing)

## Example usage

```python
Expand Down
3 changes: 3 additions & 0 deletions docs/TODO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@

- Add tests for tidyr from original tidyverse/tidyr cases
- Add more tests for base/core
82 changes: 82 additions & 0 deletions docs/datasets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@

Datasets have to be imported individually by:
```python
from datar.datasets import iris

# or
from datar import datasets

iris = datasets.iris
```

To list all avaiable datasets:

```python
from datar import datasets
print(datasets.all_datasets())

# {'airquality': {'file': PosixPath('/path/to/datar/datasets/airquality.csv.gz'),
# 'index': False},
# 'anscombe': {'file': PosixPath('/path/to/datar/datasets/anscombe.csv.gz'),
# 'index': False},
# 'band_instruments': {'file': PosixPath('/path/to/datar/datasets/band_instruments.csv.gz'),
# 'index': False},
# 'band_instruments2': {'file': PosixPath('/path/to/datar/datasets/band_instruments2.csv.gz'),
# 'index': False},
# 'band_members': {'file': PosixPath('/path/to/datar/datasets/band_members.csv.gz'),
# 'index': False},
# 'billboard': {'file': PosixPath('/path/to/datar/datasets/billboard.csv.gz'),
# 'index': False},
# 'construction': {'file': PosixPath('/path/to/datar/datasets/construction.csv.gz'),
# 'index': False},
# 'diamonds': {'file': PosixPath('/path/to/datar/datasets/diamonds.csv.gz'),
# 'index': False},
# 'fish_encounters': {'file': PosixPath('/path/to/datar/datasets/fish_encounters.csv.gz'),
# 'index': False},
# 'iris': {'file': PosixPath('/path/to/datar/datasets/iris.csv.gz'),
# 'index': False},
# 'mtcars': {'file': PosixPath('/path/to/datar/datasets/mtcars.indexed.csv.gz'),
# 'index': True},
# 'population': {'file': PosixPath('/path/to/datar/datasets/population.csv.gz'),
# 'index': False},
# 'relig_income': {'file': PosixPath('/path/to/datar/datasets/relig_income.csv.gz'),
# 'index': False},
# 'smiths': {'file': PosixPath('/path/to/datar/datasets/smiths.csv.gz'),
# 'index': False},
# 'starwars': {'file': PosixPath('/path/to/datar/datasets/starwars.csv.gz'),
# 'index': False},
# 'state_abb': {'file': PosixPath('/path/to/datar/datasets/state_abb.csv.gz'),
# 'index': False},
# 'state_division': {'file': PosixPath('/path/to/datar/datasets/state_division.csv.gz'),
# 'index': False},
# 'state_region': {'file': PosixPath('/path/to/datar/datasets/state_region.csv.gz'),
# 'index': False},
# 'storms': {'file': PosixPath('/path/to/datar/datasets/storms.csv.gz'),
# 'index': False},
# 'table1': {'file': PosixPath('/path/to/datar/datasets/table1.csv.gz'),
# 'index': False},
# 'table2': {'file': PosixPath('/path/to/datar/datasets/table2.csv.gz'),
# 'index': False},
# 'table3': {'file': PosixPath('/path/to/datar/datasets/table3.csv.gz'),
# 'index': False},
# 'table4a': {'file': PosixPath('/path/to/datar/datasets/table4a.csv.gz'),
# 'index': False},
# 'table4b': {'file': PosixPath('/path/to/datar/datasets/table4b.csv.gz'),
# 'index': False},
# 'table5': {'file': PosixPath('/path/to/datar/datasets/table5.csv.gz'),
# 'index': False},
# 'us_rent_income': {'file': PosixPath('/path/to/datar/datasets/us_rent_income.csv.gz'),
# 'index': False},
# 'warpbreaks': {'file': PosixPath('/path/to/datar/datasets/warpbreaks.csv.gz'),
# 'index': False},
# 'who': {'file': PosixPath('/path/to/datar/datasets/who.csv.gz'),
# 'index': False},
# 'world_bank_pop': {'file': PosixPath('/path/to/datar/datasets/world_bank_pop.csv.gz'),
# 'index': False}}
```

`file` shows the path to the csv file of the dataset, and `index` shows if it has index (rownames).

!!! Note

The column names are altered by replace `.` to `_`. For example `Sepal.Width` to `Sepal_Width`.
39 changes: 39 additions & 0 deletions docs/f.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
## The `Symbolic` object `f`

You can import it by `from datar.core import f`, or `from datar.all import *`

`f` is a universal `Symbolic` object, which does the magic to connect the expressions in verb arguments so that they can be delayed to execute.

There are different uses for the `f`.

- Use as a proxy to refer to dataframe columns (i.e. `f.x`, `f['x']`)
- Use as a slice container. For example:
- `f[:3]` for `range(0,3)`
- `f[f.x:f.z]` for columns from `x` to `z`, inclusively. If you want to exclude the `stop` column: `f[f.x:f.z:0]`
- Use as the column name marker for `tribble`:
```python
tribble(
f.x, f.y
1, 2
3, 4
)
```

Sometimes if you have mixed verbs with piping and you want to distinguish to proxies for different verbs:

```python
# you can just replicate f with a different name
g = f

df = tibble(x=1, y=2)
df >> left_join(df >> group_by(f.x), by=g.y)
```

Or you can instantiate a new `Symbolic` object:
```python
from pipda.symbolic import Symbolic

g = Symbolic()

# f and g make no difference in execution technically
```
46 changes: 46 additions & 0 deletions docs/import.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
## Import submodule, verbs and functions from datar

You can import everything (all verbs and functions) from datar by:
```python
from datar.all import *
```

which is not recommended. Instead, you can import individual verbs or functions by:
```python
from datar.all import mutate
```

!!! Attention

When you use `from datar.all import *`, you need to pay attention to the python builtin names that are covered by `datar`. For example, `slice` will be `datar.dplyr.slice` instead of `builtins.slice`. To refer to the builtin one, you need to:
```python
import builtins

s = builtins.slice(None, 3, None) # [:3]
```

Or if you know the origin of the verb, you can also do:
```python
from datar.dplyr import mutate
```

You can also keep the namespace:
```python
from datar import dplyr

# df = tibble(x=1)
# then use it with the dplyr namespace:
df >> dplyr.mutate(y=2)
```

## Import datasets from datar

Note that `from datar.all import *` will not import datasets

!!! note

Dataset has to be imported individually. So that `from datar.datasets import *` won't work.

You don't have to worry about other datasets to be imported and take up the memory when you import one. The dataset is only loaded into memory when you explictly import it individually.

See also [datasets](../datasets) for details about available datasets.
50 changes: 50 additions & 0 deletions docs/piping_vs_regular.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@

A verb can be called in a piping form:
```python
df >> verb(...)
```

Or in a regular way:
```python
verb(df, ...)
```

The piping is recommended and is designed specially to enable full features of `datar`.

The regular form of verb calling is limited when an argument is calling a function that is registered requiring the data argument. For example:

```python
df >> head(n=10)
head(df, n=10) # same
```

However,
```python
df >> select(everything()) # works
select(df, everything()) # not working
```
Since `everything` is registered requiring the first argument to be a data frame. With the regular form, we are not able (or need too much effort) to obtain the data frame, but for the piping form, `pipda` is designed to pass the data piped to the verb and every argument of it.

The functions registered by `register_func` are supposed to be used as arguments of verbs. However, they have to be used with the right signature. For example, `everything` signature has `_data` as the first argument, to be called regularly:
```python
everything(df)
# everything() not working, everything of what?
```

When the functions are registered by `register_func(None, ...)`, which does not require the data argument, they are able to be used in regular form:

```python
from datar.core import f
from datar.base import abs
from datar.tibble import tibble
from datar.dplyr import mutate

df = tibble(x=[-1,-2,-3])
df >> mutate(y=abs(f.x))
# x y
# 0 -1 1
# 1 -2 2
# 2 -3 3

mutate(df, abs(f.x)) # works the same way
```
5 changes: 5 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,10 @@ extra_css:
- style.css
nav:
- 'Home': 'index.md'
- 'Import': 'import.md'
- 'The f': 'f.md'
- 'Piping vs regular calling': 'piping_vs_regular.md'
- 'Datasets': 'datasets.md'
- 'API': 'mkapi/api/datar'
- 'Examples':
'across': 'notebooks/across.ipynb'
Expand Down Expand Up @@ -76,4 +80,5 @@ nav:
'uncount': 'notebooks/uncount.ipynb'
'unite': 'notebooks/unite.ipynb'
'with_groups': 'notebooks/with_groups.ipynb'
- 'TODO': 'TODO.md'
- 'Change Log': CHANGELOG.md

0 comments on commit 1ce0a98

Please sign in to comment.