-
Notifications
You must be signed in to change notification settings - Fork 18
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
7 changed files
with
228 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
|
||
- Add tests for tidyr from original tidyverse/tidyr cases | ||
- Add more tests for base/core |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
|
||
Datasets have to be imported individually by: | ||
```python | ||
from datar.datasets import iris | ||
|
||
# or | ||
from datar import datasets | ||
|
||
iris = datasets.iris | ||
``` | ||
|
||
To list all avaiable datasets: | ||
|
||
```python | ||
from datar import datasets | ||
print(datasets.all_datasets()) | ||
|
||
# {'airquality': {'file': PosixPath('/path/to/datar/datasets/airquality.csv.gz'), | ||
# 'index': False}, | ||
# 'anscombe': {'file': PosixPath('/path/to/datar/datasets/anscombe.csv.gz'), | ||
# 'index': False}, | ||
# 'band_instruments': {'file': PosixPath('/path/to/datar/datasets/band_instruments.csv.gz'), | ||
# 'index': False}, | ||
# 'band_instruments2': {'file': PosixPath('/path/to/datar/datasets/band_instruments2.csv.gz'), | ||
# 'index': False}, | ||
# 'band_members': {'file': PosixPath('/path/to/datar/datasets/band_members.csv.gz'), | ||
# 'index': False}, | ||
# 'billboard': {'file': PosixPath('/path/to/datar/datasets/billboard.csv.gz'), | ||
# 'index': False}, | ||
# 'construction': {'file': PosixPath('/path/to/datar/datasets/construction.csv.gz'), | ||
# 'index': False}, | ||
# 'diamonds': {'file': PosixPath('/path/to/datar/datasets/diamonds.csv.gz'), | ||
# 'index': False}, | ||
# 'fish_encounters': {'file': PosixPath('/path/to/datar/datasets/fish_encounters.csv.gz'), | ||
# 'index': False}, | ||
# 'iris': {'file': PosixPath('/path/to/datar/datasets/iris.csv.gz'), | ||
# 'index': False}, | ||
# 'mtcars': {'file': PosixPath('/path/to/datar/datasets/mtcars.indexed.csv.gz'), | ||
# 'index': True}, | ||
# 'population': {'file': PosixPath('/path/to/datar/datasets/population.csv.gz'), | ||
# 'index': False}, | ||
# 'relig_income': {'file': PosixPath('/path/to/datar/datasets/relig_income.csv.gz'), | ||
# 'index': False}, | ||
# 'smiths': {'file': PosixPath('/path/to/datar/datasets/smiths.csv.gz'), | ||
# 'index': False}, | ||
# 'starwars': {'file': PosixPath('/path/to/datar/datasets/starwars.csv.gz'), | ||
# 'index': False}, | ||
# 'state_abb': {'file': PosixPath('/path/to/datar/datasets/state_abb.csv.gz'), | ||
# 'index': False}, | ||
# 'state_division': {'file': PosixPath('/path/to/datar/datasets/state_division.csv.gz'), | ||
# 'index': False}, | ||
# 'state_region': {'file': PosixPath('/path/to/datar/datasets/state_region.csv.gz'), | ||
# 'index': False}, | ||
# 'storms': {'file': PosixPath('/path/to/datar/datasets/storms.csv.gz'), | ||
# 'index': False}, | ||
# 'table1': {'file': PosixPath('/path/to/datar/datasets/table1.csv.gz'), | ||
# 'index': False}, | ||
# 'table2': {'file': PosixPath('/path/to/datar/datasets/table2.csv.gz'), | ||
# 'index': False}, | ||
# 'table3': {'file': PosixPath('/path/to/datar/datasets/table3.csv.gz'), | ||
# 'index': False}, | ||
# 'table4a': {'file': PosixPath('/path/to/datar/datasets/table4a.csv.gz'), | ||
# 'index': False}, | ||
# 'table4b': {'file': PosixPath('/path/to/datar/datasets/table4b.csv.gz'), | ||
# 'index': False}, | ||
# 'table5': {'file': PosixPath('/path/to/datar/datasets/table5.csv.gz'), | ||
# 'index': False}, | ||
# 'us_rent_income': {'file': PosixPath('/path/to/datar/datasets/us_rent_income.csv.gz'), | ||
# 'index': False}, | ||
# 'warpbreaks': {'file': PosixPath('/path/to/datar/datasets/warpbreaks.csv.gz'), | ||
# 'index': False}, | ||
# 'who': {'file': PosixPath('/path/to/datar/datasets/who.csv.gz'), | ||
# 'index': False}, | ||
# 'world_bank_pop': {'file': PosixPath('/path/to/datar/datasets/world_bank_pop.csv.gz'), | ||
# 'index': False}} | ||
``` | ||
|
||
`file` shows the path to the csv file of the dataset, and `index` shows if it has index (rownames). | ||
|
||
!!! Note | ||
|
||
The column names are altered by replace `.` to `_`. For example `Sepal.Width` to `Sepal_Width`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
## The `Symbolic` object `f` | ||
|
||
You can import it by `from datar.core import f`, or `from datar.all import *` | ||
|
||
`f` is a universal `Symbolic` object, which does the magic to connect the expressions in verb arguments so that they can be delayed to execute. | ||
|
||
There are different uses for the `f`. | ||
|
||
- Use as a proxy to refer to dataframe columns (i.e. `f.x`, `f['x']`) | ||
- Use as a slice container. For example: | ||
- `f[:3]` for `range(0,3)` | ||
- `f[f.x:f.z]` for columns from `x` to `z`, inclusively. If you want to exclude the `stop` column: `f[f.x:f.z:0]` | ||
- Use as the column name marker for `tribble`: | ||
```python | ||
tribble( | ||
f.x, f.y | ||
1, 2 | ||
3, 4 | ||
) | ||
``` | ||
|
||
Sometimes if you have mixed verbs with piping and you want to distinguish to proxies for different verbs: | ||
|
||
```python | ||
# you can just replicate f with a different name | ||
g = f | ||
|
||
df = tibble(x=1, y=2) | ||
df >> left_join(df >> group_by(f.x), by=g.y) | ||
``` | ||
|
||
Or you can instantiate a new `Symbolic` object: | ||
```python | ||
from pipda.symbolic import Symbolic | ||
|
||
g = Symbolic() | ||
|
||
# f and g make no difference in execution technically | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
## Import submodule, verbs and functions from datar | ||
|
||
You can import everything (all verbs and functions) from datar by: | ||
```python | ||
from datar.all import * | ||
``` | ||
|
||
which is not recommended. Instead, you can import individual verbs or functions by: | ||
```python | ||
from datar.all import mutate | ||
``` | ||
|
||
!!! Attention | ||
|
||
When you use `from datar.all import *`, you need to pay attention to the python builtin names that are covered by `datar`. For example, `slice` will be `datar.dplyr.slice` instead of `builtins.slice`. To refer to the builtin one, you need to: | ||
```python | ||
import builtins | ||
|
||
s = builtins.slice(None, 3, None) # [:3] | ||
``` | ||
|
||
Or if you know the origin of the verb, you can also do: | ||
```python | ||
from datar.dplyr import mutate | ||
``` | ||
|
||
You can also keep the namespace: | ||
```python | ||
from datar import dplyr | ||
|
||
# df = tibble(x=1) | ||
# then use it with the dplyr namespace: | ||
df >> dplyr.mutate(y=2) | ||
``` | ||
|
||
## Import datasets from datar | ||
|
||
Note that `from datar.all import *` will not import datasets | ||
|
||
!!! note | ||
|
||
Dataset has to be imported individually. So that `from datar.datasets import *` won't work. | ||
|
||
You don't have to worry about other datasets to be imported and take up the memory when you import one. The dataset is only loaded into memory when you explictly import it individually. | ||
|
||
See also [datasets](../datasets) for details about available datasets. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
|
||
A verb can be called in a piping form: | ||
```python | ||
df >> verb(...) | ||
``` | ||
|
||
Or in a regular way: | ||
```python | ||
verb(df, ...) | ||
``` | ||
|
||
The piping is recommended and is designed specially to enable full features of `datar`. | ||
|
||
The regular form of verb calling is limited when an argument is calling a function that is registered requiring the data argument. For example: | ||
|
||
```python | ||
df >> head(n=10) | ||
head(df, n=10) # same | ||
``` | ||
|
||
However, | ||
```python | ||
df >> select(everything()) # works | ||
select(df, everything()) # not working | ||
``` | ||
Since `everything` is registered requiring the first argument to be a data frame. With the regular form, we are not able (or need too much effort) to obtain the data frame, but for the piping form, `pipda` is designed to pass the data piped to the verb and every argument of it. | ||
|
||
The functions registered by `register_func` are supposed to be used as arguments of verbs. However, they have to be used with the right signature. For example, `everything` signature has `_data` as the first argument, to be called regularly: | ||
```python | ||
everything(df) | ||
# everything() not working, everything of what? | ||
``` | ||
|
||
When the functions are registered by `register_func(None, ...)`, which does not require the data argument, they are able to be used in regular form: | ||
|
||
```python | ||
from datar.core import f | ||
from datar.base import abs | ||
from datar.tibble import tibble | ||
from datar.dplyr import mutate | ||
|
||
df = tibble(x=[-1,-2,-3]) | ||
df >> mutate(y=abs(f.x)) | ||
# x y | ||
# 0 -1 1 | ||
# 1 -2 2 | ||
# 2 -3 3 | ||
|
||
mutate(df, abs(f.x)) # works the same way | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters