Skip to content

Commit

Permalink
Github page documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
vianneybacoup committed Feb 12, 2024
1 parent b2931e9 commit ae14d02
Show file tree
Hide file tree
Showing 12 changed files with 282 additions and 0 deletions.
19 changes: 19 additions & 0 deletions .github/workflows/publish-docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
name: Publish docs via GitHub Pages
on:
push:
branches: [ "docs" ]
paths:
- docs
- mkdocs.yml

jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v2
- name: Insall mkdocs
run: pip install --upgrade pip && pip install mkdocs mkdocs-gen-files
- run: git config user.name 'github-actions[bot]' && git config user.email 'github-actions[bot]@users.noreply.github.com'
- name: Publish docs
run: mkdocs gh-deploy --force
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
# Generated
/target
*.parquet
/site

# Coverage
*.profraw
Expand Down
15 changes: 15 additions & 0 deletions docs/columns/options.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
Optional Parameters
-------
On top of the provider's logic, you can apply options

### Presence
```yaml
- name: column_name
provider: Any.provider
presence: 0.8
```
Adds a percentage of presence to the column: with missing values in the result.
Default value is **1**, or always present.
The parameter should be set between 0 and 1, otherwise it will be set to the closest.
In this example, 80% of the column will be generated, 20% will be missing.
13 changes: 13 additions & 0 deletions docs/columns/providers/increment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Increment provider
-------

### integer
```yaml
- name: adding_one_to_integer
provider: Increment.integer
start: 100
```
Increment an integer by one each row.
It starts from the optional parameter **start**. Default is 0.
[Options](../options.md) are also possible.
6 changes: 6 additions & 0 deletions docs/columns/providers/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Providers
-----

- [Increment](increment.md)
- [Person](person.md)
- [Random](random.md)
33 changes: 33 additions & 0 deletions docs/columns/providers/person.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
Person provider
-------

### email
```yaml
- name: email
provider: Person.email
domain: soma-smart.com
```
Create a random email with:
- random string of length 10 for the local-part
- optional **domain** parameter. Default is "example.com"
[Options](../options.md) are also possible.
### fname
```yaml
- name: first_name_in_top_1000_fr
provider: Person.fname
```
Returns a random first name from top 1000 french list.
[Options](../options.md) are also possible.
### lname
```yaml
- name: last_name_in_top_1000_fr
provider: Person.lname
```
Returns a random last name from top 1000 french list.
[Options](../options.md) are also possible.
29 changes: 29 additions & 0 deletions docs/columns/providers/random.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
Random provider
------

### Date
##### date
```yaml
- name: created
provider: Random.Date.date
format: "%m-%d-%Y"
after: 02-15-2000
before: 07-17-2020
```
Create a random date with:
- an optional parameter **format**. Default is "%Y-%m-%d"
- an optional parameter **after** as a lower boundary. It should follow the **format** parameter. Default is 1980-01-01
- an optional parameter **before** as a upper boundary. It should follow the **format** parameter. Default is 2000-01-01
[Options](../options.md) are also possible.
### String
##### alphanumeric
```yaml
- name: string_code
provider: Random.String.alphanumeric
```
Create a random string of length 10, with only Alphanumerics characters.
[Options](../options.md) are also possible.
34 changes: 34 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
Fakelake
---------
FakeLake is a command line tool that generates fake data from a YAML schema.

##### Example
Here is a YAML file that will generate 1 millions rows with 4 columns.
```yaml
columns:
- name: id
provider: Increment.integer
start: 42
presence: 0.8

- name: first_name
provider: Person.fname

- name: company_email
provider: Person.email
domain: soma-smart.com

- name: created
provider: Random.Date.date
format: "%Y-%m-%d"
after: 2000-02-15
before: 2020-07-17

info:
output_name: all_options
output_format: parquet
rows: 1_000_000
```
[Click here](usage/create_your_yaml_file.md) to create your YAML file.
[Click here](usage/generate.md) to generate from a YAML file.
29 changes: 29 additions & 0 deletions docs/output/parameters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
Output parameters
--------------

### Generate file name
To change the name of the generated file, use output_name
```yaml
info:
output_name: generate_file_name
```
### Format
To choose the format of the generated file, use output_format.
##### Parquet
```yaml
info:
output_format: parquet
```
### Rows
To choose the number of rows in the generated file, use rows.
```yaml
info:
rows: 1000000
```
It can also be written with delimiters for readibilty.
```yaml
info:
rows: 1_000_000
```
44 changes: 44 additions & 0 deletions docs/usage/create_your_yaml_file.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
Create your YAML file
--------------------

A YAML file for Fakelake is composed of two parts:

#### Columns
A list of columns with a name, a provider, provider's parameters and [options](../columns/options.md).<br/>
[Click here](../columns/providers/index.md) for the list of available providers.

Example of a file with one column:
```yaml
columns:
- name: unique_id
provider: Increment.integer
start: 100
```
#### Info
To setup the generated file, see [here](../output/parameters.md).
Example of a parquet file of 10 million rows:
```yaml
info:
output_name: generated_file
output_format: parquet
rows: 10_000_000
```
#### Example
```yaml
columns:
- name: unique_id
provider: Increment.integer
start: 100

info:
output_name: generated_file
output_format: parquet
rows: 10_000_000
```
That's it ! This is enough to generate a parquet file.
Next step, generate it.
33 changes: 33 additions & 0 deletions docs/usage/generate.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
Generate data
---------------

Now that you have a yaml file ready to be used (cf [here](create_your_yaml_file.md)), we can generate it using fakelake.

### Get executable
##### With precompiled binaries

Download the latest release from [here](https://github.com/soma-smart/Fakelake/releases)

```bash
tar -xvf Fakelake_<version>_<target>.tar.gz
./fakelake --help
```

##### From source
```bash
git clone https://github.com/soma-smart/Fakelake
cd fakelake
cargo build --release
./target/release/fakelake --help
```

### Generate
To generate from one YAML file you can use:
```bash
fakelake generate config_file.yaml
```

You can also chain the files to generate multiples:
```bash
fakelake generate first_file.yaml second_file.yaml
```
26 changes: 26 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
site_name: Fakelake Documentation
site_url: https://soma-smart.github.io/Fakelake/
site_author: SOMA
site_description: Documentation about FakeLake, a command line tool that generates fake data from a YAML schema.

theme:
name: readthedocs
locale: en
highlightjs: true
hljs_languages:
- yaml

nav:
- 'Home': 'index.md'
- 'Usage':
- 'usage/create_your_yaml_file.md'
- 'usage/generate.md'
- 'Columns':
- 'Providers':
- 'columns/providers/increment.md'
- 'columns/providers/person.md'
- 'columns/providers/random.md'
- 'columns/options.md'
- 'Output':
- 'output/parameters.md'

0 comments on commit ae14d02

Please sign in to comment.