-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
b2931e9
commit ae14d02
Showing
12 changed files
with
282 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
name: Publish docs via GitHub Pages | ||
on: | ||
push: | ||
branches: [ "docs" ] | ||
paths: | ||
- docs | ||
- mkdocs.yml | ||
|
||
jobs: | ||
build: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v3 | ||
- uses: actions/setup-python@v2 | ||
- name: Insall mkdocs | ||
run: pip install --upgrade pip && pip install mkdocs mkdocs-gen-files | ||
- run: git config user.name 'github-actions[bot]' && git config user.email 'github-actions[bot]@users.noreply.github.com' | ||
- name: Publish docs | ||
run: mkdocs gh-deploy --force |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,6 +5,7 @@ | |
# Generated | ||
/target | ||
*.parquet | ||
/site | ||
|
||
# Coverage | ||
*.profraw | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
Optional Parameters | ||
------- | ||
On top of the provider's logic, you can apply options | ||
|
||
### Presence | ||
```yaml | ||
- name: column_name | ||
provider: Any.provider | ||
presence: 0.8 | ||
``` | ||
Adds a percentage of presence to the column: with missing values in the result. | ||
Default value is **1**, or always present. | ||
The parameter should be set between 0 and 1, otherwise it will be set to the closest. | ||
In this example, 80% of the column will be generated, 20% will be missing. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
Increment provider | ||
------- | ||
|
||
### integer | ||
```yaml | ||
- name: adding_one_to_integer | ||
provider: Increment.integer | ||
start: 100 | ||
``` | ||
Increment an integer by one each row. | ||
It starts from the optional parameter **start**. Default is 0. | ||
[Options](../options.md) are also possible. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
Providers | ||
----- | ||
|
||
- [Increment](increment.md) | ||
- [Person](person.md) | ||
- [Random](random.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
Person provider | ||
------- | ||
|
||
```yaml | ||
- name: email | ||
provider: Person.email | ||
domain: soma-smart.com | ||
``` | ||
Create a random email with: | ||
- random string of length 10 for the local-part | ||
- optional **domain** parameter. Default is "example.com" | ||
[Options](../options.md) are also possible. | ||
### fname | ||
```yaml | ||
- name: first_name_in_top_1000_fr | ||
provider: Person.fname | ||
``` | ||
Returns a random first name from top 1000 french list. | ||
[Options](../options.md) are also possible. | ||
### lname | ||
```yaml | ||
- name: last_name_in_top_1000_fr | ||
provider: Person.lname | ||
``` | ||
Returns a random last name from top 1000 french list. | ||
[Options](../options.md) are also possible. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
Random provider | ||
------ | ||
|
||
### Date | ||
##### date | ||
```yaml | ||
- name: created | ||
provider: Random.Date.date | ||
format: "%m-%d-%Y" | ||
after: 02-15-2000 | ||
before: 07-17-2020 | ||
``` | ||
Create a random date with: | ||
- an optional parameter **format**. Default is "%Y-%m-%d" | ||
- an optional parameter **after** as a lower boundary. It should follow the **format** parameter. Default is 1980-01-01 | ||
- an optional parameter **before** as a upper boundary. It should follow the **format** parameter. Default is 2000-01-01 | ||
[Options](../options.md) are also possible. | ||
### String | ||
##### alphanumeric | ||
```yaml | ||
- name: string_code | ||
provider: Random.String.alphanumeric | ||
``` | ||
Create a random string of length 10, with only Alphanumerics characters. | ||
[Options](../options.md) are also possible. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
Fakelake | ||
--------- | ||
FakeLake is a command line tool that generates fake data from a YAML schema. | ||
|
||
##### Example | ||
Here is a YAML file that will generate 1 millions rows with 4 columns. | ||
```yaml | ||
columns: | ||
- name: id | ||
provider: Increment.integer | ||
start: 42 | ||
presence: 0.8 | ||
|
||
- name: first_name | ||
provider: Person.fname | ||
|
||
- name: company_email | ||
provider: Person.email | ||
domain: soma-smart.com | ||
|
||
- name: created | ||
provider: Random.Date.date | ||
format: "%Y-%m-%d" | ||
after: 2000-02-15 | ||
before: 2020-07-17 | ||
|
||
info: | ||
output_name: all_options | ||
output_format: parquet | ||
rows: 1_000_000 | ||
``` | ||
[Click here](usage/create_your_yaml_file.md) to create your YAML file. | ||
[Click here](usage/generate.md) to generate from a YAML file. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
Output parameters | ||
-------------- | ||
|
||
### Generate file name | ||
To change the name of the generated file, use output_name | ||
```yaml | ||
info: | ||
output_name: generate_file_name | ||
``` | ||
### Format | ||
To choose the format of the generated file, use output_format. | ||
##### Parquet | ||
```yaml | ||
info: | ||
output_format: parquet | ||
``` | ||
### Rows | ||
To choose the number of rows in the generated file, use rows. | ||
```yaml | ||
info: | ||
rows: 1000000 | ||
``` | ||
It can also be written with delimiters for readibilty. | ||
```yaml | ||
info: | ||
rows: 1_000_000 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
Create your YAML file | ||
-------------------- | ||
|
||
A YAML file for Fakelake is composed of two parts: | ||
|
||
#### Columns | ||
A list of columns with a name, a provider, provider's parameters and [options](../columns/options.md).<br/> | ||
[Click here](../columns/providers/index.md) for the list of available providers. | ||
|
||
Example of a file with one column: | ||
```yaml | ||
columns: | ||
- name: unique_id | ||
provider: Increment.integer | ||
start: 100 | ||
``` | ||
#### Info | ||
To setup the generated file, see [here](../output/parameters.md). | ||
Example of a parquet file of 10 million rows: | ||
```yaml | ||
info: | ||
output_name: generated_file | ||
output_format: parquet | ||
rows: 10_000_000 | ||
``` | ||
#### Example | ||
```yaml | ||
columns: | ||
- name: unique_id | ||
provider: Increment.integer | ||
start: 100 | ||
|
||
info: | ||
output_name: generated_file | ||
output_format: parquet | ||
rows: 10_000_000 | ||
``` | ||
That's it ! This is enough to generate a parquet file. | ||
Next step, generate it. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
Generate data | ||
--------------- | ||
|
||
Now that you have a yaml file ready to be used (cf [here](create_your_yaml_file.md)), we can generate it using fakelake. | ||
|
||
### Get executable | ||
##### With precompiled binaries | ||
|
||
Download the latest release from [here](https://github.com/soma-smart/Fakelake/releases) | ||
|
||
```bash | ||
tar -xvf Fakelake_<version>_<target>.tar.gz | ||
./fakelake --help | ||
``` | ||
|
||
##### From source | ||
```bash | ||
git clone https://github.com/soma-smart/Fakelake | ||
cd fakelake | ||
cargo build --release | ||
./target/release/fakelake --help | ||
``` | ||
|
||
### Generate | ||
To generate from one YAML file you can use: | ||
```bash | ||
fakelake generate config_file.yaml | ||
``` | ||
|
||
You can also chain the files to generate multiples: | ||
```bash | ||
fakelake generate first_file.yaml second_file.yaml | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
site_name: Fakelake Documentation | ||
site_url: https://soma-smart.github.io/Fakelake/ | ||
site_author: SOMA | ||
site_description: Documentation about FakeLake, a command line tool that generates fake data from a YAML schema. | ||
|
||
theme: | ||
name: readthedocs | ||
locale: en | ||
highlightjs: true | ||
hljs_languages: | ||
- yaml | ||
|
||
nav: | ||
- 'Home': 'index.md' | ||
- 'Usage': | ||
- 'usage/create_your_yaml_file.md' | ||
- 'usage/generate.md' | ||
- 'Columns': | ||
- 'Providers': | ||
- 'columns/providers/increment.md' | ||
- 'columns/providers/person.md' | ||
- 'columns/providers/random.md' | ||
- 'columns/options.md' | ||
- 'Output': | ||
- 'output/parameters.md' | ||
|