Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find or generate a large dataset to test CLI tool exporting #88

Closed
tgadam opened this issue Aug 2, 2022 · 2 comments
Closed

Find or generate a large dataset to test CLI tool exporting #88

tgadam opened this issue Aug 2, 2022 · 2 comments
Assignees

Comments

@tgadam
Copy link
Contributor

tgadam commented Aug 2, 2022

Find or generate a large dataset to test CLI tool exporting. See #74 (comment)

@eabeliuk
Copy link
Contributor

We might use the Phycus randomized (and extended) dataset for this.

@AndresPerezTesela
Copy link
Member

AndresPerezTesela commented Oct 21, 2022

So far we've tested the CLI manifest generator with a 17,000 row data file. With this experiment, the following findings:

if the output file format is:

  • JSON:

    • the manifest generation is pretty fast
    • the manifest file compared to CSV data file is around 15x larger, which is kind of expected as a CSV file has much less information than the FSML manifest.
  • YAML:

    • the manifest generation is much slower than with JSON
    • the manifest file compared to CSV data file is around 8x

So essentially JSON is faster but its file size is larger, whereas YAML is slower but file size is smaller.

Bottom line, to attain unlimited scalability we might need to start exploring this other ticket:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants