Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not generating JSON entries when running #1

Open
BieniekAlexander opened this issue Jul 28, 2022 · 3 comments
Open

Not generating JSON entries when running #1

BieniekAlexander opened this issue Jul 28, 2022 · 3 comments

Comments

@BieniekAlexander
Copy link

Issue
When I run the pages-to-json.rb script, the files that I'm seeing generated in the output directory are just JSON files with no entries.

What should happen
Data files produced in the output directory should have data in them.

What happened instead?
Files that I get in the output are empty JSON files:

~/learning/cantodict-archive/output$ tail -n +1 *.json
==> detail-characters.json <==
{}
==> detail-compounds.json <==
{}
==> detail-sentences.json <==
{}
==> summary-characters.json <==
version https://git-lfs.github.com/spec/v1
oid sha256:5980dbd57be32e9f069887e02adc90073d71a24059fa62fa51616095b40e7d42
size 2596859

==> summary-compounds.json <==
version https://git-lfs.github.com/spec/v1
oid sha256:3c636f110a6eb40d95856c7d2095694338a4150a53e484a12b5c9e0ed55831bd
size 31379078

==> summary-sentences.json <==
version https://git-lfs.github.com/spec/v1
oid sha256:8ebaa753d1111eab642b09e914828ea01d9c7939c944ec85d661d13b8bea62fc
size 892314

Steps to reproduce

#!/bin/bash
# clone repository
# install ruby from scratch
bundle config set --local path 'vendor'
bundle install
bundle exec ruby pages-to-json.rb

I'd actually been trying to scrape the website myself, so if I can just use your work instead, it would save me a lot of time.

@awong-dev
Copy link
Owner

Interesting... are you getting the full set of files from data/detail?

@awong-dev
Copy link
Owner

also, it seems like you don't have git lfs installed? The output files are stored in lfs. That's why you're just getting a sha hash in the summary-* json files.

I'd love to figure out why you can't regenerate the files as well, but if you install git-lfs, you should be able to just download a snapshot of the current copy.

@BieniekAlexander
Copy link
Author

@awong-dev
Hi, sorry for my very delayed response. I didn't have git-lfs installed, so I'd imagine that was part of why I wasn't generating data. However, now that I've installed it, I'm seeing the following issue regarding some quota:

(base) alex@espresso:~/projects/cantodict-archive$ git reset --hard
Downloading output/detail-characters.json (6.8 MB)
Error downloading object: output/detail-characters.json (d4e3bcb): Smudge error: Error downloading output/detail-characters.json (d4e3bcbc6cc89cc083af7096877c8ed1848eb1ac8ca155b8bba924bf5263f100): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

Errors logged to /home/alex/projects/cantodict-archive/.git/lfs/logs/20221025T115037.855897176.log
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: output/detail-characters.json: smudge filter lfs failed

It looks like this will be blocking me. :(

On that note of git-lfs, can you add installation of this tool to the README? I wasn't aware of the utility, and I would imagine that it's not something that every developer will have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants