Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

instructions for ccgbank data gathered from disc? #3

Open
johnvblazic opened this issue Apr 28, 2017 · 5 comments
Open

instructions for ccgbank data gathered from disc? #3

johnvblazic opened this issue Apr 28, 2017 · 5 comments

Comments

@johnvblazic
Copy link

Hi,

Our university has the disc copies of the CCG bank and I don't have access to the online versions of the data. I pulled the data from the call signature that appears in the link, and the data that I've gathered appears to be the same format as the sample provided in the link. I can't tell from the code or the readme what the directory structure of "ccgbank_1_1" is. So far, I've tried putting the "data" directory that I found in the ccgbank downlown in that directory, I have also tried putting the AUTO/HTML/LEX/PARG/RAW directories in the ccgbank_1_1 directory as well.

Any guidance you could provide would be extremely helpful.

I'm consistently getting the following error:

12:54:10 | ERROR | c.g.k.p.core.Stage | Job failed.
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1967) ~[na:1.8.0_121]
at edu.uw.easysrl.corpora.CCGBankDependencies.getDependencyParseCCGBank(CCGBankDependencies.java:386) ~[EasySRL-d69cb6e7d99595372df8dda65b7e975b21f18c37.jar:na]
at edu.uw.easysrl.corpora.CCGBankDependencies.getDependencyParses(CCGBankDependencies.java:364) ~[EasySRL-d69cb6e7d99595372df8dda65b7e975b21f18c37.jar:na]
at edu.uw.easysrl.corpora.CCGBankDependencies.loadCorpus(CCGBankDependencies.java:349) ~[EasySRL-d69cb6e7d99595372df8dda65b7e975b21f18c37.jar:na]
at edu.uw.neuralccg.task.CCGBankReaderTask.parseStream(CCGBankReaderTask.java:19) ~[classes/:na]
at edu.uw.neuralccg.task.CCGBankReaderTask.run(CCGBankReaderTask.java:34) ~[classes/:na]
at com.github.kentonl.pipegraph.core.Stage.run(Stage.java:195) ~[pipegraph-bb781b4c3496e98c337a030d98b81f31490ab0f4.jar:na]
at com.github.kentonl.pipegraph.runner.AsynchronousPipegraphRunner.run(AsynchronousPipegraphRunner.java:43) [pipegraph-bb781b4c3496e98c337a030d98b81f31490ab0f4.jar:na]
at com.github.kentonl.pipegraph.runner.AsynchronousPipegraphRunner.lambda$null$1(AsynchronousPipegraphRunner.java:61) [pipegraph-bb781b4c3496e98c337a030d98b81f31490ab0f4.jar:na]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121]

@johnvblazic
Copy link
Author

I've checked the pipegraph logs, the pipegraph code, the EasySRL code, and the .conf files for the neuralccg project, but I can't find any reference to the file path that it is failing on other than /data/ccgbank_1_1 in the .conf file

@kentonl
Copy link
Member

kentonl commented Apr 29, 2017

The files should be set up such that the famous Pierre Vinken example (first sentence of the dev set) can be found via this path:
neuralccg/data/ccgbank_1_1/data/AUTO/00/wsj_0001.auto

Does this match something you've tried?

@johnvblazic
Copy link
Author

yeah, that was where i've started and i've been trying permutations since. the demo works just fine, i'm currently trying to get the training module running with the following command,

./run.sh experiments/train.conf train 8080

@johnvblazic
Copy link
Author

Is there any way I can find the file path it is failing on?

@kentonl
Copy link
Member

kentonl commented Apr 29, 2017

It looks like the failure is happening here: https://github.com/kentonl/EasySRL/blob/maven/src/edu/uw/easysrl/corpora/CCGBankDependencies.java#L386

There is likely a mismatch between the content of the online version and the disc version of CCGBank. You can debug and/or apply temporary fixes by cloning the maven branch of https://github.com/kentonl/EasySRL. After running mvn install with local edits, neuralccg should use the updated code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants