Skip to content
This repository was archived by the owner on Feb 14, 2018. It is now read-only.

Bug fix: Start symbols #1

Open
syllog1sm opened this issue Nov 18, 2013 · 5 comments
Open

Bug fix: Start symbols #1

syllog1sm opened this issue Nov 18, 2013 · 5 comments

Comments

@syllog1sm
Copy link
Collaborator

Found a bug in my implementation.

On line 72 of taggers.py we have this:

prev, prev2 = self.START

This initialises the tag history features to the dummy START symbol. These actually need to be reinitialised every sentence.

This bug is unlikely to significantly effect accuracy. What will happen currently is that the "previous tag" feature will almost always be set to ".", as the previous sentence probably ended with a period. The correct value is the dummy start symbol, but the features will likely reflect the same information.

If you fix the bug, you'll need to regenerate the model. This is the weakness of having a binary model committed to the repo...The old model is in the history now. Once we've made three or four updates, the repository will be quite large.

sloria added a commit that referenced this issue Nov 18, 2013
@sloria
Copy link
Owner

sloria commented Nov 18, 2013

Ok, I've made the change. Thanks for catching it. I haven't re-trained the model because I wasn't sure what corpus you originally used. Could you help me with this? BTW, I've added you as a collaborator so you have direct commit access. If you'd like to be maintainer of this, I will gladly pass over ownership of the repo to you.

If the size of the git file tree becomes too large, we can stop committing it to the repo and only publish it with the package on the PyPI. For the short-term, though, I think committing it will be fine so long as we limit the amount of re-trainings.

@syllog1sm
Copy link
Collaborator Author

Ah, don't make the change before we update the model! If the run-time
feature extraction doesn't match the feature extraction used during
training, accuracy often goes down substantially.

On Mon, Nov 18, 2013 at 4:01 PM, Steven Loria [email protected]:

Ok, I've made the change. Thanks for catching it. I haven't re-trained the
model because I wasn't sure what corpus you originally used. Could you help
me with this? BTW, I've added you as a collaborator so you have direct
commit access. If you'd like to be maintainer of this, I will gladly pass
over ownership of the repo to you.

If the size of the git file tree becomes too large, we can stop committing
it to the repo and only publish it with the package on the PyPI. For the
short-term, though, I think committing it will be fine so long as we limit
the amount of re-trainings.


Reply to this email directly or view it on GitHubhttps://github.com//issues/1#issuecomment-28675524
.

@sloria
Copy link
Owner

sloria commented Nov 18, 2013

I haven't published the update to the PyPI; only committed to the dev branch. Let's make all necessary changes before committing the updated model file to the repo.

@sloria
Copy link
Owner

sloria commented Sep 16, 2014

@syllog1sm Just to follow up: could I get your assistance in retraining the model? Also, how do you feel about transfering ownership of this repo to you? This is your hard work, after all. =)

@syllog1sm
Copy link
Collaborator Author

Hi Steven,

Actually could you email me? [email protected]

It's worth having a chat about this stuff. The thing is, the code for that
project isn't that valuable; what makes it valuable is the data --- which
the LDC keeps gated, and is quite expensive. We may be able to distribute
trained models, but if we're doing that, maybe we don't want to use the
demo code I wrote for the blog post.

It's also worth chatting about the follow-up post:
http://honnibal.wordpress.com/2013/12/18/a-simple-fast-algorithm-for-natural-language-dependency-parsing/
. I don't know whether you saw it!

On Tue, Sep 16, 2014 at 6:28 AM, Steven Loria [email protected]
wrote:

@syllog1sm https://github.com/syllog1sm Just to follow up: could I get
your assistance in retraining the model? Also, how do you feel about
transfering ownership of this repo to you? This is your hard work, after
all. =)


Reply to this email directly or view it on GitHub
#1 (comment)
.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants