Bug fix: Start symbols #1

syllog1sm · 2013-11-18T02:34:08Z

Found a bug in my implementation.

On line 72 of taggers.py we have this:

prev, prev2 = self.START

This initialises the tag history features to the dummy START symbol. These actually need to be reinitialised every sentence.

This bug is unlikely to significantly effect accuracy. What will happen currently is that the "previous tag" feature will almost always be set to ".", as the previous sentence probably ended with a period. The correct value is the dummy start symbol, but the features will likely reflect the same information.

If you fix the bug, you'll need to regenerate the model. This is the weakness of having a binary model committed to the repo...The old model is in the history now. Once we've made three or four updates, the repository will be quite large.

See #1

sloria · 2013-11-18T05:01:56Z

Ok, I've made the change. Thanks for catching it. I haven't re-trained the model because I wasn't sure what corpus you originally used. Could you help me with this? BTW, I've added you as a collaborator so you have direct commit access. If you'd like to be maintainer of this, I will gladly pass over ownership of the repo to you.

If the size of the git file tree becomes too large, we can stop committing it to the repo and only publish it with the package on the PyPI. For the short-term, though, I think committing it will be fine so long as we limit the amount of re-trainings.

syllog1sm · 2013-11-18T05:06:11Z

Ah, don't make the change before we update the model! If the run-time
feature extraction doesn't match the feature extraction used during
training, accuracy often goes down substantially.

On Mon, Nov 18, 2013 at 4:01 PM, Steven Loria [email protected]:

Ok, I've made the change. Thanks for catching it. I haven't re-trained the
model because I wasn't sure what corpus you originally used. Could you help
me with this? BTW, I've added you as a collaborator so you have direct
commit access. If you'd like to be maintainer of this, I will gladly pass
over ownership of the repo to you.

If the size of the git file tree becomes too large, we can stop committing
it to the repo and only publish it with the package on the PyPI. For the
short-term, though, I think committing it will be fine so long as we limit
the amount of re-trainings.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/1#issuecomment-28675524
.

sloria · 2013-11-18T05:11:43Z

I haven't published the update to the PyPI; only committed to the dev branch. Let's make all necessary changes before committing the updated model file to the repo.

sloria · 2014-09-16T04:28:04Z

@syllog1sm Just to follow up: could I get your assistance in retraining the model? Also, how do you feel about transfering ownership of this repo to you? This is your hard work, after all. =)

syllog1sm · 2014-09-16T10:24:00Z

Hi Steven,

Actually could you email me? [email protected]

It's worth having a chat about this stuff. The thing is, the code for that
project isn't that valuable; what makes it valuable is the data --- which
the LDC keeps gated, and is quite expensive. We may be able to distribute
trained models, but if we're doing that, maybe we don't want to use the
demo code I wrote for the blog post.

It's also worth chatting about the follow-up post:
http://honnibal.wordpress.com/2013/12/18/a-simple-fast-algorithm-for-natural-language-dependency-parsing/
. I don't know whether you saw it!

On Tue, Sep 16, 2014 at 6:28 AM, Steven Loria [email protected]
wrote:

@syllog1sm https://github.com/syllog1sm Just to follow up: could I get
your assistance in retraining the model? Also, how do you feel about
transfering ownership of this repo to you? This is your hard work, after
all. =)

—
Reply to this email directly or view it on GitHub
#1 (comment)
.

sloria added a commit that referenced this issue Nov 18, 2013

Re-initialize prev tag features during training

c775f24

See #1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fix: Start symbols #1

Bug fix: Start symbols #1

syllog1sm commented Nov 18, 2013

sloria commented Nov 18, 2013

syllog1sm commented Nov 18, 2013

sloria commented Nov 18, 2013

sloria commented Sep 16, 2014

syllog1sm commented Sep 16, 2014

Bug fix: Start symbols #1

Bug fix: Start symbols #1

Comments

syllog1sm commented Nov 18, 2013

sloria commented Nov 18, 2013

syllog1sm commented Nov 18, 2013

sloria commented Nov 18, 2013

sloria commented Sep 16, 2014

syllog1sm commented Sep 16, 2014