-
Notifications
You must be signed in to change notification settings - Fork 40
Bug fix: Start symbols #1
Comments
Ok, I've made the change. Thanks for catching it. I haven't re-trained the model because I wasn't sure what corpus you originally used. Could you help me with this? BTW, I've added you as a collaborator so you have direct commit access. If you'd like to be maintainer of this, I will gladly pass over ownership of the repo to you. If the size of the git file tree becomes too large, we can stop committing it to the repo and only publish it with the package on the PyPI. For the short-term, though, I think committing it will be fine so long as we limit the amount of re-trainings. |
Ah, don't make the change before we update the model! If the run-time On Mon, Nov 18, 2013 at 4:01 PM, Steven Loria [email protected]:
|
I haven't published the update to the PyPI; only committed to the dev branch. Let's make all necessary changes before committing the updated model file to the repo. |
@syllog1sm Just to follow up: could I get your assistance in retraining the model? Also, how do you feel about transfering ownership of this repo to you? This is your hard work, after all. =) |
Hi Steven, Actually could you email me? [email protected] It's worth having a chat about this stuff. The thing is, the code for that It's also worth chatting about the follow-up post: On Tue, Sep 16, 2014 at 6:28 AM, Steven Loria [email protected]
|
Found a bug in my implementation.
On line 72 of taggers.py we have this:
prev, prev2 = self.START
This initialises the tag history features to the dummy START symbol. These actually need to be reinitialised every sentence.
This bug is unlikely to significantly effect accuracy. What will happen currently is that the "previous tag" feature will almost always be set to ".", as the previous sentence probably ended with a period. The correct value is the dummy start symbol, but the features will likely reflect the same information.
If you fix the bug, you'll need to regenerate the model. This is the weakness of having a binary model committed to the repo...The old model is in the history now. Once we've made three or four updates, the repository will be quite large.
The text was updated successfully, but these errors were encountered: