Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

token number error when 1st sentence starts with PONCT #9

Open
enavarro222 opened this issue Mar 5, 2014 · 1 comment
Open

token number error when 1st sentence starts with PONCT #9

enavarro222 opened this issue Mar 5, 2014 · 1 comment

Comments

@enavarro222
Copy link

Two tokens with number 1 :

$ echo '" Un exemple de phrase "' | java -Xmx1024M -jar talismane-fr-1.8.2b-allDeps.jar command=analyse 
1   "   "   PONCT   PONCT   _   0   _   _   _

1   Un  un  DET DET g=m|n=s 2   det _   _
2   exemple exemple NC  nc  g=m|n=s 0   _   _   _
3   de  de  P   P   _   2   dep _   _
4   phrase  phrase  NC  nc  g=f|n=s 3   prep    _   _
5   "   "   PONCT   PONCT   _   4   ponct   _   _

But here is a bunch of similar examples that work well :

$ echo " ' Un exemple de phrase ' " | java -Xmx1024M -jar talismane-fr-1.8.2b-allDeps.jar command=analyse 
1   '   '   PONCT   PONCT   _   0   _   _   _
2   Un  _   NPP _   _   0   _   _   _
3   exemple exemple NC  nc  g=m|n=s 2   mod _   _
4   de  de  P   P   _   3   dep _   _
5   phrase  phrase  NC  nc  g=f|n=s 4   prep    _   _
6   _'  _   ADJ _   _   5   mod _   _

$ echo ' " Un exemple de phrase " ' | java -Xmx1024M -jar talismane-fr-1.8.2b-allDeps.jar command=analyse 
1   "   "   PONCT   PONCT   _   0   _   _   _
2   Un  un  DET DET g=m|n=s 3   det _   _
3   exemple exemple NC  nc  g=m|n=s 0   _   _   _
4   de  de  P   P   _   3   dep _   _
5   phrase  phrase  NC  nc  g=f|n=s 4   prep    _   _
6   "   "   PONCT   PONCT   _   0   _   _   _

$ echo '" un exemple de phrase "' | java -Xmx1024M -jar talismane-fr-1.7.4b-allDeps.jar command=analyse 
1   "   "   PONCT   PONCT   _   0   _   _   _
2   un  un  DET DET g=m|n=s 3   det _   _
3   exemple exemple NC  nc  g=m|n=s 0   _   _   _
4   de  de  P   P   _   3   dep _   _
5   phrase  phrase  NC  nc  g=f|n=s 4   prep    _   _
6   "   "   PONCT   PONCT   _   0   _   _   _


@urieli
Copy link
Collaborator

urieli commented Jun 5, 2014

This is an analysis error, rather than a programming error: the Talismane sentence detector has decided to place a sentence break after the first double quote, therefore breaking the input up into two sentences (hence the repeated index 1). It is presumably getting confused by the space between the quotation and the sentence start (but strangey, only when preceded by another space). I'll have to look into this when I have some time. By the way: if you're only sending one sentence at a time, it is simpler to skip the sentence detector stage entirely and start with the tokeniser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants