We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hey guys,
First of all thanks for python-boilerpipe
Trying to use Boilerpipe but can't extract properly some documents...
from boilerpipe.extract import Extractor extractorType="DefaultExtractor" sourceUrl = 'http://www.indiatimes.com/news/india/arvind-kejriwal-to-seek-political-sanyas-127620.html' extractor = Extractor(extractor=extractorType, url=sourceUrl) Traceback (most recent call last): File "", line 1, in File "/Library/Python/2.7/site-packages/boilerpipe/extract/init.py", line 41, in init self.data = unicode(self.data, encoding) UnicodeDecodeError: 'utf8' codec can't decode byte 0x91 in position 53647: invalid start byte
The document seems to be having some non-utf8 characters... which do not seem to parse well... Any workaround for the problem?
The text was updated successfully, but these errors were encountered:
I solved UnicodeDecodeError ,you can see what I modified in init.py https://github.com/Caimany/python-boilerpipe/blob/master/src/boilerpipe/extract/__init__.py
Sorry, something went wrong.
No branches or pull requests
Hey guys,
First of all thanks for python-boilerpipe
Trying to use Boilerpipe but can't extract properly some documents...
The document seems to be having some non-utf8 characters... which do not seem to parse well... Any workaround for the problem?
The text was updated successfully, but these errors were encountered: