Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lxml replaced by lxml-html-clean #130

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

D3vil0p3r
Copy link

@rajatomar788 lxml and lxml-html-clean are now separated. The project needs lxml-html-clean.

`lxml` and `lxml-html-clean` are now separated. The project needs `lxml-html-clean`.
@rajatomar788
Copy link
Owner

@D3vil0p3r wouldn't this need the lxml to be replaced with lxml-html-clean at every import statement?

@D3vil0p3r
Copy link
Author

@D3vil0p3r wouldn't this need the lxml to be replaced with lxml-html-clean at every import statement?

Uhm... According to my tests, without touching the source code, if I use now only lxml as dependency, I get:

Traceback (most recent call last):
  File "/home/athena/Downloads/test.py", line 4, in <module>
    wp = config.create_page()
         ^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/8qjbn3zr43azi40bfqgbn6n0yzm229lj-python3.12-pywebcopy-7.0.2/lib/python3.12/site-packages/pywebcopy/configs.py", line 241, in create_page
    from .core import WebPage
  File "/nix/store/8qjbn3zr43azi40bfqgbn6n0yzm229lj-python3.12-pywebcopy-7.0.2/lib/python3.12/site-packages/pywebcopy/core.py", line 12, in <module>
    from .elements import HTMLResource
  File "/nix/store/8qjbn3zr43azi40bfqgbn6n0yzm229lj-python3.12-pywebcopy-7.0.2/lib/python3.12/site-packages/pywebcopy/elements.py", line 24, in <module>
    from .parsers import iterparse
  File "/nix/store/8qjbn3zr43azi40bfqgbn6n0yzm229lj-python3.12-pywebcopy-7.0.2/lib/python3.12/site-packages/pywebcopy/parsers.py", line 14, in <module>
    from lxml.html.clean import Cleaner
  File "/nix/store/v4vgibrhyyk5r3rfvh7jbjyqkw2gk4qw-python3.12-lxml-5.2.2/lib/python3.12/site-packages/lxml/html/clean.py", line 18, in <module>
    raise ImportError(
ImportError: lxml.html.clean module is now a separate project lxml_html_clean.
Install lxml[html_clean] or lxml_html_clean directly.

If I just use lxml-html-clean, it seems to work correctly. I used NixOS as test environment and Python 3.12.

@rajatomar788
Copy link
Owner

rajatomar788 commented Jul 31, 2024

@D3vil0p3r
The lxml html clean that you are talking about is just a html cleaning utility not required for core functions of the library. Meaning you may have to additionally install the lxml-html-clean to do away with the error. But you would still have to install lxml because that is the core library from which html clean utility is separated.

I would say just adding an extra lxml-html-clean dependancy should do away with the error.

@rajatomar788
Copy link
Owner

Install lxml[html_clean] or lxml_html_clean directly. To use the html cleaning utility of pywebcopy. Yes the error shouldn't have restricted the pywebcopy functions. It is in fact a bug created by lxml library splitting.

@D3vil0p3r
Copy link
Author

Install lxml[html_clean] or lxml_html_clean directly. To use the html cleaning utility of pywebcopy. Yes the error shouldn't have restricted the pywebcopy functions. It is in fact a bug created by lxml library splitting.

Uhm so what can we do meanwhile? Waiting just for that bug to be solved? It has been reported already?

@D3vil0p3r
Copy link
Author

@rajatomar788 another question: six python dependency is really needed?

@rajatomar788
Copy link
Owner

@D3vil0p3r pywebcopy 8 is in development so just wait it out. 8 would just change things around so that it removes cyclic errors.

Also six was needed to keep support for python 2.7
Pywebcopy still works on python 2.7

@D3vil0p3r
Copy link
Author

Ok. Since I'm packaging and maintaining pywebcopy for several Linux distros, for now I keep pywebcopy at current version. When pywebcopy will be out, I will update the related pkgs according to the updated docs

@rajatomar788
Copy link
Owner

@D3vil0p3r yes. And also thank you for contribution and work you put towards pywebcopy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants