Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please add back nu.validator.htmlparser.tools #53

Open
bmix opened this issue Sep 28, 2021 · 1 comment
Open

Please add back nu.validator.htmlparser.tools #53

bmix opened this issue Sep 28, 2021 · 1 comment

Comments

@bmix
Copy link

bmix commented Sep 28, 2021

The original 1.4 distribution contained some example apps, that could be used from the command line. The author stated:

Sample Apps

The jar file contains sample main() entry points:

nu.validator.htmlparser.tools.XSLT4HTML5
nu.validator.htmlparser.tools.XSLT4HTML5XOM
nu.validator.htmlparser.tools.HTML2XML
nu.validator.htmlparser.tools.XML2HTML
nu.validator.htmlparser.tools.XML2XML
nu.validator.htmlparser.tools.HTML2HTML
The first two are sample apps that demo the use of XSLT with HTML5. The first one can use SAX or DOM and requires the Xalan serializer. The second one uses XOM. Running without parameters dumps usage help.

java -cp htmlparser-1.4.jar nu.validator.htmlparser.tools.XSLT4HTML5 --template=sort-ul.xsl --input-html=test.html --output-html=out.html --mode=dom

HTML2XML converts HTML5 to XML 1.0 plus Namespaces. With no arguments, it reads from stdio and writes to stdout. With one parameter, it reads the named file and writes to stdout. With two parameters, the first is the input file name and the second is the output file name.

XML2HTML, HTML2HTML and XML2XML work analogously. The *2HTML versions produce bad output if the document tree is not serializable as HTML5. It is up to the user the make sure that it is.

The sourcecode is in test-src/nu/validator/htmlparser/tools/ but none of the releases I found on Maven Central has the classes built in. I do have an older JAR, which is also named htmlparser-1.4.jar on disk, from years ago, that had these classes and thus is usable from the CLI.

May I kindly ask you, to bring these back, so one can convert HTML into XHTML simply from the command line? Thank you!

@dhouck
Copy link

dhouck commented Mar 24, 2023

As far as I can tell, there is no currently-existing tool that does what HTML2XML does, and the obvious ways of writing one (eg. Python BeautifulSoup, HTML Tidy) donʼt actually work right especially around namespaces.

The version here also isnʼt ideal (Iʼm planning to submit another PR about that in a few minutes) but it would be better than everything else I could find, ie. nothing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants