Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra spaces being added #88

Closed
rightaway opened this issue Apr 3, 2016 · 4 comments
Closed

Extra spaces being added #88

rightaway opened this issue Apr 3, 2016 · 4 comments

Comments

@rightaway
Copy link

var htmlToText = require('html-to-text');
var text = htmlToText.fromString("This is a <a href='url'>link</a>.");
console.log(text);

will print This is a link [url] . (note the space before the period). Makes it problematic to use, for example when a url comes at the end of a sentence.

It also seems to be adding spaces before the link.

@rightaway rightaway changed the title Extra space being added Extra spaces being added Apr 3, 2016
@mlegenhausen
Copy link
Member

This is a known bug I would be happy about a pull request.

@rightaway
Copy link
Author

I can do it, but I have a question. The part causing the issue is https://github.com/werk85/node-html-to-text/blob/929dfa69a18f2d0fe9cb4d4424ac039108d54246/lib/html-to-text.js#L119.

case 'a':
    // Inline element needs a leading space if `result` currently
    // doesn't end with whitespace
    elem.needsSpace = whiteSpaceRegex.test(result);
    result += format.anchor(elem, walk, options);
    break;

elem.needsSpace is getting set to true because of this rule that inline elements need leading spaces is if doesn't end with whitespace. Why is that? I can't think of an example where the whitespace should be added rather than just using what is already around it. Can elem.needsSpace always be false for a tags?

@mlegenhausen
Copy link
Member

Don't know exactly anymore why. But feel free to change some logic, if you can provide some unit tests so I can see that the changes only effect the trailing space.

For more details about the space problem you can take a look at #21 and #48. The last one also provides some failing unit tests for the links, that I removed because of not having a solution for it.

@rightaway
Copy link
Author

I tried various things on this the last couple of days to fix it but I'm afraid I don't understand this code well enough, I just introduced new errors in my efforts. Perhaps @danmactough who did #21 could take a look? I think it's an important enough issue as (at least for us) we can't use it in production to send for example text emails made from html with the current formatting.

mlegenhausen pushed a commit that referenced this issue Jun 29, 2016
Prevent extra whitespace from being added (fix #88)
MUDev1994 pushed a commit to MUDev1994/node-html-to-text that referenced this issue May 26, 2023
MUDev1994 pushed a commit to MUDev1994/node-html-to-text that referenced this issue May 26, 2023
Prevent extra whitespace from being added (fix html-to-text#88)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants