-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove html encoded entities from post slug when cleaning #62549
base: trunk
Are you sure you want to change the base?
Remove html encoded entities from post slug when cleaning #62549
Conversation
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.
To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
Adding a comment to update the props as I've linked my account now (thanks to whoever set up that bot with instructions!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution, @tharsheblows!
I have left a few initial thoughts I had while reviewing the code. Let me know what you think!
@@ -24,7 +24,7 @@ export function cleanForSlug( string ) { | |||
return ''; | |||
} | |||
return ( | |||
removeAccents( string ) | |||
removeAccents( string.replace( /&/g, '' ).replace( /</g, '' ) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about any other commonly used HTML entities, like >
,
, '
, "
and others (not an exhaustive list by any means)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tyxla Yes, I agree with you, it should handle all entities. I wavered between keeping it limited to what you would get from the WP REST API response for the title (only those entities need to be handled) and all entities, wasn't quite sure which approach to take!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. In light of the recent conversation to stay compatible with the sanitize_title_with_dashes
filter, let's introduce a utility function compatible with sanitize_title
filtered by sanitize_title_with_dashes
which converts some entities (spaces and dashes) to hyphens, and obliterates the rest of the HTML entities.
it( 'Should remove html entities for ampersands', () => { | ||
expect( | ||
cleanForSlug( 'the long cat & a dog && fish' ) | ||
).toBe( 'the-long-cat-a-dog-fish' ); | ||
expect( cleanForSlug( 'the long cat &amp; dog' ) ).toBe( | ||
'the-long-cat-amp-dog' | ||
); | ||
} ); | ||
|
||
it( 'Should remove html entities for less thans', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely not applicable after addressing my other feedback, but this could be a good case for it.each
as that could simplify the test code and remove redundancy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I'll add in after the discussion below, thank you!
@@ -24,7 +24,7 @@ export function cleanForSlug( string ) { | |||
return ''; | |||
} | |||
return ( | |||
removeAccents( string ) | |||
removeAccents( string.replace( /&/g, '' ).replace( /</g, '' ) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cleanForSlug
function is intended to be a mirror of sanitize_title()
, which doesn't account for HTML entities. This is also confirmed by the cleanForSlug
documentation, which specifically claims it doesn't touch any HTML entities. With that in mind, perhaps there is a better place to address this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I wondered why it wasn't a mirror – there is a core default filter added to sanitize_title()
which runs the $title
through sanitize_title_with_dashes()
and the expectation when doing PHP development is that sanitize_title( 'the dog & the cat' )
will return the-dog-the-cat
.
It's that core filter which handles the heavy lifting of sanitization and removal of the html entities among other things, this is the reason the string returned by cleanForSlug()
doesn't match the actual slug returned by sanitize_title()
.
People might be using this and changing the returned value could horribly mess their urls up – what about keeping back compat and introducing a new function sanitizeForSlug()
which mirrors sanitize_title()
with the default filter added (ie add in all the sanitization done by sanitize_title_with_dashes()
) and then using sanitizeForSlug()
when displaying the slug.
Does that approach sound ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a core default filter added to sanitize_title() which runs the $title through sanitize_title_with_dashes() and the expectation when doing PHP development is that sanitize_title( 'the dog & the cat' ) will return the-dog-the-cat.
This makes sense, thanks for pointing it out 👍
People might be using this and changing the returned value could horribly mess their urls up – what about keeping back compat and introducing a new function sanitizeForSlug() which mirrors sanitize_title() with the default filter added (ie add in all the sanitization done by sanitize_title_with_dashes()) and then using sanitizeForSlug() when displaying the slug.
Does that approach sound ok?
A separate utility function sounds good to me 👍 We'll definitely want to keep cleanForSlug
as it is, since it intentionally doesn't handle entities.
Fixes #62543
What?
This removes the html encoded entity strings for & and < from the slug in
cleanForSlug()
. These are only two entities which seem to cause an issue.Why?
Step-by-step reproduction instructions
How?
Replaces
&
and<
with an empty string incleanForSlug()
. Also adds tests.Testing Instructions
And & &&& & &&& <1 039898':!#*() &&& <>.“‘”’" hey
(note the string&
and string<
)and-amp-lt1-039898-hey
and-amp-lt1-039898-hey