Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: uri decode function #3161

Merged
merged 1 commit into from
Aug 21, 2024
Merged

feat: uri decode function #3161

merged 1 commit into from
Aug 21, 2024

Conversation

fmgornick
Copy link
Contributor

hi! i love this tool and have been using it for years, but recently noticed it doesn't have built in uri decoding. i didn't feel like writing a bash script or installing another program for something this trivial, plus i thought it'd be a nice addition to jq, so i thought i'd try adding it!

I didn't add too many tests because i wanted to keep it consistent with the other format strings, but i did personally test a few files in the src dir, and i also tested it on this file containing every unicode point.

closes #798, closes #2261.

@wader
Copy link
Member

wader commented Aug 19, 2024

Hey, i think that makes sense to add. gojq has is already and it seems like jaq used to have it 01mf02/jaq#105 but got removed as jq didn't have it

Copy link
Member

@emanuele6 emanuele6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think it makes sense to add a @urid since there is also @base64d; it can be useful.

src/builtin.c Outdated Show resolved Hide resolved
src/builtin.c Outdated Show resolved Hide resolved
src/builtin.c Outdated Show resolved Hide resolved
tests/jq.test Show resolved Hide resolved
src/builtin.c Outdated Show resolved Hide resolved
@fmgornick fmgornick force-pushed the urid branch 2 times, most recently from a56ac41 to 3d844f8 Compare August 19, 2024 14:43
tests/jq.test Show resolved Hide resolved
tests/uritest Outdated Show resolved Hide resolved
tests/uri.test Outdated Show resolved Hide resolved
src/builtin.c Show resolved Hide resolved
src/builtin.c Outdated Show resolved Hide resolved
@itchyny
Copy link
Contributor

itchyny commented Aug 20, 2024

Should + be converted to (space) or no?

@fmgornick
Copy link
Contributor Author

Should + be converted to (space) or no?

it seems like real percent encoding uses %20 as the space encoding, and only application/x-www-form-urlencoded form data uses +. I have no opinion but since the @uri formatter uses %20 i figured it'd be best to keep it consistent

@emanuele6
Copy link
Member

emanuele6 commented Aug 20, 2024

The += syntax is specify to query strings; if users want it, they can just gsub("\\+";" ") before @urid.

@itchyny
Copy link
Contributor

itchyny commented Aug 20, 2024

I think @uri is commonly used for query string (@uri"q=\(.)"), so @urid can be used to decode them. They often use + for spaces (like ?q=jq+manual&sourceid=chrome&ie=UTF-8 which was copied from Chrome address bar).

@fmgornick
Copy link
Contributor Author

I think @uri is commonly used for query string (@uri"q=\(.)"), so @urid can be used to decode them. They often use + for spaces (like ?q=jq+manual&sourceid=chrome&ie=UTF-8 which was copied from Chrome address bar).

looking at the code for @uri, it just uses percent encoding for any char not in the unreserved buffer ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_.~, since isn't unreserved, it gets encoded to %20.

If people want i can change the behavior of both @uri and @urid to encode to + and vice versa, but there prolly needs to be some more discussion there 🤷‍♀️

@wader
Copy link
Member

wader commented Aug 21, 2024

Played around a bit with how go does it https://go.dev/play/p/Itnr8EfdOjK for the query it seem to decode both + and %20 as whitespace. Should we do that but only encode to %20? otherwise i guess we need separate formats (or arg?) for url parts and query encode/decode?

For reference this is how gojq behaves atm, uses query behavior in both cases

$ gojq -n '"+%20" | @urid'
"  "
$ gojq -n '" " | @uri'
"+"

@emanuele6
Copy link
Member

emanuele6 commented Aug 21, 2024

+= is specific to queries and is completely wrong in all other cases; i don't think @uri/@urid should do that conversion; there could be dedicated @query and @queryd formats

@itchyny
Copy link
Contributor

itchyny commented Aug 21, 2024

Understood. These formats are for path segments, not for query params.

@itchyny itchyny merged commit 0e0cdd5 into jqlang:master Aug 21, 2024
29 checks passed
@fmgornick fmgornick deleted the urid branch August 21, 2024 23:08
@itchyny itchyny added this to the 1.8 release milestone Aug 21, 2024
@wader
Copy link
Member

wader commented Aug 22, 2024

@fmgornick Thanks for working on this! 🥳

@fmgornick
Copy link
Contributor Author

@fmgornick Thanks for working on this! 🥳

of course! thank you all for providing such constructive feedback 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

uri decode function inverse of @uri
4 participants