Skip to content

Latest commit

 

History

History
64 lines (46 loc) · 2.6 KB

parse.md

File metadata and controls

64 lines (46 loc) · 2.6 KB

Parse a URL

You parse a full URL by setting the CURLUPART_URL part in the handle:

CURLU *h = curl_url();
rc = curl_url_set(h, CURLUPART_URL,
                  "https://example.com:449/foo/bar?name=moo", 0);

If successful, rc contains CURLUE_OK and the different URL components are held in the handle. It means that the URL was valid as far as libcurl concerns.

The function call's forth argument is a bitmask. Set none, one or more bits in that to alter the parser's behavior:

CURLU_NON_SUPPORT_SCHEME

Makes curl_url_set() accept a non-supported scheme. If not set, the only acceptable schemes are for the protocols libcurl knows and have built-in support for.

CURLU_URLENCODE

Makes the function URL encode the path part if any bytes in it would benefit from that: like spaces or "control characters".

CURLU_DEFAULT_SCHEME

If the passed in string does not use a scheme, assume that the default one was intended. The default scheme is HTTPS. If this is not set, a URL without a scheme part is not accepted as valid. Overrides the CURLU_GUESS_SCHEME option if both are set.

CURLU_GUESS_SCHEME

Makes libcurl allow the URL to be set without a scheme and it instead "guesses" which scheme that was intended based on the hostname. If the outermost sub-domain name matches DICT, FTP, IMAP, LDAP, POP3 or SMTP then that scheme is used, otherwise it picks HTTP. Conflicts with the CURLU_DEFAULT_SCHEME option which takes precedence if both are set.

CURLU_NO_AUTHORITY

Skips authority checks. The RFC allows individual schemes to omit the host part (normally the only mandatory part of the authority), but libcurl cannot know whether this is permitted for custom schemes. Specifying the flag permits empty authority sections, similar to how the file scheme is handled. Really only usable in combination with CURLU_NON_SUPPORT_SCHEME.

CURLU_PATH_AS_IS

Makes libcurl skip the normalization of the path. That is the procedure where curl otherwise removes sequences of dot-slash and dot-dot etc. The same option used for transfers is called CURLOPT_PATH_AS_IS.

CURLU_ALLOW_SPACE

Makes the URL parser allow space (ASCII 32) where possible. The URL syntax does normally not allow spaces anywhere, but they should be encoded as %20 or +. When spaces are allowed, they are still not allowed in the scheme. When space is used and allowed in a URL, it is stored as-is unless CURLU_URLENCODE is also set, which then makes libcurl URL-encode the space before stored. This affects how the URL is constructed when curl_url_get() is subsequently used to extract the full URL or individual parts.