You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
no host name is extracted in the following situations
URL contains 4 slashes after the protocol: https:////example.org/ - while java.net.URL extracts an empty hostname, the Nutch's OkHTTP-based protocol seems to fetch the resource as if there are only two slashes.
similarly java.net.URL and OkHttp show a different behavior if there is an overlong (or even invalid?) userinfo before the hostname (scheme://userinfo@hostname/)
the extraction of registered domains (done by crawler-commons' EffectiveTldFinder does not extract anything if the hostname is equal to a public suffix (gov.uk, kharkov.ua for example)
The text was updated successfully, but these errors were encountered:
gov.uk
,kharkov.ua
for example)The text was updated successfully, but these errors were encountered: