The uri structure contains a library for dealing with URIs.
A URI (Uniform Resource Identifier) is of following syntax:
[scheme]Parts in brackets may be omitted.:path [?search] [#fragid]
The URI contains characters like : to indicate its different
parts. Some special characters are escaped if they are a
regular part of a name and not indicators for the structure of a URI.
Escape sequences are of following scheme: %hh where h
is a hexadecimal digit. The hexadecimal number refers to the
ASCII of the escaped character, e.g. %20 is space (ASCII
32) and %61 is `a' (ASCII 97). This module
provides procedures to escape and unescape strings that are meant to
be used in a URI.
Parses an uri-string into its four fields. The fields are not unescaped, as the rules for parsing the path component in particular need unescaped text, and are dependent on scheme. The URL parser is responsible for doing this. If the scheme, search or fragid portions are not specified, they are #f. Otherwise, scheme, search, and fragid are strings. path is a non-empty string list -- the path split at slashes.
Here is a description of the parsing technique. It is inwards from both ends:
First, the code searches forwards for the first reserved
character (=, ;, /, #, ?,
: or space). If it's a colon, then that's the
scheme part, otherwise there is no scheme part. At
all events, it is removed.
Then the code searches backwards from the end for the last reserved char. If it's a sharp, then that's the fragid part -- remove it.
Then the code searches backwards from the end for the last reserved char. If it's a question-mark, then that's the search part -- -remove it.
What's left is the path. The code split it at slashes. The empty string becomes a list containing the empty string.
This scheme is tolerant of the various ways people build broken
URI's out there on the Net3, e.g. = is a reserved character, but used
unescaped in the search-part. It was given to me4 by Dan Connolly of the W3C and slightly modified.
Unescape-uri unescapes a string. If start and/or end are specified, they specify start and end positions within string should be unescaped.This procedure should only be used after the URI was parsed, since unescaping may introduce characters that blow up the parse -- that's why escape sequences are used in URIs.
This is a set of characters (in the sense of SRFI 14) which are escaped in URIs. These are the following characters:$,-,_,@,.,&,!,*,\,",',(,),,,+, and all other characters that are neither letters nor digits (such as space and control characters).
This procedure escapes characters of string that are in escaped-chars. Escaped-chars defaults to uri-escaped-chars.Be careful with using this procedure to chunks of text with syntactically meaningful reserved characters (e.g., paths with URI slashes or colons) -- they'll be escaped, and lose their special meaning. E.g. it would be a mistake to apply escape-uri to
//lcs.mit.edu:8001/foo/bar.html
because the slashes and colons would be escaped.
This procedure splits uri at slashes. Only the substring given with start (inclusive) and end (exclusive) as indices is considered. start and end - 1 have to be within the range of uri. Otherwise an index-out-of-range exception will be raised.Example:
(split-uri "foo/bar/colon" 4 11)returns("bar" "col")
This procedure generates a path out of a URI path list by inserting slashes between the elements of plist.If you want to use the resulting string for further operation, you should escape the elements of plist in case they contain slashes, like so:
(uri-path->uri (map escape-uri pathlist))
This procedure simplifies a URI path. It removesAccording to RFC 2396, relative paths are considered not to start with"."and"/.."entries from path, and removes parts before a root. The result is a list, or #f if the path tries to back up past root.
/. They are appended to a base URL path and then simplified.
So before you start to simplify a URL try to find out if it is a
relative path (i.e. it does not start with a /).
Examples:
(simplify-uri-path (split-uri "/foo/bar/baz/.." 0 15))
==> ("" "foo" "bar")
(simplify-uri-path (split-uri "foo/bar/baz/../../.." 0 20))
==> ()
(simplify-uri-path (split-uri "/foo/../.." 0 10))
==> #f
(simplify-uri-path (split-uri "foo/bar//" 0 9))
==> ("")
(simplify-uri-path (split-uri "foo/bar/" 0 8))
==> ("")
(simplify-uri-path (split-uri "/foo/bar//baz/../.." 0 19))
==> #f