A Uniform Resource Locator (URL) is a compact representation of the location and access method for a resource located on the Internet. Each URL consists of a scheme (HTTP, HTTPS, or FTP) and a scheme-specific string. This string can also include a combination of a directory path, search string, or name of the resource. The WinINet functions provide the ability to create, combine, break down, and canonicalize URLs. For more information on URLs, see RFC-1738 on Uniform Resource Locators (URL).
The URL functions operate in a task-oriented manner. The content and format of the URL that is given to the function is not verified. The calling application should track the use of these functions to ensure that the data is in the intended format. For example, the InternetCanonicalizeUrl function would convert the character "%" into the escape sequence "%25" when using no flags. If InternetCanonicalizeUrl is used on the canonicalized URL, the escape sequence "%25" would be converted into the escape sequence "%2525", which would not work properly.
The format of all URLs must follow the accepted syntax and semantics in order to access resources through the Internet. Canonicalization is the process of formatting a URL to follow this accepted syntax and semantics.
Characters that must be encoded include any characters that have no corresponding graphic character in the US-ASCII coded character set (hexadecimal 80-FF, which are not used in the US-ASCII coded character set, and hexadecimal 00-1F and 7F, which are control characters), blank spaces, "%" (which is used to encode other characters), and unsafe characters (<, >, ", #, {, }, |, \, ^, ~, [, ], and ').
Begins retrieving an FTP, HTTP, or HTTPS resource.
Canonicalizing URLs
Canonicalizing a URL is the process that converts a URL, which might contain unsafe characters such as blank spaces, reserved characters, and so on, into an accepted format.
The InternetCanonicalizeUrl function can be used to canonicalize URLs. This function is very task-oriented, so the application should track its use carefully. InternetCanonicalizeUrl does not verify that the URL passed to it is already canonicalized and that the URL that it returns is valid.
The following five flags control how InternetCanonicalizeUrl handles a particular URL. The flags can be used in combination. If no flags are used, the function encodes the URL by default.
Value
Meaning
ICU_BROWSER_MODE
Do not encode or decode characters after "#" or "?", and do not remove trailing white space after "?". If this value is not specified, the entire URL is encoded, and trailing white space is removed.
ICU_DECODE
Convert all %XX sequences to characters, including escape sequences, before the URL is parsed.
ICU_ENCODE_SPACES_ONLY
Encode spaces only.
ICU_NO_ENCODE
Do not convert unsafe characters to escape sequences.
ICU_NO_META
Do not remove meta sequences (such as "." and "..") from the URL.
The ICU_DECODE flag should be used only on canonicalized URLs, because it assumes that all %XX sequences are escape codes and converts them into the characters indicated by the code. If the URL has a "%" symbol in it that is not part of an escape code, ICU_DECODE still treats it as one. This characteristic might cause InternetCanonicalizeUrl to create an invalid URL.
To use InternetCanonicalizeUrl to return a completely decoded URL, the ICU_DECODE and ICU_NO_ENCODE flags must be specified. This setup assumes that the URL being passed to InternetCanonicalizeUrl has been previously canonicalized.
Combining Base and Relative URLs
A relative URL is a compact representation of the location of a resource relative to an absolute base URL. The base URL must be known to the parser and usually includes the scheme, network location, and parts of the URL path. An application can call InternetCombineUrl to combine the relative URL with its base URL. InternetCombineUrl also canonicalizes the resultant URL.
Cracking URLs
The InternetCrackUrl function separates a URL into its component parts and returns the components indicated by the URL_COMPONENTS structure that is passed to the function.
The components that make up the URL_COMPONENTS structure are the scheme number, host name, port number, user name, password, URL path, and additional information (such as search parameters). Each component, except the scheme and port numbers, has a string member that holds the information, and a member that holds the length of the string member. The scheme and port numbers have only a member that stores the corresponding value; they are both returned on all successful calls to InternetCrackUrl.
To get the value of a particular component in the URL_COMPONENTS structure, the member that stores the string length of that component must be set to a nonzero value. The string member can be either the address of a buffer or NULL.
If the pointer member contains the address of a buffer, the string length member must contain the size of that buffer. InternetCrackUrl returns the component information as a string in the buffer and stores the string length in the string length member.
If the pointer member is NULL, the string length member can be set to any nonzero value. InternetCrackUrl stores the address of the first character of the URL string that contains the component information and sets the string length to the number of characters in the remaining part of the URL string that pertains to the component.
All pointer members set to NULL with a nonzero length member point to the appropriate starting point in the URL string. The length stored in the length member must be used to determine the end of the individual component's information.
To finish initializing the URL_COMPONENTS structure properly, the dwStructSize member must be set to the size of the URL_COMPONENTS structure, in bytes.
The following example returns the components of the URL in the edit box, IDC_PreOpen1, and returns the components to the list box, IDC_PreOpenList. To display only the information for an individual component, this function copies the character immediately after the component's information in the string and temporarily replaces it with a NULL.
The components that make up the URL_COMPONENTS structure are the scheme, host name, port number, user name, password, URL path, and additional information (such as search parameters). Each component, except the port number, has a string member that holds the information, and a member that holds the length of the string member.
For each required component, the pointer member should contain the address of the buffer holding the information. The length member should be set to zero if the pointer member contains the address of a zero-terminated string; the length member should be set to the string length if the pointer member contains the address of a string that is not zero-terminated. The pointer member of any components that are not required must be NULL.
For applications that need to operate through a CERN proxy, InternetOpenUrl can be used to access FTP directories and files. The FTP requests are packaged to appear like an HTTP request, which the CERN proxy would accept.
InternetOpenUrl uses the HINTERNET handle created by the InternetOpen function and the URL of the resource. The URL must include the scheme (http:, ftp:, file: [for a local file], or https: [for hypertext protocol secure]) and network location (such as www.microsoft.com). The URL can also include a path (for example, /isapi/gomscom.asp?TARGET=/windows/feature/) and resource name (for example, default.htm). For HTTP or HTTPS requests, additional headers can be included.
WinINet does not support server implementations. In addition, it should not be used from a service. For server implementations or services use Microsoft Windows HTTP Services (WinHTTP).
In asynchronous mode, an application can execute any function that includes a context value as one of its parameters and can continue to execute other commands or functions while the application waits for the function to complete its task.