Best ways of parsing a URL using C?

I have a URL like this:

http://192.168.0.1:8080/servlet/rece

I want to parse the URL to get the values:

IP: 192.168.0.1
Port: 8080
page:  /servlet/rece

How do I do that?


Personally, I steal the HTParse.c module from the W3C (it is used in the lynx Web browser, for instance). Then, you can do things like:

 strncpy(hostname, HTParse(url, "", PARSE_HOST), size)

The important thing about using a well-established and debugged library is that you do not fall into the typical traps of URL parsing (many regexps fail when the host is an IP address, for instance, specially an IPv6 one).


I wrote a simple code using sscanf, which can parse very basic URLs.

#include <stdio.h>

int main(void)
{
    const char text[] = "http://192.168.0.2:8888/servlet/rece";
    char ip[100];
    int port = 80;
    char page[100];
    sscanf(text, "http://%99[^:]:%99d/%99[^\n]", ip, &port, page);
    printf("ip = \"%s\"\n", ip);
    printf("port = \"%d\"\n", port);
    printf("page = \"%s\"\n", page);
    return 0;
}

./urlparse
ip = "192.168.0.2"
port = "8888"
page = "servlet/rece"

May be late,... what I have used, is - the http_parser_parse_url() function and the required macros separated out from Joyent/HTTP parser lib - that worked well, ~600LOC.


With a regular expression if you want the easy way. Otherwise use FLEX/BISON.

You could also use a URI parsing library


Libcurl now has curl_url_get() function that can extract host, path, etc.

Example code: https://curl.haxx.se/libcurl/c/parseurl.html

/* extract host name from the parsed URL */ 
uc = curl_url_get(h, CURLUPART_HOST, &host, 0);
if(!uc) {
  printf("Host name: %s\n", host);
  curl_free(host);
}