spacer
Common Sense Computing
 

Anatomy of a URL

These days URLs are everywhere. You know the "http://www..." you see in magazine articles and TV commercials, on billboards, everywhere. You can paste one into your web browser, but do you know what all the ..'s and //'s mean? Hello this is Jeanna Matthews and today on Common Sense Computing, we will be exploring the anatomy of a URL.

The first part of a URL, usually "http", designates the protocol used by your web browser to fetch information. Besides Hypertext Transfer Protocol (http), other choices include ftp for File Transfer Protocol or simply "file" to designate a file located in the local file system.

After the ://, you find the name of a specific computer - the web server on which the data you are looking for is stored. Web servers names often begin with www, like www.commonsensecomputing.org, but that is not a requirement. Although it may be harder to remember, http://foseball.commonsemsecomputing.org would work just as well. Names like these must be translated into IP addresses in order for data to be transferred anyway, so naming machines www is simply a convention.

Following the web servers name, many URLs have a pathname or set of names divided by slashes like http://www.commonsensecomputing.org/archives/current.html. This specifies a particular file on the web server. In this case, the file current.html is a file inside the archive directory. If the last element of this pathname has an ending like .html, .htm, .pl etc, then it completely names a file. If not, the last element probably names a directory. In this case, the web server will look in that directory for a default file such as index.html. In some cases, the URL contains only the machine name and no pathname. In this case, the web server simply looks for a default file in the root directory for the web content.

URLs can get even more complicated and most of the complication involves three pieces of punctuation - the colon, the question mark, and the hash mark. Colons are used to specify a port numbers as in http://www.foo.com:8080. If not port is specified, then the default port 80 is assumed. In fact, if no port is specified, you can add a :80 after the web server name without changing the meaning of the URL. For example, try http://www.google.com:80.

A question mark is used when passing parameters to a program being executed on the web server. For example, next time you use a search engine, notice that once you enter a set of words to search for, the URL changes to include these words. In this case, the words you typed were sent to the web server as parameters in the URL name and the web server dynamically generated the resulting web page based on your input.

The hash mark is used to indicate sub-sections within a large web page. Authors of web pages can embed specific tags or anchors into the web page. When the name for one of these anchors is given after the hash mark in a URL, the web page will be loaded at the specified section rather than at the beginning.

Well, that's the basics of URL anatomy. With those basics, you can carve most any URL into its basic pieces. For more information and some particularly interesting anatomical specimens, visit us on the web. And yes that would be http://www.commonsensecomputing.org.

Longest URL: http://www.llanfairpwllgwyngyllgogerychwyrndrobwyll-llantysiliogogogoch.com/

Adding a port number makes it even longer: http://www.llanfairpwllgwyngyllgogerychwyrndrobwyll-llantysiliogogogoch.com:80/

Example of FTP Protocol URL (and also the technical definition of a URL): ftp://ftp.rfc-editor.org/in-notes/rfc1738.txt

Example of parameters (and also searching for URL anatomy): http://www.google.com/search?hl=en&lr=&q=URL+Anatomy

Copyright (c) 2004 - Jeanna Matthews


  Common Sense Computing
PO Box 6356 · Massena, NY 13662
comments@commonsensecomputing.org