Friendly URLs

Created: 2013/03/15 23:09:53+0000

Hello, you may have noticed if you are a frequent visitor to my site (cough, google-bot, cough). I have replaced my query string URLs with more friendly ones. This creates shorter and more accessible URLs.

The pages of my site are generated dynamically from a database. To identify the values that need to be extracted from the database I pass IDs in the URL. These IDs are used to lookup to content in the database. I was passing these IDs in the query string of the URL. The query string is everything to the right of the "?" character. The query string consists of name/value pairs formatted as name=value with multiple pairs separated by "&". These pairs are referred to as the request's arguments. All I did was to drop the query string and to separate the IDs with forward slashes.

From:

To:

Now this is shorter and cleaner but it does obscure what the number means. That doesn't matter very much here as it's an ID. The ID exists or it doesn't, you get a 200 response code or a 404 response code.

I used Struts 2 action wildcard mappings to pattern match the requested URL. This provides an easy way of doing this transformation. Wildcards allow a the URL to be partially defined and the undefined parts to be substituted for the URL requested. I added to the action the wildcards which take the form "{name:regexp}".

In Struts 2 this looks like this:

<action name="struts/article/{articleId:\d+}/{articleRevisionId:\d+}" class="com.mattunderscore.site.actions.DisplayArticle">
   <result name="success">/site/article/Display.jsp</result>
</action>

This works just like taking arguments as query strings but the name of the pair is determined by the wildcard and the value is the substitution of the "{name:regexp}" in the matched URL. The URL is matched only if the regular expression in the variable is matched. Of course I also had to go through my entire website and change all the URLs to the new format. Then there was my 404 handling, I had to write a wildcard to catch the URLs I wanted to return 404 codes. To allow me to pattern match over forward slashes and to use regular expressions I had to set in the struts.xml file the properties:

<constant name="struts.enable.SlashesInActionNames" value="true"/>
<constant name="struts.mapper.alwaysSelectFullNamespace" value="false"/>
<constant name="struts.patternMatcher" value="regex"/>

This meant I had to change the action mapping further. The "struts" part was not in my previous action mappings, it is the URL pattern that the Struts 2 filter was applied to. Before these arguments where set Struts 2 was able to match that automatically.

There are different levels of friendliness. The least friendly, I think are shortened URLs. There is a place for them but they obfuscate everything about the URL. Then there are URLs with query strings. These are simple, ugly and frequently long. The URLs provided by a company I used to work for made heavy use of query strings. Better are URLs like mine, the arguments form part of the normal URL and the query string has been dropped. These are shorter and more convenient but what the values mean may not be clear. The most friendly of all are named URLs, so long as it is kept short. Something like http://www.mattunderscore.com/article/friendly-urls.html. This is not something my site is generally very good at. Here I'm using Paul Turkey's UrlRewriteFilter to forward request to the URL to less friendly URL. This is not something that is driven from my database, this forwarding is defined in an XML file. This makes it a little inconvenient for me.