Back to Blog & Articles page

Clean URLs and routing with mod_rewrite and PHP

Posted Mon 3rd Sep, 11am by Paul

Clean URLs (like this site uses) are a great way to present users with a readable URL and can help with search engine rankings and managing the structure of your site.

So what’s a “clean” URL?

Instead of the address of this page being something like http://www.fuzeddesign.co.uk/article.php?id=45 it’s actually http://www.fuzeddesign.co.uk/articles/clean-urls-and-routing-with-mod_rewrite-and-php . Both methods can render the same HTML but the second describes the location and content of the page a lot clearer than the first.

Search engine friendly

Search engines use clever little programs called “spiders” to crawl your website and index the links within it. Some of these spiders struggle to read complex URLs with elements like ?id=45, meaning your content may never be found and indexed. Clean URLs obviously don’t have this problem.

When a user types keywords into a search engine (say Google) and hits enter, Google will go off and look up matching words and phrases in it’s enormous index of the web. If the words it’s looking for can be found in the URL of the page as well as the page content, your much more likely to come out tops in the results.

The mod_rewrite part

mod_rewrite is an Apache module for routing requests depending on the incoming URL. You can use it to do all sorts of fancy routing with the regular expression and the like, but in this case we’re just going to use it to send every request to our index.php file in the root directory of our website. Once we get there we’ll use the URL we’ve got to start being all clever.

The easiest and most accessible way to set up mod_rewrite on your website is using a .htaccess file. All you need to do is create a file called .htaccess and place in your public root directory right beside your index.php file (eg. /htdocs or /www). Yes, that’s right just .htaccess, nothing before the dot. If your using Windows you might find it complains about naming a file without a prefix. If so, download the ZIP file for this article with it included and copy/paste as needed.

The contents of the .htaccess should look as follows

DirectoryIndex index.php index.html
#RewriteBase /
<IfModule mod_rewrite.c>
 	RewriteEngine On
 	RewriteCond %{REQUEST_FILENAME} -f [OR]
 	RewriteCond %{REQUEST_FILENAME} -d
 	RewriteRule ^(.+) - [PT,L]
 	RewriteRule ^(.*) index.php
</IfModule>

The DirectoryIndex part makes sure there’s no confusion over which index file to use. The RewriteBase / instruction is commented out with a # because most of the time it’s not needed. However, if you’re getting server log errors like “500 Internal Server Error” or “Rewrite not allowed here” try removing the # and enabling this line.

The <IfModule mod_rewrite.c> checks to see if the Apache server you’re on has the rewrite module enabled and the rest is the clever stuff that always sends you back to index.php If you’re not sure whether you’ve got mod_rewrite enabled run php_info() and have a look for mod_rewrite in the apache handler section. If it’s not there, you could ask your hosting company about it, but it quite common these days to have it enabled.

The PHP part

It’s all very well routing every URL request to index.php, but we’ve got to do something with the URL once we get there. Like every website, what you’re going to display on a web page depends on the URL’s content. Let’s look at an example of an index.php script for handling an incoming URL and displaying the relevant output. (The comments show some example variable values)

#index.php
<?php 
$url = tidy($_SERVER['REQUEST_URI']); // $url = "/articles/2007"
$section = explode("/", $url); // $section = ([0] => '', [1] => 'articles', [2] => '2007')
	
switch($section[1]) {
		
	case '':
		$page = 'homepage.php';
		$section[1] = 'homepage';
	break;
	
	default:
		if(file_exists($section[1].'.php'))
			$page = $section[1].'.php';
		else
			$page = './error/404-not-found.php';
	break;
	
}
?>
<html>
	<head>
		<title>Web page title</title>
	</head>
	<body>
		<div id="header">
			<h1>Site name</h1>
		</div>
		<div id="menu"></div>
		<?php include $page; ?>
	</body>
</html>

This is quite a cut down version of the PHP and HTML to make is easier to start with, we’ll look at adding more complex routing in a bit.

Ok, lets go through this one step at a time. First thing we do is extract the URL from the server global vars with the line $url = tidy($_SERVER[‘REQUEST_URI’]);. This gives us all the information we need to output the required content. The tidy() function (see ZIP file) removes any suspicious characters to help prevent SQL injection attacks etc… The variable $url will contain something like /articles/2007 (no prizes for guessing the rendered page is going to display all the articles published in 2007). We then take the URL string and separate it into it’s individual components using the PHP explode() function in the line $section = explode(”/”, $url);. This gives us an array $section with our URL components starting at $section[1]. The value of the array at index 0 is an empty string – I’ve left it that way because starting at 1 makes it more readable and minimizes confusion. If you want to strip the left hand / off the $url string and start at index 0 that’s fine.

Next we use a switch statement to load a given page into the body of the HTML. In this example I’ve included some html to render a site-wide header and menu before including the individual page (have a quick look inside the HTML <body> tags to see what I’m on about). Back to the PHP – in this example there are 2 cases to the switch statement. The first, case: ‘’; handles a request for when $section[1] is empty, i.e. the homepage is being requested (the browser address would look like http://www.example.com/ and the $url variable would be just “/”). The second case default: handles all other requests. By using “convention over configuration”, we can avoid having to write a case statement for each page or link on the site, but instead name the file to be included the same as the first component ($section[1]) in the URL. The line if(file_exists($section[1].’.php’)) looks for a file called the first section name suffixed with .php, for example

/homepage  	->	 homepage.php
/articles 	->	 articles.php
/contact-us 	->	 contact-us.php

If the file cannot be found, a 404 style error page is included, in this example it’s sourced from a sub directory called /error with the lines

else
	$page = './error/404-not-found.php';

The included pages

In the above example we’ve worked out which page to include using the switch statement and then included the page further down the file within the <body> tag using the line include $page;. Let’s take a look at an included page file.

 #homepage.php
<div id="content">
	<h2>Intro</h2>
	<p>
		Homepage web site intro.
	</p>
</div>

We’ve only got the content of the homepage in this file because the header, menu, etc… is all held in our index.php template file. This “Don’t Repeat Yourself” (DRY) approach makes it easier to maintain and manage your website.

Further routing

So far we’ve only rendered pages and their content depending on the first component of the URL ($section[1]). In the above example we used a sample URL “/articles/2007” – lets look at how to use the second URL component to display only articles from 2007.

In our routing switch() statement we can add in a case to handle articles

#index.php
case 'articles':
	$page = $section[1].'.php';
	if(is_numeric($section[2]))
		$year = $section[2];
break;

This case goes before the default: case and will display the articles page as normal if the URL is just “/articles”. We then look to see if the 2nd URL component ($section[2]) is a number using the is_numeric() function, if it is we set a variable called $year. We’ll use this variable next to determine which database query to use.

#articles.php
<?php
	if(isset($year))
		$sql = "SELECT * FROM articles WHERE year = '$year' ORDER BY date DESC";
	else
		$sql = "SELECT * FROM articles ORDER BY date DESC";
?>

This above SQL code leaves a lot to be desired but it gives you the picture of how your program can react to changes in the URL

Download the associated files for this article
Zip Archive, 13k