Howto mod_rewrite with Apache
About mod_rewrite for Apache
This
module uses a rule-based rewriting engine (based on a
regular-expression parser) to rewrite requested URLs on the fly. It
supports an unlimited number of rules and an unlimited number of
attached rule conditions for each rule to provide a really flexible and
powerful URL manipulation mechanism. The URL manipulations can depend
on various tests, for instance server variables, environment variables,
HTTP headers, time stamps and even external database lookups in various
formats can be used to achieve a really granular URL matching.
This
module operates on the full URLs (including the path-info part) both in
per-server context (httpd.conf) and per-directory context (.htaccess)
and can even generate query-string parts on result. The rewritten
result can lead to internal sub-processing, external request
redirection or even to an internal proxy throughput.
But
all this functionality and flexibility has its drawback: complexity. So
don't expect to understand this entire module in just one day.
Using Apache's mod_rewrite
The
Apache module mod_rewrite can be used to perform various forms of URI
acrobatic manipulation. A prerequisite concept before attempting to
understand mod_rewrite are regular expressions.
When
a URL is requested by a server, this does not necessarily map directly
to the server's filesystem. This request can be twisted and turned to
(hopefully) present more sense to the browsing user.
Why should I care?
A
clean URL is part of a good user experience. It works as a breadcrumb
trail - allowing the user to see where they are located in the site, it
doesn't break in bookmarks, you can easily send it over email, and
allows users to guess where they want to go next. Most importantly a
easy to read URL will be indexed by search engines such as Google.
Having all your pages indexed creates a huge advantage to getting
visitors to your website. Many search engine robots cannot read a URL
with symbols such as ? & or commas and therefor not indexing your
website.
This is all possible by using human-readable URLs:
"In
principle, users should not need to know about URLs which are a
machine-level addressing scheme. In practice, users often go to
websites or individual pages through mechanisms that involve exposure
to raw URLs." -- Jakob Neilsen, Jakob Nielsen's Alertbox, March 21,
1999: URL as UI
"a URL should contain
human-readable directory and file names that reflect the nature of the
information space." -- Jakob Nielsen, item #4 Top Ten Mistakes in Web
Design
By choosing a well thought-out URL,
you won't have to change it during the next re-organization. URLs that
remain the same tend to pick up more links over time.
Getting started:
First,
Apache must be compiled with the mod_rewrite module for any of this to
take place. Insert these lines into the vhost definition for the domain
that you want to work with.
RewriteEngine on
RewriteLog /path/to/logs/server.rewrite.txt
RewriteLogLevel 1
The
first line turns the RewriteEngine on. Otherwise, extra code doesn't
get processed by the Apache webserver. Next, we specify where the
logfile that records the rewrite activity should be placed. This is
mostly for debugging, as your CustomLog should be keeping track of
traffic.
A beginning example:
One
of the simplest uses of mod_rewrite is to re-direct a web request from
one page to another. Many times this will be done if the first has
expired, was spelled wrong, or the site has a new naming scheme. It's
nice to forward new users to the correct page in case they have the
previous one bookmarked, or if a search engine has cached the old
location.
RewriteRule ^/biogarphy.php3 /biography/ [R=301]
This
forwards a browser request from one page to the other. because the
[R=301] at the end. I've taken a file that was spelled wrong, and fixed
it at the same time removing a an old filetype suffix. (php4 has
replaced that suffix with .php) What if I were to dump php from my
system, and go with *.html, *.jsp, or even *.willie? By rewriting my
URI to look like a directory, it doesn't matter what filetype I'm
using, nor what my DirectoryIndex options are.
Compound Example:
What
if you used the above example, but didn't decide to create a
"biography" directory at your Doc-root? Apache can still be told where
the content resides by including another RewriteRule following the
first. Rules will continue attempting to match until a "last" case is
presented with the [L] modifier at the end. This is much like a switch
programming structure, using break to prevent each option from being
executed.
RewriteRule ^/biography/ /biogarphy.php3 [L]
This
might seem a little redundant, since we just did the opposite. This
line will tell requests to "biography" to read the content from
/biogarphy.php3 instead of looking for a biography directory.
Confusing? Well, I could do this instead:
RewriteRule ^/(.+)/?$ /content/$1.php [L]
I
can search on anything that follows the beginning slash, and replace
the file request to look for that file in the content directory through
the use of the regular expression and the backreference.
I've
also placed this inside another directory that I don't necessarily want
the browsing user to see, or know about, but it's easier for the
webmaster to keep track of the roles of each file on the site. Since
I've upgraded from php3 to php4 the suffix has changed.
More Advanced - the Query String:
The
query string is passed in separately from the URL. This means that a
simple regex doesn't necessarily do the trick, but a compound statement
using RewriteCond (condition) is required.
RewriteCond %{QUERY_STRING} id=([^&;]*)
RewriteRule ^/$ http://%{SERVER_NAME}/%1/? [R]
RewriteRule ^/([^/]*)/?$ /index.php?id=$1 [L]
The
RewriteCondition matches only when the following condition is true, and
continues until a "last" [L] is stated. The Condition's backreferences
are different, using the % prefix, and their scope lasts beyond the
Condition line.
This above example would
translate "/?id=home" into "/home/", and then re-assign the value of
"home" to the id HTTP_GET_VAR. One more thing to notice here is that
the the second line has a trailing ? - this is used to negate copying
of the query string into the new, re-directed URI.
More Reference Links:
http://www.engelschall.com/pw/apache/rewriteguide/
http://httpd.apache.org/docs/mod/mod_rewrite.html
http://httpd.apache.org/docs/misc/rewriteguide.html