Table of Contents:
The
Writing Apache Modules with Perl and C
book can be purchased online from O'Reilly
and
Amazon.com.
|
Your corrections of the technical and grammatical
errors are very welcome. You are encouraged to help me
improve this guide. If you have something to contribute
please send it
directly to me.
|
Nowadays millions of people surf the Internet. There are millions of Terabytes of data lying around. To manipulate the data new smart techniques and technologies were invented. One of the major inventions was the relational database, which allows us to search and modify huge stores of data very quickly. We use SQL (Structured Query Language) to access and manipulate the contents of these databases.
[ TOC ]
When people started to use the web, they found that they needed to write web interfaces to their databases. CGI is the most widely used technology for building such interfaces. The main limitation of a CGI script driving a database is that its database connection is not persistent - on every request the CGI script has to re-connect to the database, and when the request is completed the connection is closed.
Apache::DBI
was written to remove this limitation. When you use it, you have a database
connection which persists for the process' entire life. So when your
mod_perl script needs to use a database,
Apache::DBI
provides a valid connection immediately and your script starts work right
away without having to initiate a database connection first.
This is possible only with CGI running under a mod_perl enabled server, since in this model the child process does not quit when the request has been served.
It's almost as straightforward as is it sounds; there are just a few things to know about and we will cover them in this section.
[ TOC ]
The DBI module can make use of the Apache::DBI
module. When it loads, the DBI module tests if the environment variable
$ENV{MOD_PERL}
is set, and if the Apache::DBI
module has already been loaded. If so, the DBI module will forward every
connect()
request to the Apache::DBI
module. Apache::DBI
uses the ping()
method to look for a database handle from a
previous connect()
request, and tests if this handle is still
valid. If these two conditions are fulfilled it just returns the database
handle.
If there is no appropriate database handle or if the ping()
method fails, Apache::DBI
establishes a new connection and stores the handle for later re-use. When
the script is run again by a child that is still connected, Apache::DBI
just checks the cache of open connections by matching the host, username and password
parameters against it. A matching connection is returned if available or a
new one is initiated and then returned.
There is no need to delete the disconnect()
statements from
your code. They won't do anything because the Apache::DBI
module overloads the disconnect()
method with an empty one.
When should this module be used and when shouldn't it be used?
You will want to use this module if you are opening several database
connections to the server. Apache::DBI
will make them persistent per child, so if you have ten children and each
opens two different connections (with different connect()
arguments) you will have in total twenty opened and persistent connections.
After the initial connect()
you will save the connection time
for every connect()
request from your DBI
module. This can be a huge benefit for a server with a high volume of
database traffic.
You must not use this module if you are opening a special connection for each of your users. Each connection will stay persistent and in a short time the number of connections will be so big that your machine will scream in agony and die.
If you want to use Apache::DBI
but you have both situations on one machine, at the time of writing the
only solution is to run two Apache/mod_perl servers, one which uses Apache::DBI
and one which does not.
[ TOC ]
After installing this module, the configuration is simple - add the
following directive to httpd.conf
PerlModule Apache::DBI |
Note that it is important to load this module before any other
Apache*DBI
module and before the DBI
module itself!
You can skip preloading DBI
, since Apache::DBI
does that. But there is no harm in leaving it in, as long as it is loaded
after
Apache::DBI
.
[ TOC ]
If you want to make sure that a connection will already be opened when your
script is first executed after a server restart, then you should use the connect_on_init()
method in the startup file to preload every connection you are going to
use. For example:
Apache::DBI->connect_on_init ("DBI:mysql:myDB::myserver", "username", "passwd", { PrintError => 1, # warn() on errors RaiseError => 0, # don't die on error AutoCommit => 1, # commit executes immediately } ); |
As noted above, use this method only if you want all of apache to be able
to connect to the database server as one user (or as a very few users),
i.e. if your user(s)
can effectively share the connection. Do
not use this method if you want for example one unique connection per user.
Be warned though, that if you call connect_on_init()
and your database is down, Apache children will be delayed at server
startup, trying to connect. They won't begin serving requests until either
they are connected, or the connection attempt fails. Depending on your DBD
driver, this can take several minutes!
[ TOC ]
If you are not sure if this module is working as advertised, you should enable Debug mode in the startup script by:
$Apache::DBI::DEBUG = 1; |
Starting with ApacheDBI-0.84
, setting $Apache::DBI::DEBUG = 1
will produce only minimal output. For a full trace you should set
$Apache::DBI::DEBUG = 2
.
After setting the DEBUG level you will see entries in the error_log
both when Apache::DBI
initializes a connection and when it returns one from its cache. Use the
following command to view the log in real time (your error_log
might be located at a different path, it is set in the Apache configuration
files):
tail -f /usr/local/apache/logs/error_log |
I use alias
(in tcsh
) so I do not have to remember the path:
alias err "tail -f /usr/local/apache/logs/error_log" |
[ TOC ]
Be very very careful when locking the database (LOCK TABLE ...
) or singular rows if you use Apache::DBI
or similar persistent connections. MySQL threads keep tables locked until
the thread ends (connection is closed) or the tables are unlocked. If your
session die()'s
while tables are locked, they will stay neatly
locked as your connection won't be closed either.
See the section Handling the 'User pressed Stop button' case for more information on prevention.
[ TOC ]
[ TOC ]
The SQL server keeps a connection to the client open for a limited period
of time. Many developers were bitten by so called Morning
bug, when every morning the first users to use the site received a
No Data Returned
message, but after that everything worked fine. The error is caused by Apache::DBI
returning a handle of the invalid connection (the server closed it because
of a timeout), and the script was dying on that error. The infamous ping()
method was introduced to solve this problem, but still people were being
bitten by this problem. Another solution was found - to increase the
timeout parameter when starting the SQL server. Currently I startup MySQL
server with a script safe_mysql
, so I have modified it to use this option:
nohup $ledir/mysqld [snipped other options] -O wait_timeout=172800 |
(172800 seconds is equal to 48 hours. This change solves the problem.)
Note that as from version 0.82
, Apache::DBI
implements ping()
inside the eval
block. This means that if the handle has timed out it should be reconnected
automatically, and avoid the morning bug.
[ TOC ]
When it receives a connection request, before it decides to use an existing
cached connection, Apache::DBI
insists that the new connection be opened in exactly the same way as the
cached connection. If I have one script that sets LongReadLen
and one that does not, Apache::DBI
will make two different connections. So instead of having a maximum of 40
open connections, I can end up with 80.
However, you are free to modify the handle immediately after you get it
from the cache. So always initiate connections using the same parameters
and set LongReadLen
(or whatever) afterwards.
[ TOC ]
You must use DBI::connect()
as in normal DBI usage to get your $dbh
database handler.
Using the Apache::DBI
does not eliminate the need to write proper DBI
code. As the Apache::DBI
man page states, you should program as if you are not using Apache::DBI
at all. Apache::DBI
will override the DBI methods where necessary and return your cached
connection. Any disconnect()
call will be just ignored.
[ TOC ]
Make sure you have it installed.
Make sure you configured mod_perl with EVERYTHING=1.
Use the example script eg/startup.pl (in the mod_perl distribution). Remove the comment from the line.
# use Apache::DebugDBI; |
and adapt the connect string. Do not change anything in your scripts for
use with Apache::DBI
.
[ TOC ]
Does your error_log look like this?
10169 Apache::DBI PerlChildInitHandler 10169 Apache::DBI skipping connection cache during server startup Database handle destroyed without explicit disconnect at /usr/lib/perl5/site_perl/5.005/Apache/DBI.pm line 29. |
If so you are trying to open a database connection in the parent httpd process. If you do, children will each get a copy of this handle, causing clashes when the handle is used by two processes at the same time. Each child must have its own, unique, connection handle.
To avoid this problem, Apache::DBI
checks whether it is called during server startup. If so the module skips
the connection cache and returns immediately without a database handle.
You must use the Apache::DBI->connect_on_init()
method in the startup file.
[ TOC ]
To log a trace of DBI
statement execution, you must set the
DBI_TRACE
environment variable. The PerlSetEnv DBI_TRACE
directive must appear before you load Apache::DBI
and DBI
.
For example if you use Apache::DBI
, modify your httpd.conf
with:
PerlSetEnv DBI_TRACE "3=/tmp/dbitrace.log" PerlModule Apache::DBI |
Replace 3
with the TRACE level you want. The traces from each request will be
appended to /tmp/dbitrace.log
. Note that the logs might interleave if requests are processed
concurrently.
Within your code you can control trace generation with the
trace()
method:
DBI->trace($trace_level) DBI->trace($trace_level, $trace_filename) DBI trace information can be enabled for all handles using this DBI class method. To enable trace information for a specific handle use the similar C<$h-E<gt>trace> method. |
Using the handle trace option with a $dbh
or $sth
is handy for limiting the trace info to the specific bit of code that you
are interested in.
Trace Levels:
[ TOC ]
Since many mod_perl developers use mysql as their preferred SQL engine,
these notes explain the difference between mysql_use_result()
and
mysql_store_result()
. The two influence the speed and size of the processes.
The DBD::mysql
(version 2.0217) documentation includes the following snippet:
mysql_use_result attribute: This forces the driver to use mysql_use_result rather than mysql_store_result. The former is faster and less memory consuming, but tends to block other processes. (That's why mysql_store_result is the default.) |
Think about it in client/server terms. When you ask the server to
spoon-feed you the data as you use it, the server process must buffer the
data, tie up that thread, and possibly keep any database locks open for a
long time. So if you read a row of data and ponder it for a while, the
tables you have locked are still locked, and the server is busy talking to
you every so often. That is mysql_use_result()
.
If you just suck down the whole dataset to the client, then the server is
free to go about its business serving other requests. This results in
parallelism since the server and client are doing work at the same time,
rather than blocking on each other doing frequent I/O. That is
mysql_store_result()
.
As the mysql manual suggests: you should not use mysql_use_result()
if you are doing a lot of processing for each row on the client side. This
can tie up the server and prevent other threads from updating the tables.
[ TOC ]
Sometimes you end up running many databases on the same machine. These might have very varying database needs (such as one db with sessions, very frequently updated but tiny amounts of data, and another with large sets of data that's hardly ever updated) you might be able to gain a lot by running two differently configured databases on one server. One would benefit from lots of caching, the other would probably reduce the efficiency of the cache but would gain from fast disk access. Different usage profiles require vastly different performance needs.
This is basically a similar idea to having two Apache servers, each optimized for its specific requirements.
[ TOC ]
In this section you will find scripts, modules and code snippets to help
you get started using relational Databases with mod_perl scripts. Note that
I work with mysql
( http://www.mysql.com ), so the code
you find here will work out of box with mysql. If you use some other SQL
engine, it might work for you or it might need some changes. YMMV.
[ TOC ]
Having to write many queries in my CGI scripts, persuaded me to write a stand alone module that saves me a lot of time in coding and debugging my code. It also makes my scripts much smaller and easier to read. I will present the module here, with examples following:
Notice the DESTROY
block at the end of the module, which makes various cleanups and allows
this module to be used under mod_perl and
mod_cgi
as well. Note that you will not get the benefit of persistent database
handles with mod_cgi.
[ TOC ]
My-DB.pm
-- The My::DB module
(Note that you will not find this on CPAN. at least not yet :)
[ TOC ]
To use My::DB
in your script, you first have to create a My::DB
object:
use vars qw($db_obj); my $db_obj = new My::DB or croak "Can't initialize My::DB object: $!\n"; |
Now you can use any of My::DB
's methods. Assume that we have a table called tracker where we store the names of the users and what they are doing at each and
every moment (think about an online community program).
I will start with a very simple query--I want to know where the users are
and produce statistics. tracker
is the name of the table.
# fetch the statistics of where users are my $r_ary = $db_obj->sql_get_matched_rows_ary_ref ("tracker", [qw(where_user_are)], ); my %stats = (); my $total = 0; foreach my $r_row (@$r_ary){ $stats{$r_row->[0]}++; $total++; } |
Now let's count how many users we have (in table users
):
my $count = $db_obj->sql_count_matched("users"); |
Check whether a user exists:
my $username = 'stas'; my $exists = $db_obj->sql_count_matched ("users", [username => ["=",$username]] ); |
Check whether a user is online, and get the time since she went online (since
is a column in the tracker
table, it tells us when a user went online):
my @row = (); $db_obj->sql_get_matched_row (\@row, "tracker", ['UNIX_TIMESTAMP(since)'], [username => ["=",$username]] ); if (@row) { my $idle = int( (time() - $row[0]) / 60); return "Current status: Is Online and idle for $idle minutes."; } |
A complex query. I join two tables, and I want a reference to an array
which will store a slice of the matched query (LIMIT $offset,$hits
) sorted by username
. Each row in the array is to include the fields from the users
table, but only those listed in @verbose_cols
. Then we print it out.
my $r_ary = $db_obj->sql_get_matched_rows_ary_ref ( "tracker STRAIGHT_JOIN users", [map {"users.$_"} @verbose_cols], [], ["WHERE tracker.username=users.username", "ORDER BY users.username", "LIMIT $offset,$hits"], ); foreach my $r_row (@$r_ary){ print ... } |
Another complex query. The user checks checkboxes to be queried by, selects
from lists and types in match strings, we process input and build the @where
array. Then we want to get the number of matches and the matched rows as
well.
my @search_keys = qw(choice1 choice2); my @where = (); # Process the checkboxes - we turn them into a regular expression foreach (@search_keys) { next unless defined $q->param($_) and $q->param($_); my $regexp = "[".join("",$q->param($_))."]"; push @where, ($_ => ['REGEXP',$regexp]); } # Add the items selected by the user from our lists # selected => exact match push @where,(country => ['=',$q->param('country')]) if $q->param('country'); # Add the parameters typed by the user foreach (qw(city state)) { push @where,($_ => ['LIKE',$q->param($_)]) if $q->param($_); } # Count all that matched the query my $total_matched_users = $db_obj->sql_count_matched ( "users", \@where, ); # Now process the orderby my $orderby = $q->param('orderby') || 'username'; # Do the query and fetch the data my $r_ary = $db_obj->sql_get_matched_rows_ary_ref ( "users", \@display_columns, \@where, ["ORDER BY $orderby", "LIMIT $offset,$hits"], ); |
sql_get_matched_rows_ary_ref
knows to handle both OR
ed and
AND
ed params. This example shows how to use OR
on parameters:
This snippet is an implementation of a watchdog. Our users want to know
when their colleagues go online. They register the usernames of the people
they want to know about. We have to make two queries: one to get a list of
usernames, the second to find out whether any of these users is online. In
the second query we use the OR
keyword.
# check who we are looking for $r_ary = $db_obj->sql_get_matched_rows_ary_ref ("watchdog", [qw(watched)], [username => ['=',$username)], ], ); # put them into an array my @watched = map {$_->[0]} @{$r_ary}; my %matched = (); # Does the user have some registered usernames? if (@watched) { # Try to fetch all the users who match the usernames exactly. # Put it into an array and compare it with a hash! $r_ary = $db_obj->sql_get_matched_rows_ary_ref ("tracker", [qw(username)], [username => ['=',\@watched], ] ); map {$matched{$_->[0]} = 1} @{$r_ary}; } # Now %matched includes the usernames of the users who are being # watched by $username and currently are online. |
Your corrections of the technical and grammatical
errors are very welcome. You are encouraged to help me
improve this guide. If you have something to contribute
please send it
directly to me.
|
The
Writing Apache Modules with Perl and C
book can be purchased online from O'Reilly
and
Amazon.com.
|
Written by Stas Bekman. Last Modified at 05/26/2000 |
|
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |