You are currently browsing the monthly archive for March 2008.

The group I work with publishes web sites that front relational databases. Our core system infrastructure includes a custom Tomcat/Struts-based web application proxied through an Apache web server. Development of our software involves software engineers working on the web application, web design gurus focused on the web interface, programmers writing CGI scripts and beta testers poking at our progress.

To facilitate individualized development, we have a model whereby everyone on the project gets a personal Apache virtual host in which to install the web application and its sundry bits. Some people need more than one virtually hosted web site and sometimes a web site needs to be thrown up for a quick isolated test and then thrown away.

To ease the ups and downs of the virtual hosts, I use wildcard A records in the DNS for our domains. This allows me to quickly configure (and un-configure) an Apache virtual host without having to formally register (and un-register) the host name in the nameserver. So I can configure a name-based virtual host for the Apache web server:

<VirtualHost *:80>
    ServerAlias vh1.crashingdaily.com
    DocumentRoot /var/www/vh1.crashingdaily.com/html
</VirtualHost>

and instantly http://vh1.crashingdaily.com/ is a valid URL. Of course a real virtual host configuration would be more involved and the development files need to be installed to /var/www/vh1.crashingdaily.com/html but hopefully you get the idea.

Now, what happens if someone attempts to use host name for which there is no configured virtual host – say, they mistype the public production site as, ww.crashingdaily.com? Well, the wildcard A-record will send the user to our development server at IP address of the DNS wildcard and the Apache web server, not finding a virtual host configured for ‘ww.crashingdaily.com’, will use its default virtual host to answer the request.

For name-based virtual hosting, the first virtual host defined during Apache’s configuration phase catches any request to an unknown server name. This is a problem when an end user trying for our production site mistypes the host – they end up at a password protected development site and, depending on our ever changing vhost configurations, possibly a site unrelated to our project.

To get our production users back on track when they are using a malformed host name, I set up a wildcard Apache virtual host that answers for all host names not explicitly configured.

<VirtualHost *:80>
    ServerAlias *.crashingdaily.com
    Redirect 301 / http://www.crashingdaily.com/
</VirtualHost>

This configuration goes on our development server to which wildcard DNS records resolve. Our production sites are on a separate machine so we can’t just add the ServerAlias to the configuration for the ‘www’ virtual host.

This was a successful solution for our public users, they were redirected to the public site, but caused confusion for our developers when they mistyped a host name for a development site or, more commonly, when the host name is typed correctly but the site was not a successfully configured virtual host in Apache. Like our public users, the developers would get redirected to the public production site but, in this case, that is not the desired action. And because the production site frequently looks like the development site, it is not immediately clear what has happened.

I needed to make the wildcard virtual host return a real message to all users, developer or public. I wanted to keep the host exclusively virtual with no physical files on the server. I came up with the following.

<VirtualHost *:80>
    ServerAlias *.crashingdaily.com
    Redirect 404 /
    ErrorDocument 404 "No such site. Check the URL speling. Our main site is \
                       <a href='http://crashingdaily.com/'>http://crashingdaily.com/</a>"               
</VirtualHost>

With this configuration all requests to the wildcard virtual hosts are remapped by the Redirect rule to trigger a HTTP response of 404 Not Found. The configuration also specifies the custom ErrorDocument to return to the client when a 404 error is encountered.

Now when a developer or public user accesses a non-existent website within our domain they get a clear indication of the problem rather than being transparently redirected to another, possibly unintended, site.

Related:

Apache mod_alias – provides the Redirect rule.
Architectural Concerns on the use of DNS Wildcards


(Our real hosts and IP addresses have been changed for this posting to protect my ass.)

BASH Cures Cancer has yet another interesting post this week. This one, entitled “Command Substitution and Exit Status“, leverages the exit status returned from a BASH sub-shell. That alone is a useful tip but what tickled me was the author’s trick of shortening the delay in a running loop by starting a newer, shorter sleep loop that kills off the longer-running sleep process. Clever. The thought of dueling shell processes – “I’m going to sleep for a minute.”, “Oh, no you’re not!” – makes me chuckle. Sigh. I am easily amused.

When you execute a shell script it inherits all the environment variables present in the parent shell. Sometimes that can cause unintended consequences. For example, I recently ran into a situation where one of my scripts in common use by our group failed for one of the users. I eventually tracked it down to the JAVA_HOME variable that user had set in his bash profile. The user’s JAVA_HOME being inherited by the script was not set to an appropriate value. Ideally the script should have set all the environment variables it needs. However, in this case, I had failed to explicitly set JAVA_HOME in my script, an oversight masked by the fact my own JAVA_HOME was set to a valid value so the script ran normally for me.

To reduce the chance of this type of problem reoccurring with other variables I decided to clear all the inherited environment variables at the start of the script. That way any other environment dependency not defined by the script would immediately reveal itself during the script debugging phase.

I could not find a simple builtin method to run the script without the inherited environment, (aside from the impractical “env -i scriptname“) [ edit 8/28/2010: see Gordon’s env - /bin/bash suggestion below and you probably don’t need to read the rest of this posting ], so I put a one-liner in the script to unset each variable (BASH syntax):

unset $(/usr/bin/env | egrep '^(\w+)=(.*)$' | \
  egrep -vw 'PWD|USER|LANG' | /usr/bin/cut -d= -f1);

env prints the environment’s variable=value pairs. The cut at the end splits the pairs and returns the variable name.

The first egrep filters for the variable=value pairs that start at the beginning of the line. This was necessary because GNU screen sets a multiline TERMCAP variable:

TERMCAP=SC|screen|VT 100/ANSI X3.64 virtual terminal:\
        :DO=\E[%dB:LE=\E[%dD:RI=\E[%dC:UP=\E[%dA:bs:bt=\E[Z:\
        :cd=\E[J:ce=\E[K:cl=\E[H\E[J:cm=\E[%i%d;%dH:ct=\E[3g:\

I wanted just the first line and needed to discard the others so the cut would not split them and return bogus variable names, like ‘:DO‘, to unset.

The second egrep is optional but allows me to skip selected variables so they retain their preset values.

With this in place, if I attempt to use an environment variable in my script (or in a program called by my script) it will fail unless I explicitly set the value. This provides me the opportunity to ensure the script will behave consistently for all users on the system.

The diff utility reports differences between two files. If you need to find differences between one or two stdout outputs then temporary named pipes are a handy aid.

Here’s a simple example for the BASH shell to illustrate the technique. Say you have two files, A and B:

$ cat A
Tara
Dawn
Anya
Willow

$ cat B
WILLOW
ANYA
DAWN
HARMONY
TARA

The task is to find names that differ between the two lists without creating new files or editing the existing. You can do that by sorting and normalizing the letter case, then using diff to check the stdout on the fly.

$ diff -B <( sort A | tr [:lower:] [:upper:] ) <( sort B | tr [:lower:] [:upper:] )

2a3
> HARMONY

The <( ... ) syntax creates a temporary named pipe which makes the stdout of the sort | tr commands look and behave like a file, allowing diff to operate on the expected type of input.

For fun, you can see the temporary file created by the process:

dir <( sort A | tr [:lower:] [:upper:] ) 
lr-x------  1 crash daily 64 Mar  5 23:17 /dev/fd/63 -> pipe:[21483501]

This technique is not limited to diff. This should work for most any other command expecting a file for input.

Interestingly (frustratingly), it does not work for me in my BASH shell on Mac OS X. Does anyone know why? Update: Indeed someone does know. Unixjunkie has the answer. The following produces the expected output
on OS X but attempting diff produces no output.


cat <( sort A | tr [:lower:] [:upper:] ) <( sort B | tr [:lower:] [:upper:] )

ANYA
DAWN
TARA
WILLOW

ANYA
DAWN
HARMONY
TARA
WILLOW

Related:

Introduction to Named Pipes
Heads and Tails – an earlier posting that uses temporary named pipes for receiving stdin
Process Substitution, Advanced Bash-Scripting Guide

Categories

March 2008
M T W T F S S
« Jan   Apr »
 12
3456789
10111213141516
17181920212223
24252627282930
31  

Latest del.icio.us