You are currently browsing the category archive for the ‘Apache HTTP Server’ category.

Today I found myself on someone else’s Apache HTTPD server needing to locate the parent error_log (the one where apache records its startup). The server has a dizzying array of virtual hosts configured using a web of configuration files pulled in with Include directives. Using the configuration files to untangle which is the default virtual host and what it is using for ErrorLog was unfruitful.

My understanding is that the error_log I’m looking is the stderr for the parent apache process. So, punting, I used lsof to see where the parent process was writing stderr, a.k.a. file descriptor 2, which will be reported as ‘2w’ by lsof.

First lookup the parent httpd process. The server runs CentOS using the init system:

[root@xenu ~]# ps -u root | grep httpd
10226 ? 00:07:04 httpd

Using lsof, see what pid 10226 is using for stderr:

[root@xenu ~]# lsof -p 10226 |grep ' 2w'
httpd   10226 root    2w   REG  8,64     9411     49196 /files/www/virtualhost/RedProject/release1.0/logs/error_log

Another approach is to check the /proc filesystem:

[root@xenu ~]# ls -l /proc/10226/fd/2
l-wx------ 1 root root 64 Apr  1 16:47 /proc/10226/fd/2 -> /files/www/virtualhost/RedProject/release1.0/logs/error_log

Yep, that’s the one.

[root@xenu ~]# grep resuming /files/www/virtualhost/RedProject/release1.0/logs/error_log
[Tue Feb 21 12:52:55 2012] [notice] Apache/2.2.3 (CentOS) configured -- resuming normal operations

Jobs in the Jenkins continuous integration software can be configured to link SCM commit details to a repository browser. Unfortunately the Subversion plugin (and perhaps all the SCM options) only allows one repository browser for all SVN repos in a given job (https://issues.jenkins-ci.org/browse/JENKINS-5289). My project uses two svn repos, Core and Rind, so this limitation, of course, poses an inconvenience.

Luckily, I was able to find a solution that works for my project. I noticed that the current Core revision numbers are in the 10K range and the Rind revisions are in the 50K range. Ah ha!, an opportunity for an Apache Rewrite rule.

I chose WebSVN for the repository browser and installed the WebSVN2 plugin (the WebSVN bundled with Jenkins is for the WebSVN 1.x series). For the browser URL configuration I used a fake repname (TBD) in the query string.

http://websvn.crashingdaily.com/revision.php?repname=TBD

and then used the following Apache RewriteRules to redirect to the proper repo in WebSVN based on the revision number.

RewriteEngine on
RewriteCond %{QUERY_STRING} repname=TBD&(.*)rev=([1234]\d{4,4})
  RewriteRule ^(.+)$ $1?repname=GUS&%1rev=%2 [R,L,NE]
RewriteCond %{QUERY_STRING} repname=TBD&(.*)rev=([56789]\d{4})
  RewriteRule ^(.+)$ $1?repname=ApiDB&%1rev=%2 [R,L,NE]

That is, if the rev number matches a number between 10000 and 49999 then redirect to the Core repo in WebSVN. If rev number matches a number over 50000 then redirect to the Rind repo.

In both cases the Rewrite depends on the repname being TBD so we do not trigger rewrites when normally browsing WebSVN.

This scheme depends on conditions and assumptions that happen to fit our project:

– We only care about current and future revisions (>10000). That’s ok, Jenkins will not need to link to revisions that predate this setup.

– Rind will always get more commits than Core such that Core revisions will always lag behind Rind revisions. This has been the case for the past 6 years, I don’t expect it to change.

– Core revisions will not exceed 49999 for the life of this Rewrite. If it does (unlikely, it’s taken 6 years to get to the 10K range) we can update the rules.

Obviously this is not a suitable solution for everyone.

Sidenote:

I initially started with sventon behind an Nginx proxy before deciding that WebSVN was going be easier for me to maintain in the long run (simply due to my system infrastructure and experience, not due to any complaint against sventon). For the record, this is where I was heading with the Nginx rewrite rules. I did not follow through with testing so I do not know if it works as is, but perhaps it’s a starting point for someone.

if ($request_uri ~ "revision=(.+)" ) {
  set $revision $1;
}

if ($revision ~ "[1234]\d{4}") {
  rewrite ^ http://sventon.crashingdaily.org/repos/Core/info?revision=$revision?;
}
if ($revision ~ "[56789]\d{4}") {
  rewrite ^ http://sventon.crashingdaily.org/repos/Rind/info?revision=$revision?;
} 

The at command can be used in CGI scripts to remotely trigger non-recurring jobs on a web server. This is especially useful for long running jobs that clients don’t need to wait on and jobs that need to persist across web daemon restarts.

My initial simple attempts to schedule a job with at in a CGI script executed by an Apache webserver were met with the failure response “This account is currently not available“. That message being the output of /sbin/nologin which is the shell value set for the apache user in /etc/password.

The fix is to set the SHELL environment variable to a proper shell when executing at. Here’s a simple, proof-of-concept CGI script.

#!/usr/bin/perl
use CGI;
my $cgi = new CGI;
print $cgi->header;

system(‘echo “/usr/bin/env > /tmp/apacheenv” | SHELL=/bin/bash at now’);

Calling the CGI script triggers the at scheduling. In this case, the execution of /usr/bin/env is immediate and successful.

$ cat /tmp/apacheenv
SERVER_SIGNATURE=
SHELL=/bin/bash
HTTP_USER_AGENT=Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_7; en-us) AppleWebKit/528.18.1 (KHTML, like Gecko) Version/4.0 Safari/528.17
SERVER_PORT=80

Related:

at man page

A mod_rpaf story.

My Apache virtual hosts are dynamically configured from a library of Perl code using PerlSections.
A configuration file for one virtual looks something like:

<VirtualHost *:80>
 <Perl>
  require 'conf/lib/MasterConf.pm';
 </Perl>
</VirtualHost>

Aside from boostrapping all the basic virtual host configuration, MasterConf.pm establishes basic authentication (via mod_auth_tkt) for the <Location /> of the virtual host. Authentication is the desired default but, for some specific virtual hosts, I want to disable the authentication requirements. This I can do by appending a new <Location /> after the Perl Section.

<VirtualHost *:80>
 <Perl>
  require 'conf/lib/MasterConf.pm';
 </Perl>
 <Location />
  Allow from any
 </Location>
</VirtualHost>

This effectively overrides the deny from declared by the perl library and makes the site public. This has been working fine for me for years. Recently though, I put an Nginx reverse-proxy in front of some of my virtual hosts and installed mod_rpaf into Apache so the IP address it logged would be the client’s and not the proxy server.

There I hit a snag. Apache/mod_rpaf was ignoring the X-Forwarded-For header and only logging the IP address of the proxy. Actually the GET '/' request was logging correctly, but that was insufficient and no consolation.

Client logging was done correctly if I used a static, hand-crafted Apache configuration file (not using the Perl library) so I knew mod_rpaf was installed correctly. I began to suspect the dynamic Perl configuration and that made me nervous; I really didn’t want to give that up completely.

Well, cutting to the chase, it turns out that the ‘Allow from any’ added to disable the basic authentication was the culprit. I’ve been using ‘Allow from any’ for years and it does indeed allow any host access but looking at it anew today it suddenly struck me as being wrong. It’s normally ‘Allow from all‘, I must have had ‘Satisfy any‘ on the brain when I originally started using the override – propagating around through thoughtless copying/pasting. On a whim I changed it to ‘Allow from all‘ and, w00t!, mod_rpaf began logging the correct client address.

Are ‘allow from any’ and ‘allow from all’ supposed to mean different things to Apache? A little Googling turns up examples of people using ‘allow from any’ but I don’t see any indication that it has a special meaning.

I’m using the Apache mod_rpaf module to capture client IP addresses in the X-Forwarded-For header passed by an Nginx reverse proxy. This is good for logging and CGI environments but mod_rpaf does not fix up the client IP address sufficiently to be used in Apache’s allow/deny access control directives.

Almlys has a nice workaround.

Quoting the juicy bit from Almlys’s blog posting:

SetEnvIf X-Forwarded-For ^172\.26\.0\.17 let_me_in
Order allow,deny
allow from env=let_me_in

Clever.

I’ve been exploring using Nginx to front our Apache websites. I found a fair amount of documentation online but most of it was for nginx on top of a backend application running on the same host – so, lots of examples of load balancing among various ports on 127.0.0.1. In my case I have name-based Apache virtual hosts running at different IP addresses. Also my goal is not to set up automatic load balancing – our session-based application will not work well with it – but rather I want to be able to rapidly and manually switch a web address to a different machine so I can perform maintenance on or updates to the offline systems.

To summarise, I want the publically accessible crashingdaily.com website to be a proxy server to our internal, Apache name-based virtual hosts w1 or w2. I want to be able to choose which of w1 or w2 services the client’s request.
Read the rest of this entry »

I ran into a fun problem today. The HTTP session cookie for one of my websites was not being retained in the Safari web browser. Packet sniffing clearly showed the server was sending a ‘Set-Cookie’ in the HTTP header. Safari accepted the cookie just fine with other virtual hosts having identical code base and similar configurations for Apache webserver and Tomcat. FireFox 3 accepted the cookie from all the virtual hosts. It was quite a perplexing issue.

It turns out that Safari, Version 3.1.1 at least, does not retain cookies from hosts with an underscore (_) in its name and my troubled website host did indeed have one in its name. I changed the underscore to a dash (-) and Safari was happy.

Who woulda thunk?

Related:

Discussion of the issue from the Ruby Forum
Internet Explorer probably would have given me the same behaviour, but I didn’t test it.
Host name specifications are documented in RFC 952 and RFC 1035. The underscore is not among valid characters.

The Tiger and Leopard releases of MacOS X include an implementation of BSD’s dummynet. Dummynet is “a system facility that permits the control of traffic going through the various network interfaces“.

There are many uses for this feature. I use it as part of my website development to simulate a slow network connection. Many of the users of our websites are in developing countries with slow, dialup-speed, network connections. By using a couple of quick commands I can throttle my connection to the webserver down to similar speeds. As such, I can feel their pain even though I’m on a snazzy gigabit connection, two hops away from the webserver.

The following series of commands will slow my communications to and from the webserver down to 56K modem speeds. It only affects http connections (to any web server, not just mine). My other network connections – ssh, for example – operate with native network performance.

$ sudo ipfw add pipe 1 src-port http
$ sudo ipfw add pipe 1 dst-port http
$ sudo ipfw pipe 1 config bw 56kbit/s

Adam Knight’s Traffic Shaping in Mac OS X is a good starter tutorial.

Additional articles and documentation.

The group I work with publishes web sites that front relational databases. Our core system infrastructure includes a custom Tomcat/Struts-based web application proxied through an Apache web server. Development of our software involves software engineers working on the web application, web design gurus focused on the web interface, programmers writing CGI scripts and beta testers poking at our progress.

To facilitate individualized development, we have a model whereby everyone on the project gets a personal Apache virtual host in which to install the web application and its sundry bits. Some people need more than one virtually hosted web site and sometimes a web site needs to be thrown up for a quick isolated test and then thrown away.

To ease the ups and downs of the virtual hosts, I use wildcard A records in the DNS for our domains. This allows me to quickly configure (and un-configure) an Apache virtual host without having to formally register (and un-register) the host name in the nameserver. So I can configure a name-based virtual host for the Apache web server:

<VirtualHost *:80>
    ServerAlias vh1.crashingdaily.com
    DocumentRoot /var/www/vh1.crashingdaily.com/html
</VirtualHost>

and instantly http://vh1.crashingdaily.com/ is a valid URL. Of course a real virtual host configuration would be more involved and the development files need to be installed to /var/www/vh1.crashingdaily.com/html but hopefully you get the idea.

Now, what happens if someone attempts to use host name for which there is no configured virtual host – say, they mistype the public production site as, ww.crashingdaily.com? Well, the wildcard A-record will send the user to our development server at IP address of the DNS wildcard and the Apache web server, not finding a virtual host configured for ‘ww.crashingdaily.com’, will use its default virtual host to answer the request.

For name-based virtual hosting, the first virtual host defined during Apache’s configuration phase catches any request to an unknown server name. This is a problem when an end user trying for our production site mistypes the host – they end up at a password protected development site and, depending on our ever changing vhost configurations, possibly a site unrelated to our project.

To get our production users back on track when they are using a malformed host name, I set up a wildcard Apache virtual host that answers for all host names not explicitly configured.

<VirtualHost *:80>
    ServerAlias *.crashingdaily.com
    Redirect 301 / http://www.crashingdaily.com/
</VirtualHost>

This configuration goes on our development server to which wildcard DNS records resolve. Our production sites are on a separate machine so we can’t just add the ServerAlias to the configuration for the ‘www’ virtual host.

This was a successful solution for our public users, they were redirected to the public site, but caused confusion for our developers when they mistyped a host name for a development site or, more commonly, when the host name is typed correctly but the site was not a successfully configured virtual host in Apache. Like our public users, the developers would get redirected to the public production site but, in this case, that is not the desired action. And because the production site frequently looks like the development site, it is not immediately clear what has happened.

I needed to make the wildcard virtual host return a real message to all users, developer or public. I wanted to keep the host exclusively virtual with no physical files on the server. I came up with the following.

<VirtualHost *:80>
    ServerAlias *.crashingdaily.com
    Redirect 404 /
    ErrorDocument 404 "No such site. Check the URL speling. Our main site is \
                       <a href='http://crashingdaily.com/'>http://crashingdaily.com/</a>"               
</VirtualHost>

With this configuration all requests to the wildcard virtual hosts are remapped by the Redirect rule to trigger a HTTP response of 404 Not Found. The configuration also specifies the custom ErrorDocument to return to the client when a 404 error is encountered.

Now when a developer or public user accesses a non-existent website within our domain they get a clear indication of the problem rather than being transparently redirected to another, possibly unintended, site.

Related:

Apache mod_alias – provides the Redirect rule.
Architectural Concerns on the use of DNS Wildcards


(Our real hosts and IP addresses have been changed for this posting to protect my ass.)

Categories

May 2024
M T W T F S S
 12345
6789101112
13141516171819
20212223242526
2728293031  

Latest del.icio.us