Implementing Apache’s force proxy flag for rewrite rules under NGINX

NGINX’s default behavior for rewrite rules (at least up to version 0.7.65) is to redirect if the replacement part begins with ‘http://’. Let me quote some info from NGINX’ wiki:

rewrite

syntax: rewrite regex replacement flag

[…]

If the replacement string begins with http:// then the client will be redirected, and any further rewrite directives are terminated.

It is important to take this into consideration while designing new rules because the behavior of the rule itself is bound to the place we are retrieving data from.

Apache does this differently since it offers a flag that can be set per rule which instructs the web server to just “proxy” the request (i.e. do not redirect, just get the response of that request in the background and send it back to the client). Taken from Apache’s site, this flag is:

proxy|P‘ (force proxy)
This flag forces the substitution part to be internally sent as a proxy request and immediately (rewrite processing stops here) put through the proxy module. You must make sure that the substitution string is a valid URI (typically starting with http://hostname) which can be handled by the Apache proxy module. If not, you will get an error from the proxy module. Use this flag to achieve a more powerful implementation of the ProxyPass directive, to map remote content into the namespace of the local server.

Note: mod_proxy must be enabled in order to use this flag.

How to implement this under NGINX

I have been able to get similar behavior under NGINX using ‘rewrite’ and ‘proxy_pass’ directives.

The following example implements a regular expression based rewrite rule serving content from domain2 to the client’s request on domain1.

server {
  listen 1.2.3.4:80;
  server_name domain1.com;

  location / {
    rewrite ^/([0-9][0-9]/[0-9][0-9]/.+)$ /example/?t=$1 last;
    proxy_pass http://domain2.com;
  }
}

Using that NGINX configuration, the client can request:

http://domain1.com/12/34/test

which will be proxied to:

http://domain2.com/example?t=12/34/test/

and served back to her “apparently” from http://domain1.com/12/34/test (i.e. there won’t be any URL redirection).

http_load man page

http_load(1)                                                                                                      http_load(1)

NAME
       http_load - multiprocessing http test client

SYNOPSIS
       http_load [-checksum] [-throttle] [-proxy host:port] [-verbose] [-timeout secs] [-sip sip_file] [-cipher str] ( -paral-
       lel N | -rate N [-jitter] ) ( -fetches N | -seconds N ) url_file

DESCRIPTION
       http_load runs multiple http fetches in parallel, to test the throughput of a web server.   However  unlike  most  such
       test clients, it runs in a single process, so it doesn't bog down the client machine.  It can be configured to do https
       fetches as well.

       The -checksum flag tells http_load to do checksums on the files fetched, to make sure they came across ok.  The  check-
       sums  are  computed  the  first  time each URL gets fetched, and then recomputed and compared on each subsequent fetch.
       Without the -checksum flag only the byte count is checked.

       The -throttle flag tells http_load to throttle its consumption of data to 33.6Kbps, to simulate access by modem  users.

       The -proxy flag lets you run http_load through a web proxy.

       The -verbose flag tells http_load to put out progress reports every minute on stderr.

       The -timeout flag specifies how long to wait on idle connections before giving up.  The default is 60 seconds.

       The  -sip  flag  lets you specify a file containing numeric IP addresses (not hostnames), one per line.  These get used
       randomly as the *source* address of connections.  They must be real routable addresses on your  machine,  created  with
       ifconfig, in order for this to work.  The advantage of using this option is you can make one client machine look like a
       whole bank of machines, as far as the server knows.

       The -cipher flag is only available if you have SSL support compiled in.  It specifies a cipher set to use.  By default,
       http_load  will  negotiate  the highest security that the server has available, which is often higher (and slower) than
       typical browsers will negotiate.  An example of a cipher set might be "RC4-MD5" - this  will  run  considerably  faster
       than  the  default.   In addition to specifying a raw cipher string, there are three built-in cipher sets accessible by
       keywords:
         * fastsec - fast security - RC4-MD5
         * highsec - high security - DES-CBC3-SHA
         * paranoid - ultra high security - AES256-SHA
       Of course, not all servers are guaranteed to implement these combinations.

       One start specifier, either -parallel or -rate, is required.  -parallel tells http_load  to  keep  that  many  parallel
       fetches  going  simultaneously.   -rate tells http_load to start that many new connections each second.  If you use the
       -rate start specifier, you can also give the -jitter flag, telling http_load to vary the rate randomly by about 10%.

       One end specifier, either -fetches or -seconds, is required.  -fetches tells http_load to quit when that  many  fetches
       have been completed.  -seconds tells http_load to quit after that many seconds have elapsed.

       The url_file is just a list of URLs, one per line.  The URLs that get fetched are chosen randomly from this file.

       All flags may be abbreviated to a single letter.

       Note  that  while the end specifier is obeyed precisely, the start specifier is only approximate.  If you use the -rate
       flag, http_load will make its best effort to start connections at that rate, but may not succeed.  And if you  use  the
       -parallel flag, http_load will attempt to keep that many simultaneous connections going, but may fail to keep up if the
       server is very fast.

       Sample run:
           % http_load -rate 2 -seconds 300 urls
           591 fetches, 8 max parallel, 5.33606e+06 bytes, in 300 seconds
           9028.87 mean bytes/connection
           1.97 fetches/sec, 17786.9 bytes/sec
           msecs/connect: 28.8932 mean, 44.243 max, 24.488 min
           msecs/first-response: 63.5362 mean, 81.624 max, 57.803 min
           HTTP response codes:
             code 200 -- 591

SEE ALSO
       http_ping(1)

AUTHOR
       Copyright (C) 1998,1999,2001 by Jef Poskanzer .  All rights reserved.

                                                       15 November 2001                                           http_load(1)

How to test CAS’ SAML using soapUI

Overview
Recent versions (I believe 3.2 or older) of Central Authentication System (a.k.a. CAS) include Security Assertion Markup Language (a.k.a. SAML) support, out of the box. The beauty if it is that it is already “there” accessible through the URL ‘/cas/samlValidate’ instead of the usual ‘/cas/serviceValidate’.

One thing to be noted is that it is not so easy to communicate with your CAS instance using SAML protocol since the requests need to be HTTP POST (which put browsers out of the picture) using a properly formed SAML payload.

Here is when soapUI comes in, which is an excellent tool for web services testing using SOAP requests (there should not be any problem/limitation by using the open source version of the tool) since it can be used to complete the SAML communication and see what the CAS server is actually returning back.

Steps
So, in order to complete that, you would need to connect to your CAS server, login by providing valid credentials and then get a CAS ticket. This can be done by opening the following URL on a browser:

https://CAS_DOMAIN:PORT/cas/login?service=http://localhost/foo

The browser should be now displaying an error because it should have been redirected back to the URL http://localhost/foo which probably does not exist. No problem. What is important though is that you would be able to retrieve the ticket from the URL. Example:

# URL
http://localhost/foo?ticket=ST-3-j6RIZfeaNTxilsFYr3xe-cas

# TICKET
ST-3-j6RIZfeaNTxilsFYr3xe-cas

Now using SoapUI you need to send CAS a proper SAML request. You may do that using the “submit a request to a specified end point” action. The URL where to send the request should be:

https://CAS_DOMAIN:PORT/cas/samlValidate? ->
     TARGET=http://localhost/foo&ticket=ST-3-j6RIZfeaNTxilsFYr3xe-cas

the request body should be:


								ST-3-j6RIZfeaNTxilsFYr3xe-cas

CAS’ response should be similar to this:


                  http://localhost/foo

                  juan.huerta

                     urn:oasis:names:tc:SAML:1.0:cm:artifact

The returned username can be found in the ‘NameIdentifier’ tag.

See Also

Note.- special thanks to Juan Huerta, Julien Gribonvald and Marvin Addison for their tips which inspired me to write this post.

KeepAlived Installation under Debian Etch

Briefly, KeepAlived is a daemon that is able to provide failover capabilities to servers/services by binding virtual IP addresses to machines. In the event of failure, KeepAlived would reassign this virtual IP to another machine. This action is executed fast (less than 2 seconds) and automatically.

This is a very interesting daemon to be used in combination with HAProxy, for example. It would be possible to have a failovered load balancer. In the event of this load balancer failing, keepalived would switch to another that is up and running in such a clean and fast way that the clients would not notice.

Installation steps under Debian Etch

apt-get update
apt-get install keepalived

The system will ask a couple of questions. I usually reply using the default values, then configure myself manually the daemon, by editing /etc/keepalived/keepalived.conf.

To make the virtual IP address bindable, you should add this line /etc/sysctl.conf:

net.ipv4.ip_nonlocal_bind=1

Check binding:

sysctl -p

net.ipv4.ip_nonlocal_bind = 1

It is convenient to alter the order when keepalived is being started upon restarts. We probably want to have it started at the end so all the services are already running by the time keepalive runs. To do that:

update-rc.d -f keepalived remove
Removing any system startup links for /etc/init.d/keepalived ...
/etc/rc0.d/K20keepalived
/etc/rc1.d/K20keepalived
/etc/rc2.d/S20keepalived
/etc/rc3.d/S20keepalived
/etc/rc4.d/S20keepalived
/etc/rc5.d/S20keepalived
/etc/rc6.d/K20keepalived

update-rc.d keepalived defaults 90
Adding system startup for /etc/init.d/keepalived ...
/etc/rc0.d/K90keepalived -> ../init.d/keepalived
/etc/rc1.d/K90keepalived -> ../init.d/keepalived
/etc/rc6.d/K90keepalived -> ../init.d/keepalived
/etc/rc2.d/S90keepalived -> ../init.d/keepalived
/etc/rc3.d/S90keepalived -> ../init.d/keepalived
/etc/rc4.d/S90keepalived -> ../init.d/keepalived
/etc/rc5.d/S90keepalived -> ../init.d/keepalived

See Also

Sticky sessions (aka persistence)

While working with load balancers it is sometimes required that the client connects to the same backend server all the time. This concept has several names (e.g.: sticky sessions, session stickiness, persistent sessions, persistence, etc…).

Wikipedia explains this concept clearly:

One dilemma when operating a load-balanced service, is what to do if the backend servers require some information (“state”) to be stored “persistently” (across multiple requests) on a per-user basis. This can be a problem if a backend server needs access to information generated by a different backend server during a previous request. Performance may suffer if cached information from previous requests is unavailable for re-use.

One solution is to consistently send clients to the same backend server. This is known as “persistence” or “stickiness”. One downside to this technique is lack of automatic failover, in case one or more backend servers should fail or be taken offline for maintenance. Persistent information is lost if it cannot be transmitted to the remaining backend servers. Citation.

See Also

Name based virtual hosting

It is possible (using HTTP/1.1) to have several websites served on the same IP address and port and still differentiate them based on the host name.

This should be done at the web server configuration level.

Microsoft IIS

IIS uses the so called “Host Headers”. It is straightforward to set up, you may want to look into these articles:0

Apache

More complex and versatile, you should check: Apache’s Name-based Virtual Host Support.