[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [apache-plusplus] Applications for HTTP servers (was "Process model for C++ Apache")



Bret wrote:

> Taking Michael's ideas and playing along while
> simultaneously trying to return to the notion of a
> Web server ... it occurs to me that the main place
> we're gonna be looking for optimization is going to
> be the request handlers, whatever we're using to
> actually process any requests.
> ... it seems that the act of reading requests is
> relatively simple compared to any processing involved,
> particularly for non-static content.
> If that is the case, from the perspective of an HTTP 
> server I find it hard to justify a pool of reader threads
> waiting for requests and throwing them into a queue.

I'm no expert on httpd, but Dean wrote earlier:
> There's another aspect to http requests... they
> generally don't require a lot of computing time
> in userland... most of what they do is tell the
> kernel to send some bytes somewhere.

However, let's assume that you are correct. Let's also
loose our emphasis on thread pools, which is what we've been
discussing almost exclusively up to now. Thread pools are a
natural extension of the more-important abstraction of
connection- and message-pools and the way they abstract
connections from network I/O, and messages from the
processing of messages. A connection is nothing more than
a record in the connection queue; a message is nothing more
than a record in the message-queuing system.

Organizing connections and messages in this form allows us
to manupulate these records independent of their content;
we have meta-connection and meta-message information
available.

How is this useful in an http server? I don't know if its
useful - you'll tell me that. But here are some contrived
and exagerated examples that illustrate uses of connection-
and message-queues. Lets also suppose that we have the thread
pools as I proposed (acceptors(), readers(), processors() and
writers() ).

1. The acceptor() accepts a connection, and populates a
connection-record in the queue. We want to filter connections
by IP address, so we invent a filter() thread who traverses
the queue tossing out undesired connections, before the
readers() get the connection-records. This is trivial.

2. A connection-record may contain multiple requests, so it
probably contains a small header. The reader() reads-in each
request into a separate message-record in the message queue,
where thay are found by the processors(). Voila message
multiplexing!

Bret wrote:
> Multiple handler threads would be excellent here, however...
> but what kind of threads?

Here are some more contrived, ridiculous examples:

Remember that connection/message records and the threads
in thread pools (except for processors() ) are generic.
Acceptors() service only one port, but it can be any port,
including a connection to a database server that does
user authentication or spits-out HTML. Readers() can read
from any connection on the browser side or back-end; same
for writers(). Now lets assume we have a httpd which can
service 100 request per second. Based on measurements, we
determine that we need 6 acceptors(), three for browsers,
three for the back-end. Since browsers can be stingy, we
need 15 readers() and 15 writers(). We are going to do heavy
database access to three separate DBs on other hosts on the
lan, so we need about 20 processors(). The DB processing is
probably the bottleneck in this setup.

3. Complicated requests arrive from the browser, each request
requiring 3 database requests, one from each DB server. The
acceptors() and readers() do their thing and the processors()
get the requests. Each processor() parses his respective
request, formulates three DB queries, and throws them
simultaneously into the message queue.

3 continued: Now the usual procedure for processing would be
to make the first query, get the response, make the second
query and response, and then the third. Lets say that each
query/response takes 2 seconds (yes, contrived and ridiculous),
then total request processing is 6 seconds.

3. continued: However, since we have a message queue, all
three queries/responses are done simultaneously so total
request processing is 2 seconds. This parallelism is made
possible by the abstraction of messages from message processing
and explains why we have pools of readers(), writers(), etc.
That pool of readers() is working one browser request and
3 back-end DB responses which result from the browser request.


4. Take the same setup as in example three, but add 3 more
redundant DB servers. Since we can derive meta-connection
information from the connection queue, we can tell if a
DB server is bogging-down and load-balance to the redundant
server.


I don't know if any of the capabilities described in these
examples (I can give dozens more but its late and I'm tired)
belong in a httpd, but the architecture (process-model)
allows it to happen if desired. The process model is more
than the threading model - it is also how the data is
organized.


> As a side note, this entire discussion seems to justify
> Dean's point; a thread interface which abstracts most of
> this from the programmer and determines the best possible
> model to run on the current system would be
> wonderful.  If NSPR does that better than anybody else, ...

Agreed, but the thread interface, NSPR or otherwise, should
be able to handle whatever division of thread functionality
is desired, whether start-to-end or what I suggested. Don't
confuse the "thread interface" with the "thread model" with
the "process model".

Regards,

-- 
Mike Anderson
mka@redes.int.com.mx
+52 473 23730 voice/fax
Guanajuato, GTO, Mexico

"If it looks like a bug, waddles like a bug,
and quacks like a bug, its a quack!"