[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Process model ideas for C++ Apache.



Hi apache-plusplus list:

I sent a similar message regarding ideas for a 2.0 process
model to the new-httpd list, where it was totally ignored.
Here's your chance to ignore it also.

---------------------------------------------------------

I read Dean's Process Model Design document, an interesting
discussion of micro issues, and the other documents it refers
too. I like to throw out some ideas about two levels up which
are very different from Apache's current process model. If
these ideas are unsuitable for a httpd server, I'd like to
hear why, since that probably means I made some bad design
decisions in my own project.

Caveat: My knowledge of Apache is small, currently that
of writing a C++ module to interface a multi-threaded middleware
project to Apache. The documentation for the project is at:

http://www.commproc.com

The code is not yet on the site (while the license is being
developed), but I can relate some of the design issues of
multi-threading and network I/O that worked for me. Some may
be useful to 2.0.

My project is a hybrid of C++ classes and procedural C. All
sorts of C++ conventions were violated (such as member functions
which return the address of private data) to step around
certain C++ inefficiencies. I used the book
"Inside the C++ Object Model", by Stanley B. Lippman, 1996,
Addison Wessley, ISBN 0-201-83454-5, to help me identify C++
performance weak spots.

These are the major themes in the process model of my project:

1.  threads are self-dispatching through lightweight
    synchronization such as mutexes, condition-variables,
    or semaphores (collectively call a poll); no central
    dispatcher, no main scheduling loop, no select().

2.  the number of threads are minimized by specializing and
    abstracting threads by "function" ( accept()or, read()er,
    write()r, processer, etc.), so that no pending "function"
    is waiting for another "function" to complete. For example,
    a thread that accept()s -> read()s -> processes -> write()s,
    cannot accept() another connection while it is read()ing.
    Any read()er or write()r should be able to service any
    connection provided by any accept()or. Any processor should
    be able to service any message (request->response) from any
    connection. More connections can be handled with fewer
    threads since "functions" are asynchronous to each other.
    There is a pool of read()ers, write()rs, processors, etc.
    available to handle any message, waiting on its poll to go
    to work. accept()ors are, of course, blocking on accept().
    For network I/O, this scheme implies a bucket(s) of
    connection records in common or shared memory over which
    accept()ors, read()ers, and write()rs can operate.

3.  connections are abstracted from network I/O. A connection
    is nothing more than a record in the connection bucket.
    Network protocols (TCP, UnixDomain, UDP, TLI, etc.) are
    instantiated objects. Bucket records are self-contained
    objects which know their state and handle their own concurrency.

4.  complete abstraction of messages (requests & responses)
    from connections (network I/O) and the processing of the
    messages. This scheme implies a bucket(s) of message records
    in common or shared memory over which read()ers, and
    write()rs, processors, and other thread-types can operate.
    Bucket records are self-contained objects which know their
    state and handle their own concurrency.


That's it. Of course, the devil is in the details.

So, the macro processing model is:

main thread:

    1.  parse config file and command-line arguments.
    2.  allocate buckets.
    3.  startup initializations (thread attributes, etc.)
    4.  start ThreadManager (who has a bucket of threads.)
    5.  request ThreadManager to start Processors, Readers,
        Writers Acceptors, and whatever other threads are needed.
    6.  hang-out waiting for exit event, clean-up and die.

ThreadManager:

    1.  start threads as requested. Instantiate network
        protocols for Acceptors.
    2.  monitor connection- and message-record wait times.
        start new threads as wait times increase (heavier
        loads).
    3.  reap dysfunctional and underused threads.

Acceptors:

    1.  grab a connection-record and accept().
    2.  throw the connection-record (connection handle and
        protocol object) into the connection bucket. Assert
        the Reader's poll.
    3.  loop to 1.

Readers:

    1.  grab a message-record from the message bucket; wait
        on poll.
    2.  when asserted, grab any ready connection-record
        from the connection bucket and read() the message
        (request) into the message-record.
    3.  throw the message, with connection-record id, into
        the message bucket; assert the Processor's poll.
    4.  loop to 1.

Processors:

    1.  wait on poll.
    2.  when asserted, grab any ready message-record
        from the message bucket.
    3.  do a table-lookup for request/module type;
        escort message through module functions.
    4.  throw the message into the message bucket;
        assert the Writers's poll.
    5.  loop to 1.

Writers:

    1.  wait on poll.
    2.  when asserted, grab any ready message-record
        from the message bucket.
    3.  grab the connection-record from the connection
        bucket; write() the message.
    4.  clear the message- and connection-records.
    5.  loop to 1.


There is, of course, much more happening here than the
description gives: sanity checks, clean-up of abandoned
records, maintaining load state, etc.

Since the buckets and threads contain concurrency
mechanisms, care must be taken not to serialize thread
execution by overzealous locking and so lose the inherent
parallelism possible in the design.

The performance characteristics of this approach will
vary depending on the thread package type: kernel, user,
or hybrid, and, in my theory, should scale well on
multiple processor machines. I haven't done any
measurements yet to verify/negate that theory.

One of the keys to good performance will be matching the
number of threads to the load. Fewer threads mean
less memory and less contention for shared resources.
There are good techniques for minimizing contention.

I don't have any performance comparisons on this
approach, since I don't have anything to compare it
with that is roughly equivalent in functionality (to
my project) and which uses a different approach. Also my
project is much more complex, with component
registrations, routing, load-balancing, message-queuing,
message prioritizing, hot plug/unplug of components,
workflow, protocol-hopping, blah, blah, blah ...
These kind of features don't belong in a HTTP server
and, poorly done, can really drag performance.

Comments?


-- 
Mike Anderson
mka@redes.int.com.mx
+52 473 23730 voice/fax
Guanajuato, GTO, Mexico

"If it looks like a bug, waddles like a bug,
and quacks like a bug, its a quack!"