[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [apache-plusplus] Process model ideas for C++ Apache.
Hi Bret:
Thanks for replying. I'll try to answer your questions and
elaborate a little more on design details.
mka wrote previously:
> >My project is a hybrid of C++ classes and procedural C. All
> >sorts of C++ conventions were violated (such as member functions
> >which return the address of private data) to step around
> >certain C++ inefficiencies. I used the book
> >"Inside the C++ Object Model", by Stanley B. Lippman, 1996,
> >Addison Wessley, ISBN 0-201-83454-5, to help me identify C++
> >performance weak spots.
Bret wrote:
> Again, I can't speak for others, but this last paragraph sets
> off a few warning bells in my head. Without seeing your code
> it's impossible to say anything for certain, but it certainly
> sounds ominous to state that you violated standard C++ conventions! :)
I'd have to study the code and design docs again to identify my
various sins in regards to C++ conventions, but I believe the
main one's are these:
1. aggregating private data members into structures so that
memory copying operations can be more efficient.
2. attempting to eliminate all implicit memory copying of class
data such as a copy constructor might do.
3. class methods which return the address of private data,
especially aggregated structures for direct manipulation
outside the class methods.
4. use of structures instead of (private-data-member) classes
for data headers that are actually transmitted over the
network.
Each violation of C++ conventions was carefully considered at the
time, based on research and efficiency concerns. In retrospect,
I would probably do fewer of these non-conventional techniques,
and would justify them based on actual benchmarking instead
of theory.
mka wrote previously:
> >These are the major themes in the process model of my project:
> >
> >1. threads are self-dispatching through lightweight
> > synchronization such as mutexes, condition-variables,
> > or semaphores (collectively call a poll); no central
> > dispatcher, no main scheduling loop, no select().
Bret wrote:
> I'm by no means an expert on all things threaded, but from
> the little I've been able to pick up, it seems like this
> would require a relatively large number of mutexes. Did
> you find this was actually the case?
Yes - since connections and messages are maintained simply as
records in a pool (implemented as arrays), access to
individual records (connections or messages) is controlled by
locks (mutexes) on each record. In addition, access to the pool
of records, whether connections or messages, by threads wishing
to manipulate a record, is controlled by a condition-
variable (CV). That way, threads aren't burning CPU when they've
nothing to do.
On individual records, each record contains a state variable
indicating its current status, and attempts to seize a record
checks the status first, uses trylock() (in pthreads terminology),
and checks the status again if the lock is obtained. If the
record is already in-use (state variable) or trylock() fails,
the thread moves on to another record in the pool and tries
again. In theory, threads which have work to do are never blocked
on a lock held by another thread.
Bret wrote:
> Also, merely to satisfy my own curiousity, could you speak
> more specifically about how you handled scheduling without
> a dispatcher thread?
All threads have limited functionality - threads which do
connection acceptance only accept(); they do not read(),
process() or write(), etc. There are also read()ers,
write()ers, process()ors, etc., which only do that function.
Accept()ors block on accept() - they schedule themselves by
virtue of blocking or returning from accept(). The connection
information is thrown into the connection pool, the condition-
variable (CV) is asserted on the pool, and the accept()or
returns to accept() another connection.
Read()ers wake-up when the pool condition-variable (CV) is
asserted; any read()er can handle any connection in the pool.
So read()ers are scheduled by the randomness of the
condition-variable (CV) assertion. Read()ers read the
connection into a message pool, assert the CV for that pool,
and return to wait for another connection which needs reading.
Process()ors and write()rs are similarly scheduled by the CVs
for the message pool they are working. The potential bottleneck
and single-point-of-failure that a central dispatcher represents
does not exist. The failure of individual threads does not
halt processing, except an individual connection or message
record may not be processed if the failing thread locked it
before failing. Those records can be recovered by a timeout
on the thread.
> >2. the number of threads are minimized by specializing and
> > abstracting threads by "function" ( accept()or, read()er,
> > write()r, processer, etc.), so that no pending "function"
> > is waiting for another "function" to complete. For example,
> > a thread that accept()s -> read()s -> processes -> write()s,
> > cannot accept() another connection while it is read()ing.
> > Any read()er or write()r should be able to service any
> > connection provided by any accept()or. Any processor should
> > be able to service any message (request->response) from any
> > connection. More connections can be handled with fewer
> > threads since "functions" are asynchronous to each other.
> > There is a pool of read()ers, write()rs, processors, etc.
> > available to handle any message, waiting on its CV to go
> > to work. accept()ors are, of course, blocking on accept().
> > For network I/O, this scheme implies a pool(s) of
> > connection records in common or shared memory over which
> > accept()ors, read()ers, and write()rs can operate.
Bret wrote:
> I'm a bit perplexed by your wording; first you appear to refer
> to a thread that performs functions successively, in the case
> you cited accept() followed by read() and so on. Next, you refer
> to multiple pools of specialized threads, each of which performs
> a single function. This raises two questions in my mind...
> 1. Do threads change functionality over time? In other words,
> does a single thread perform multiple functions, or do you have
> pools of multiple threads, each thread in a given pool performing
> the function assigned to that pool?
Threads perform a single function.
Bret wrote:
> 2. ... passes handling for the
> next function off to a thread from within the available pool of
> threads for that function, how is the passing handled? What kind
> of overhead are you looking at?
The handoff is accomplished by asserting the CV for the message
pool. The overhead is the CV assertion.
Bret wrote:
> Oh, I see... the CVs handle the scheduling between the various
> pools of threads. Can't quite identify it, but I keep hearing
> "race condition" in the back of my head somewhere...
A CV can release one or all threads waiting on the CV (I forget
which I do.) In my theory, a "race condition" can't exist since
the thread which holds the record (either through its state
variable or through a lock) is the same one which asserts the
CV on the pool. The CV isn't asserted until the record is released.
The state-variable/trylock() check described above should prevent
a race between multiple threads trying to seize a record.
mka wrote previously:
> >3. connections are abstracted from network I/O. A connection
> > is nothing more than a record in the connection pool.
> > Network protocols (TCP, UnixDomain, UDP, TLI, etc.) are
> > instantiated objects. Bucket records are self-contained
> > objects which know their state and handle their own concurrency.
> >
> >4. complete abstraction of messages (requests & responses)
> > from connections (network I/O) and the processing of the
> > messages. This scheme implies a pool(s) of message records
> > in common or shared memory over which read()ers, and
> > write()rs, processors, and other thread-types can operate.
> > Bucket records are self-contained objects which know their
> > state and handle their own concurrency.
Bret wrote:
> Have you looked at the ACE classes, available from Washington U.,
> with respect to these items?
Yes, I looked at ACE, and studied all of Doug Schmidt's papers. It
was his research, along with the book previously mentioned, that
prompted me to use non-conventional C++. I confess that I had a
hard time understanding ACE's abstractions, but they struck me as
being very heavy-weight at the time. In retrospect, I don't think
I would use ACE now either, but I would probably use a more standard
C++ approach.
mka wrote previously (edited for clarity and slanted towards
HTTP service):
> >So, the macro processing model is:
> >
> >main thread:
> >
> > 1. parse config file and command-line arguments.
> > 2. allocate pools of connection and message records.
> > 3. startup initializations (thread attributes, etc.)
> > 4. start ThreadManager (who has a pool of threads.)
> > 5. request ThreadManager to start Processors, Readers,
> > Writers Acceptors, and whatever other threads are needed.
> > 6. hang-out waiting for exit event, clean-up and die.
> >
> >ThreadManager:
> >
> > 1. start threads as requested. Instantiate network
> > protocols for Acceptors.
> > 2. monitor connection- and message-record wait times.
> > start new threads as wait times increase (heavier
> > loads).
> > 3. reap dysfunctional and underused threads.
> >
> >Acceptors:
> >
> > 1. grab a connection-record from the connection pool
> > and accept().
> > 2. populate the connection-record (connection handle
> > and protocol object). Assert the connection pool CV.
> > 3. loop to 1.
> >
> >Readers:
> >
> > 1. grab an empty message-record from the message pool;
> > wait on connection pool CV.
> > 2. when asserted, grab any readable connection-record
> > from the connection pool and read() the message
> > (request) into the message-record.
> > 3. populate the message-record with the
> > connection-record id; assert the message pool CV.
> > 4. loop to 1.
> >
> >Processors:
> >
> > 1. wait on message pool CV.
> > 2. when asserted, grab any processable message-record
> > from the message pool.
> > 3. do a table-lookup for request/module type;
> > escort message through module functions.
> > 4. assert the message pool CV.
> > 5. loop to 1.
> >
> >Writers:
> >
> > 1. wait on message pool CV.
> > 2. when asserted, grab any writable message-record
> > from the message pool.
> > 3. get the connection-record from the connection
> > pool; write() the message.
> > 4. clear the message- and connection-records.
> > 5. loop to 1.
Bret wrote:
> I'll study this in a bit more detail later, but on a surface
> glance it appears quite intriguing. The only thing that
> strikes me right away is your use of polling over select()...
I'm not using poll(), if that's what you're referring too.
Neither am I polling for connections or messages. The initial
connection comes from accept() returning. All subsequent
access to connections and messages are controlled by CVs
and locks (mutexes). In my theory, no thread should ever
be using CPU unless they have real work to do.
Bret wrote:
> it seems to me, based on the limited info I have in front of
> me, that you're mimicing the functionality of select() by
> creating a multithreaded acceptor (i.e. a pool of acceptors,
> in your lingo).
Yes, that's the plan, although I've found in my testing that
a single accept()or dedicated only to accepting connections is
often enough. I haven't tested this under a heavy HTTP
simulation yet so it may well require a pool of accept()ors.
Bret wrote:
> When discussing the accept() phase only, how does this approach
> offer you any better performance than just using select()?
I'm not sure how Apache does it, but I presume it builds a mask
of selectable fds for select(), calls select(), searches for
and dispatches to an idle process, who then calls accept()
(correct me if I'm wrong.) All the select() processing plus
the search/dispatch seems like extra overhead.
The approach I describe assumes that only one accept() returns
for any connection. I've seen some messages on the Apache mailing
list that indicates that may not be true (sounds like a serious
bug to me.) I've never seen multiple accept()s return for a
single connection in my project, but perhaps I simply
haven't stressed it enough. Do you know anything about this?
Bret wrote:
> I assume you have some method of controlling the possibility
> of deadlock if two readers attempt to access the same
> connection-record simultaneously?
As described above, the combination of mutex locks and
state variables handles this nicely and also prevents
threads from blocking on a connection- or message-record
held by another thread. I call this technique "lazy-locking".
Bret wrote:
> It also appears to me that you simply remove any socket info
> when you remove the message and connection records in your
> writer process. When talking about a Web server, you'd have
> to retain a list of old sockets like Apache does so that, in
> the case of a restart, you won't run into "socket in use"
> problems.
That feature (old socket list) would still be needed. Keep in
mind that my project is not an httpd server but rather is
intended for message-passing distributed-computing service.
I just wanted to throw out some different ideas for a C++
Apache 2.0 processing model.
mka wrote previously:
> >Since the pools and threads contain concurrency
> >mechanisms, care must be taken not to serialize thread
> >execution by overzealous locking and so lose the inherent
> >parallelism possible in the design.
Bret wrote:
> Guess this is what I was getting at earlier... I could see
> this very easily becoming a source of debugging nightmares
> ("Why am I losing so much performance?").
Debugging multi-threaded programs, especially where you have
pools of shared resources, is a nightmare in any case.
Extreme clarity of design is vital to avoid crippled locking;
this is where the class model of C++ shines - the inherent
encapsulation greatly simplifies the complexity of managing
concurrency. However, a good C programmer can do the same
through other means. IMHO, "clarity" is key.
mka wrote previously:
> >One of the keys to good performance will be matching the
> >number of threads to the load. Fewer threads mean
> >less memory and less contention for shared resources.
> >There are good techniques for minimizing contention.
Bret wrote:
> What types of methods are you using in your product?
In my project, the two major techniques have already been
described above: the use of CVs to eliminate CPU burning,
and "lazy-locking" to minimize record contention and
threads blocking on the same resource.
If you have further interest in some of these process-model
ideas, I have some (out-of-date/incomplete) documentation
at http://www.commproc.com
I hope to get the source code up there very soon, but am
also in the process of remodeling an old colonial house
(I need to move in 4 weeks) and am traveling some in a new job.
Best Regards,
--
Mike Anderson
mka@redes.int.com.mx
+52 473 23730 voice/fax
Guanajuato, GTO, Mexico
"If it looks like a bug, waddles like a bug,
and quacks like a bug, its a quack!"