[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [apache-plusplus] Process model ideas for C++ Apache.



Bret wrote:

> >>4.  complete abstraction of messages (requests & responses)
> >>    from connections (network I/O) and the processing of the
> >>    messages. This scheme implies a bucket(s) of message records
> >>    in common or shared memory over which read()ers, and
> >>    write()rs, processors, and other thread-types can operate.
> >>    Bucket records are self-contained objects which know their
> >>    state and handle their own concurrency.
> >
> >There's that state word...
> >
> >If I understand you, you want to make the state explicit
> >and managed by the programmer. By contrast, I really think
> >state should be implicit and managed behind the scenes by
> >the run-time.

I asked Dean in another message to clarify this point - I don't
know if he's talking about message state (where the message is
in the list of message processing), or scheduling of the
thread.

> From merely a cursory glance, I'm not sure this is required
> by this model... theoretically, some of the "buckets" could
> have pre-designed functions which control this and which
> _should_ be hidden from the programmer ...

This is exactly the case in my implementation. The message
bucket is really a message-queuing system with the bucket
(or pool) being an array of message-records, and methods
are provided for traversal/insertion/etc. The methods
explicitly assert the CVs (condition-variables) which inform
the appropriate threads (however scheduled by the CPU) that
they have work to do.


> >>3.  class methods which return the address of private data,
> >>    especially aggregated structures for direct manipulation
> >>    outside the class methods.

> I read this and was frightened.  Kinda hard to hide information
> if you're throwing the address where your data lives all over
> the place.
> ... if you want to use your approach and perform some abstraction
> between programmer and thread dispatching, you'll probably want
> to avoid this like the plague...

This shouldn't be frightening, its exactly what C does with
structures or C++ with public data members. If I made the
data public deliberately, no one would give it a thought. The
data is private because I generally want it controlled, but
for efficiency reasons, for a specific operation, its convenient
to pass the address of the aggregated data. Since the data
is aggregated into structures and the structures into arrays
(in this implementation), and the data is static, the only
real loss is information-hiding in exchange for a kick to
performance. Thats a design trade-off I made at the time. It
may prove to be a bad decision. In any case, my implementation
sins re: C++ have no bearing on the merits of the proposed
process model.

I don't know if the proposed process model is appropriate
for an http server; I don't see why it wouldn't work, so I
threw out some of the ideas I found useful in the
MOM (message-oriented middleware), so the httpd experts could
consider and feed-back.

> Just out of curiousity, which of your sources suggested
> this kind of approach?

None of the sources suggest sinning against C++. What they
do document is the performance penalty caused by certain types
of operations. Doug Schmidt's (ACE) benchmarking papers of CORBA
middleware mention several performance problems of OO middleware.
Here is my list of performance killers, in order or severity -
many of these have little or nothing to do with httpd software:

1.data marshaling. 
2.memory copying at the user and kernel level. 
3.memory management. 
4.threading paradigm and concurrency models/mechanisms. 
5.messaging overhead. 
6.network protocol choice. 
7.network protocol tuning. 
8.function-call or method overhead. 
9.data/logic demultiplexing.

So, what I did was to apply this list against the C++ internal
processing model, and then design my code to encourage the
compiler to generate efficient code.
 

> "Readers are scheduled by the _randomness_ of the condition
> variable assertion"?????
> This sounds highly suspect to me... how do you avoid
> interference? ... in the design you've used, how do you
> avoid multiple readers being awakened by a CV change and
> both attempt to perform the same action simultaneously?
> You either have to have some kind of scheduler, or you
> have to do a lot of praying.

This is absolutely not the case. Depending on which pthreads
call you make ( pthread_cond_signal() or
pthread_cond_broadcast() ) you can start one or all read()ers.
It doesn't matter either way. Remember that each individual
message- or connection-record is a class who controls its
own concurrency. Its not possible for two threads to grab
the same record (unless deliberately programmed that way).
Nor will they block on a locked record because of the
"lazy-locking" mechanism described earlier.
If several read()ers were started and only one connection
was available, one read()er would seize the connection record
and the others would traverse the queue (or pool or bucket),
find nothing, and go back to the CV having burned a tiny
amount of CPU.

What I meant by "randomness" is that the O/S decides which
thread to start, based on its own criteria. Since any read()er,
processor(), or write()r can handle any connection- or message-
record, no one cares which thread leaves the gate.


Dean wrote:
> >... I'm about to get rid of the entire pool of threads
> >concept in the interest of a simple accept()/create thread
> >dispatch loop. ...

In the document, "Concurrency within DOE Object Implementations",
Bob Hagmann presents a compelling argument that the proper use
of threads is for program structuring and design. Performance
enhancement is a secondary goal. Interesting paper.


> These two distinctions cannot be emphasized enough... Apache
> runs multiple processes rather than a single-process,
> multithreaded model, which is currently being discussed.
> Having the children perform the accept() also avoids the
> "dispatch" overhead you expressed concern about before. I
> missed the lock to avoid starvation in the Apache code ...

> ... how do you avoid this without some synchronizing agent
> somewhere?

Sorry, I don't understand what the "this" is that should be
avoided. If you mean "starvation conditions", I don't know
what they are - I've never seen anything like that. This may
be a problem of "select/accept" in "multiple processes" as
Dean described that doesn't occur in single process/multi-
threaded accepts() sans select. Perhaps Dean can clarify?


I probably should have stated my original interest in the
concept of dividing-up network I/O and message processing
by functionality, each function to its own thread(s). I
worked for about 10 months on a project to put the SABRE
travel reservation system on the internet as
www.travelocity.com.  That system used these components:
a Netscape httpd, a middleware interface, a DB interface, and
the SABRE mainframe transaction system. Each request used
a "clone()d" thread in the two middle components to process
the request from start-to-end. So 300 simultaneous requests
required 300 threads in each middle component. The memory
and context-switch burn of this setup was appalling. I was
asked to investigate alternative architectures and
prototyped something very similar to the process model I
described previously. Although we were unable to make an
apples-to-apples comparison of a prototype to an
in-production system, the measurement data we evaluated
suggested that the prototype would have equivalent
performance with 30-50% of the number of threads of the
production system. This was due to the inherent parallelism
possible in I/O intensive processing, multiple CPUs, and the
disparity in processing time of the different functions
(accept(), read(), write(), process() ). The message queues
buffered the differences in processing time and resources
between the different functions.

I don't make any claims for the real-world application
of the concept, but I think there may be a useful idea
in there somewhere.

Regards,

-- 
Mike Anderson
mka@redes.int.com.mx
+52 473 23730 voice/fax
Guanajuato, GTO, Mexico

"If it looks like a bug, waddles like a bug,
and quacks like a bug, its a quack!"