[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [apache-plusplus] Process model ideas for C++ Apache.
> From: Michael Anderson <mka@redes.int.com.mx>
>
>I sent a similar message regarding ideas for a 2.0 process
>model to the new-httpd list, where it was totally ignored.
>Here's your chance to ignore it also.
I didn't ignore it, it's still in my new-httpd inbox to comment on...
>1. threads are self-dispatching through lightweight
> synchronization such as mutexes, condition-variables,
> or semaphores (collectively call a poll); no central
> dispatcher, no main scheduling loop, no select().
Synchronization sucks. Only the kernel knows when it is necessary and
when it isn't (i.e. the kernel knows how many CPUs are in the box).
So this seems like a bad way to start.
select() (well, rather poll()) will have to be there behind the scenes.
You can't get rid of it and still implement the multiple fiber models
(using my terminology). I don't see what is wrong with it.
>2. the number of threads are minimized by specializing and
> abstracting threads by "function" ( accept()or, read()er,
> write()r, processer, etc.),
Ack! I'm already imagining the state-machine hell that Squid is.
Have you looked at Squid? That's awful to understand, and I've coded
to that model before and it's awful to code to. This is why I keep
emphasizing fibers -- they let folks program with a serial mindset,
which is a mindset folks are used to programing in... and they do it
with little performance loss as far as I can tell...
>3. connections are abstracted from network I/O. A connection
> is nothing more than a record in the connection bucket.
> Network protocols (TCP, UnixDomain, UDP, TLI, etc.) are
> instantiated objects. Bucket records are self-contained
> objects which know their state and handle their own concurrency.
Yeah this makes sense and I thought I alluded to it, maybe it was in
mail though. My intention was to develop a server framework, and
the two protocols I was immediately thinking of were HTTP and
(anonymous) FTP.
>4. complete abstraction of messages (requests & responses)
> from connections (network I/O) and the processing of the
> messages. This scheme implies a bucket(s) of message records
> in common or shared memory over which read()ers, and
> write()rs, processors, and other thread-types can operate.
> Bucket records are self-contained objects which know their
> state and handle their own concurrency.
There's that state word...
If I understand you, you want to make the state explicit and managed
by the programmer. By contrast, I really think state should be
implicit and managed behind the scenes by the run-time.
>Acceptors:
>Readers:
>Processors:
>Writers:
So let me give an example why I think this is so damn difficult to use.
I was going to post an example similar to this to new-httpd regarding
one of our Grails: layered i/o. You've got this Reader thread, and
suppose it manages to read 1 byte. That's it, the client is being stingy.
It doesn't know what that byte is, it just hands it off to the Processor
pool. The processor pool immediately says "wtf? I need more than one
byte!" and goes right back to the Reader pool. They go back and forth
like this until the Processor is finally satisfied with the bytes it has.
Every step along the way you've got mutexes being flipped around.
One byte is extreme, but 128 bytes isn't. Neither is 512 if you think
later during the passing of bytes to a CGI.
Ah, but we'll just hide this all behind a method that reads a line of
input, right? But, hey, that's exactly what the implicit state model
does...
>I don't have any performance comparisons on this
>approach, since I don't have anything to compare it
>with that is roughly equivalent in functionality (to
>my project) and which uses a different approach.
Of course, I don't have many numbers to back up my stuff either.
I've just got the docs and URLs I've mentioned in my process-model doc
and a few other points picked up since then.
>From: Michael Anderson <mka@redes.int.com.mx>
>
>I'd have to study the code and design docs again to identify my
>various sins in regards to C++ conventions, but I believe the
>main one's are these:
>
...
>3. class methods which return the address of private data,
> especially aggregated structures for direct manipulation
> outside the class methods.
Doing that, of course, completely negates one of the advantages of C++
(implementation hiding). Wouldn't inlined get/set methods get you the
same performance benefit while still gaining you the ability to modify
the implementation later?
>Each violation of C++ conventions was carefully considered at the
>time, based on research and efficiency concerns. In retrospect,
>I would probably do fewer of these non-conventional techniques,
>and would justify them based on actual benchmarking instead
>of theory.
:) a lesson I should learn too.
>Read()ers wake-up when the pool condition-variable (CV) is
>asserted; any read()er can handle any connection in the pool.
>So read()ers are scheduled by the randomness of the
>condition-variable (CV) assertion. Read()ers read the
>connection into a message pool, assert the CV for that pool,
>and return to wait for another connection which needs reading.
It sounds like you are describing the NT completion-port/fiber model.
That model can be easily hidden behind a serial programming interface
like NSPR...
>Process()ors and write()rs are similarly scheduled by the CVs
>for the message pool they are working. The potential bottleneck
>and single-point-of-failure that a central dispatcher represents
>does not exist. The failure of individual threads does not
>halt processing, except an individual connection or message
>record may not be processed if the failing thread locked it
>before failing. Those records can be recovered by a timeout
>on the thread.
No, but a common-failure-mode is extremely likely which could empty an
entire pool of threads. i.e. you're not buying any extra reliability
this way.
>I'm not using poll(), if that's what you're referring too.
>Neither am I polling for connections or messages. The initial
>connection comes from accept() returning. All subsequent
>access to connections and messages are controlled by CVs
>and locks (mutexes). In my theory, no thread should ever
>be using CPU unless they have real work to do.
The implication I'm reading from this is that with an approach such as
what I've got going with apache-nspr right now you believe that CPU is
wasted by idle threads... and that couldn't be further from the truth.
To be honest, there is some wasted CPU, but it's not idle threads.
For example, a write() may block, when that happens the syscall is aborted
with EWOULDBLOCK and NSPR puts the thread to sleep and schedules the
descriptor to run in the poll() call it does periodically. The thread
isn't wasting any CPU in between.
Now with NT completion-ports/fibers we can do one better -- there is
no need to poll, because a thread picks up the event exactly when it
finishes, and then starts running the fiber which was put to sleep
waiting for that event. That's hidden behind the NSPR interface.
The fact that Unix has to waste some CPU (and has worse latency) is a
shame. It's one of the things that I've been trying to explain to
the Linux folks for some time. The difference is between "wake one"
and "wake all" semantics on unix system calls -- unix calls only give
"wake all" semantics, every single kernel-level-task is awakened when
a poll() or select() can complete. That wastes CPU in synchronizing who
does the work for each completed event. POSIX offers some hope in this
area with real-time signals and a bunch of other crud that look like
they can be glued together to give wake-one semantics (but then I'm sure
the implementations will need to be tuned for this particular (ab)use
of the interfaces).
But the truth is, I stopped caring about this level of performance detail.
It is not something that we, as folks writing Apache, should have to
implement. It's something that the operating system vendors will realise
as soon as they benchmark Apache. And most of the commercial folks have
already realized it, we don't have to do a thing. NT has completion
ports; and Solaris has LWPs and a dance between user and kernel level
threads which they control completely and present as "pthreads".
NSPR makes me really happy up front because it can hide ALL of these
implementation details, and allows for all the models that I'm interested
in seeing apache support... In fact I would rather extend NSPR to
provide an "accept() which creates a thread to service each connection"
abstraction, because NSPR is at least one step closer to knowing which
of the dozens of models of accept/threadpools are appropriate to the
underlying multiprocessing model.
>Yes, that's the plan, although I've found in my testing that
>a single accept()or dedicated only to accepting connections is
>often enough. I haven't tested this under a heavy HTTP
>simulation yet so it may well require a pool of accept()ors.
A single accept()or cannot take advantage of the parallelism in an SMP box
with multiple network cards. High end stuff, that I think I'm about to
start ignoring for a while in the apache-nspr development... I'm about
to get rid of the entire pool of threads concept in the interest of a
simple accept()/create thread dispatch loop. Then later, after it's
been put into a benchmark situation, and it has been shown that multiple
accept()ors would win I'll consider changing it...
But right now I'm way more interested in simplicity.
>I'm not sure how Apache does it, but I presume it builds a mask
>of selectable fds for select(), calls select(), searches for
>and dispatches to an idle process, who then calls accept()
>(correct me if I'm wrong.) All the select() processing plus
>the search/dispatch seems like extra overhead.
Apache 1.x, single socket servers never need to call select() to accept.
Remember Apache 1.x is a multiple *process* model. They all just sit
in accept() until they get a request. They use blocking-i/o. The only
use of select() is to probe pipelined connections for a network packet
optimization. In this case it's a single file descriptor, and could
easily be poll() instead.
Apache 1.x multiple socket servers need to use select() to accept.
But remember, there's absolutely no dispatch. To do that requires fd
passing between processes, absolutely something to be avoided. There is
a lock that prevents multiple processes from doing this select()/accept()
at the same time (because it has starvation conditions we don't want to
deal with other ways).
The only threaded apache I'm familiar with is apache-nspr... and that
as I just mentioned is about to see this part overhauled. The poll() in
this case is there, hidden in the NSPR library. But, so what? The poll()
in this case gives a bunch of completed i/o events and they're scanned
and all threads with something to do get their chance.
>The approach I describe assumes that only one accept() returns
>for any connection. I've seen some messages on the Apache mailing
>list that indicates that may not be true (sounds like a serious
>bug to me.) I've never seen multiple accept()s return for a
>single connection in my project, but perhaps I simply
>haven't stressed it enough. Do you know anything about this?
Only one accept() returns for each connection, yes. But if you have
multiple processes in accept() at the same time, various unixes do
Bad Things. We work around those when we have to.
Dean