[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [apache-plusplus] Process model ideas for C++ Apache.
On Wed, 27 May 1998, Michael Anderson wrote:
> Sorry I haven't kept-up with the conversation. I caught a
> cold and didn't look at mail for 2 days ...
No rush, it's better when we only have to write one of these long posts
once a day :)
> Dean Gaudet wrote:
>
> > >1. threads are self-dispatching through lightweight
> > > synchronization such as mutexes, condition-variables,
> > > or semaphores (collectively call a poll); no central
> > > dispatcher, no main scheduling loop, no select().
> >
> > Synchronization sucks. Only the kernel knows when it is
> > necessary and when it isn't (i.e. the kernel knows how
> > many CPUs are in the box).
>
> I should have been clearer here. Connection- and message-
> records in their respective pools are protected by mutex.
> Threads are informed that a record needs processing by a
> condition-variable. I don't understand the relevance of
> the statement, "Only the kernel knows ..."
What I mean by that is that in general, it is only the kernel that is
completely aware of the parallelism available in the hardware. The
application doesn't know if it's running on a single processor box, a dual
or quad SMP box, a NUMA box with 32 processors, whatever. There are a
bunch of operations (i.e. most syscalls) which require synchronization in
the kernel; pretty much everything else doesn't need synchronization in
the kernel. If an application is written to avoid locking then it puts
the onus on the kernel to parallelize things as best as possible... and
that's a great place to put the onus on, because any reasonable kernel has
to do a good job or it won't do well on kernel benchmarks.
Multiprocessing hardware is fairly common today, pretty much every machine
I work with is a dual cpu intel box. Intel bought Corollary, and they've
announced that the future for servers is 4-way and 8-way systems. So it's
not unreasonable to plan ahead for these things and try to reduce
application level locking as much as possible.
> > select() (well, rather poll()) will have to be there
> > behind the scenes. You can't get rid of it and still
> > implement the multiple fiber models
>
> Your process-model paper defines "fiber" as a user-level
> thread, as opposed to a kernel-level "thread" - are you
> saying that user-level threads such as Chris Provenzano's
> or Frank Mueller's pthreads packages use select() internally?
> Does it matter if its hidden behind the pthreads interface?
Yup they use select() (or poll()) internally. It doesn't matter to me
that it's hidden, but I was getting the impression that it bothered you...
but that doesn't appear the case now. I read you proposal the wrong way.
> Are we talking about the same "state" here? I'm referring to
> the state of a message (where it is in the list of processing
> that needs to be done.) You seem to be referring to CPU
> scheduling?
Yeah we are slightly confused. The state of the message in your system is
actually a state of CPU scheduling as well -- because different threads
are scheduled in different states. In your system it appears that a
single message has the latency of at least 4 context switches. I don't
see the need to burden the request path with necessary context switches...
> Aha! we arrive at a fundamental mis-understanding and perhaps
> a fundamental difference in the two types of projects (an
> http server and a MOM (message-oriented middleware.)) In
> the MOM, the read()er knows the length of the incoming
> message from a header preceding the message data.
Ah, yes, a fundamental difference :)
> I guess an http server only knows it has the entire message
> when the client closes the connection, Does Apache begin
> processing a message before its all there? Are there byte-by-
> byte I/O optimizations?
Actually it reads header lines until it comes to a CRLFCRLF pair which
terminates the headers. Then it can begin request processing. Some
requests include a message-body (i.e. POST, PUT), which can be processed
later after the initial mapping from url/etc to request handler happens.
With HTTP/1.1 there is the possibility of pipelining requests, but the
responses have to come back in order. In theory we could achieve better
latency by doing some parallelization and doing some magic the order the
responses. But I don't think that's worth it.
HTTP/ng, there isn't even a draft for it yet, includes multiplexing of a
single connection. In this case we do have to do requests in parallel.
Then something like your proposal makes sense. A thread handles
unbundling the input requests and pushes them into serial HTTP/1.x request
processors. There probably isn't a need for an output bundler because the
output mux can be written in arbitrary orderings and we can just mutex
protect the output filedescriptor. (In this case we're only locking
between threads serving a single client, not multiple clients, which
allows us to still exploit parallelism of multiple clients.)
Dean