[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Process model for C++ Apache
Hi Bret:
This message is turning into a novel, and it seems that
we are using the same words, but talking about slightly
different subjects. I also believe we have a different
understanding of threads, mutexes and CVs (condition
variables.) I believe that you base much of your distrust
of using multiple threads per message on this different
understanding.
Here is a clear, unambiguous statement of my understanding
of the relationship among threads, mutexes, and CVs
(correct me if I'm wrong):
It is NOT TRUE that flipping a mutex or asserting a CV
causes a kernel-context-switch (KCS) in the general case.
This depends on the threading model.
For Dean's fibers (user-level threads), flipping a mutex
or asserting a CV does NOT cause a kernel-context-switch (KCS),
because these are also light-weight user-level artifacts
(it is possible to implement user-level threads to force
a kernel-context-switch at these points, but that defeats
the purpose of user-level.) So for fibers, the start-to-end
model has NO kernel-context-switch (KCS) advantage over the
multiple thread model. Fibers also cannot take advantage
of multiple CPU scheduling.
For hybrid threads, whether mutexes or CVs cause a KCS
depends on the implementation of the threads package. Your
guess is as good as mine, but I would suspect that mutexes
do not ever cause a KCS, and its probable that CVs also
do not. If this is the case, for hybrid thread packages,
the start-to-end model has no KCS advantage over the
multiple thread model. If this is not the case, the
start-to-end model has a KCS advantage over the multiple
thread model, but this may be counter-balanced by the
hybrid's ability to schedule across multiple CPUs.
Therefore, I believe that that the KCS (dis)advantage
of start-to-end vs. multiple-thread is roughly a wash.
Multiple threads per message should allow equal message
processing performance with a large reduction in thread
count, which has a performance advantage on most hardware.
However, start-to-end is easier to conceptualize and
program.
Now, on to the novel:
Mike Anderson wrote previously:
> >Thread pools are a
> >natural extension of the more-important abstraction of
> >connection- and message-pools and the way they abstract
> >connections from network I/O, and messages from the
> >processing of messages. A connection is nothing more than
> >a record in the connection queue; a message is nothing more
> >than a record in the message-queuing system.
Bret wrote:
> I agree with your abstraction regarding a connection and/or
> message queueing system. That's fine... but how does a
> thread pool model become a "natural extension" of that? If
> by "thread pools" you mean your model of thread pools (i.e.
> pools of threads performing a single function), I would argue
> there is at least one other model which is just as natural...
> a single thread handling a single request over it's entire
> lifetime.
I mean that pools of connections and messages imply multiple
threads; what's the use of having a pool or queue unless
multiple entities (in this case, threads) wish to look at
and manipulate the pool records?
Let me ask this a different way: of what use is
the connection- and message-abstractions of pools to the
start-to-end thread model? If the same thread takes
connection/message from acceptance through writing back the
response, there is no significant difference (abstraction)
between a connection and a message, and a pool is useless
overhead. Since the connection/message mapping to threads is
one-to-one, there also is no significant abstraction between
them. Connection/message/thread is essentially all the same,
so lets throw away those pesky connection- and message-
abstractions.
Mike Anderson wrote previously:
> >Organizing connections and messages in this form allows us
> >to manipulate these records independent of their content;
> >we have meta-connection and meta-message information
> >available.
Bret wrote:
> Okay... we still agree... but independent of your method of
> processing (i.e. the abstraction of each pool performing a
> "function" as opposed to a single thread handling an entire
> request for it's entire lifetime), how is this different from
> any other Web server on the planet?
Sorry, I guess I don't understand the question. I think the
connection- and message-abstractions (pools or queues) which
imply the thread abstraction (pool) is different from the
integrated connection/message/thread model used by some Web
servers. Most Web servers already have some sort of pool of
threads or processes, but with a different abstraction concept,
and there are certainly other ways of assigning thread
functionality thats neither start-to-end or what I proposed.
Mike Anderson wrote previously:
> >How is this useful in an http server? I don't know if its
> >useful - you'll tell me that. But here are some contrived
> >and exagerated examples that illustrate uses of connection-
> >and message-queues. Lets also suppose that we have the thread
> >pools as I proposed (acceptors(), readers(), processors() and
> >writers() ).
> >
> >1. The acceptor() accepts a connection, and populates a
> >connection-record in the queue. We want to filter connections
> >by IP address, so we invent a filter() thread who traverses
> >the queue tossing out undesired connections, before the
> >readers() get the connection-records. This is trivial.
> >
> >2. A connection-record may contain multiple requests, so it
> >probably contains a small header. The reader() reads-in each
> >request into a separate message-record in the message queue,
> >where thay are found by the processors(). Voila message
> >multiplexing!
Bret wrote:
> You've now got(at least) three functions, with three condition
> variables, along with three mutexes. You can do the same thing
> with one mutex, wrapped around your accept() call... why
> _introduce_ complexity if it doesn't buy you anything?
If the mutexes and CVs are light-weight (no KCS), who cares? It
is more complex, so we encapsulate the complexity nicely in
classes.
I'm really curious to hear how you are going to handle example
2 with a single start-to-end thread; you have multiple requests,
lets say 3, and a single start-to-end thread. Just to start the
ball rolling, are you going to:
1. read in the first message, process it entirely, get the 2nd,
process, then the third? Wouldn't it be nice to do these in
parallel?
2. throw the 2nd & 3rd messages into a message queue? What kind
of thread will find it there? It has to be different because the
accept() and read() have already happened. How will the other
threads know they have work to do?
3. grab the first request and put the remaining stream in the
connection pool for another thread? That will require a different
thread, since the accept() has already happened. How does the
other thread know he has work to do? The other thread will have
to repeat the process for the 3rd request.
4. read in all three requests, instantiate 2 more start-to-end
threads and hand them their request? Now our simple start-to-end
request processor is in the thread management business; will the
first thread monitor the others for successful completion?
5. How would you do it?
> >Bret wrote previously:
> >> Multiple handler threads would be excellent here, however...
> >> but what kind of threads?
> >
Mike Anderson wrote previously:
> >Here are some more contrived, ridiculous examples:
> >
> >Remember that connection/message records and the threads
> >in thread pools (except for processors() ) are generic.
> >Acceptors() service only one port, but it can be any port,
> >including a connection to a database server that does
> >user authentication or spits-out HTML. Readers() can read
> >from any connection on the browser side or back-end; same
> >for writers(). Now lets assume we have a httpd which can
> >service 100 request per second. Based on measurements, we
> >determine that we need 6 acceptors(), three for browsers,
> >three for the back-end. Since browsers can be stingy, we
> >need 15 readers() and 15 writers(). We are going to do heavy
> >database access to three separate DBs on other hosts on the
> >lan, so we need about 20 processors(). The DB processing is
> >probably the bottleneck in this setup.
Bret wrote:
> I can do the same thing with some type of Socket object
> that doesn't require the introduction of a new thread pool,
> again with all the overhead that that brings. I understand
> that the generic nature of your thread pool is nice, but if
> I can do the same thing with objects _without_ having to
> introduce a new thread every time I want to write somewhere,
> why not do that?
What is the new thread pool you are talking about? We have a
pool of generic threads which are pre-started (although the
pool expands and shrinks according to load) to minimize
thread start-up overhead. These threads service the front- and
back-end (in this example) according to their function. They
make full use of the connection- and message-abstractions for
fully asynchronous operation. I guess I don't understand the
question. How are you going to manage your "pool" of objects?
Mike Anderson wrote previously:
> >3. Complicated requests arrive from the browser, each request
> >requiring 3 database requests, one from each DB server. The
> >acceptors() and readers() do their thing and the processors()
> >get the requests. Each processor() parses his respective
> >request, formulates three DB queries, and throws them
> >simultaneously into the message queue.
> >
> >3 continued: Now the usual procedure for processing would be
> >to make the first query, get the response, make the second
> >query and response, and then the third. Lets say that each
> >query/response takes 2 seconds (yes, contrived and ridiculous),
> >then total request processing is 6 seconds.
> >
> >3. continued: However, since we have a message queue, all
> >three queries/responses are done simultaneously so total
> >request processing is 2 seconds. This parallelism is made
> >possible by the abstraction of messages from message processing
> >and explains why we have pools of readers(), writers(), etc.
> >That pool of readers() is working one browser request and
> >3 back-end DB responses which result from the browser request.
Bret wrote:
> I agree with you when you say "The DB processing is probably
> the bottleneck in this setup..."
> In the case of multiple databases, maybe. But in the case of
> multiple queries to a _single_ database, this just doesn't work.
> Your numbers assume that the database can handle the concurrency
> you're looking for. Let's assume the database only receives
> connections on a single, well-known port... if it get's three
> requests simultaneously, it will most likely queue the requests
> and process them one at a time. At least that's my suspicion.
> So you're still waiting. Only now you've got blocked threads
> sitting around twiddling their thumbs while you're waiting for
> a database to respond to your query.
> If that is the case, what has the concurrency in your program
> bought you but excess overhead?
Hold-on here! I presented an example which is not uncommon in
large sites (I've worked on one even bigger than the example)
and showed how intra-request parallelism (as different from the
extra-request parallelism of example 2) is possible with the
connection-, message-, and threading abstractions that I
suggested.
It is not a valid response to rewrite the example so that no
parallelism is possible and then ask "what has the concurrency
in your program bought you".
And just to clarify a point in the rewritten example, since these
generic threads, and the processor() thread are asynchronous,
there are no "blocked threads sitting around twiddling their thumbs".
These threads are processing other requests while the DB is doing
its thing. This is how fewer specialized threads can process more
requests than the number of threads. However, the start-to-end
threads will be "twiddling their thumbs" in either the original
or the rewritten example.
Bret wrote:
> On a side note, I'm not sure how well a design such as this lends
> itself to modularization... you're doing _everything_ within
> the server core. I would say you would be better off coding
> that functionality into an external module, perhaps giving the
> module an interface to the "thread pools", and letting them
> perform the query. The module can be implemented however you'd
> like... but I guess I just keep wondering why I would use a
> thread pool model if I can achieve the exact same effect with
> multiple threads of Socket objects...
That is an implementation issue. The same functionality can be
provided very modular or very intergrated according to the
requirements of the program - in this case an httpd.
By the way, your "multiple threads of Socket objects" sound
suspiciously like my thread pool, with similar management
overhead.
Mike Anderson wrote previously:
> >4. Take the same setup as in example three, but add 3 more
> >redundant DB servers. Since we can derive meta-connection
> >information from the connection queue, we can tell if a
> >DB server is bogging-down and load-balance to the redundant
> >server.
Bret wrote:
> Agreed... I don't think anyone is challenging your connection
> abstraction or your request abstraction... Apache makes hefty
> use of something similar right now. Look through the code and
> check out request_rec and conn_rec (or is it connection_rec?
> I never can remember)... these structs either do or easily could
> perform the exact same functions you're talking about. But again,
> I would argue most of this should be modularized...
Two points here:
1. Certain generic functionalities can be part of the core
because they apply to all messages, but can also be modular
in the since that its trivial to remove or modify. These are
not model issues but implementation issues.
2. I don't see usefulness of connection- or message-abstraction
using the start-to-end threading model. Help me out here.
Mike Anderson wrote previously:
> >The process model is more
> >than the threading model - it is also how the data is
> >organized.
Bret wrote:
> This distinction is crucial... I accept the second part
> ("how the data is organized"), but I'm highly suspicious of
> your threading model (actually, in the terms you use below,
> I believe I should say "process model"). And, apparently like
> you, I do not view the two as being impossibly intertwined...
Well, we agree on something.
Bret wrote:
> the "process model"
> should be determined by me, the "thread interface" by whomever
> designed my library, and the "thread model" by whatever is best
> for the platform I'm currently on.
We agree again.
From a different message Bret wrote:
> For example, I've got five kernel threads. I've also got twenty
> fibers... for simplicity, say the mapping is even, four fibers
> to a thread. Now, I've got five kernel-based context-switches,
> each of which may include four user-level context-switches. Is
> that really as bad as twenty kernel-level context switches?
You are assuming that there will be twenty KCS when there will
only be five with either start-to-end or multiple-thread models.
Regards,
--
Mike Anderson
mka@redes.int.com.mx
+52 473 23730 voice/fax
Guanajuato, GTO, Mexico
"If it looks like a bug, waddles like a bug,
and quacks like a bug, its a quack!"