Multicasting and Switches

D Yuniskis · 2010-11-08T21:35:07+00:00

Hi,This is the first of a series of related posts.I figured it best to make sure the groundwork is inplace, first...On any *wired* network (save the wireless complicationsfor later) using a star *physical* topology beyond10BaseT, there is a potential for multicast packetsto NOT arrive at all nodes coincidentally (i.e.,ignoring "speed of light" propagation down the wire).Presumably, switches enqueue incoming multicast packetson *all* outbound ports. Since there is no way of knowingwhat's already queued on a particular port, the storagetime in the switch can vary from port to port.Also, the switch can't know a priori if multicast trafficmight *originate* on all ports simultaneously. (Considerthis in light of the previous assertion).So, first question: how do switches handle incomingmulticast traffic (in terms of a real algorithm, notjust an approximate "hand waving" explanation)?This then clears the way for the second question: howto determine the maximum latency presented by a switch?And, the third question: how is the above mentionedpathological case handled (i.e., you can create moretraffic than you have bandwidth to process!)?Or, do switch vendors avoid this with arbitraryrestrictions on their application? (e.g., akin tosegmented hubs in days gone by)I could see how a switch could conceivably monitorIGMP traffic to "eavesdrop" on the virtual connectionsdesired. But, IGMP is only required when routerscome into play (or is this a misunderstanding on my part?).[I should just set up a few multicast hosts and watchwhere the bytes go... :-/ ]

D

D Yuniskis 15 years ago

Sorry, I should have been more precise in my comment. I intended "multicast" to encompass broadcast :-/ I should have said "non-unicast" to be more clear.

Understood. See below

The control packets are broadcast. But, it is unclear as to how the actual payload is delivered (seems to be left to the application to decide?). E.g., the protocol seems to imply each "Need"-ing host can use whatever (supported) protocol to fetch the payload from the "Have"-ing host(s). I don't see anything akin to reliable multicast inherent in the protocol (though, conceivably, a host that loses/drops payload can subsequently reissue a "NeedFile" request).

OK, different usage model than what I was targeting. E.g., consider N (N being large) diskless workstations powering up simultaneously and all wanting to fetch the (identical) image from a single/few server(s). Clearly, a broadcast/reliable multicast scheme would best utilize the network bandwidth (in this case).

Here, I'm looking at the scenario where many devices are powered up simultaneously (as above) and need to fetch images over a network that is already being used for other traffic (MM and otherwise). Or, when their collective "mode of operation" changes at run-time and they need to (all) load an "overlay", etc.

Unicast transfers (of any sort) mean that the "image server" sees a higher load as it has to push the same image out N times. (A P2P scheme shares that load among the nodes themselves but you still have a longer theoretical time until all nodes have valid images -- unless your P2P algorithm carefully schedules which packets go where to maximize network utilization). Ideally, the "image server" would coincide with the "media server" (or at least *one* such media server) so that host is already busy with some sort of load.

I *think* (unsupported) that reliable multicast/broadcast gives you the shortest time to "everyone having a clean image" -- of course, cases exist where any protocol can be boundless.

I was only mentioning wireless in the sense that it can exploit broadcast easily if there is an underlying protocol to govern access to the "medium" (hence the distinction between loose/tight meshes)

Vote

C

Clifford Heath 15 years ago

TCP - it's documented under the heading "TCP file transfers". Either a raw stream is sent in response to a NeedFile request (if the needer advertised a port number), or a simplified HTTP-style raw request-response, if the needer responds to an advertised port in a WillSend promise.

The two forms are needed in case one system has a (software?) firewall.

Everything is checked through SHA-1 hashes. If a file reaches its expected size but the hash doesn't match, the whole file is dropped (since there's no way to know where the error occurred) and the search starts afresh.

Yes.

Yes, but beware that a single dropped packet at the source will cause every recipient to send a NACK. This is the problem with massive wide-area reliable multicast protocols, they get you out of the data fan-out problem but replace it with a NACK fan-in one instead. That's also why some routers have been taught how to aggregate such NACKs.

If you're dealing with tens or hundreds of machines on a LAN, it's probably not an issue - low error rate and it's possible to cope with the NACKs. Even if many of them are dropped, it only requires one to get through and the requested section will be multicast again.

I expect that due to startup timing differences, you'd need to sit and listen for a relevant multicast to start for a few seconds before requesting it... or to wait after receiving such a request for a few seconds before starting to send.

Include identification inside the stream so latecomers realise what they're missing and can re-fetch the earlier parts.

The other thing we considered doing with massive multicast involving overlapping sets of parties was to allocate 2**N multicast groups, and take N bits of the SHA-1 of the file (or channel ID, if not using content-addressing) to decide which IGMP group to send it to. That way a smart router can refrain from sending packets down links where no-one might be interested.

Clifford Heath.

Vote

P

Paul Keinanen 15 years ago

One way would be to break up the message into numbered blocks with CRCs and use a simple carousel to repeatedly broadcast those blocks. This is used e.g. for firmware updates for TV STBs, in which no return channel is available.

Each receiver accumulates blocks and if you did not get all blocks during the first cycle, wait for the next carousel cycle to pick the missing blocks.

If there is a return channel, each slave could initially request all blocks, after one full cycle check which blocks are missing and only request those missing blocks. After each full cycle, the server would check all the update requests received during the cycle and drop those blocks from the carousel which have not been requested, those speeding up the update cycle, finally shutting down the carousel.

If the expected error rate is low, the missing blocks could be asked even with unicasts.

If the expected error rate is high, such as in some radio links with large blocks, a memory ARQ system could be used, in which blocks failing the CRC are stored and if subsequent reception(s) of the same block also fail the CRC check, the previously received blocks are accumulated, until the accumulated block passes the CRC.

Alternatively, the few missing blocks could be transmitted again with a better ECC coding or just send the actual error correction bits to be combined with the ordinary received data block bits (assuming proper interleaving) at the receiver.

Vote

D

D Yuniskis 15 years ago

Yes but all "unicast" (i.e., connection oriented protocol)... the "need-er" and the "have-er" engage in a dedicated dialog.

Hmmm... not sure I see why (though my brain is frozen from lying on the roof for the past hour :-/ )

Understood. You don't split the file into "pieces" (though, conceivably, one could "pre-split" the REAL *files* into smaller "files" at the expense of a tiny bit more overhead...

Yes.

I'm looking at a hybrid approach. As always, the initial assumptions drive the design...

Let "the" image server multicast (or even broadcast, depending on the domain of the recipients) *THE* image. Let hosts that end up "missing" parts of that image request those parts from their peers (assuming *some* peers have received the parts). So, this can happen concurrent with the rest of the "main" image's delivery (i.e., it doesn't need to be a serial activity).

I have to look at the model and how it would apply in mesh networks (where your peer is often responsible for forwarding traffic from other nodes -- i.e., the "image server") to see what the overall traffic pattern looks like. It might be a win for that peer to broadcast/multicast that "piece" in the event other hosts (i.e., those downstream from *you*) have missed the piece as well.

Things seem to get harder, not easier :>

Yes. A short delay and allow "need-ers" to pick up the stream at arbitrary points (by cutting it into pieces) instead of having to listen from the beginning (request the parts you missed, later)

Exactly. And, if they can request those parts from peers to distribute the traffic better...

I'm not sure I follow -- isn't *everyone* interested?

Vote

B

Boudewijn Dijkstra 15 years ago

Or use RTCP timestamps to synchronize the streams.

Latency? Why would you have noticeable latency? You can start playing the media before the buffer is full, then stretch it a bit to allow the buffer to catch up.

Gemaakt met Opera's revolutionaire e-mailprogramma: http://www.opera.com/mail/ (remove the obvious prefix to reply by mail)

Vote

D

D Yuniskis 15 years ago

The back channel, here, would see very little traffic (when compared to the forward channel). So, aside from the requirement that it places on the "image server", its impact is relatively small.

I was thinking of a protocol that could offload this "missed block" portion of the process to peers who *may* have correctly received the block. This should fare well when transposed to a mesh topology -- where your peer may, in fact, be your actual "upstream link" (so why propagate the request all the way upstream if your peer can -- and, ultimately *will* -- handle it?)

Rather than "requesting all blocks", I envision requesting a larger object (e.g., a file or an entire image). *Assume* it will arrive intact at each consumer (concurrently). Then, handle the missing parts as I described above. So, the image server is, effectively, the "peer of last resort" if no other *true* peer can satisfy the request -- this might be handled with something as simple as a timeout (i.e., the image server deliberately ignores these requests for some period of time to allow "peers" to attempt to satisfy it, instead).

Huh? Perhaps you meant "until the failing block also passes the CRC"?

Now you've lost me. Why change ECC (you've got horsepower on the Rx end so a "less effective" CRC doesn't really buy you much of anything)?

Vote

D

D Yuniskis 15 years ago

But you can only synchronize to the granularity that the "buffering discrepancy" in the switch allows.

Each consumer would need to be designed with a "deep enough" buffer to be able to handle any dropped packets, short-term network overload, etc. I.e., if, statistically, it requires T time to restart/resume an interrupted stream then your buffer has to be able to maintain the integrity of the "A/V signal" for that entire duration (else the failure to do so becomes a "noticeable event" to the user).

Consider that some causes of "missed packets" can be system-wide. I.e., *many* nodes could have lost the same packet -- or, packets adjacent (temporally). In that case, multiple (N) retransmission requests can be destined for the server simultaneously. If those are processed as unicast requests, then multiple packet times (N) may elapse before a particular node's request is satisfied.

With video, the user can *tolerate* (though not *enjoy*!) the occasional "frozen frame" -- as long as it doesn't become frequent. (persistence of vision)

With audio, it's much harder to span any gaps. You can't just replay the past T of the audio stream without it *really* being noticeable.

You also have to be able to re-synchronize the streams *after* the "dropout" (imagine two displays/speakers side by side; the image/sound from each must be in phase for a "pleasing A/V experience" :>). So, you can't just "stretch time" to span the dropout.

Vote

Multicasting and Switches

Join the Discussion

Didn't find your answer?