Synchronizing "pushed" multicasts

- D
- Don Y
  
  Contact options for registered users
posted
12 years ago

Mon, Aug 29, 2011 4:37 AM

Hi,

I'm "fine-toothing" the error handling routines on my network audio clients (and thinking ahead to the similar issues for the video clients which follow).

Normally, I design protocols to *pull* data, as needed. So, the issue is one of "making sure the data source can keep up" -- with the demand rate of the consumer(s).

But, if I want to multicast/broadcast ("push"), then the issue becomes one of making sure the consumers can chew up the data as fast as it is being supplied! And, with multiple consumers feeding on the same data stream(s), there's little room for coordinated efforts (i.e., if one consumer is sluggish, it can't tell all the others to slow down to its rate -- esp as *it* may be the client that is malperforming!).

So, remedies have to be local to each consumer (client).

In theory, this should "never happen". The clocks between the clients are finely synchronize (also, "in theory") as well as with that of the producer/server.

But, sh*t happens. I've got to decide how to handle the case of a buffer overrun (as well as underruns).

I can deal with gracefully shutting down the audio (output) as I see the possibility of an underrun. Overruns will be hard to "de-pop" -- unless I throw a heavy filter on the signal during the transition (probably best to just let the artifacts through).

But, what PRO-ACTIVE strategies can I adopt -- if I am willing to enlist the other clients and server -- to minimize the chances of this happening?

E.g., if clients advertise their capabilities, the server can throttle its output to avoid "just missing" adequate space in ALL clients' buffers. (i.e., instead of aiming for 100.0% buffer utilization and risking 100.1%, aim for 99.8% with that little buffer "just in case").

Likewise, the server can inform the other clients of appropriate points at which they should "collaborate" with their peers as buffer utilization rises and falls.

How do other streaming multicast protocols handle this problem? Or, do they rely on huge buffers (i.e., disks!) in their clients?

--don

- N
- Nobody
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, Aug 29, 2011 5:50 AM

IME, they just throw out the data at the correct rate and expect the clients to keep up.

If audio was sampled at 44.1 kHz, then it needs to be played back at (close to) that rate, and that's the rate at which the server will send it. It's the client's responsibility to maintain enough of a buffer to deal with latency jitter.

As for minor rate mismatches: if it's pre-recorded data where each client is independent, the server might adjust the timing based upon feedback from the client or it might not. If it's real-time or multi-cast (or if feedback just isn't used), the server sends according to its own clock and the client has to run some form of DPLL (and perform resampling if the audio hardware clock doesn't support fine-grained adjustment).

The one constant is that the server won't perform fine-grained resampling or re-encoding. For pre-recorded data, they may or may not synchronise the timing to the client, but won't touch the actual audio data. For "live" data (e.g. VoIP), they'll typically have a fixed set of sample rates, and a fixed set of resampling/encoding filters (i.e. a specific rate and/or encoding quality), so everything is locked to the server's audio hardware clock with coarse factors and the client has to adjust.

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, Aug 29, 2011 2:34 PM

So, discrepancies in the clients' local clocks are left to the clients to resolve (?)

The buffer just compensates for variations in the transport service. (as well as providing a "last chance" at error recovery).

You still have to deal with the fact that data can come at you at a rate slightly greater than your consumption rate (because clocks will never be completely syntonous) which will inevitably overflow your buffer.

The consequences of *what* data gets discarded in that case can be significant (e.g., do you discard *one* sample? Or, an entire

*packet* of samples? *Which* sample do you discard: the newest? oldest?)

Similarly, if your clock is slightly *faster* than the producer's clock, you will eventually deplete your buffer.

And, if you can't compensate for the discrepancy between the clocks (*completely*!), then this problem will repeat. Periodically.

E.g., if your buffer has filled, then a "fast producer" will result in another dropped sample in time proportional to the error in the consumption period vs producer period. I.e., if producing at 50.000KHz (friendlier number to work with :> ) and consuming at 50.001KHz (i.e., off by *1* Hz) then this problem recurs at a ~1Hz rate!

(assuming you *only* discard ONE sample in this overrun and not an entire packet)

This gets more complicated when the server's clock isn't "authoritative" across the system. E.g., when the server's clock is, also, trying to track some "master clock".

In this case, you don't know whether the server of (a) client's clock is "more correct". And, have to account for variations in each as those clocks are dynamically adjusted to track that "master clock".

I'm not worried about resampling or transcoding. That can be upstream from this without affecting the protocol.

- R
- Rocky
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, Aug 29, 2011 3:47 PM

You can't (well, shouldn't) discard or repeat whole samples as this is very audible because of the phase jump. What you need to do (if you can't adjust the DAC sample rate) is fractional interpolation or decimation depending on whether the buffer is filling up or emptying.

- N
- Nobody
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, Aug 29, 2011 6:40 PM

The transmission is slaved to either the server's clock or the client's. The former is easier and more common (the main exception is streaming pre-recorded data over either TCP or something with similar flow control).

If you have fine-grained control over the DAC clock (e.g. 1Hz multiplied by an integer factor or tens of MHz divided by an integer factor), you can just tweak the clock to maintain synchronisation.

Otherwise, you need to emulate a fine-tunable DAC by resampling the data (with interpolation; discarding or duplicating whole samples is too noticeable).

The issue isn't limited to networking. Any decent digital audio program will use similar techniques in order to support multiple sound cards. Most consumer-grade cards don't allow the sample clock to be locked to an external hardware source, so you have to deal with e.g. one card running at 47999 Hz and the other at 48001 Hz. This is a simpler case, as the discrepancies are much lower and more stable, but the same techniques apply.

In which case, replace "server's clock" with "ADC clock" throughout my previous reply. The server won't care how the sample clock relates to its own clocks (CPU clock, RTC, ntpd, etc); it will just send the data 1:1. The sample clock is what the client "sees".

Similarly, the client doesn't particularly care about its own CPU clock, ntpd, etc; what it really cares about the DAC clock. The time-base for audio software will be the buffer-flip events as each frame of data has been consumed.

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, Aug 29, 2011 7:16 PM

No. Audio streaming protocols don't just push raw data, they also embed clocking information so the client can adjust the playback rate.

The buffer has to be big enough to absorb some extra - particularly in a broadcast/multicast situation where there are not individual client-server connections.

That's why audio streams have embedded clocking.

Embedded clocking will handle playback rate (assuming the clients have enough horsepower). The more important issue is network delays. Even in a LAN, a broadcast/multicast that passes through a router will not necessarily be delivered to all listeners simultaneously ... the router's needs to copy the packets to all ports and that takes time.

In most audio streaming protocols the data in each packet partially overlaps the previous packet, so that if a packet is lost or the client gets behind and has to drop a packet, the amount of unique data lost is limited and the client can resynchronize playback on the next packet.

Your protocol should not care about real world clock synchronization. It should only care about *causal* clock synchronization.

George

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, Aug 29, 2011 9:18 PM

Regardless, it's still an artifact that repeats regularly. The "solution" is to get the two clocks syntonous.

IMO, you need some mechanism of coordinating efforts of everyone sharing that clock -- even if that clock is slaved to some *other* clock.

I.e., perhaps create a *virtual* clock that can free-wheel wrt the *real* "shared clock" (though that just moves the problem one step further away)

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, Aug 29, 2011 9:31 PM

Correct. What I've done is put a timestamp in each "packet" (call them packets, frames, ... whatever) that represents it's location, temporally, in the (unending) stream of packets.

Each client uses that information plus it's own "bias" to determine when that packet (samples within) should hit the D/AC. I.e., two clients may be "playing" the same audio stream but with a deliberate skew of 50 ms.

Yes. And, in my case, the "potential bias/skew" is in addition to the above.

But, if the producer's average/net data rate exceeds that of (a) consumer, sooner or later, the buffer will overfill (or, in the opposite case, UNDERfill)

Yes, but that relies on the individual clocks being completely syntonous.

Yes.

Any time you have extra or missing data, you have an audible artifact. If this happens with any regularity, it can become noticeable. E.g., something "beating".

The problem comes into play when neither the "producer" nor "consumer" can be regarded as "authoritative". I've tried to work-around this by creating a "system time" and treating EVERYTHING as a slave to that. I.e., the server says, "play this packet at system time XXXX" -- as *its* "clock" is slaved to that system time and it *knows* that the clients' are also. I can avoid dealing with network latencies, processing latencies, etc. as long as the client(s) and server(s) have a way of mapping "system time" to their individual local realities.

E.g., imagine listening to two audio streams on your PC. From two different servers. Which server's "clock" is "right"? (using clock in the sense of the "timebase for the particular audio stream")

- R
- Rocky
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Tue, Aug 30, 2011 3:04 AM

Not really - unless you mean 'repeats every sample'. In the example you give you suggest discarding or repating about 1 sample per second. What I am suggesting is discarding or repeating 1/44100 samples per sample, assuming 44100Hz sample rate, so effectively the incoming stream is re-sampled to the rate of the client. The efect of this is that playback occurs at the pitch of the server. Without periodic hiccups.

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Tue, Aug 30, 2011 4:03 AM

[attributions elided]

Ah, sorry, I thought you were talking about fudging the "extra" sample (as in my example).

The point I'm making is the client and server *think* they are operating at the same *exact* rate. "In theory", the clocks are syntonous. In *practice*, there can be some bias in one or the other. And, that bias can *change* (depending on how their respective PLLs synchronize to the *master* clock).

You (a client) only find out about this when you find yourself with more data than time to "play" it! And, then, it's too late to actually *do* anything about it (except make a note and hope your note helps you compensate in the future).

I can't see handling this completely locally without the cooperation of the server (producer) and other clients (consumers) that are also impacting the producer.

- R
- RCIngham
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Tue, Aug 30, 2011 3:22 PM

[snipped]

In the old plesiochronous telecomms networks, this is exactly what was done, and almost nobody's ears could tell when it had happened...

--------------------------------------- Posted through

formatting link

- R
- Robert Adsett
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Tue, Aug 30, 2011 3:32 PM

Maybe I'm missing something obvious but why would you wait that long?

Wouldn't you set a threshold (at some fraction of the buffer size) and if the data exceeds that start consuming it faster and if it falls below a threshold you start slowing your consumption. That gives you the ability to compensate for variation without needing abrupt changes to behaviour.

Robert

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Tue, Aug 30, 2011 11:43 PM

Because you don't (yet) know that data is coming in too fast (until the buffer overfills).

E.g., you aren't guaranteed to *see* the network traffic at precise intervals (defined by: sample rate / packet size). You might have a lull followed by a burst of data. All you care about is "running out of data" and "running out of space".

Your consumption rate is dictated by the sample rate *of* the data. E.g., if you are reproducing data sampled at

1KHz (easy number), then you push a datum out every 1.000ms.

In theory, the producer is giving you data at the same average rate. But, if (even in the short term) you start seeing data at what *appears* to be 0.999ms, then data is coming in at 0.999ms/sample and going *out* at 1.000ms/sample meaning you will eventually fill your buffer.

- R
- Robert Adsett
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Aug 31, 2011 2:09 AM

That's why you have thresholds.

OK

Yes, and if you hit your threshold you speed up your consumption in some fashion. That keeps you from overflowing the buffer and synchronizes you to the actual rate of the producer. Whether the increased rate is temporary or not doesn't matter. If it reaches the threshold you act as if it is going to last long enough to be a problem. If and when it drifts back you can compensate again. You keep adjusting the consumption rate to keep your buffer between the thresholds.

The size of the buffer and the threshold points would be a matter of your packet size and the maximum variability in rate you are willing to accept, maximum clock drift acceptable and probably other items I'm not thinking of. Probably a matter for queueinq theory but you could probably approximate it.

You could think of it as the room between the thresholds as being your "real" buffer and the room beyond the thresholds being extra buffer to give you time to respond.

If you don't have any method of communicating all of the rate variation in advance you must react to it. You could search for higher order patterns but I suspect that even if they exist you get most of the benefit by simply acting is if the recent past is indicative of the near future. The far future not really mattering and the far past not being a useful predictor.

Robert

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Aug 31, 2011 2:51 AM

E.g., you are expecting samples at 1Hz. (simple numbers) At t=0.00 you get 3 samples these are/will be consumed at t=0 (?), t=1 and t=2

At t=2.95 you get 3 more samples these will be consumed at t=3, 4, 5. And, since the first of these is consumed shortly afterward (i.e., at t=3), there hasn't been an underrun. At t=3+ you have 2 samples in the buffer.

At t=3.90 you get 3 more samples. We now have *5* in the buffer. Is this too many? These will satisfy our needs for t=6, 7, 8.

At t=3.95, you get 2 more samples. For t=9 & 10. Now we have 7 backed up in our buffer. Too many, yet?

At t=10.5 we get 1 more sample (for t=11). Suddenly, that backlog of 7 samples doesn't look as "wasteful".

Will the *next* sample come before t=12? Or, will *10* more samples come in that timeframe?

What should the threshold be?

But the producer may be the device in error! What if *your* clock is dead nuts but the producer is "running fast"? If you unilaterally speed up (essentially resampling the audio stream), then you distort the signal. The "smarter" move would be for you to discard any samples that overflow your buffer and complain to the producer. The producer could conceivably resend the discarded samples and/or adjust its clock (esp if it gets complaints from other clients on the same multicast stream!)

E.g., why should N clients all distort a signal (possibly *differently*) when the problem lies with the producer?

I think just letting the client rerequest a packet -- even if that packet has to be unicast-ed -- lets the client stick to its notion of reality (assuming it *thinks* its clock is stable) and implicitly "leaks" information to the server that there must be a clock deviation for this event to have happened (or, the client simply *lost* the packet)

- H
- Hans-Bernhard Bröker
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Aug 31, 2011 7:05 PM

If you waited that long, you've waited _too_ long.

You need to react a lot earlier than that.

You absolutely need to make your buffers large enough, and wait long enough after the start of data transmission before you start playback, that (within expectations / guarantees of your network) you'll have time to react _before_ the buffer runs empty or overflows.

No. The sample rate gives you your _average_ consumption rate, but doesn't nail your receiver to exactly that figure.

If the transmission rate fluctuates around that, you need at least one of two things:

1) a mechanism to _change_ the consumption rate, i.e. you need to be able to resample the data to reconstruct missing data, or cleverly speed across excess data 2) a buffer that's large enough to cover the maximum time any sample is allowed to arrive _both_ early or late. You fill that buffer to the about the 50% mark before beginning playback.

The task for extra points is to have both, and to start using mechanism

1) before mechanism 2) fails.

Which is why the source _has_ to send a master clock along with the data, so the source and receiver will eventually agree on how long that millisecond actually is. The receiver doesn't get to make up its own mind on sample timing. The source defines that for all receivers.

- R
- Robert Adsett
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Aug 31, 2011 8:01 PM

Not enough information. You haven't specified the maximum variation you are prepared to/have to tolerate from your source. Once you define that (note that it is a design decision) you can determine the size of buffer you need and where you have thresholds. As I said I'd look at queueing theory to see if they had methods for estimating buffer size. Similarly stocking level determination although I think they look back at queueing theory if they are doing much more than an ad-hoc determination.

What you define it to be working from the tolerances you need.

So what?

Yes, but the assumption is that you are only going to correct for deviations within your tolerance. Sound quality presumably forms part of that determination. If no distortion is allowed than you cannot change rate to accommodate different clocks.

You've already stated you don't know whose clock is correct so complaining isn't much help except as a diagnostic. In that case the only reasonable clock to use is the source clock. If you need to determine a trustworthy clock that's probably done better and cleaner separately from the audio stream.

That pretty quickly degrades from multicast to N clients to a set of unicasts.

Robert

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Fri, Sep 2, 2011 5:39 PM

^^^^^^^^^^^^^^^^^^^^^^^

It doesn't matter. If your producer supplies data at ANY rate faster than you consume it, ANY (finite) buffer size will eventually overflow.

E.g., if you have a bucket that will let liquid flow out of it at 1 gallon per minute and you have 1.0000000000000000001 gallon per minute flowing into it, sooner or later, regardless of bucket size, it will overflow.

And, a "well behaved" producer can *be* well behaved up to (and including) the point where it gives you one datum too many.

You can artificially move that overflow point (i.e., by pretending the bucket is smaller) but that doesn't change the fact that you

*will* overflow (sooner or later).

See above.

Then *you* aren't the problem. Why would you want to tweek your clock (effectively) when the problem lies elsewhere? That has consequences for other things that might be dependant on that clock.

Replace "audio" with "video". Do you come up with a *different* solution? Do you resample the video stream? Do you just drop frames periodically?

Exactly. *That's* why you complaining *is* helpful. It tells the other components of the clock synchronization system (other nodes, etc.) that what you see looks wrong. E.g., if *every* client tells the producer that "you're running fast", perhaps it changes the time constant of its PLL and tries to improve it's sync to "The Real Time".

Alternatively, *it* can make the changes in the audio stream on behalf of the N clients.

Only if the producer/server *lets* it. Remember, it's a distributed system so there is knowledge (and processing power) in many places. Instead of all being in one processor and its "state", it is spread around among several (hence the reason you *want* consumers to "complain")

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Sun, Sep 4, 2011 5:06 AM

Hi Don,

I was going to post something similar to Hans-Bernhard Bröker's message, but he beat me to it 8-) I wanted first to check my references to see if there was any reasonable way to achieve what you want without much modification of your design.

Unfortunately, as you probably suspected, there is no way, even theoretically, to *guarantee* time-synchronized communication in an asynchronous system. It can be done in a synchronous system, but even there it cannot be guaranteed from the point of view of an external observer, but only from within a communication stream.

You can solve the client buffering problem if you are willing to abandon constant rate transmission and broadcast/multicast. I'm sure you are familiar with all of this, but I'll go through it anyway.

You can't simply push packets to the client at a constant rate - you need an active flow control mechanism. The client's stream buffer at minimum needs a low water mark. When the buffer falls below the low mark, the client requests the server burst enough packets to fill up the buffer again. You can also use a high water mark to protect against overfilling the buffer, but technically it isn't needed if the server knows or the flow control protocol can specify the burst limit. The mark can be adjusted based on the playback rate, but it's position should accommodate the time needed to begin receiving the next burst of data from the server. Regardless of how fast the buffer is being emptied, the flow control protocol should (try to) keep the buffer from becoming empty until the end of the stream.

One complication is that flow control makes broadcast/multicast impossible ... to work correctly the server needs to deal with each client individually. One way to handle that without a lot of extra gyrations on the server is to use a stateless protocol in which stream packets are numbered (e.g., by file block) and the client can say, e.g., "send me M packets starting at #N".

Another complication is that a client handling multiple streams with differing playback rates may require separate buffering for each stream.

The client's clock is irrelevant. Playback clocking is always stream relative and the client is slaved to the server's notion of time.

Each server is correct within its respective data stream(s). It is usual in a distributed system that no clock can be considered globally correct. This is the reason for using logical clocks and causal delivery protocols.

It is nearly impossible to synchronize clocks in a distributed system

- it can be done painfully using an external time source and paying very careful attention to cable lengths, but it is so difficult that it rarely is attempted with devices that are not in close proximity.

George

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Sun, Sep 4, 2011 6:13 PM

But you can't! You're trying to tune a rubber band (similar to pushing on a string). You never know if your current buffer state represents the *actual* data rate *at* the producer (because the network fabric is elastic -- as well as the code running in the server)

E.g., if you receive 512 samples at t=0 and then receive the next 512 samples at t=512.000000 (assume t is effectively the SAMPLE clock), does that tell you that you are receiving samples at *exactly* the correct rate? Or, was that second packet actually supposed to arrive at t=511.8 but it was just delayed in a switch somewhere along the way? Or, a bit late working its way down through the server's network stack?

You can increase the size of the buffer and *hope* to catch some sign of this "premature delivery" (i.e., having a packet arrive 511.8 after the previous one) but that still doesn't tell you anything conclusive.

But you never have foreknowledge! You can only tell that your buffer has run dry when it actually *does* run dry! Likewise, you can only tell you have had an overflow when it actually

*does* overflow.

E.g., if I have one sample left in the buffer, the arrival of the next packet within ~1 sample time from the point at which that "last sample" is passed to the D/AC will avoid an underflow. I.e., if the *next* sample becomes AVAILABLE before the next D/AC update is required, all is well (for now -- just barely!).

Similarly, as long as enough *space* becomes AVAILABLE in the buffer (depends on size of next -- as yet unreceived -- packet)

*before* the next packet arrives, you avoid an overflow.

The underflow is the more dangerous situation because you have no time left once the buffer is empty. OTOH, the overflow condition has you "flush" with data which buys you time to take remedial action -- that time dependant on the sample rate and the size of the buffer.

3) a mechanism to bring the consumption and production rates into harmony!

I.e., if this *has* happened, there's a good chance that it is going to happen *again*. Quite probably very *periodically*! You've done nothing to solve the problem -- just patched over the symptoms.

That (last statement) assigns the source to the role of Omnipotent. At the very least, it kicks the can down the road (what happens if you have two sources feeding a client?)

I look at this, instead, as a control system. What we're trying to control is the *clock* (the samples are just a consequence).

You have a producer synchronizing to some Grand Master Clock (GMC) inherent in your timing system (because you need a global sense of time among all of your nodes -- to ensure each client is pushing the same data out their respective D/AC's at the "same" instant).

The consumers (clients) also synchronize to this GMC. This allows them to talk to *any* producer without having to alter their sense of "current time".

These represent C+P (Nconsumers + Nproducers) control loops executing in parallel against the same input signal (the GMC).

[note that the GMC is considered authoritative -- it "can do no wrong"]

Now, when a consumer-producer relationship(s) is created, you have created new control systems -- consumers trying to synchronize to their producers ... or, vice versa!

Instead of treating the consumer (or producer) as the "slave" and trying to bias its local sense of "current time", instead, use those observations to inform the existing servo loops in consumer and producer as they track the GMC!

For instance, this can cause a time constant in one of their PLL's to be sped up to decrease the phase capture time. You would normally want this Tc to be long/slow -- since the clocks shouldn't change much over time. But, if you *know* "something is wrong", you might want to open up the capture range so the PLL updates VERY quickly -- which normally would increase the risk of locking onto "noise".

Once your producer(s) and their client(s) are happy with each other, this second order control can be removed.

IN THE MEANTIME, you can implement mechanisms that allow the client(s) that have overflowed their buffers to recover those dropped samples

*while* the PLLs are getting into tighter lock. [here is a perfect example of how *which* sample you drop can have a marked impact on overall quality! If, for example, you are not going to try to recover from this overflowed buffer (i.e., some sample(s) WILL be lost), then it is smarter to drop the sample closest to the D/AC (oldest sample)! I.e., you *know* something is going to be lost so throw away something *old* in the hope that hanging onto the "most newest" sample will result in a continuous stream of uninterrupted data hereafter.

OTOH, if you *are* going to try to recover, then discard the "most newest" sample in the hope that you can re-retrieve it from the producer when you have space for it. I.e., gamble that you won't have to drop *any* samples]