another bizarre architecture

- K
- Ken Smith
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Feb 9, 2007 2:54 PM

[....]

There are special drivers for serial ports that get about that sort of timing.

The trend these days is to offload the work from the PC to some external box. This way you can have the PC only set the parameters and run the user interface. The actual work is done by a much more capable processor such as an 8051.

--
--
kensmith@rahul.net   forging knowledge

- V
- Vladimir Vassilevsky
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Feb 9, 2007 3:07 PM

As everything else, this approach has its limits.

Once the number of states in the state machine gets over a hundred, the code is very difficult to manage. The dependencies are growing all the way. Changing anything can be a pain in the butt. It is almost impossible to verify all kinds of transitions between the states. For that reason it is very easy to overlook something.

There are kinds of tasks which call for multithreading. Caching, hashing, calculations, vector graphics and such. Those tasks can be organized as the state machines however it is going to be messy.

Vladimir Vassilevsky

DSP and Mixed Signal Design Consultant

formatting link

- N
- nospam
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Feb 9, 2007 4:10 PM

There is no inherent reason for high interrupt latencies on PCs running Windows.

I did quite detailed testing on a fast PC running 2k server, an edge on an input pin triggered an interrupt which flipped an output pin.

The delay between input and output edges was nominally 17us -0 +4 which occasionally stretched to +15 during intense disc activity. The system was quite happy taking interrupts at 10kHz.

That of course was interrupt latency to a driver interrupt handler, not to an associated DPC or back through the scheduling system to application level event handlers.

You do rely on interrupt handlers in other drivers complete promptly, some, particularly network card drivers are poor in this respect.

- D
- Didi
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Feb 9, 2007 6:33 PM

While 17 uS is a bit too long for a GHz range CPU, this is a sane figure. I have seen people complain about hundreds of milliseconds response time to the INT line of ATA drives, but this may have been on windows

95/98 which were... well, nothing one could call "working".

Well plain user experience is enough to see the latencies there, they are in the seconds range, sometimes tens of seconds. They may learn how to do this in another 10 years time...

Dimiter

- P
- Paul Keinanen
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Feb 9, 2007 7:40 PM

While you can not get _guaranteed_ 1 ms response from standard Windows or Linux (or in fact from any system with virtual memory, without locking all referenced pages into memory), but perhaps 95 % to 99 % of all events.

One way to test response times is to run a half duplex slave protocol in the device to be tested. This will test the latencies from the serial card to kernel mode device driver into the user mode protocol code and then back to the device. Observe the pause between the last character of the request and the first character of the response with an oscilloscope or serial line analyzer. With 1 ms serial line analyser time stamp resolution, the two way latency was somewhere between 0 and 2 ms (or 1-2 character times at 9600 bit/s).

A few years ago I did some tests with NT4 on 166 MHz and the 20 ms periodic wakeup occurred within +/-2 ms more than 99 % of the time, provided that no user interactions happened at the same time. With user interactions, the worst case wakeup observed was about 50 ms.

Of course, any application using SetTimer can be delayed by seconds, if the user grabs a window and shakes it all over the screen :-).

Paul

- P
- Paul Keinanen
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Feb 9, 2007 7:40 PM

While there are protocol specific intelligent I/O processors doing all the protocol handling, but for instance RocketPort 8-32 line multiplexor cards simply implement deep Rx and Tx FIFOs for each channel in an ASIC. No interrupts are used, but the driver scans all Rx FIFOs once every 1-10 ms and each FIFO is emptied at each scan. The Tx side works in a similar way.

The latency with such cards does not depend so much about the number of active channels or number of bytes in a channel, but rather about the scan rate. So if the scan rate is 10 ms, the Rx-processing-Tx two way latency is 10-20 ms regardless of number of lines. With 115200 bit/s, there can be about 120 character at each scan with 10 ms scan rate. However, if the received message ends just after the previous scan, there can be a more than 100 character time pause before the response is sent.

Paul

- P
- Paul Keinanen
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Feb 9, 2007 7:40 PM

What you are describing sounds very much like the 16 bit Windows 3.x style single thread system as well as 32 bit windowed applications.

However, at least in the Windows NT family, you can run console multithread applications with ordinary main() and synchronisation primitives similar to those used in RSX11/VMS.

If the timing is important in your Windows application, stay away from windowed applications and use console applications instead.

Paul

- G
- Grant Edwards
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Feb 9, 2007 8:12 PM

As one of the maintainers of that driver, I thought I might comment on the reasoning behind the decision not to use an interrupt-driven scheme. For a few channels with sporadic data flow, using interrupts makes a lot of sense -- the driver isn't using up CPU time unless there is data to be transferred, and data is handled with low latency.

However, the RocketPort driver is intended to support a large number of channels with high throughput rates: it supports up to 256 serial ports at 921.6K baud all in use at 100%. For a large number of ports with heavy usage, polling the boards at a fixed interval results in a much lower overhead than handing interrupts from up to 256 different UARTs.

There's a definite trade-off between low-latency and efficient high-throughput, and the RocketPort driver leans towards the latter. [Though CPU speed has increased so much that polling at

1ms isn't really much overhad and provides pretty good latency.]

--
Grant Edwards                   grante             Yow!  PARDON me, am I
                                  at               speaking ENGLISH?
                               visi.com

- N
- nospam
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Feb 9, 2007 9:16 PM

But the PCI bus doesn't run at GHz and has lots of overhead for a single cycle I/O access. On the same system the minimum width of a software generated pulse on an I/O line was about 4us.

- P
- Paul Keinanen
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Feb 9, 2007 10:14 PM

No doubt, the RocketPort cards are targeted for ISPs running a large number of modem lines using a full duplex protocol like PPP. In such applications, the scan rate or line turn-around time is really not an issue.

However, in any half-duplex protocol, the latencies and line turnaround times can seriously degrade the system throughput. A 10 ms delay at 115k2 corresponds to 120 character dead time, which is catastrophic for the throughput when using short messages.

The 1 ms poll rate has not been an issue even at processors below 1 GHz.

While the poll rate for Linux 2.4 drivers was usually 1/HZ, I stumped on some driver version that forced 10 ms poll time even for kernels with HZ >100. When forcing the poll rate to 1 ms (HZ=1000), the throughput and latency performance was quite acceptable at serial speeds below 115k2.

Paul

- K
- Ken Smith
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Feb 10, 2007 2:43 AM

Yes applications that are tied to the user interface have the problem right up at the surface. In other applications, the FIFO model is more hidden but it is still there under the surface. The task dispatching still tends to take the task in the order of the events.

The code does not run very much as interrupt code. When the interrupt happens, the fact is recorded and then the code returns to being non-interrupt code. It is the what happens next that matters at this point.

You still can't get quick responce times on things like serial ports. The problem is at the OS level.

--
--
kensmith@rahul.net   forging knowledge

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Feb 10, 2007 3:47 AM

But what was the *maximum* width of that pulse? I'd expect that, under Windows, the probability tail is still nonzero at some number of seconds.

John

- N
- nospam
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Feb 10, 2007 4:35 AM

It was generated in a driver interrupt handler so the maximum was also about 4us barring PCI bus contention from other bus masters.

- P
- Paul Keinanen
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Feb 10, 2007 9:03 AM

Sounds like all the treads waiting for various events are running at the same priority level.

On older Windows NT versions, only the tread priority levels 16, 22-26 and 31 were available in the realtime priority class, so assigning priorities to various threads was quite tricky. Starting from Windows

2000 the levels 16-31 are available.

In non-realtime priority classes, round-robin scheduling and priority boost for interactive threads etc. makes predicting timing more or less pointless.

Paul

- D
- Didi
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Feb 10, 2007 9:11 AM

Sounds like the delay has been down to bridging to/from the PCI bus, PCI is much faster than that. Maximum latencies can get large because of bus (req/grant etc.) delay, but minimum (when the bus is free or owned) are much below

1uS. However, for a monstrous mess like the x86 your figures are about the best one could expect, sound quite reasonable.

Dimiter

- K
- Ken Smith
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Feb 10, 2007 4:55 PM

Not all of them have to be at the same priority for the effect to be seen. The event just needs to pass through a thread to end up being timed at that threads priority even if the thread waiting for that thread to finish dealing with it is at some other priority.

In spades.

Starting from Windows

--
--
kensmith@rahul.net   forging knowledge

- J
- joseph2k
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sun, Feb 11, 2007 4:25 PM

Before the time i have a 100 states in a state machine is look to "refactor" it into multiple interacting state machines of less than 12 states each.

--
 JosephKK
 Gegen dummheit kampfen die Gotter Selbst, vergebens.  
  --Schiller