Has anyone produced a board using Kicad?

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, May 23, 2006 7:30 AM

Now, back to the original topic at hand, before Stuart and Ales so rudely created a gschem and gEDA turf war over schematics not being part of PCB's charter -- I was unaware that gEDA had the right to dictate design to the PCB project which existed long before gEDA. I'll let DJ, Harry, Stuart and Ales work their turf war out ... leave me out of it please.

I considered strongly adding schematics into PCB some four years back, while Harry was still maintaining it himself out of JHU. When Harry dropped off the face of the earth for a while, I even considered starting a PCB project at sf.net, till I saw one day Harry had created one. Harry had believed strongly that Schematics do not belong in PCB, and as chief maintainer of the sf.net project, that was his choice. Before starting FpgaC on sf.net, I was strongly tempted to pickup and continue to support an Xaw version, and add Schematics in, as an sf.net project called PCB2 ... fork the project since Harry and I have very different goals and expectations about UI's and the types of designs PCB should support/produce. I asked very clearly on the user forum for PCB if Xaw was dead, trying to get a clear idea if it would remain a crippled Gtk version ... and no answer. I was actually supprised to find DJ had done a Lesstif variation, and had deviated strongly (forked) from the old/Gtk UI.

In the end I decided I would be more useful digging TMCC out of the grave, and bringing it forward a decade to be useful with todays FPGA products.

TMCC/FpgaC suffers badly from the same working set problems I posed for PCB. Very small changes in a project code transition FpgaC compile times from a few minutes, to hours ... and in one case from 45 minutes to over a day and a half, simply by exceeding working set sizes for L2 cache. Interestingly enough, the same C code does the same thing to GCC at a slightly different boundry point.

Student, and other toy, projects frequently contain simple algorithms that are ok inside a typical processors L2 cache size these day ... that when the data set grows just slightly, fail horribly performance wise. In this case, linearly searching a linked list works fine up to about 90-95% of L2 cache size. When you exceed that threshold, performance drops and run times increase roughly 10X or better because of the nature of LRU or pseudo-LRU cache replacement policy.

Consider for example, a small cache of 4 "bins" of marbles taken from a bowl of 300 marbles. If we first reference a certain red marble, it's taken from the bowl and placed in a cache bin after searching the 300 marbles for it. We keep using, and replacing the red marble avoiding the search in the bowl. Later we also use a green, blue, and yellow marble, which take the three remaining bins in the cache. Because of the nature of the task, we always use red, green, blue, and yellow in that order, always taking from the cache, and replacing in the cache.

When our working set expands to five mables, we have a cache failure, which goes like this. We access the red, green, blue, yellow marbles in order from the cache, then we need a white marble. The red mable is least reciently used so it's removed from the cache, and replaced with the white marble. We then repeat our cycle, next needing the red mable which is no longer in the cache, so we must fetch it from the bowl, and due to the LRU algorithm, replace the green marble with the red marble. However next we need the green mable, which forces the blue out of the cache. Next we need the blue marble, forcing the yellow out of the cache. Next we need the yellow, forcing the white out of the cache ... and so on, with every cache hit faulting, requiring a lengthy access and search of the bowl.

LRU algorithms fail horribly with sequential searches of the cached working set, resulting in a very sharp reduction in performance as the working set is exceeded. In FpgaC's case, the primary data structures are linked lists which are frequently searched completely to verify the lack of duplicate enteries when a new item is created. When the working set under these linked lists exceeds the processors L2 cache size, run times jump by more than a factor of 10 for many machines these days ... the ratio of L2 cach performance compared to memory performance. Thus, depending on the host processors L2/L3 cache size, there are critical points for FpgaC where the run times to compile incrementally small increases in program size, jumps dramatically. The fix for this is relatively simple, and will occur soon, which is to replace the linear searches with tree or hash searches to avoid referencing the entire working set to invoke the LRU replacement failure mode problem.

Similar problems exist at several levels in the responsiveness of PCB. Any event which forces a search of the design space, will require the working set to hold all the objects that are required to be searched. When that working set increases past various cache sizes, noticable increases in latency will result, to the point that they are visible in the UI ... that point will vary depeding on the particular machine being used (L1/L2/L3 cache sizes, and the relative latency of reference for faults). Developers who only use and test on a fast processors, with large caches, and fast native memory, will not notice extremely jerky performance that someone using a P2 333 celerion (128K cache) with a 66mhz processor bus and fast page mode memory will encounter. Slightly larger designs running on 4Ghz processors with 512K caches will fail equally noticably with a design some 4-10 times larger.

Certain operations will fail harder, those which invoke a series of X output, as they will also incure the memory overhead of part of the kernel, some shared libraries, the X server, and display driver in the "working set" for those operations. While a 512K cache is four times larger, the available working set is Cache size minus X working set, meaning that for small cache sizes there might not be much working set left at all, while doubling the cache size may actually increase the usable working set by 10X or more.

Just taking a guess, PCB + kernel + Xaw + Xserver, probably has a working set something around or slightly larger than 128K for very basic PCB tasks. Thus, we will see cache flushing and reloading between X calls, and locally ok cache performance at both ends. As the L2/L3 cache grows to 512K this is probably less of a problem.

What does become a problem is when the PCBX working set get continually flused by every call, such as making a long series of little calls to the Xserver, faulting all the way to the X server, and faulting all the way back ... calling performance drops like a rock ... factor of 10 or more. This happens when the task at either, or both, of the PCB or X server end requires a slightly larger working set, making the total working set for LRU into worst case failure mode.

I suspect that the Gtk failure modes do this, by including Gtk overhead into the working set, such that every PCB to Gtk to X server call faults round trip, and runs at native memory performance. The reason I believe this is that in my testing a 550Mhz PIII machine with SDRAM is only about twice as slow, as a 2GHz P4 machine with DDR SDRAM, in this failure mode ... rather than the 4-6X normal compuation difference when running at CPU speeds from L1 cache, or even L2 cache.

With synchronous calls to Gtk and the X server, it's difficult for PCB to keep it's event processing in real time.

I have a several day class I used to regularly teach that discusses in detail, designing for hysteresis problems that occure with step discontinuties of the processor load vs thruput function that is quite useful for recognizing and designing architectural solutions to problems of this class.

So ... applicati> DJ Delorie wrote:

- S
- Stuart Brorson
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, May 23, 2006 10:47 AM

Dude, way to flame!

Sorry for the top posting, but this thread suddently got kind of long . . . .

I'll let others decide whos rude and sensitive; I didn't mean to come across that way. I do try to make a point out of defending gEDA since there is a lot of FUD out there, a lot of it due to user cluelessness i.e. newbie college freshmen, or worse: grad students :-). I think it important to place verifiable facts (like examples of boards done with the gEDA/PCB, and software design considerations) against generalized complaints about usability (like "it's too slow" or "too hard to use"). If it sounds like flaming, or an ego thing, that't not the intent.

In contrast to some of the flames we get, you do appear to be quite clueful, and have done open-source stuff yourself, so my hat is off to you.

As for "a patch is worth a thousand posts", well, I stand by it. I see you only *considered* patching PCB. Hmmmm . . . But OTOH you started your own project, so you're doin' alright.

As for all your points about L2 cache, scalable algorithms, and synchronous calls from GTK to X, that's very nice. But have you verified any of them in the *specific* instance of PCB's code, or are they just some ideas? Do you have any specific files/line numbers where you see sub-optimal loops? You see, that's my point: Donating general ideas is very easy, but doing implementation is difficult. However, implementation is what counts. As an open-source guy, I'm sure you know this.

As for the unfairly maligned GTK port of PCB: It was done by popular request by a developer who kindly donated his valuable time to the project, just as Ales, DJ, and you do with your respective projects. This port brought PCB into a widget set that didn't look like 1985, and also provided some usability enhancements. Any slowness is due to the GTK widget set.

That GTK version of PCB worked fine (i.e. reasonable response time) for me on normal desktop workstations (i.e. I didn't need a supercomputer), but my boards tend to be of the middle-level size and complexity. If you have boards which radically slow down PCB, that's an interesting factoid for the general EDA community: You're doing some pretty large designs using gEDA. Care to share component or net counts on those boards? It would interest some of the nay-sayers here.

(A few people did complain about the GTK port's speed when it came out. Perhaps the speed did depend upon each computers' detailed architecture, cache usage, and stuff like that. In any event, DJ and team have re-architected PCB to support multiple GUIs, including GTK, Motif, and XAw. It should be getting even speedier now.)

Anyway, this discussion is devolving, and I have made my simple points already:

GEDA is very usable -- and is often used in the real world -- for board designs up to mid-level complexity.

"A patch is worth a thousand posts." Put another way: ideas are cheap, implementation is what counts.

Therefore, with that I'll bid this thread farewell.

Stuart

fpga snipped-for-privacy@yahoo.com wrote: : Now, back to the original topic at hand, before Stuart and Ales so : rudely created a gschem and gEDA turf war over schematics not being : part of PCB's charter -- I was unaware that gEDA had the right to : dictate design to the PCB project which existed long before gEDA. I'll : let DJ, Harry, Stuart and Ales work their turf war out ... leave me out : of it please.

: I considered strongly adding schematics into PCB some four years back, : while Harry was still maintaining it himself out of JHU. When Harry : dropped off the face of the earth for a while, I even considered : starting a PCB project at sf.net, till I saw one day Harry had created : one. Harry had believed strongly that Schematics do not belong in PCB, : and as chief maintainer of the sf.net project, that was his choice. : Before starting FpgaC on sf.net, I was strongly tempted to pickup and : continue to support an Xaw version, and add Schematics in, as an sf.net : project called PCB2 ... fork the project since Harry and I have very : different goals and expectations about UI's and the types of designs : PCB should support/produce. I asked very clearly on the user forum for : PCB if Xaw was dead, trying to get a clear idea if it would remain a : crippled Gtk version ... and no answer. I was actually supprised to : find DJ had done a Lesstif variation, and had deviated strongly : (forked) from the old/Gtk UI.

: In the end I decided I would be more useful digging TMCC out of the : grave, and bringing it forward a decade to be useful with todays FPGA : products.

: TMCC/FpgaC suffers badly from the same working set problems I posed for : PCB. Very small changes in a project code transition FpgaC compile : times from a few minutes, to hours ... and in one case from 45 minutes : to over a day and a half, simply by exceeding working set sizes for L2 : cache. Interestingly enough, the same C code does the same thing to : GCC at a slightly different boundry point.

: Student, and other toy, projects frequently contain simple algorithms : that are ok inside a typical processors L2 cache size these day ... : that when the data set grows just slightly, fail horribly performance : wise. In this case, linearly searching a linked list works fine up to : about 90-95% of L2 cache size. When you exceed that threshold, : performance drops and run times increase roughly 10X or better because : of the nature of LRU or pseudo-LRU cache replacement policy.

: Consider for example, a small cache of 4 "bins" of marbles taken from a : bowl of 300 marbles. If we first reference a certain red marble, it's : taken from the bowl and placed in a cache bin after searching the 300 : marbles for it. We keep using, and replacing the red marble avoiding : the search in the bowl. Later we also use a green, blue, and yellow : marble, which take the three remaining bins in the cache. Because of : the nature of the task, we always use red, green, blue, and yellow in : that order, always taking from the cache, and replacing in the cache.

: When our working set expands to five mables, we have a cache failure, : which goes like this. We access the red, green, blue, yellow marbles in : order from the cache, then we need a white marble. The red mable is : least reciently used so it's removed from the cache, and replaced with : the white marble. We then repeat our cycle, next needing the red mable : which is no longer in the cache, so we must fetch it from the bowl, and : due to the LRU algorithm, replace the green marble with the red : marble. However next we need the green mable, which forces the blue out : of the cache. Next we need the blue marble, forcing the yellow out of : the cache. Next we need the yellow, forcing the white out of the cache : ... and so on, with every cache hit faulting, requiring a lengthy : access and search of the bowl.

: LRU algorithms fail horribly with sequential searches of the cached : working set, resulting in a very sharp reduction in performance as the : working set is exceeded. In FpgaC's case, the primary data structures : are linked lists which are frequently searched completely to verify the : lack of duplicate enteries when a new item is created. When the working : set under these linked lists exceeds the processors L2 cache size, run : times jump by more than a factor of 10 for many machines these days ... : the ratio of L2 cach performance compared to memory performance. Thus, : depending on the host processors L2/L3 cache size, there are critical : points for FpgaC where the run times to compile incrementally small : increases in program size, jumps dramatically. The fix for this is : relatively simple, and will occur soon, which is to replace the linear : searches with tree or hash searches to avoid referencing the entire : working set to invoke the LRU replacement failure mode problem.

: Similar problems exist at several levels in the responsiveness of PCB. : Any event which forces a search of the design space, will require the : working set to hold all the objects that are required to be searched. : When that working set increases past various cache sizes, noticable : increases in latency will result, to the point that they are visible in : the UI ... that point will vary depeding on the particular machine : being used (L1/L2/L3 cache sizes, and the relative latency of reference : for faults). Developers who only use and test on a fast processors, : with large caches, and fast native memory, will not notice extremely : jerky performance that someone using a P2 333 celerion (128K cache) : with a 66mhz processor bus and fast page mode memory will encounter. : Slightly larger designs running on 4Ghz processors with 512K caches : will fail equally noticably with a design some 4-10 times larger.

: Certain operations will fail harder, those which invoke a series of X : output, as they will also incure the memory overhead of part of the : kernel, some shared libraries, the X server, and display driver in the : "working set" for those operations. While a 512K cache is four times : larger, the available working set is Cache size minus X working set, : meaning that for small cache sizes there might not be much working set : left at all, while doubling the cache size may actually increase the : usable working set by 10X or more.

: Just taking a guess, PCB + kernel + Xaw + Xserver, probably has a : working set something around or slightly larger than 128K for very : basic PCB tasks. Thus, we will see cache flushing and reloading between : X calls, and locally ok cache performance at both ends. As the L2/L3 : cache grows to 512K this is probably less of a problem.

: What does become a problem is when the PCBX working set get : continually flused by every call, such as making a long series of : little calls to the Xserver, faulting all the way to the X server, and : faulting all the way back ... calling performance drops like a rock ... : factor of 10 or more. This happens when the task at either, or both, of : the PCB or X server end requires a slightly larger working set, making : the total working set for LRU into worst case failure mode.

: I suspect that the Gtk failure modes do this, by including Gtk overhead : into the working set, such that every PCB to Gtk to X server call : faults round trip, and runs at native memory performance. The reason I : believe this is that in my testing a 550Mhz PIII machine with SDRAM is : only about twice as slow, as a 2GHz P4 machine with DDR SDRAM, in this : failure mode ... rather than the 4-6X normal compuation difference when : running at CPU speeds from L1 cache, or even L2 cache.

: With synchronous calls to Gtk and the X server, it's difficult for PCB : to keep it's event processing in real time.

: I have a several day class I used to regularly teach that discusses in : detail, designing for hysteresis problems that occure with step : discontinuties of the processor load vs thruput function that is quite : useful for recognizing and designing architectural solutions to : problems of this class.

: So ... application architecture in respect to working set sizes is a : critical performance issue. Algorithm choices which conserve working : set, and avoid sequential LRU faulting are a critical issue for design. : And carefully managing data set representation for compactness, even at : the cost of a number of cpu cycles for packing/unpacking can greatly : push off the working set failure threshold with care full design. : Consolidation of processes to minimize frequency and location of long : working set calls to external processes (like including them as threads : if necessary) are critical to align them with places in the UI : interaction where latency hiding in transparent, rather than highly : visible.

: fpga snipped-for-privacy@yahoo.com wrote: :> DJ Delorie wrote: :> > fpga snipped-for-privacy@yahoo.com writes: :> > > It really needs to be the same tool, :> >

:> > Or at least *seem* like it's the same tool. Otherwise, I agree. :>

:> On larger designs, memory is being pushed to maintain lists and objects :> instantiated already. Paging severely cuts into performance. When :> running as a separate application, there is substantial page :> replication introduced for every data page for a long list of shared :> library instances, plus replication of the netlists. Likewise, :> performance is critially tied to working set, having a second :> application running concurrently with equally large working set, will :> provoke substantial cache thrashing, which will show up as memory :> latency induced jerkyness in the UI, as the cache is flushed out and :> reloaded between contexts. While these may seem like parameters in the :> application architecture that can be ignored, perceived UI performance :> is heavily dependent on them. Similarly the communication between :> separate applications results in context switches, which causes :> additional cache thrashing by including large sections of the kernel in :> the working set. Consider the processor is some 20-100 times faster :> than L2/L3 cache these days, and the cache is frequently another 10-50 :> times or more faster than memory. Exceeding cache working sets, :> effectively turns the machine into a 50MHz processor again. :>

:> There are substantial performance reasons suggesting that it should be :> the same application, (just a different thread at most) to conserve :> memory resources, and improve performance. While they may not be :> critical for toy student projects, for many real life projects which :> are much larger, they become critical UI problems. The sample :> ProofOfConcept design I sent you, is about 1/5 the size of several :> production designs I have done using PCB. :>

:> When the typical desktop CPU comes standard with 10MB or better of L2 :> cache, these issues might go away. Last time I checked, this was only :> available for high end Itianum processors, well outside the reach of :> most mortals in cost (or me right now).

- A
- Ales Hvezda
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, May 23, 2006 11:31 AM

Hi again,

[snip]

I guess I don't worry too much about people who either commit to do something and then never deliver or never contribute or get busy with real life. Whatever, I'm just happy when people contribute.

[snip]

Attacking you? Huh? I was very careful in my word choice. I was only refering to me, not you. My only point is that everybody has their own different reasons for doing OSS/free software.

Anyways, interesting thread, but this is where I stop off as well. :)

Good luck with your project!

-Ales

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, May 23, 2006 12:28 PM

Ought to do your homework first ... I've sent Harry patches in the past.

Actually, if the only form of bug reporting acceptable is fixes, it will be a very long time before your project is complete and stable. I've provided demonstratable boards to Harry, DJ, etc that demonstrate the unreasonable slowness.

Actually, not unfairly maligned. A clean Gtk port would not have broken the existing Xaw usage until everybody agreed it was a stable replacement. One of the boards I had here last year, took well over a minute to redraw under Gtk after a simple pan, that was under a second with Xaw ... that's not just SLOW, that's unusable. Doesn't mater how pretty its GUI looks.

When you stop listening to other peoples experience, the only other choice is to make all the same mistakes yourself along the way, and hope you actually learn the lessons too, and not do as many less clueful folks do, repeat the same old mistakes for ever, because thats the way it's always been done.

There are far more clueful people in the world willing to share experience, if not treated like clueless right off the bat. There was no reason to jump into this discussion complaining about not supporting gschem and gEDA ... I don't like gschem, and don't use it. one more bastard drawing UI to have to learn. The whole eEDA cludge between gschem and PCB is painful at best. The lack of a consistant UI was one prime complaint to DJ.

What I was talking about, are technical reasons for doing it right as part of PCB, which has nothing to do with gschem, or the turf you where defending by mistakenly attacking me.

In don't think it's nearly as difficult as you were complaining about, and I've already looked. Maybe it's just because I'm not easily scared by complexity, and do a general itterative "Keep It Simple Stupid" (KISS) approach to tackling difficult projects until I become much more experienced with the code. As I noted to Harry several years back ... all the pieces are there in pcb already ... just startout by treating schematic symbols as footprints and keep dual footprint libraries initially (schematic and pcb) and two wire tracks for the design at first. Then clean up the internal interfaces slowly, to include a reasonable formal architecture. Things like crossed wires not forming a connection, unless explictly joined. Like linked references between schematic symbols, pin function lists, and actual foot prints based on industry standards and vendor data. Things like automatic cross notation between instances of the netlist (traces/rats). Be able to pull spice data for not only the design, but the implmentation traces as well, tied to vendor part data. Maybe not all the first year, or even the second or third.

You end up with ONE UI, one project file, and hopefully one consistant parts library.

Years ago, you did a schematic, then laid out the board. These days with FPGAs, I frequently do the PCB then the schematic as nearly all the pins are assigned based on ease of layout, not predefined as you have with comodity parts. It becomes very useful no days, to have both the schematic and the pcb up at the same time, and draw both at the same time, one net at a time.

Used Sony 21" monitors are $50/ea on ebay and dual/triple/quad head is supported in both Windows and Linux. A dual processor 1Ghz system with

4GB of ram is under $500, which makes one heck of a CAD system under linux. My desk has three SGI GDM-5011P's on it which take VGA in. Most peoples Best Buy or Circuit City systems cost more than I paid for the parts on eBay. With large glass tubes not being "cool" these days, high res sony monitors are a MUST BUY for any hardware hobbiest doing CAD while the supply lasts. Get several and use them till they die. I'm 54, and find using large fonts makes web surfing easier on my eyes when I'm not busy doing another design.

I'm an avid hardware hobbiest, and most of the really dense and fun boards I've done are for personal research. I mostly get paid for doing contract software work and networking stuff, with hardware projects a secondary part of my living. I like hardware, and don't turn down the contracts when I can get them. When two pieces of a 16"x22" six layer SMOBC panel are $185 from certain suppliers, that will hold a half dozen projects ... doing quality PCBs is both cheap and fun. I frequently run a homebrew club from my home, and share panel runs, making two peices of most projects $20-60. And about double that if we need to do stencils for both sides. I will be mfg boards as a business later this year, with dual smt pick and place lines and N2 relow ovens. Mostly to produce my own research boards, plus low cost hobby and student FPGA project boards from recycled parts that I have extra. The lines were a couple grand off eBay, and picked up initially to build my home FPGA super computer boards -- which is another fun project I've been working on for a few years. Several thousand FPGAs, MIPS/PPC CPUs, memory, water cooling and a lot of power :)

The boards sent to DJ and Harry to demonstrate the Gtk slowness are all proof of concept designs from my own research projects, some of which I've also sold a few of. I think DJ thinks they are "interesting" too. So when Stuart was getting off that anyone that needs more than a toy design that can be done with the crippled student version of various demo products, I pretty much feel he doesn't have a clue what real hardware geeks want to do in their spare time with $50 of recycled eBay parts :)

I'll be going back to grad school soon, and need my "home computer" for research :)

- D
- DJ Delorie
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, May 23, 2006 1:22 PM

[subject changed to reflect reality, and most people aren't interested in the internals of the pcb program. Please remove the [was...] when replying - DJ]

I think the key to the gtk pcb's sudden slowdown can be found in queuing theory. As you move the mouse, the X server generates events. They come at a certain pace, and you deal with then in a certain time. The size of the queue of events is determined by the input rate and completion rate. One interesting rule - when the input rate exceeds the completion rate, the queue eventually becomes infinite. This "trip point" happens when the redraw exceeds a certain complexity, such as the sample board, and depends on your hardware speed too.

The lesstif HID was designed for my 400MHz laptop, so I went to great lengths to avoid this problem (having been stung by it before). It does two things to avoid the queue.

First, I combine motion events. When I get a motion event I remove all motion events from the queue and query the *current* mouse pointer. Thus, if the system is busy, I end up skipping events to keep up.

Second, I redraw the screen only when the program is otherwise idle. The event handlers only update the data structures, they don't usually draw on the screen, just set a flag to do so later. When the event queue is empty, I test the flag and, if set, *then* I redraw the screen. The net result is, if the redraw takes 0.1 second, the screen will be done redrawing 0.1 second after you stop moving the mouse.

Also, note that PCB's core uses an rtree to store the data it needs for a redraw. If you're looking at the whole board, you have no choice but to go through the whole data list. However, if you zoom in, the working set shrinks to only those objects that are visible.

- I
- Ian Bell
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, May 23, 2006 2:16 PM

Experiences clearly differ. Kicad has never segfaulted for me but gEDA has.

You did. I does hierarchical schematics.

It is, but is is more mature.

Not 20, 5 at present.

Yes.

Kicad is too.

Interesting. I always found the gEDA UI very awkward and counter intuitive. Kicad does pretty much what I would expect.

Interesting yes, but given Kicad's relative youth it's hardly a fair contest. It's current main limitations regarding scalability are the number of layers and the lack of a decent autorouter.

Ian

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, May 23, 2006 11:39 PM

This is right on, and the queuing theory issues are a fundamental partt of understanding the hysteresis problems presented by exceeding working set. If it takes 0.1 seconds (as you state below) to redraw the screen while running cleanly out of l1/L2/L3 cache, then you can accept mouse events at 100 per second and keep up with the users motion without creating a backlog that grows queue length. The problem starts when the working set is exceeded, processor bandwidth drops due to cache faulting, and all of a sudden it starts taking 10 time longer to redraw the screen.

I actually used to use PCB on a 233mhz Compaq a few years back when I was traveling, and it was pretty usable under RH9, a 2.2 kernel, and

96mb of memory. Light paging traffic, mostly caused by crond which I would normally turn off to get rid of the jerkiness.

Bravo, that is the first step in countering working set problems ... reduce the total work load linearly, and work harder in each cache context before faulting to another. By combining motion events you slow down the rate of context switching to the X server, and do more work per context switch.

AKA latency hiding, by defering work to a less critical time.

That was visible in the first Gtk release last year, that zooming in would reduce the latency lag, and at some point it would suddenly become realtime again.

Linked lists and trees frequently have a very poor memory usage efficiency with small data structures and lots of pointer overhead, combined with a kitchen sink problem (everything that is related to an object is tossed into the same structure). FpgaC suffers from this pretty badly.

Let me explain ... the problem is that to get to one or two words of data, we frequently reference a structure that has maybe a dozen related variables that are not used for every operation - plus a couple pointers for linking the objects, all without cache line alignment. When working sets start thrashing caches, there are smarter ways to conserve working set by getting better memory utilization:

1) separate out variables which a heavily searched/used from those that are not critical, so large working set, latency critical operations fetch from memory only what is needed. Using this strategy, the attributes necessary for drawing are in one object, and other non-critical attributes are in a secondary object. It might even be useful to compact some of these attributes in the latency critical object, and keep a non-compact native form in the non-critical attribute object. 2) use segmented tables (arrays, vectors) instead of single object linked lists where possible to avoid the pointer overheads. Using this strategy there may still be linked lists and trees, but each leaf node includes a dozen or more objects in a table. Thus the ratio of usable data to pointer overhead greatly improves. 3) Use some care in designing and allocation of your objects so that they do not span multiple cache lines. Since a full cache line is read/written as a memory operation, when an object uses the end of one and the beginning of another cache line, two cache lines are partially used, which cuts memory bandwidth in half.

using these strategies can improve working set performance by a factor of 3 to 10, and application performace once the working set exceeds cache sizes by 3 to 20 times.

One last tidbit ... dynamically linked shared libraries have signficant working set bloat and poor cache line balancing ... it's sometimes useful to statically link to get better cache performance ... but that is another long discussion about why.

Toy student applications don't need to worry about these problems most of the time. Larger production applications where interactive performace and batch run times are import, frequently can not avoid these optimizations.

My two cents worth from 30 years of performance engineering experiece from fixing bloated applications.

- D
- DJ Delorie
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, May 24, 2006 12:04 AM

10 per second.

Even simpler. If you're getting 10 mouse events per second, and it takes 0.099 seconds to process them, you're OK. If it takes 0.101 seconds to process them, you're toast. The trip point isn't filling cache, it's just that the redraw finally takes longer than the available time, even if it's still running cleanly out of cache.

More importantly, I redraw the board less often. Redraws are expensive no matter how much cache you have.

PCB does this.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, May 24, 2006 12:47 PM

Ok DJ,

I spent all night playing with the Gtk current release version, including striping the board I sent you down to the PCI frame and a few hundred connections around the edge connector, cap's, and the like. All other chips and connections removed from the board. Resulting complexity is less than a typical microprocessor student board.

It still totally lags, and is in short a piece of crap runing on a uniprocessor 2Ghz P4 with 4GB. Of course, the lesstif version runs like a bat out of hell without any problems, on either this cut down example, or the full example I sent you.

So, in short ... it's NOT a "catch up with mouse events" scenario at all, as moving the PCI frame to the right as before has nearly the same

8-10 second redraw all the rubber banded lines problems, and it takes equally long to redraw it back original with "u" key for undo ... that is NOT a catch up with the mouse problem.

Trying to drag a bounding box is suffering badly with 1/2 to 2 second delays on mouse movements, cursor left/right movement and pan with arrow keys lags badly, in short the whole thing just lags like hell with a minor toy level design with NO components other than a couple dozen caps, and a few hundred wires.

A bunch of other things are total broke as well. Try tab to flip to the back side, and drag out a bounding box to select a region. The box frame isn't clipped, scaled or mirrored to the area the mouse drags out ... and neither is the region that actually selects.

If you select a large number of wires, such as the right side of the ProofOfConcept board I sent, the entire right half ... 3 collums, select, and pull down delete selected, it chugs away for a long time ... hitting undo chugs away again for a long time.

None of this happens with the old Xaw version, or the new Lesstif version ....

so in short ... the Gtk version just plain sucks rocks after a year of development as the prime recomended default release canidate.

Your Lesstif version has all the performance of the original Xaw version ... so my hat's off to DJ.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, May 24, 2006 1:19 PM

Does the Gtk version have an extra flush the display list to the view port call someplace in the main processing loop ... it's display behavior is far too agressive wanting to updat the display.

- D
- DJ Delorie
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, May 24, 2006 1:49 PM

I think this relates to the "deferred redraw" thing I mentioned about the lesstif hid. In this case, what's happening with the gtk hid is that the view is being refreshed for EACH trace that gets moved. Even at 30 FPS, for hundreds of traces that's many seconds to redraw.

If you want to take a stab at adding deferred refresh to the gtk hid, it would be much appreciated.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, May 24, 2006 1:57 PM

That's probably the solution for one problem. It has a number of others, all related to excessive lag in responsiveness, some of which don't have to do with drawing .... like a huge lag when you take the cursor off the view port to another window, and back. I suspect this is a number of problems, that are additive, not a single quick kill .... unless it's a single extra viewport update that got left in from debugging a year ago.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, May 24, 2006 2:04 PM

There are too many other things that are broke, like the bounding box problem on the back side, the lag on bounding box, arrow key activities, and the like that all smell like this project was never finished ... somebody quit in the middle of the port without doing a good job of debug and checkout.

Your version works ... with some nits you are already aware of.

I think I'll go back to 050318 until you guys are finally done ... it works just fine too, and I've grown skilled at using it's UI ... yours has everything different ... from mouse actions, menus, pull downs, etc ... and you don't seem done yet.