on the second page there's a description of the memory architecture. A can't wrap around my head how did they configure 8 BRAMs to a 32KB memory with 12 independent ports. Can anybody explain this to me?
The 32 kB CRAM has four ports, but each port only accesses a quarter of the memory. If each port were able to access the full memory there would be no need for the crossbar switch. So really the CRAM is four separate 8 kB RAMs. Or did I miss something?
I only skimmed that paper so my impression could be wrong, but it seemed that the buzzword/content ratio was fairly high. In such cases, close examination of questionable content -- when it doesn't lead to outright evasion -- often leads to finding out they're doing something predictable, ordinary, and not too astonishing.
If I recall correctly, the Xilinx BRAMs are already dual-port. The predictable, ordinary, and not too astonishing way that I'd do what they describe would be to design an arbitrating interface to the BRAM blocks in such a manner that any two accesses to any one BRAM would be "perfect" dual-port, but simultaneous accesses beyond two would generate "not ready" signals.
I might even design it so that while there are 12 ports, each port would have preferred access to just one or two BRAMs, with accesses to BRAMs that are "further afield" (conceptually if not physically on the chip) would take longer.
You could possibly speed this up by stacking up any write requests to be done in slack time, but you'd have to do it in a consistent manner -- you wouldn't a subsequent read of old data, for instance.
So, basically, you'd do the same thing as you'd do if you were making a "dual port" RAM with a controller and a bunch of single-port RAM chips, only more so.
Control systems, embedded software and circuit design
I should mention -- for the most part I'm not an FPGA guy. I'm circuit design and software, with enough FPGA chops to do simple stuff. I have done FPGA projects that have left the customer happy, but I've done more work that involves credibly _threatening_ (one says "offering" with an ingratiating smile for maximum effect) to write some Verilog or VHDL. Given a reasonably competent and territorial FPGA guy, this usually prompts a response of "here, Tim, let me take that off your hands".
Control systems, embedded software and circuit design
yes. That makes sense. I just thought that if they mentioned pipeline halti ng in the case of simultaneous access to 2:1 mux, that means that there is no arbitrating in 4 ports of memory. But now I see that in the sentence abo ut halting they don't mention 2:1 mux. It can be about this 4 ports too.
I think it's more like the FPGA guy feels you are offensive to the art of FPGAs. Maybe not truly dangerous, but not the sort of thing an FPGA guy wants to think about. I haven't worked with you obviously, I'm just going from your description. :)
Actually I just finished a simple presentation on test benches for an FPGA workshop being given next week. The guy doing it is providing an
8080 core that he is testing by loading a Forth interpreter into the CPU memory. I ran a testbench driven by a text command file and in the simple test I put together I found a bug in the increment/decrement commands. I mentioned that in my writeup since that is a good illustration of using testbenches.
Yeah, see, that is the part where it gets dangerous. Not trying to be rude, but when you "sort of" know how to do something, that is when mistakes get made. HDL isn't so horribly hard, but there are subtleties that can cause hard to debug problems. When you only do something once in a while and nothing very big, you never really learn all the details.
I don't consider myself proficient in Verilog because I have only worked with it a bit. In particular there are a lot of assumptions the tools make when you do arithmetic that need to be understood to get the correct results. I don't know them all and I've never found a good book on the subject, so I stick with VHDL where most everything is explicit with no assumptions. Then VHDL can be cranky if you don't know how to work within the various restrictions.
You must not be so bad at HDL. You've gotten along so far without major issues, right?
My main gripe with non-FPGA people is the frequent biases they seem to develop that FPGAs have to be power hungry, large and complex to work with. I mostly work with small devices that use little power from a single power supply where the only complexity is in the design itself. Not too many people appreciate these types of devices.
As a demonstration I have wanted to design a battery powered WWVB receiver in an FPGA. Will do it some day.
Well, that's why I prefer to hand my system designs to an FPGA expert instead of trying to implement them myself. In general, if a customer calls and wants FPGA work done I say "no" and explain why. If money got tight and there was no other work available, and someone just HAD to have ME do the FPGA work, then I'd (A) warn them one last time that I'm just a hacker, and (B) deeply discount my time.
I think I've done FPGA work for money twice: once was for a friend who understood, and the work was deeply discounted; the other was for a customer who had FPGA people on staff, but those guys didn't have time to do the work when I did it. In that case the first communication I got from the FPGA guy was "you're a software guy, aren't you?"
I want to see it! Most of the work that I've done when there are FPGAs in the mix have been with processors doing the complicated slow stuff, and FPGAs doing the (relatively) simple fast stuff -- things like video processing, where the FPGA is fondling every pixel but the processor is managing managing the transfer of lines over a digital data link (and with a different digital link the processor would be further away than that).
For me, at least, I find that to do designs in an HDL I essentially have to think at the same level of detail that I do when writing code in assembly -- trying to get more abstract, at least for me, leads to lots of wasted gates and unexpectedly long propagation times.
You could refer the work to someone you know. If the work is bigger than just the FPGA work, you can sub it out to someone else. I did that on a project that was just too large for me and it worked pretty well. I did have one part I had to take over. I had worked with the guy one time and had not gained a full appreciation of the flake factor. In the end it was just moonlighting work for him and he gave it last priority over even his personal life. It worked out in the end.
Lol. It often shows, but being a software person doesn't automatically make someone a poor candidate for FPGA work. They just have to be a bit flexible and forget a *lot* of what they know about software. I guess it's hard to know what to forget and what to retain. It helps if they have someone more experienced to oversee the work and to review the results.
In the other direction, I helped a friend who is a great hardware designer spin up on FPGA work. I did one module and turned it over to him with some one-on-one time to get him familiar with what I did and he was off and running.
Depending on how fast you need, there are some low power iCE40 devices from Lattice, since they bought Silicon Blue. Of course the power goes up with clock speed, but they start at nearly zero (100 uA) and go up linearly. They aren't real big and don't have multipliers, so likely not for the above design. The XP2 line has DSP blocks plus multipliers and would be lower power than many product lines, but not as low as the iCE40.
It helps to think in terms of the functional blocks that are optimal for FPGAs. Adders are good, but you can often avoid an extra adder by making sure the carry out can be used for a compare operation. Any time you do an operation on a bus, consider how it would be implemented. Muxes eat up LUTs and can often be combined with an adder by zeroing an input. It all depends on the design. Sometimes the design just eats LUTs and there's nothing to be done other than minor optimizations.
The problems with HDL don't come from these sorts of issues though. They come from misusing the language. Which do you work in, VHDL or Verilog?
I am equally inept in either one. Really, I think that at my level it has less to do with the language and more to do with not having an intuitive grasp of what's going to fit well with a given FPGA structure vs. what's not, and if I'm not careful I start to think sequentially, which translates to long if-then-else chains in either language, which get synthesized as these really wide-to-one MUX's with high delays in the chip.
Yeah, I know what you mean. The one guy I know who was able to complete a design without going over to the "dark" side (thinking hardware) needed to do a not too complex project in a very large FPGA. So he didn't care about size but needed to get it done quickly. I gave him some help off the group, but not a lot and he was able to get the job done. He was appreciative since we had talked about my coming to town to help formally, but management turned it down. He ended up showing his appreciation by getting them to send me a check for my troubles, which was not necessary at all. It was only a few hours which I chalked up to good will. So I sent him a polo shirt with my company logo, lol.
If you would like any assistance or review of code, I'd be happy to help on the books or if it isn't too many hours, off the books. I'm having my hip done in two weeks, so I'll have plenty of time recouping when I'm bored and looking for something to do. Likely I won't really find much truly wrong with your code, but maybe I'll have suggestions on techniques that might make it simpler.
The funny part is I can do this stuff easily, but I would have a very hard time writing it down. I just put together a brief presentation to go with a presentation Dr. Ting is doing for the Sunnyvale Forth Users Group (SVFIG) on a VHDL 8080 design. His example code is an 8080, memory and a UART. He verifies it runs by running Forth on it and getting an OK> prompt. So I wrote a testbench that reads commands from a text file to instruct it to write data on the data bus and verify the addresses and returned data. With just an 11 line test program I found a flaw in the increment/decrement instructions. I guess I got lucky.
I prepared some slides^H^H^H^H^H^H uh, a power point presentation, but I don't know it is so good. I should stick to writing code I think.
Yes, their memory is more of "virtual" multiport than actual multiport. I' ve been studying some multiport architectures, because I may need a blockRA M that has 4 ports (2 read, 2 write). It seems this quadruples the BRAM re quirements. E.g., I need 4 36kb dual-port BRAMs to make a single 36kb quad
-port. Plus, I need additional flag/semaphore logic implemented in LUT-RAM .
Can't see the rest of the thread, but if your required access times are lower than the BRAM you can have single data/address buffers on input and output of a single DP RAM and mux in as many ins and outs as you have time for, on a sequential basis. Depends how tight the BRAM availability is in your design I guess.
I also can't see the original post. It would be most helpful if you could quote sufficient to add some context to your post.
I don't know of any BRAM quad port memories. I think you would need to consider an alternative method such as FIFOs and some arbitration or if the data is sequential to use this property to have BRAMs for odd and even addresses.