on the second page there's a description of the memory architecture. A can't wrap around my head how did they configure 8 BRAMs to a 32KB memory with 12 independent ports. Can anybody explain this to me?
The 32 kB CRAM has four ports, but each port only accesses a quarter of
the memory. If each port were able to access the full memory there
would be no need for the crossbar switch. So really the CRAM is four
separate 8 kB RAMs. Or did I miss something?
I only skimmed that paper so my impression could be wrong, but it seemed
that the buzzword/content ratio was fairly high. In such cases, close
examination of questionable content -- when it doesn't lead to outright
evasion -- often leads to finding out they're doing something
predictable, ordinary, and not too astonishing.
If I recall correctly, the Xilinx BRAMs are already dual-port. The
predictable, ordinary, and not too astonishing way that I'd do what they
describe would be to design an arbitrating interface to the BRAM blocks
in such a manner that any two accesses to any one BRAM would be "perfect"
dual-port, but simultaneous accesses beyond two would generate "not
I might even design it so that while there are 12 ports, each port would
have preferred access to just one or two BRAMs, with accesses to BRAMs
that are "further afield" (conceptually if not physically on the chip)
would take longer.
You could possibly speed this up by stacking up any write requests to be
done in slack time, but you'd have to do it in a consistent manner -- you
wouldn't a subsequent read of old data, for instance.
So, basically, you'd do the same thing as you'd do if you were making a
"dual port" RAM with a controller and a bunch of single-port RAM chips,
only more so.
Control systems, embedded software and circuit design
I should mention -- for the most part I'm not an FPGA guy. I'm circuit
design and software, with enough FPGA chops to do simple stuff. I have
done FPGA projects that have left the customer happy, but I've done more
work that involves credibly _threatening_ (one says "offering" with an
ingratiating smile for maximum effect) to write some Verilog or VHDL.
Given a reasonably competent and territorial FPGA guy, this usually
prompts a response of "here, Tim, let me take that off your hands".
Control systems, embedded software and circuit design
yes. That makes sense. I just thought that if they mentioned pipeline halti
ng in the case of simultaneous access to 2:1 mux, that means that there is
no arbitrating in 4 ports of memory. But now I see that in the sentence abo
ut halting they don't mention 2:1 mux. It can be about this 4 ports too.
I think it's more like the FPGA guy feels you are offensive to the art
of FPGAs. Maybe not truly dangerous, but not the sort of thing an FPGA
guy wants to think about. I haven't worked with you obviously, I'm just
going from your description. :)
Actually I just finished a simple presentation on test benches for an
FPGA workshop being given next week. The guy doing it is providing an
8080 core that he is testing by loading a Forth interpreter into the CPU
memory. I ran a testbench driven by a text command file and in the
simple test I put together I found a bug in the increment/decrement
commands. I mentioned that in my writeup since that is a good
illustration of using testbenches.
Yeah, see, that is the part where it gets dangerous. Not trying to be
rude, but when you "sort of" know how to do something, that is when
mistakes get made. HDL isn't so horribly hard, but there are subtleties
that can cause hard to debug problems. When you only do something once
in a while and nothing very big, you never really learn all the details.
I don't consider myself proficient in Verilog because I have only worked
with it a bit. In particular there are a lot of assumptions the tools
make when you do arithmetic that need to be understood to get the
correct results. I don't know them all and I've never found a good book
on the subject, so I stick with VHDL where most everything is explicit
with no assumptions. Then VHDL can be cranky if you don't know how to
work within the various restrictions.
You must not be so bad at HDL. You've gotten along so far without major
My main gripe with non-FPGA people is the frequent biases they seem to
develop that FPGAs have to be power hungry, large and complex to work
with. I mostly work with small devices that use little power from a
single power supply where the only complexity is in the design itself.
Not too many people appreciate these types of devices.
As a demonstration I have wanted to design a battery powered WWVB
receiver in an FPGA. Will do it some day.
Well, that's why I prefer to hand my system designs to an FPGA expert
instead of trying to implement them myself. In general, if a customer
calls and wants FPGA work done I say "no" and explain why. If money got
tight and there was no other work available, and someone just HAD to have
ME do the FPGA work, then I'd (A) warn them one last time that I'm just a
hacker, and (B) deeply discount my time.
I think I've done FPGA work for money twice: once was for a friend who
understood, and the work was deeply discounted; the other was for a
customer who had FPGA people on staff, but those guys didn't have time to
do the work when I did it. In that case the first communication I got
from the FPGA guy was "you're a software guy, aren't you?"
I want to see it! Most of the work that I've done when there are FPGAs
in the mix have been with processors doing the complicated slow stuff,
and FPGAs doing the (relatively) simple fast stuff -- things like video
processing, where the FPGA is fondling every pixel but the processor is
managing managing the transfer of lines over a digital data link (and
with a different digital link the processor would be further away than
For me, at least, I find that to do designs in an HDL I essentially have
to think at the same level of detail that I do when writing code in
assembly -- trying to get more abstract, at least for me, leads to lots
of wasted gates and unexpectedly long propagation times.
You could refer the work to someone you know. If the work is bigger
than just the FPGA work, you can sub it out to someone else. I did that
on a project that was just too large for me and it worked pretty well.
I did have one part I had to take over. I had worked with the guy one
time and had not gained a full appreciation of the flake factor. In the
end it was just moonlighting work for him and he gave it last priority
over even his personal life. It worked out in the end.
Lol. It often shows, but being a software person doesn't automatically
make someone a poor candidate for FPGA work. They just have to be a bit
flexible and forget a *lot* of what they know about software. I guess
it's hard to know what to forget and what to retain. It helps if they
have someone more experienced to oversee the work and to review the
In the other direction, I helped a friend who is a great hardware
designer spin up on FPGA work. I did one module and turned it over to
him with some one-on-one time to get him familiar with what I did and he
was off and running.
Depending on how fast you need, there are some low power iCE40 devices
from Lattice, since they bought Silicon Blue. Of course the power goes
up with clock speed, but they start at nearly zero (100 uA) and go up
linearly. They aren't real big and don't have multipliers, so likely
not for the above design. The XP2 line has DSP blocks plus multipliers
and would be lower power than many product lines, but not as low as the
It helps to think in terms of the functional blocks that are optimal for
FPGAs. Adders are good, but you can often avoid an extra adder by
making sure the carry out can be used for a compare operation. Any time
you do an operation on a bus, consider how it would be implemented.
Muxes eat up LUTs and can often be combined with an adder by zeroing an
input. It all depends on the design. Sometimes the design just eats
LUTs and there's nothing to be done other than minor optimizations.
The problems with HDL don't come from these sorts of issues though.
They come from misusing the language. Which do you work in, VHDL or
I am equally inept in either one. Really, I think that at my level it
has less to do with the language and more to do with not having an
intuitive grasp of what's going to fit well with a given FPGA structure
vs. what's not, and if I'm not careful I start to think sequentially,
which translates to long if-then-else chains in either language, which
get synthesized as these really wide-to-one MUX's with high delays in the
Yeah, I know what you mean. The one guy I know who was able to complete
a design without going over to the "dark" side (thinking hardware)
needed to do a not too complex project in a very large FPGA. So he
didn't care about size but needed to get it done quickly. I gave him
some help off the group, but not a lot and he was able to get the job
done. He was appreciative since we had talked about my coming to town
to help formally, but management turned it down. He ended up showing
his appreciation by getting them to send me a check for my troubles,
which was not necessary at all. It was only a few hours which I chalked
up to good will. So I sent him a polo shirt with my company logo, lol.
If you would like any assistance or review of code, I'd be happy to help
on the books or if it isn't too many hours, off the books. I'm having
my hip done in two weeks, so I'll have plenty of time recouping when I'm
bored and looking for something to do. Likely I won't really find much
truly wrong with your code, but maybe I'll have suggestions on
techniques that might make it simpler.
The funny part is I can do this stuff easily, but I would have a very
hard time writing it down. I just put together a brief presentation to
go with a presentation Dr. Ting is doing for the Sunnyvale Forth Users
Group (SVFIG) on a VHDL 8080 design. His example code is an 8080,
memory and a UART. He verifies it runs by running Forth on it and
getting an OK> prompt. So I wrote a testbench that reads commands from
a text file to instruct it to write data on the data bus and verify the
addresses and returned data. With just an 11 line test program I found
a flaw in the increment/decrement instructions. I guess I got lucky.
I prepared some slides^H^H^H^H^H^H uh, a power point presentation, but I
don't know it is so good. I should stick to writing code I think.
When I started I had hopes of founding a vast design engineering bureau,
but it turns out that my sales & marketing chops simply aren't up to it.
So it's just me.
Should a project cross my desk that requires combined FPGA & software
work I'll give you a call -- should one cross my desk that requires
_just_ FPGA work, I may well just give the prospect your name.
Control systems, embedded software and circuit design
I was rereading my post and I should clarify that the 11 line (not
counting comments) program is the test command file. The VHDL file is
around 250 lines with the heaeder. Here is the command file
-- This is a comment
FETCH 0000 00 -- NOP
FETCH 0001 AF -- CLR ACCUM
FETCH 0002 3D -- DCR A
FETCH 0003 67 -- MOV H,A
FETCH 0004 6F -- MOV L,A
FETCH 0005 77 -- MOV M,A
WRITE FFFF FF -- verify address and data
-- Write command shows DCR is not correct
FETCH 0006 46 -- MOV B,M
READ FFFF BE -- verify address, provide data
FETCH 0007 00 -- NOP
VHDL text IO is a bit crude, so I had to write routines to read the hex
numbers and parse the fields. Not so hard because I kept it simple.
Yes, their memory is more of "virtual" multiport than actual multiport. I'
ve been studying some multiport architectures, because I may need a blockRA
M that has 4 ports (2 read, 2 write). It seems this quadruples the BRAM re
quirements. E.g., I need 4 36kb dual-port BRAMs to make a single 36kb quad
-port. Plus, I need additional flag/semaphore logic implemented in LUT-RAM
Can't see the rest of the thread, but if your required access times are
lower than the BRAM you can have single data/address buffers on input
and output of a single DP RAM and mux in as many ins and outs as you
have time for, on a sequential basis. Depends how tight the BRAM
availability is in your design I guess.
I also can't see the original post. It would be most helpful if you
could quote sufficient to add some context to your post.
I don't know of any BRAM quad port memories. I think you would need to
consider an alternative method such as FIFOs and some arbitration or if
the data is sequential to use this property to have BRAMs for odd and