On Wednesday, March 20, 2019 at 6:41:55 AM UTC-4, email@example.com wrot
d say that IMHO right now your parliament is facing unusually difficult pro
blem on one hand, but at the same time it's not really "life or death" sort
of the problem. Having troubles and appearing non-decisive in such situati
on is normal. It does not mean that the system is broken.
I was watching a video of a guy who bangs together Teslas from salvage cars
. This one was about him actually buying a used Tesla from Tesla and the m
any trials and tribulations he had. He had traveled to a dealership over a
n hour drive away and they said they didn't have anything for him. At one
point he says he is not going to get too wigged out over all this because i
t is a "first world problem". That gave me insight into my own issues real
izing that what seems at first to me to be a major issue, is an issue that
much of the world would LOVE to have.
I'm wondering if Brexit is not one of those issues... I'm just sayin'...
FPGA design is similar. Consider which of your issues are "first world" is
sues when you design.
I never mentioned a bottom up or a top down approach to design. Nothing ab
out using these small CPUs is about the design "direction". I am pretty su
re that you have to define the circuit they will work in before you can sta
rt designing the code.
Obviously it is like a combination of LUTs with FFs and able to implement a
ny logic you wish including math. BTW, in many devices the elements are no
t at all so simple. Xilinx LUTs can be used as shift registers. There are
additional logic within the logic blocks that allow math with carry chains
, combining LUTs to form larger LUTs, breaking LUTs into smaller LUTs and l
ets not forget about routing which may not be used much anymore, not sure.
So your simple world of four elements is really not so valid.
Why does it need to be inferred. If you want to write an HDL tool to turn
HDL into processor code, have at it. But then there are other methods. So
meone mentioned his MO is to use other tools for designing his algorithms a
nd letting that tool generate the software for a processor or the HDL for a
n FPGA. That would seem easy enough to integrate.
Huh? You can't simulate code on a processor???
You seem to have left the roadway here. I'm lost.
I don't follow your logic. What is different about the ARM processor from
the stack processor other than that it is larger and slower and requires a
royalty on each one? Are you talking about writing the code in C vs. what
ever is used for the stack processor?
The point of the many hard cores is the saving of resources. Soft cores wo
uld be the most wasteful way to implement logic. If the application is lar
ge enough they can implement things in software that aren't as practical in
HDL, but that would be a different class of logic from the tiny CPUs I'm t
You lost me with the gear shift. The mention of instruction rate is about
the CPU being fast enough to keep up with FPGA logic. The issue with "hete
rogeneous performance" is the "heterogeneous" part, lumping the many CPUs t
ogether to create some sort of number cruncher. That's not what this is ab
out. Like in the GA144, I fully expect most CPUs to be sitting around most
of the time idling, waiting for data. This is a good thing actually. The
se CPUs could consume significant current if they run at GHz all the time.
I believe in the GA144 at that slower rate each processor can use around 2
.5 mA. Not sure if a smaller process would use more or less power when run
ning flat out. It's been too many years since I worked with those sorts of
I don't usually think of designing in those terms. If I want to design som
ething, I design it. I ignore many tools only using the ones I find useful
. In this case I would have no problem writing code for the processor and
if needed, rolling into the FPGA simulation a model of the processor to run
the code. In a professional implementation I would expect these models to
be written for me in modules that run much faster than HDL so the simulati
on speed is not impacted.
I certainly don't see how P&R tools would be a problem. They accommodate m
ultipliers, DSP blocks, memory block and many, many special bits of assorte
d components inside the FPGAs which vary from vendor to vendor. Clock gene
rators and distribution is pretty unique to each manufacturer. Lattice has
all sorts of modules to offer like I2C and embedded Flash. Then there are
entire CPUs embedded in FPGAs. Why would supporting them be so different
from what I am talking about?
Equally, you can make anything sound simple if you are vague enough and
wave your hands around.
I did not use the phrase "software running on such heterogeneous cores"
- and I am not trying to make anything difficult. You are making cpu
cores. They run software. Saying they are "like logic elements" or
"they connect directly to hardware" does not make it so - and it does
not mean that what they run is not software.
I agree that VHDL is software. And yes, there are usually processes in
I am not /worrying/ about these devices running software - I am simply
saying that they /will/ be running software. I can't comprehend why you
want to deny that. It seems that you are frightened of software or
programmers, and want to call it anything /but/ software.
If the software a core is running is simple enough to be described in
VHDL, then it should be a VHDL process - not software in a cpu core. If
it is too complex for that, it is going to have to be programmed
separately in an appropriate language. That is not necessarily harder
or easier than VHDL design - it is just different.
If you try to force the software to be synchronous with timing on the
hardware, /then/ you are going to be in big difficulties. So don't do
that - use hardware for the tightest timing, and software for the bits
that software is good for.
I'd expect that the sensible way to pass data between these, if you need
to do so much, is using FIFO's.
On Wednesday, March 20, 2019 at 6:56:51 AM UTC-4, firstname.lastname@example.org wrot
t's needed is a combinations of existing tools - compilers, assemblers, pro
bably software simulator plug-ins into existing HDL simulators, but the lat
er is just luxury for speeding up simulations, in principle, feeding HDL si
mulator with RTL model of the CPU core will work too.
I agree, but I think it will be very useful to have a proper model of the C
PUs for faster simulations. If it were one CPU it's different. But using
100 CPUs would very likely make simulation a real chore without a fast mode
niches. It's extremely rare that user's design uses all or majority of the
features of given FPGA device and need LUTs, embedded memories, PLLs, mult
iplies, SERDESs, DDR DRAM I/O blocks etc in exact amounts appearing in the
This is exactly the reason why FPGA companies resisted even incorporating b
lock RAM initially. I recall conversations with Xilinx representatives abo
ut these issues here. It was indicated that the cost of the added silicon
was significant and they would be "seldom" used. Now many people would not
buy an FPGA without multipliers and/or DSP blocks. This is really just an
other step in the same direction.
sks and other NREs are mighty expensive while silicon itself is relatively
cheap. Multiple small hard CPU cores are really not very different from fea
tures, mentioned above.
I don't know the details of costs for FPGAs. What I do know is that the CP
Us I am talking about would use the silicon area of a rather few logic bloc
ks. The reference design I use is in a 180 nm process and is an eighth of
a square mm. With an 18 nm process the die area would be 1,260 sq um. Tha
t's not very big. 100 of them would occupy 0.126 sq mm. If they have much
use, that's a pretty small die area. For comparison, an XC7A200T has a di
e area of about 132 sq mm and 33,000 slices for an area of 3,923 sq um per.
Of course this is loaded with overhead which is likely more than half the
area, but it gives you some perspective about the cost of adding these CPU
s... very, very little, around the die area of a single slice. It also giv
es you an idea of how large the FPGA logic functions have grown.
Professionally, since 1978 I've done everything from low noise
analogue electronics, many hardware-software systems using
all sorts of technologies, networking at all levels of the
protocol stack, "up" to high availability distributed soft
And almost all of that has been on the bleeding edge.
So, yes, I do have more than a passing acquaintance with
the characteristics of many hardware and software technologies,
and where partitions between them can, should and should not
Whatever is being proposed, is it old or new?
If old then the OP needs enlightenment and concrete
examples can easily be noted.
If new, then provide the concepts.
Not trying to make it sound "simple". Just saying it can be useful and not
the same as designing a chip with many CPUs for the purpose of providing l
ots of MIPS to crunch numbers. Those ideas and methods don't apply here.
You don't need to complicate the design by applying all the limitations of
multi-processing when this is NOT at all the same. I call them logic eleme
nts because that is the intent, for them to implement logic. Yes, it is so
ftware, but that in itself creates no problems I am aware of.
As to the connection, I really don't get your point. They either connect d
irectly to the hardware because that's how they are designed, or they don't
... because that's how they are designed. I don't know what you are saying
Enough! The CPUs run software. Now, what is YOUR point?
Ok, now you have crossed into a philosophical domain. If you want to think
in these terms I won't dissuade you, but it has no meaning in digital desi
gn and I won't discuss it further.
Ok, so what?
LOL! You are thinking in terms that are very obsolete. Read about how the
F18A synchronizes with other processors and you will find that this is an
excellent way to interface to the hardware as well. Just like logic, when
the CPU hand shakes with a logic clock, it only has to meet the timing of a
clock cycle, just like all the logic in the same design. In a VHDL proces
s the steps are written out in sequence and not assumed to be running in pa
rallel, just like software. When the process reaches a point of synchroniz
ation it will halt, just like logic.
Between what exactly??? You are designing a system that is not before you.
More importantly you don't actually know anything about the ideas used in
the F18A and GA144 designs.
I'm not trying to be rude, but you should learn more about them before you
assume they need to work like every other processor you've ever used. The
F18A and GA144 really only have two particularly unique ideas. One is that
the processor is very, very small and as a consequence, fast. The other i
s the communications technique.
Charles Moore is a unique thinker and he realized that with the advance of
processing technology CPUs could be made very small and so become MIPS fodd
er. By that I mean you no longer need to focus on utilizing all the MIPS i
n a CPU. Instead, they can be treated as disposable and only a tiny fracti
on of the available MIPS used to implement some function... usefully.
While the GA144 is a commercial failure for many reasons, it does illustrat
e some very innovative ideas and is what prompted me to consider what happe
ns when you can scatter CPUs around an FPGA as if they were logic blocks.
No, I don't have a fully developed "business plan". I am just interested i
n exploring the idea. Moore's (Green Array's actually, CM isn't actively w
orking with them at this point I believe) chip isn't very practical because
Moore isn't terribly interested in being practical exactly. But that isn'
t to say it doesn't embody some very interesting ideas.
ks since 2006, you are really not easy to understand.
Is it sort of admission that you indeed never designed with soft cores?
variation of something old and routine and obviously working.
It is a new variation of of old concept.
A cross between PPCs in ancient VirtexPro and soft cores virtually everywhe
re in more modern times.
Probably, best characterized by what is not alike: it is not alike Xilinx Z
ynq or Altera Cyclone5-HPS.
"New" part comes more from new economics of sub-20nm processes than from ab
stractions that you try to draf into it. NRE is more and more expensive, ga
tes are more and more cheap (Well, the cost of gates started to stagnate in
last couple of years, but that does not matter. What's matter is that at s
omething like TSMC 12nm gate are already quite cheap). So, adding multiple
small CPU cores that could be used as replacement for multiple soft CPU cor
es that people already used to use today, now starts to make sense. May be,
it's not a really good proposition, but at these silicon geometries it can
't be written out as obviously stupid proposition.
It appears that I don't agree with Rick about "how small is small" and resp
ectively about how many of them should be placed on die, but we probably ag
ree about percentage of the area of FPGA that intuitively seem worth to all
ocate for such feature - more than 1% but less than 5%.
Also he appears to like stack-based ISAs while I lean toward more conventio
nal 32-bit or 32/64-bit RISC, or, may be, even toward modern CISC akin to R
enesas RX, but those are relatively minor details.
Fair enough. I have not suggested it was like using lots of CPUs for
number crunching. (That is not what I would think the GA144 is good for
I agree that software should not in itself create a problem. Trying to
think of them as "logic" /would/ create problems. Think of them as
software, and program them as software. I expect you'd think of them as
entirely independent units with independent programs, rather than as a
multi-cpu or heterogeneous system.
"Synchronise directly with hardware" might be a better phrase.
My point was that these are not logic, they are not logic elements (even
if they could be physically small and cheap and scattered around a chip
like logic elements). Thinking about them as "sequential logic
elements" is not helpful. Think of them as small processors running
simple and limited /software/. Unless you can find a way to
automatically generate code for them, then they will be programmed using
a /software/ programming language, not a logic or hardware programming
language. If you are happy to accept that now, then great - we can move on.
That is not using software for synchronising with hardware (or other
cpus) - it is using hardware.
When a processor's software has a loop waiting for an input signal to go
low, then it reads a byte input, then it waits for the first signal to
go high again - that is using software for synchronisation. That's okay
for slow interfacing. When it waits for one signal, then uses three
NOP's before setting another signal to get the timing right, that is
using software for accurate timing - a very fragile solution.
When it is reading from a register that is latched by an external enable
signal, it is using hardware for the interfacing and synchronisation.
When the cpu has signals that can pause its execution at the right steps
in handshaking, it is using hardware synchronisation. That is, of
course, absolutely fine - that is using the right tools for the right jobs.
You use VHDL processes for cycle-precise, simple sequences. You use
software on a processor for less precise, complex sequences.
Between whatever you want as you pass data around your chip.
Communication between the nodes is with a synchronising port. A write
to the port blocks until the receiving node does a read - similarly, a
read blocks until the sending node does a write. Hardware
synchronisation, not software, and not entirely unlike an absolutely
minimal blocking FIFO. It is an interesting idea, though somewhat limiting.
As I said before, it is a very interesting and impressive concept, with
a lot of cool ideas - despite being a commercial failure.
I think one of the biggest reasons for its failure is that it is a
technologically interesting solution, but with no matching problems -
there is no killer app for it. When combined with a significant
learning curve and development challenge compared to alternative
I want to know if that is going to happen with your ideas here. Sure,
you don't have a full business plan - but do you at least have thoughts
about the kind of usage where these mini cpus would be a technologically
superior choice compared to using state machines in VHDL (possibly
generated with external programs), sequential logic generators (like C
to HDL compilers, matlab tools, etc.), normal soft processors, or normal
Give me a /reason/ to all this - rather than just saying you can make a
simple stack-based cpu that's very small, so you could have lots of them
on a chip.
Ok, please tell me what those problems would be. I have no idea what you m
ean by what you say. You are likely reading a lot into this that I am not
I don't know why and likely I'm' not going to care. I think you need to le
arn more of how the F18A works.
You have it backwards. Please show me what you think the problems are. I
don't care if they run software or have a Maxwell demon tossing bits about
as long as it does what I need. You seem to get hung up on terminology so
So??? You are the one who keeps talking about software/hardware whatever.
I'm talking about the software being able to synchronize with the clock of
the other hardware. When that happens there are tight timing constraints
in the same sense of the software sampling an ADC on a periodic basis and h
aving to process the resulting data before the next sample is ready. The o
nly difference is something like the F18A running at a few GHz can do a lot
in a 10 ns clock cycle.
That is your construct because you know nothing of how the F18A works. As
I've mentioned before, you would do well to read some of the app notes on t
his device. It really does have some good ideas to offer.
You are making arbitrary distinctions. The point is that if these CPUs are
available they can be used to implement significant sections of logic in l
ess space on the die than in the FPGA fabric.
FIFOs are used for specific purposes. Not every interface needs them. You
r suggestion that they should be used without an understand of why is prett
Oh, what are the limitations? Also be aware that the blocking doesn't need
to work as you describe it. Mostly the block would be on the read side, a
processor would block until the data it needs is available... or a clock s
ignal transitions to indicate the data that has been calculated can be outp
ut... just like other logic the LUT/FF logic blocks of an FPGA.
Saying there is no killer app is rather the result than the problem. Yes,
it was designed out of the idea of "what happens when I inter-connect a bun
ch of these processors?" without considering a lot of the real world design
needs. The chip has limited RAM which could have been included in some wa
y even if not on each processor. There is no Flash, which again could have
been included. The I/Os are all 1.8 volts. There was no real memory inte
rface provided, rather a DRAM interface was emulated in firmware and actual
ly doesn't work, so one had to be written for static RAM which is hard to c
ome by these days. I don't recall the full list.
But this is not about the GA144.
The point wasn't that I don't have a business plan. The point was that I h
aven't given this as much thought as would have been done if I were working
on a business plan. I'm kicking around an idea. I'm not in a position to
create FPGA with or without small CPUs.
Why? Why don't you give ME a reason? Why don't you switch your point of v
iew and figure out how this would be useful? Neither of us have anything t
o gain or lose.
I don't have any good ideas of what these might be used for. And I
can't see how it ends up as /my/ responsibility to figure out why /your/
idea might be a good idea.
You presented an idea - having several small, simple cpus on a chip.
It's taken a long time, and a lot of side-tracks, to drag out of you
what you are really thinking about. (Perhaps you didn't have a clear
idea in your mind with your first post, and it has solidified underway -
in which case, great, and I'm glad the thread has been successful there.)
I've been trying to help by trying to look at how these might be used,
and how they compare to alternative existing solutions. And I have been
trying to get /you/ to come up with some ideas about when they might be
useful. All I'm getting is a lot of complaints, insults, condescension,
patronisation. You tell me I don't understand what these are for - yet
you refuse to say what they are for (the nearest we have got in any post
in this thread to evidence that there is any use-case, is you telling me
you have ideas but refuse to tell me as I am not an FPGA designer by
profession). You are forever telling me about the wonders of the F18A
and the GA144, and how I can't understand your ideas because I don't
understand that device - while simultaneously telling me that device is
irrelevant to your proposal. You are asking for opinions and thoughts
about how people would program these devices, then tell me I am wrong
and closed-minded when I give you answers.
Hopefully, you have got /some/ ideas and thoughts out of this thread.
You can take a long, hard look at the idea in that light, and see if it
really is something that could be useful - in today's world with today's
tools and technology, or tomorrow's world with new tools and development
But next time you want to start a thread asking for ideas and opinions,
how about responding with phrases like "I hadn't thought of it that
way", "I think FPGA designers IME would like this" - not "You are wrong,
and clearly ignorant".
You are a smart guy, and you are great at answering other people's
questions and helping them out - but boy, are you bad at asking for help
On Thursday, March 21, 2019 at 4:21:13 AM UTC+2, email@example.com wr
. I'm talking about the software being able to synchronize with the clock
of the other hardware. When that happens there are tight timing constraint
s in the same sense of the software sampling an ADC on a periodic basis and
having to process the resulting data before the next sample is ready. The
only difference is something like the F18A running at a few GHz can do a l
ot in a 10 ns clock cycle.
I certainly don't like "few GHz" part.
Distributing single multi-GHZ clock over full area of FPGA is non-starter f
rom power perspective alone, but even ignoring the power, such distribution
takes significant area making the whole proposition unattractive. As I und
erstand it, the whole point is that this thingies take little area, so they
are not harmful even for those buyers of device that don't utilize them at
all or utilize very little.
Alternatively, multi-GHZ clocks can be generated by local specialized PLLs,
but I am afraid that PLLs would be several times bigger than cores themsel
ves and need good non-noisy power supplies and grounds that are probably ha
rd to get in the middle of the chip etc... I really know too little about P
LLs, but I think that I know enough to conclude that it's not much better i
dea than chip-wide clock distribution at multi-GHZ.
My idea of small hard cores is completely different in that regard. IMHO, t
hey should run either with the same clock as surrounding FPGA fabric or wit
h clock, delivered by simple clock doubler. Even clock quadrupling does not
appear as a good idea to my engineering intuition.
I have no difficulty understanding what he is saying.
Several people have difficulty understanding what you
You are proposing vague ideas, so the onus is on you
to make your ideas clear.
No, we really don't have to learn more about one specific
processor - especially if it is just to help you.
If, OTOH, you succinctly summarise its key points and
how that achieves benefits, then we might be interested.
You need to explain your points better.
There's the old adage that "you only realise how little
you know about a subject when you try to teach it to
Give us the elevator pitch, so we can estimate whether
it would be a beneficial use of our remaining life.
Why? Because you are trying to propagate your ideas.
The onus is on you to convince us, not the other way
No, it is not.
The starting points are fine, but so what?
There's little point building something if it
isn't useful in practice.
For examples of that, see Intel's 432 and 860
processors, and there are other examples.
Your approach is 'I have this low-level thing (a tiny CPU), what can I use
it for?'. That's bottom up. A top down view would be 'my problem is X,
what's the best way to solve it?'. The advantage of the latter view is you
can explore some of the architectural space before targeting a solution
that's appropriate to the problem (with metrics to measure it), aiming to
find the global maximum. In a bottom-up approach you need to sell to users
that your idea will help their problem, but until you build a system they
don't know that it will even be a local maximum.
You can still reason about blocks as combinations of basic functions. A
block that is LUT+FF can still be analysed in separate parts.
A processor is a 'black box' as far as the tools go. That means any
software is opaque to analysis of correctness. The tools therefore can't
know that the circuit they produced matches the input HDL.
Simulation does not give you equivalence checking of the form of LVS (layout
versus schematic) or compiler correctness testing, it only tests a
particular set of (usually hand-defined) test cases. There's much less
coverage than equivalence checking tools.
That's roughly what OpenCL and friends can do. But those are top-down
architecturally (starting with a chip block diagram), rather than starting
with tiny building blocks as you're suggesting.
Verification is greater than simulation, as described above.
If you have an existing codebase (supplied by the vendor of your external
chip, for example), it'll likely be in C. It won't be in
special-stack-assembler, and your architecture seems to be designed to not
be amenable to compilers.
'Wastefulness' is one parameter. But you can also consider that every
unused hard-core is also wasteful in terms of silicon area. Can you show
that the hard-cores would be used enough of the time to outweigh the space
they waste on other people's designs?
OK, so once we drop any idea of MIPS, we're talking about something simpler
than a Cortex M0. You should be able to make a design that clocks at a few
hundred MHz on an FPGA process. You could choose to run it synchronously
with your FPGA logic, or on an internal clock and synchronise inputs and
outputs. You probably wouldn't tile these, but you could deploy them as a
'hardware thread' in places you need a complicated state machine.
If this is a module that the tools have no visibility over, ie just a blob
with inputs and outputs, then they can implement that. In that instance
there is a manageability problem - beyond a handful of processes, writing
heterogeneous distributed software is hard. Unless each processor is doing
a very small, well-defined, task, I think the chances of bugs are high.
If instead you want interaction with the toolchain in terms of
generating/checking the software running on such cores, that's also
I hadn't seen Picoblaze before, but that seems a strong fit with what you're
suggesting. So a question: why isn't it more successful? And why isn't
Xilinx putting hard Picoblazes into their FPGAs, which they could do
tomorrow if they felt the need?
The OP's attitude and responses have puzzled me. However, they
make more sense if that is indeed his design strategy - and I
suspect it is, based on comments he has made in other parts
of this thread.
That attitude surprises me, since all my /designs/ have been
based on "what do I need to achieve" plus "what can individual
technologies achieve" plus "which combination of technologies
is best at achieving my objectives". I.e top down with a
knowledge of the bottom pieces.
Of course I /implement/ my designs in a more bottom up way.
(I agree with the rest of your statements)
On Thursday, March 21, 2019 at 5:22:09 AM UTC-4, firstname.lastname@example.org wrote
er. I'm talking about the software being able to synchronize with the cloc
k of the other hardware. When that happens there are tight timing constrai
nts in the same sense of the software sampling an ADC on a periodic basis a
nd having to process the resulting data before the next sample is ready. T
he only difference is something like the F18A running at a few GHz can do a
lot in a 10 ns clock cycle.
from power perspective alone, but even ignoring the power, such distributi
on takes significant area making the whole proposition unattractive. As I u
nderstand it, the whole point is that this thingies take little area, so th
ey are not harmful even for those buyers of device that don't utilize them
at all or utilize very little.
There is no multi-GHz clock distribution. These CPUs can be self timed. T
he F18A is. Think of asynchronous logic. It's not literally asynchronous,
but similar with internal delays setting the speed so all the internal log
ic works correctly. The only clock would be whatever clock the rest of the
logic is using.
Think of these CPUs running from the clock generated by a ring oscillator i
n each CPU. There would be a minimum CPU speed over PVT (Process, Voltage,
Temperature). That's all you need to make this work.
s, but I am afraid that PLLs would be several times bigger than cores thems
elves and need good non-noisy power supplies and grounds that are probably
hard to get in the middle of the chip etc... I really know too little about
PLLs, but I think that I know enough to conclude that it's not much better
idea than chip-wide clock distribution at multi-GHZ.
That's the advantage of synchronizing at the interface rather than trying t
o run at lock step. CPUs free run at some fast speed. They sit waiting fo
r data on a clock transition not clocking, using very little power. On rec
eiving the same clock edge the rest of the chip is using the CPU starts run
ning, data previously generated is output (like a FF), data on the inputs i
s read, processed and the result is held while the CPU pends on the next cl
ock edge again going into a sleep state.
You can read how the F18A does it at an atomic level in the clock managemen
t. The wake up is *very* fast.
they should run either with the same clock as surrounding FPGA fabric or w
ith clock, delivered by simple clock doubler. Even clock quadrupling does n
ot appear as a good idea to my engineering intuition.
This would make the CPU ridiculously slow and not a good trade off for fabr
CPUs can be size efficient when they do a lot of sequential calculations.
This essentially takes advantage of the enormous multiplexer in the memory
to allow it to replace a larger amount of logic. But if the needs are fast
er than a slow processor can handle the processor needs to run at a much hi
gher clock speed. This allows an even higher space efficiency since now th
e logic in the CPU is executing more instructions in a single clock.
So let a small CPU run a very high rates and synchronize at the system cloc
k rate by handshaking just like a LUT/FF logic block without worrying about
the fact that it is running a lot of instructions. It just needs to run e
nough to get the job done. The timing is like the logic in a data path bet
ween FFs. I has to run fast enough to reach the next FF before the next cl
ock edge. It won't matter if it is faster. So the CPU only needs a minimu
m spec on the internal clock speed.
I think if you go back and read, I said it all before. But because there
is a lot of new thinking involved, it was very hard to get you to understan
d what was being said rather than continue to look at it the way you have b
een looking at it for the last few decades.
There is no onus. This is not a business proposal. If you want to discuss it, do so. If not, don't.
If you can't tell me what your concerns are, I can't address them. If no one can tell me what problems are being talked about by "Trying to think of them as "logic" /would/ create problems." I can't possibly address those concerns.
I don't see a question. Are you trying to teach me how to post in newsgroups? lol
Ask a question if you have one. Explain something I've said that is wrong. But if you don't have anything better to say, I can't help you.
Which points? I'm starting to think you are not here for the hunting.
If you don't have any idea what I'm talking about at this point, an elevator pitch won't help.
No, I'm trying to discuss an idea. If you don't wish to discuss the idea, then that's fine.
I'm not designing anything so I can't be designing bottom up. I'm not sell
ing anything, so I don't have users.
I'm discussing an idea. I'm kicking a can. I'm running a flag up the flag
If you aren't interested in discussing this, then that's ok. But there's n
o point at all in having a meta-discussion.
"Correctness" in what sense? I've never worked with tools that could analy
ze my HDL to tell me if it was logically correct. I really have no idea wh
at you are talking about here. I also don't see the point of your pointing
out the LUT can be separate from the FF in a LUT/FF combination. You can
model the CPU as a large LUT with FFs. It can do the same job. The FF can
be removed. The logic can be removed. Whatever analysis that can be done
on the LUT/FF can be applied to the CPU.
If you want to verify the "correctness" of parts of a design my inspection,
I would expect that to be done on the HDL anyway, not on the generated log
ic... unless you thought the tools were suspect.
So those techniques can't be applied to software?
You can write any compiler you want. I don't know what libraries you would
be using to replace FPGA logic with software. Are we talking about print
How do you port C libraries to logic in an FPGA now? Do it the same way.
That assumes some number of CPUs on the FPGA. We don't have those numbers.
We also don't have any real data on how large a logic block is in an FPGA
, at least I don't. y
I think you are making silly points when we are discussing a concept. Of c
ourse we won't have the sort of data you are talking about.
I don't think a few hundred MIPS is fast enough to actually be useful. GIP
S is required.
A state machine is one application. But I don't see them being limited in
any way in replacing logic other than logic that is too small for this to b
Xilinx makes a big deal of their shift registers from a LUT. I've seen des
igns where many stages of shift register were needed. This CPU could repla
ce a large number of those running at some hundreds of MHz data clock rate.
Why no visibility?
You need to explain to me what is hard about *this*. Giving it a label and
then saying anything with that label is hard doesn't mean much. I don't t
hink the label fits.
I don't follow. In the design it's logic. You keep trying to think of it
the way you think of all software. It's logic. Inputs and outputs. You o
nly need to dig into the code after you find there is something wrong with
the mapping of inputs to outputs like any other logic module. Presumably t
he code would have been simulated with appropriate inputs and outputs.
More successful than what? The Volkswagen Beetle?
I can't explain much of what Xilinx does except they respond to their large
st customers who pay thousands of dollars for a single FPGA chip. They say
what goes into Xilinx FPGAs and the rest of us are tag-alongs. Literally.