Generating a desired synthesizable binary pulse train on FPGA using VHDL

Hello Everyone I am new to VHDL programming and FPGA. I have a Virtex - 4 FPGA and I wish to generate a binary pulse train of 16 pulses from FPGA using VHDL programming. My desired pulse train will be like "1011100111101110". (min pulse width should be 30ns).

I have a clock of 100 MHz and I am able to divide the clock frequency to get the clock of 10MHz (clock frequency required for my application). Also I am aware of the fact that "Wait for" statement can not be used for synthesizing as it can only be used for test bench and simulation purposes.

So I am struggling with this problem. I am wondering if I can use "after Xns" command in my VHDL code or if there is any other way to do it.

I will be very thankful if any feedback or advice is provided. Your response will truly be appreciated. Kindly provide your valuable suggestions.

Thanking you Regards Chaitanya Mauskar

--------------------------------------- Posted through

formatting link

Reply to
chaitanya163
Loading thread data ...

In order to use an HDL (Hardware Description Language) you need to understand hardware enough that you can then use the HDL to describe it. Think about how you would do this in hardware if you were drawing a schematic. Then you can figure out how to describe that circuit in hardware.

So how would you design a circuit using gates and FFs to do this task?

--

Rick
Reply to
rickman

Hi,

I can give you a quick-and-dirty skeleton in Verilog, just to get you started.

module myPulse(input wire clk, input wire rst, output wire sig);

always @(posedge clk) begin

end

endmodule

16

to

Also

purposes.

"after

--------------------------------------- Posted through

formatting link

Reply to
mnentwig

Hi,

(browser got trigger-happy, ignore my unfinished previous post if there is any)

here's a quick-and-dirty skeleton in Verilog. There are many ways how to approach this, for example use a state machine if it needs to be more complex.

This one will load "1" to the output as long as rst is asserted. When rst goes low, the output will play back the sequence and continue with 0.

module myPulse(input wire clk, input wire rst, output wire pulseOut);

reg [15:0] myReg = 0; assign pulseOut = myReg[15];

always @(posedge clk) begin if (rst) begin myReg

Reply to
mnentwig

(snip)

(snip)

I used to say that you should think about how you would built it using TTL gates, but maybe not everyone knows about TTL by now.

You want to think about AND gates and flip-flops. For problems like yours, the most important part could be a shift register, which is a series of flip-flops.

-- glen

Reply to
glen herrmannsfeldt

You will not be able to get 30 ns pulses with a 10 MHz clock. At least one part of your logic will have to run at a higher clock rate. If you run the logic at 100 MHz, then you will have 10 ns resolution on the bit timing. You could have a 4-bit counter running at 100 MHz/3 = 33 MHz or 30 ns period, and the desired bit pattern entered into a 16:1 multiplexer. The counter selects the inputs in the proper sequence to the multiplexer. Of course, the synthesis tools will do a massive optimization of your description and probably reduce it to about 5 LUTs or so.

Jon

Reply to
Jon Elson

Reply to
Jon Elson

I think 30ns is the minimum pulse size - it may be larger. If so, 10MHz wil l meet his specifications.

Reply to
Aylons Hazzud

Reply to
rickman

Indeed. :)

Reply to
Aleksandar Kuktin

Hi,

would this still apply if my design uses proportionally more LUTs than registers?

For example, here is a synthesis report for a minimal "medium" ZPU processor on Spartan 6 LX9 (that is most enthusiastically blinking its LED as I write this):

Slice Logic Utilization: Number of Slice Registers: 284 out of 11,440 2% Number used as Flip Flops: 284 .. Number of Slice LUTs: 934 out of 5,720 16% Number used as logic: 915 out of 5,720 15% Number used as Memory: 9 out of 1,440 1%

This is not to argue the point, I just want to understand the possible trade-offs. For example, I wonder if it would make sense to replace small counters with one-hot shift registers in such a situation?

--------------------------------------- Posted through

formatting link

Reply to
mnentwig

First, my comment was about going the other direction, from a long shift register to an encoded counter and memory. You are asking if it makes sense to go from a state encoded counter to a one-hot register. I don't see how that can save resources of any type. The one-hot register will need at minimum one LUT per FF.

A counter is a very efficient use of the FPGA resources, however that is not a useful FSM. To be useful there needs to be inputs which add logic to the counter. In the simplest case this input is just a hold input which comes free other than the logic to generate the hold signal. In a more general case the counter will need to jump around rather than just progressing through the states linearly. In this case the FSM is not just a counter anymore and the LUT count increases.

So to answer your question, "it depends". lol But in general I would not expect a one-hot implementation to use any fewer LUTs at the expense of more FFs, but it is possible.

I've been watching the ZPU over the years and I would like to know what your LUT count includes. Does that include I/O such as a UART? Any idea how much is just for the CPU? Early on the ZPU people claimed a

*very* low LUT count of around 500 or less, IIRC. I believe the Spartan 6 has 6 input LUTs, so your LUT count is hard to compare to the LUT counts using 4 input LUTs. Still, 900 is a fair amount more than 500. I assume you have optimized for performance at the expense of size?
--

Rick
Reply to
rickman

Hi,

isn't a one-hot counter just a simple ring shift register? I can build it from FFs without any further logic.

A simple experiment:

reg [1023:0] test = 1024'd1; always @(posedge clk) begin test

Reply to
mnentwig

That's only if it is a simple counter with no other transitions or controls other than an enable. Usually they need some sort of sync reset which may or may not be supported by the FF primitive without a LUT.

Yes, this is the one that I thought was impressive in terms of the tiny size, but as you note, at a price of extreme lack of speed. I believe the slowness comes from the architecture rather than the clock being a lot slower. That is, the clock is still a reasonable speed, but it needs a lot more of them to get the work done because of having fewer data paths.

LUT sharing? Is that where the logic is broken into pieces which can be shared between different paths when there is some overlap? I've never bothered with that as I think the savings are typically pretty small.

I'm not familiar with the MICO32... do you mean the one from Lattice, maybe named MICRO32? I don't recall for sure. Just about any standard RISC CPU will be a lot bigger than the ZPU. OpenCores has one they call OpenRISC which has been around a while. I think it is fairly large though. ZPU was designed specifically to be as small as possible for code that needs very little speed. Then they decided to develop a few faster variants which are totally binary compatible. I think they achieved their objective and I have heard of it being used in some business apps.

The other day I did see another soft core that is supported by a C compiler, at least a beta version. I don't recall the name, but I expect I could come up with it if you are interested. Everything else I have seen are stack processors intended to run a Forth like language. That can make for a very simple machine... like the ZPU. :)

--

Rick
Reply to
rickman

HJi,

LUT.

thanks. Maybe I'll just leave it to the synthesis tool...

Yes, the achievable clock speed is even marginally higher for the small one (~110 MHz vs 100 MHz, possibly faster if I'd tweak the settings). It doesn't have registers, so every operand goes to the stack, if I remember correctly. The "medium" variant has a hardware cache for the last two levels.

There is an option to duplicate registers to reduce routing delay. But what I meant is to put several independent logic functions into the same LUT, i.e. four-input plus two-input to make it smaller. I haven't really read the manual too carefully here. The one optimization option that I found important is pipeline register balancing.

This is the MICO32 I meant:

formatting link

I just got feedback in another forum that the openRisc processor was too limited in terms of clock speed. There is also an ARM clone (amber), but it seems quite big, 90 % of an LX9 (compared to 20 and 10 % for the ZPUs)

I'll have a look at the YARD processor, thanks. Never heard about it before.

For example, Ettus uses ZPUs in their SDR products, so I think I'm on the right track with the ZPU. It doesn't have to be perfect, still beats the alternative of running a separate MBED or raspberry board with a SPI link to the FPGA.

Cheers

Markus

--------------------------------------- Posted through

formatting link

Reply to
mnentwig

Bugs in documentation have long been an "interesting" problem in the software business area.

Many years ago, I read a neat story, probably on usenet. Our hero was on a US destroyer in the south pacific. He was in charge of the 5 inch guns. They worked, but weren't quite as accurate as they should have been. They even had a factory rep flown out. He didn't fix anything. When their time was up and they were headed back home, one if his guys said, roughly, "Everything is clean and polished, how about I take a look at the gun controller?" The guy was good at that sort of stuff, so the answer was "go for it". This was analog computer days. Picture gears all over the place, like a kid taking apart a clock. As things were put back together, the guy turned one gear over. That fixed it. The picture in the book showed it in the wrong way.

--
These are my opinions.  I hate spam.
Reply to
Hal Murray

Argh/blush. Wrong newsgroup (as if you couldn't guess). Fatfinger on my part. Sorry for the clutter.

--
These are my opinions.  I hate spam.
Reply to
Hal Murray

Ok, you are talking about something that comes with the 6 input LUTs. For many years the standard size for LUTs was 4 inputs. Xilinx used some extra logic in the CLB to allow multiple 4LUTs to be joined via another mux to create the equivalent of a 5 input LUT. So you could say they had 5LUTs for some time now which had the option of being split into a pair of 4LUTs. semantics...

The issue is routing. The pair of 4LUTs require 8 inputs while the single 5LUT only requires 5 inputs obviously. Extrapolating this to the

6LUT in the device you are using, they have provided 6 separate inputs to the LUT. They have actually done this not to give you a larger LUT (they can always be combined easily) but to reduce the required routing. So now if you want to split the 6LUT into a pair of 5LUTs (possible giving the size of the LUT itself), there aren't enough inputs. So instead it seems they give you a 4LUT and a 2LUT. Better than nothing. :)

I believe some of the Lattice devices do something like this but with larger LUTs as long as you can share the inputs to the two LUTs. Or I may be thinking of how the add/carry thing works in their devices and I may be thinking of an older Altera chip, lol.

There is also a software function in most packages which can figure out that a given logic component is used by more than one function. It can then change the net list to allow one LUT to drive both logic functions. I believe they even will regroup the logic to facilitate this. The down side is that it makes it harder for the placer to do it's job and get a placement that makes fast routing possible.

Geeze, all this time I was reading that as "Micro". lol I know this core is "free" as in beer, but I don't know how free it is to modify and distribute.

I remember some years back a guy cloned the ARM7... until he got a call from someone at ARM. Seems there was a patent on a particular feature in the interrupt controller (if I remember correctly) that is very hard to work around. My understanding is that they explained the patent to him and then offered him a job... the code disappeared from the OpenCores web site.

BTW, never use clock speed alone as a measure of performance. I can't say if the openrisc processor is fast or not. I find it funny that you would consider using the ZPU if you are looking for speed. I believe the ZPU is the slowest processor I have ever seen.

I'm rather surprised they are using a ZPU, but I expect it is for controlling the overall functionality, a bit like a front panel controller that would have been an 8051 some years ago.

--

Rick
Reply to
rickman

I'd write the shift register similar to the following, assuming you have your frequency divider working correctly. Some tools implement this with a 16:1 mux, which is fine.

entity shifter is port(clk,en,reset:in std_ulogic; q:out std_ulogic); end entity shifter;

architecture rtl of shifter is constant s:std_ulogic_vector:=x"1011100111101110"; signal i:unsigned(3 downto 0); begin q

Reply to
Daniel Kho

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.