FPGA implementation of a lexer and parser - feasible?

Greetings,

I am new to hardware design and hoping to get a reality check on building a lexical analyzer and parser using FPGAs. I can see the following options.

  1. A hardwired implementation of some or all of lexer&parser to maximize the performance.

  1. Build a CPU with instructions optimized for interpreting parse tables and drive it with the output of a parser generator similar to YACC.

  2. Have a RISC-like CPU implemented on the FPGA executing a parser.

I see no advantage to option 3 over doing the same thing on a general purpose CPU. I don't know if the second option has any potential for increased performance. As far as the first option, I am wondering if the FPGAs currently available have the capacity to implement the hardwired parser for a language of low to moderate complexity and how hard it might be model it using a language like VHDL.

Please post any thoughts/comments/pointers about the various options.

Thanks

Reply to
Ekalavya Nishada
Loading thread data ...

Well, it's an ambitious project, anyway....

So are you trying to have a hardcoded interpreter? From what I think you're saying, you want to build this parser and then (I'm guessing) you want to either produce some bytecode for a VM (or in this case a real machine (RM) implemented in the FPGA) or build some kind of AST and walk it in your FPGA (?). Otherwise I can't see how it makes sense to just have a parser in an FPGA.

Fist off, what's the all-fire hurry? Parsing a simple language is pretty quick as is in software. It would be _lot_ easier to implement your CPU in the FPGA and then use software to parse and compile the frontend language into machine code that runs on your CPU.

Seconly, assuming that I'm guessing wrong and you just want a parser for a programming language implemented in an FPGA: I think this could be pretty difficult, but perhaps not impossible especially if you have some large amount of memory available either inside of or external to your FPGA. You'd have a bytestream coming in which represents characters of your tokens, a tokenizer (a state machine) and then another big state machine that implements your parser. But again, after you've parsed this language, what do you intend to do with it?

Phil

Reply to
Phil Tomson

(snip of options)

It would be nice to know the reason for the question. As far as I know, parsing of existing languages isn't limiting compilation times. Most languages were designed when machines were much slower than they are today, and if it was a problem then, they might have designed the language to help speed up parsing. I know many cases where that wasn't even true.

As a large part of parsing is finite state machines, and they are pretty easy to build in FPGA's, though with external RAM if they get really huge, that part seems pretty easy. Now, why would you want to do that? The state tables could be generated externally, on a more ordinary machine. Storing a symbol table would be a little more challenging, but I don't think even that should be too hard. Hash algorithms should be easy to implement, for example, or trees if that turns out better.

One possibility that I could think of would be in pattern matching. If you consider a program like grep a parser, which signals the point at which it recognizes a correct match, then one could use it for high speed pattern matching.

-- glen

Reply to
Glen Herrmannsfeldt

Jan Gray, begetter of the excellent but more or less defunct fpgacpu.org site, has discussed this. Search in Google.

Reply to
Tim

I apologize for not providing the full context. What I am thinking of is a parser for XML data. I am aware of the possibility of using regular expressions but the power of regular expressions is not sufficient for the processing I am thinking of. What I envision is a lot of XML data being quickly parsed and some action taken.

I agree with you here. It does not make sense to optimize parsing when the time spent there is a small portion of the total time as in compilers or interpreters. But, if the tokenizing and parsing time dominate, then it makes a lot of sense to make it as fast as possible. I expect most of the data processing time to be spent in parsing and comparitively little in processing the parser output. (No hard data on that; that is my hypothesis still)

I might also add that I am drawn to FPGAs due to the possibility of doing things in parallel. Processing of UTF encoded Unicode and tokenizng should be an order of magnitude faster than doing them on general purpose computers. Then there is always the possibility of doing pattern matches on the parse tree in parallel.

As should be evident by now, it is not a programming language parser. My question was really about the feasibility of doing a hardwired parser. I just wanted to know if I am completely crazy or just a little :-)

Thanks!

Reply to
Ekalavya Nishada

Apologies again as I have not been clear in my questions. As you mention later in your response, I am exploring this as an approach to parsing of XML data. More to the point, my question was to ask the FPGA experts here a) is it feasible to build a hardwired parser for a small language? b) what sort of performance improvements one might see even while interpreting parser tables as compared to doing the same in a regular CPU?

Please see my response to Phil's questions also.

Thanks!

Reply to
Ekalavya Nishada

Cool. But why? :) Don't programmers spend 99% of their time debugging instead of compiling?

Reply to
Will

Where is all your XML data coming from? How are you going to get it to the FPGA?

I'm not a parsing wizard. I'm pretty sure you could do much of the work in an FPGA. But what do you do when you find a name that needs a new entry in a symbol table as compared to an atom that can be turned into a simple code number?

Assume the data is coming off the net. That's slow. You can parse it with the CPU. Assume it's coming off the disk. Why not parse it once and cache the answer?

You can't do things in parallel unless you have parallel paths through the bottleneck. What's the bottleneck? CPU? Memory? With an FPGA, it would probably be getting data to/from the FPGA.

--
The suespammers.org mail server is located in California.  So are all my
other mailboxes.  Please do not send unsolicited bulk e-mail or unsolicited
 Click to see the full signature
Reply to
Hal Murray

Yes this is doable and you get really good results.

formatting link
formatting link
formatting link

Regular CPUs just can't keep up.

Steve

Reply to
Steve Casselman

know,

today,

help

That makes a lot of sense. Somehow "lexer and parser" implies "programming language" to many people, as it is a common use for them.

Many XML parsers are pretty slow, though I don't know that they have to be.

-- glen

Reply to
Glen Herrmannsfeldt

Thanks Tim. Say not defunct -- think of it as on extended hiatus. (Hint: try googling jan gray msdn).

Anyway, Ekalavya, if you visit fpgacpu.org and type 'parsing' into the Search box you'll find a few entries that may be of interest.

Jan Gray, Gray Research LLC FPGA CPU "News":

formatting link

Reply to
Jan Gray

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.