Regexes and C - Page 4

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Re: Regexes and C
On 19/03/2020 13:18, Martin Gregorie wrote:
Quoted text here. Click to load it
Not done regex in C recently but for me the trick has always been  
getting the regex expression right. So I use an online tool to test  
build regex expressions and see how they work.

I can't remember which one I used last, but something like this.

<https://regex101.com/

Much quicker than testing regex in your own code.


Re: Regexes and C
On Thu, 19 Mar 2020 18:51:34 +0000, Pancho wrote:

Quoted text here. Click to load it
Yes, I agree.  

I've used that one for PCRE regexes, but its often just as easy to
test them using grep with the -P option set. I use PCRE a lot more thn  
other flavours because I have SpamAssassin installed and maintain a  
private rule set.

For Java regex testing I've also used this:  

http://www.regexplanet.com/advanced/java/index.html

However, is there a similar test harness for regcomp(), regexec() and  
friends? Ans a readable online document for that regex flavour?


--  
Martin    | martin at
Gregorie  | gregorie dot org


Re: Regexes and C
On a sunny day (Thu, 19 Mar 2020 13:18:58 -0000 (UTC)) it happened Martin

Quoted text here. Click to load it

My reference for libc related functions is libc.info, reg.. is explained there.
I have libc.info as text file, you can also download it'from my site as one big text file:
 wget http://panteltje.com/pub/libc.info
use editor search for any function, i thas sometimes examples too.
I could not have written my programs without i!

That said, I have never used reg.. and it seems, for detecting illegal or allowed chars a bit overkill?

I usually set up a loop for that, for all chars in 'string', maybe like this:

#include <stdlib.h>
#include <stdio.h>  


int main(int argc, char **argv)
{

/* [^.a-zA-Z0-9@_-] */


if(argc != 2)
        {
        fprintf(stderr, "Usage: ./test62 some_text\n");
        fprintf(stderr, "Dummy, you should enter some text!\n");

        exit(1);
        }

char *ptr;    
int ok, c;
ptr = argv[1];
ok = 0;  
while(1)
        {        
        c = *ptr;

        if(c == 0) break;
        else if(c == '^')   ok = 1;
        else if(c == '.')   ok = 1;
        else if(isalnum(c)) ok = 1;
        else if(c == '@')   ok = 1;  
        else if(c == '_')   ok = 1;
        else if(c == '-')   ok = 1;
        else ok = 0;
        
        if(! ok) break;      
        ptr++;
        }

if(! ok)
        {  
        fprintf(stderr, "Dummy, you should not use char %c in this field!\n", c);
        }
else
        {
        fprintf(stderr, "Very good!\n");
        }

exit(0);
} /* end function main */


:-)


  

Re: Regexes and C
On Fri, 20 Mar 2020 08:23:58 +0000, Jan Panteltje wrote:

Quoted text here. Click to load it
Thanks for that: I haven't seen it before, but it looks very useful.  
Saved.


--  
Martin    | martin at
Gregorie  | gregorie dot org


Re: Regexes and C
Quoted text here. Click to load it

For all of you trying to do this in regexes, I present some test cases:

??@??.??               (Chinese, Unicode)
???@????.????               (Hindi, Unicode)
????????@?????.???          (Ukrainian, Unicode)
????@???????.???            (Greek, Unicode)

????@??????.??              (Russian, Unicode)

courtesy of:
https://en.wikipedia.org/wiki/International_email

I hope your code handles them appropriately :)

Theo

Re: Regexes and C
On Fri, 20 Mar 2020 11:44:30 +0000, Theo wrote:

Quoted text here. Click to load it

Quoted text here. Click to load it

This appears to describe just the display-name. From RFC 5322 it seems  
that the address can only use ASCII characters, and at this point I'm  
only dealing with the address.


--  
Martin    | martin at
Gregorie  | gregorie dot org


Re: Regexes and C
Martin Gregorie wrote:

Quoted text here. Click to load it

Please note

Obsoleted by: 5322                                     PROPOSED STANDARD
Updated by: 5335, 5336                                      Errata Exist

Network Working Group                                 P. Resnick, Editor
Request for Comments: 2822                         QUALCOMM Incorporated
Obsoletes: 822                                                April 2001
Category: Standards Track

https://tools.ietf.org/html/rfc5336
https://tools.ietf.org/html/rfc5335

4.1.  UTF-8 Syntax and Normalization

   UTF-8 characters can be defined in terms of octets using the
   following ABNF [RFC5234], taken from [RFC3629]:

   UTF8-xtra-char  =   UTF8-2 / UTF8-3 / UTF8-4

   UTF8-2          =   %xC2-DF UTF8-tail

   UTF8-3          =   %xE0 %xA0-BF UTF8-tail /
                       %xE1-EC 2(UTF8-tail) /
                       %xED %x80-9F UTF8-tail /
                       %xEE-EF 2(UTF8-tail)

   UTF8-4          =   %xF0 %x90-BF 2( UTF8-tail ) /
                       %xF1-F3 3( UTF8-tail ) /
                       %xF4 %x80-8F 2( UTF8-tail )

   UTF8-tail       =   %x80-BF

   These are normatively defined in [RFC3629], but kept in this document
   for reasons of convenience.

   See [RFC5198] for a discussion of normalization; the use of
   normalization form NFC is RECOMMENDED.


I was wrong about 1990 - you are at state of 2001 :)

Sorry for making fun of it, but it is indeed funny. I knew I had to refresh
my memories, cause didn't have to do with mailing since 4 years when I
changed company.

regards


Re: Regexes and C
Theo wrote:

Quoted text here. Click to load it

"your code"+"in C"

:D

Re: Regexes and C
On Thu, 19 Mar 2020 13:18:58 -0000 (UTC),

Quoted text here. Click to load it

You're already in a state of regex sin. There are far too many
exceptions to the rules with respect to an email address. The "+" is a
sendmail construct, and has been replicated in postfix and possibly
(likely?) present in other MTAs.  

The domain portion is much smaller match space, but the username
portion permits all characters ***if they're properly escaped***.

This is a thorny problem, and has been with us ever since someone put
a webform asking for an email address on the web, and thought sanity
checking the address was a good idea. In theory, a great idea, but in
practice it will drive you to drink.

--  
Consulting Minister for Consultants, DNRC
I can please only one person per day. Today is not your day. Tomorrow
We've slightly trimmed the long signature. Click to see the full one.
Re: Regexes and C
In comp.sys.raspberry-pi,
Quoted text here. Click to load it

100% agree.

Quoted text here. Click to load it

There's a little operator called "gmail" that supports it. I use it for
testing things that need a unique email address, eg signups to a web
site, by sticking a timestamp in: username+ snipped-for-privacy@gmail.com.
The format is then easy for me to select on the backend and delete
things later.

I think the original use the + was the Andrew system at CMU back in late
1980s early 1990s. It was certainly quickly implemented in sendmail's
famously obfuscated cf language, but I don't know that CMU was using
sendmail to do that.

http://www.faqs.org/faqs/mail/addressing/

(But note: Last-modified: (2 Jun 98 14:32:39) )

Elijah
------
gmail wasn't around in 1998

Re: Regexes and C
On Sat, 21 Mar 2020 13:06:56 +0000, I R A Darth Aggie wrote:

Quoted text here. Click to load it
OK, added it to my regex - can't do any harm with the way I'm using the  
regex.


--  
Martin    | martin at
Gregorie  | gregorie dot org


Re: Regexes and C
Quoted text here. Click to load it

It may have originated in sendmail, but it's firmly enshrined in the
standards - which originated way before MS, Google and even AOL started
to bastardise the standard and create their own 'standard'. Start with
RFC822

https://tools.ietf.org/html/rfc822

published in 1982 and work forwards to it's replacements/updates.

(which isn't easy reading, but you need to note that it specifies
characters that can't be used rather than ones that can, so +, {}, ~
and whatever else you want are valid characters in an email address -
see section 3.3 and look for 'atom')

Quoted text here. Click to load it

Or drive the poor user (ie us) to throw their drink including bottle down
the throats of the people who didn't even know standards existed let alone
use them

-Gordon

Site Timeline