Regexes and C - Page 2

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Re: Regexes and C
On 20/03/2020 16:01, Richard Kettlewell wrote:
Quoted text here. Click to load it
Oh dear. I wonder who added that?


--  
Socialism is the philosophy of failure, the creed of ignorance and the  
gospel of envy.

We've slightly trimmed the long signature. Click to see the full one.
Re: Regexes and C
Quoted text here. Click to load it

Exim. It?s following the spec:
https://tools.ietf.org/html/rfc5321#section-4.4

--  
https://www.greenend.org.uk/rjk/

Re: Regexes and C
On 20/03/2020 10:02, Martin Gregorie wrote:
Quoted text here. Click to load it
No, to the envelope from address.



--  
Climate Change: Socialism wearing a lab coat.

Re: Regexes and C
On Fri, 20 Mar 2020 10:47:38 +0000, The Natural Philosopher wrote:

Quoted text here. Click to load it
Regardless, the point I was trying to make is that the From: header is  
not used by the process of transferring mail from sender to recipient and  
is displayed by the receiving MUA for information only.
  
So it can be, and is, used nefariously by spammers and other lowlife.


--  
Martin    | martin at
Gregorie  | gregorie dot org


Re: Regexes and C
On 20/03/2020 12:43, Martin Gregorie wrote:
Quoted text here. Click to load it
No, the From: address is simply a header until you try and respond to an  
email, when it is, after 'reply to:', the default recipient (long with  
any cc: lists.

So it is not just displayed for information.

For clarity, the SMTP protocol is:
Helo <servername> (can be anything)
Mail From <envelope sender address>
RCPT to <recipient address>
data
which is where the headers are sent, followed by the message parts.

That is, the SMTP exchange has nothing to do with the headers BUT once  
delivered, the headers are what the MUA uses to respond




--  
?when things get difficult you just have to lie?



Re: Regexes and C
On Fri, 20 Mar 2020 13:03:53 +0000, The Natural Philosopher wrote:

Quoted text here. Click to load it
The fact that you can reply to it only occurs because your MUA displayed  
it in such a way that you can reply to it. It can be plain wrong: how  
many times have you seen a sender displayed by your MUA as:


You haven't? Then one of the following applies  
(a) your MUA isn't configured to show the address after the display-name
(b) your MUA can't/won't show the from address, so ditch it and get a
    better one
(c) you haven't been watching out for spam

  
Quoted text here. Click to load it
envelope sender is NOT the From: header and nothing says that there has  
to be a From: header or that irs content has to be the same as the  
envelope-from content

Same goes for the Return-address: and Reply-To: addresses though, because  
they are automatically set from the sender's account name and server  
domain, they are more likely to be correct.    

Quoted text here. Click to load it


--  
Martin    | martin at
Gregorie  | gregorie dot org


Re: Regexes and C
On 20/03/2020 19:49, Martin Gregorie wrote:
Quoted text here. Click to load it
I never said it was.

The mail From sequence in smtp has nothing to do with the From: header
Quoted text here. Click to load it

They are not,. Reply-to: is set by the senders MUA itself

Return-address: is a new feature that is set to the envelope from.
Quoted text here. Click to load it


--  
"Corbyn talks about equality, justice, opportunity, health care, peace,  
community, compassion, investment, security, housing...."
We've slightly trimmed the long signature. Click to see the full one.
Re: Regexes and C
Quoted text here. Click to load it

You said you were worried about cross-site scripting, which is a web
issue.

Quoted text here. Click to load it

It sounds like you?re worried about code injection attacks on these
boundaries.

Sanitization is the wrong strategy. A perfectly valid email address may
nevertheless contain characters that trip up your downstream
applications in some way, if they have not been hardened to process
untrusted data.

The mitigations for this when the shell is involved are:

* Minimize the portion of your application written in shell[1].
* Use shell quoting syntax properly within the shell scripts that you do
  have.

And when invoking commands (in any context):
* Prefer array-format commands e.g. args=[...] in Python, or execvp
  (etc) in C
* Avoid string-format commands (e.g. args="..." in Python or system &
  popen in C).

[1] Ideally to 0. The same applies to C. Both are terrible language
    choices for security.

--  
https://www.greenend.org.uk/rjk/

Re: Regexes and C
Quoted text here. Click to load it

It's extremely hard, but some people have tried:

https://emailregex.com/
https://stackoverflow.com/questions/201323/how-to-validate-an-email-address-using-a-regular-expression

Definitely not a thing to make up yourself, you'll almost certainly get it
wrong.

Theo

Re: Regexes and C
On Thu, 19 Mar 2020 14:15:09 +0000, Theo wrote:

Quoted text here. Click to load it
address-using-a-regular-expression
Quoted text here. Click to load it
Yep, so it seems - and anyway the examples on your second link are very  
unlikely to be accepted by recomp(), so I'll have a play with Java's  
pattern matching classes. IIRC they sidestep the UTF8 problem anyway.

Thanks for that.


--  
Martin    | martin at
Gregorie  | gregorie dot org


Re: Regexes and C
Am 19.03.2020 um 14:29 schrieb A. Dumas:

Quoted text here. Click to load it

We are using a small CRM that checks if there exists a MX record in the  
DNS for the domain part.

So, first check if domain is valid for e-mail, then try to deliver and  
check response ...

Re: Regexes and C
On Thu, 19 Mar 2020 15:47:41 +0100, DeepCore wrote:

Quoted text here. Click to load it

Good idea, but not needed here because I only need to check the From  
address on incoming mail.


--  
Martin    | martin at
Gregorie  | gregorie dot org


Re: Regexes and C
Quoted text here. Click to load it

The existence of an MX record is often a good idea but by no means a
requirement. An A or CNAME record is perfectly OK.

 - Andi

Re: Regexes and C
On 19/03/2020 14:47, DeepCore wrote:
Quoted text here. Click to load it
Doesn't work with gmail and other big sites. They accept the mail then  
bounce it back later.
DAMHIKT


--  
  ?A leader is best When people barely know he exists. Of a good leader,  
who talks little,When his work is done, his aim fulfilled,They will say,  
We've slightly trimmed the long signature. Click to see the full one.
Re: Regexes and C
Quoted text here. Click to load it

Email a random number to the address. Make the punter come back
and type that number in. Then, and only then, do you know the email
is valid (and belongs, in some sense, to the punter wanting your wares...)

(Yes, someone could be MITMing your email connection, but then you have
bigger problems!)

--  
Ian

"Tamahome!!!" - "Miaka!!!"

Re: Regexes and C
On 2020-03-19, Martin Gregorie wrote:
Quoted text here. Click to load it

No; email addresses cannot be syntactically validated by regexp alone.

R

Re: Regexes and C
On Thu, 19 Mar 2020 14:24:42 +0000, Roger Bell_West wrote:

Quoted text here. Click to load it
OK, I'm starting to see that, so it looks like my current strategy of  
inverting a bracket expression containing all the characters that can  
legitimately be in an e-mail address is about as far as I can go.

Doing this in either C or Java should be OK, since I'm only looking to  
stop From: headers being used as attack vectors on a bash script. AFAICR  
Bash only accepts ASCII, so any message whose From: address contains  
anything that isn't ASCII alphanumeric, '@', hyphen, underscore or period  
can be binned.  


--  
Martin    | martin at
Gregorie  | gregorie dot org


Re: Regexes and C
On 2020-03-19, Martin Gregorie wrote:
Quoted text here. Click to load it

You will be dropping valid mail if you do this.

We can start with + in the address, but really, we can play this game
all day.

Re: Regexes and C
Quoted text here. Click to load it

It's not clear to me that the full syntax of email addresses
can be represented in the regular languages.  Undoubtedly
a useful subset _can_, but in their full generality, you
may need a push-down automoton.

Quoted text here. Click to load it

What do you mean "doesn't provide any way to anchor a regex to either
end of a string"?  That's what the `^` and `$` metacharacters in the
regex are for, and they're fully supported by the library.

Quoted text here. Click to load it

Could you clarify what you mean?  '$' will match the empty string at
the end of a line, '^' matches the empty string at the beginning
of a line.  By default, the library ignores newlines entirely; they're
only significant if you use the `REG_NEWLINE` flag to `regcomp()`.

Quoted text here. Click to load it

Perhaps if you could post your code, one might be able to see
an issue?

As far as other libraries, if you can link against C++ code, the
RE2 library is very nice.

Quoted text here. Click to load it

You'd want something that covers the POSIX interfaces.

    - Dan C.


Re: Regexes and C
On Thu, 19 Mar 2020 15:19:36 +0000, Dan Cross wrote:

Quoted text here. Click to load it
Just that:

My original regex was

"[a-zA-Z0-9][.a-zA-Z0-9_-]*@[a-zA-Z0-9][a-zA-Z0-9.]*[a-zA-Z0-9]*"

and matched a string  containing "a snipped-for-privacy@d.e", so I changed it to  

"^[a-zA-Z0-9][.a-zA-Z0-9_-]*@[a-zA-Z0-9][a-zA-Z0-9.]*[a-zA-Z0-9]*$"

and it *still* matched that string. So I reread regex(7) and this time  
noticed:

'^' (matching the null string at the beginning of a line),
'$' (matching the null string at the end of a line)

Which, by its discussion of lines, seems to imply that regcomp/regexec  
thinks strings, i.e. shell parameters are somehow different from strings  
that have been filled by reading lines from a file.

Quoted text here. Click to load it
Exactly so. But they don't match the ends of a string that was passed in  
as a command-line parameter.

Quoted text here. Click to load it
I tried getting int C++ years ago when it first became common (think  
Borland C++) and hated it, found Bjarne Stoustrup's C++ far below the  
standard set by K&R and finally gave it up when I found all too much C++  
code was in face just ANSI C with // comment delimiters.

Java beats the crap out of it, IMO anyway.
  
Quoted text here. Click to load it
Quite possibly, though I'm constantly surprised by how useful and  
relevant it still is. This is about the first time it hasn't come up with  
the goods, though that says at least as much about how stable the C  
standard library's APIs are.

Would you care to recommend a POSIX book thats as good as the SVR4 one  
was in its time?
  

--  
Martin    | martin at
Gregorie  | gregorie dot org


Site Timeline