Regexes and C

- R
- Richard Kettlewell
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Mar 20, 2020 8:21 AM

You said you were worried about cross-site scripting, which is a web issue.

It sounds like you?re worried about code injection attacks on these boundaries.

Sanitization is the wrong strategy. A perfectly valid email address may nevertheless contain characters that trip up your downstream applications in some way, if they have not been hardened to process untrusted data.

The mitigations for this when the shell is involved are:

Minimize the portion of your application written in shell[1].

Use shell quoting syntax properly within the shell scripts that you do have.

And when invoking commands (in any context):

Prefer array-format commands e.g. args=[...] in Python, or execvp (etc) in C

Avoid string-format commands (e.g. args="..." in Python or system & popen in C).

[1] Ideally to 0. The same applies to C. Both are terrible language choices for security.

--
https://www.greenend.org.uk/rjk/

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Mar 20, 2020 8:23 AM

On a sunny day (Thu, 19 Mar 2020 13:18:58 -0000 (UTC)) it happened Martin Gregorie wrote in :

My reference for libc related functions is libc.info, reg.. is explained there. I have libc.info as text file, you can also download it'from my site as one big text file: wget

formatting link

use editor search for any function, i thas sometimes examples too. I could not have written my programs without i!

That said, I have never used reg.. and it seems, for detecting illegal or allowed chars a bit overkill?

I usually set up a loop for that, for all chars in 'string', maybe like this:

#include #include

int main(int argc, char **argv) {

/* [^.a-zA-Z0-9@_-] */

if(argc != 2) { fprintf(stderr, "Usage: ./test62 some_text\n"); fprintf(stderr, "Dummy, you should enter some text!\n");

exit(1); }

char *ptr; int ok, c; ptr = argv[1]; ok = 0; while(1) { c = *ptr;

if(c == 0) break; else if(c == '^') ok = 1; else if(c == '.') ok = 1; else if(isalnum(c)) ok = 1; else if(c == '@') ok = 1; else if(c == '_') ok = 1; else if(c == '-') ok = 1; else ok = 0; if(! ok) break; ptr++; }

if(! ok) { fprintf(stderr, "Dummy, you should not use char %c in this field!\n", c); } else { fprintf(stderr, "Very good!\n"); }

exit(0); } /* end function main */

:-)

- R
- Richard Kettlewell
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Mar 20, 2020 8:27 AM

Yes, they are designed to be parsed and parsers for them exist (for instance in most email software). The specifications have always contained grammars for them. The language specified in RFC822 isn?t a regular language, but that just means you need something a little more sophisticated than a regular expression to parse it.

--
https://www.greenend.org.uk/rjk/

- M
- Martin Gregorie
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Mar 20, 2020 10:02 AM

RCE.html

Exactly so. Its not common, but it can also be used to inject a poison pill into the recipient's system.

Its well-known that the From: header is not used at all to transfer mail from sender to receiver - returned bounces are sent to the Reply-To address. The only defined use of From: is to be displayed by the receiving mail reader (MUA). Any other use is entirely up to the recipient and their system.

A common use for the From: header is in mail archives, which typically index emails by sender, recipient, subject and date, but the wise archivist knows that the From: header can be, and frequently is, a pack of lies.

Take a careful look at the next piece of spam you receive that's apparently from a friend. Many MUAs default to showing just the from text rather than both text and internet mail address. If yours is one of those, reconfigure it to show both. This gives you the ability recognise spam without opening it.

Then use your MUA to look at all the headers and you'll see that spammers are often both lazy and stupid: they often change the sender text to spoof the victim but both From: and Reply-To: both contain their real address - unless, that is, that the message was sent from a compromised system, in which case a common pattern is: From text is your friend's name, From address is the spammer's address and Reply-to is the address of the compromised system.

--
Martin    | martin at 
Gregorie  | gregorie dot org

- M
- Martin Gregorie
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Mar 20, 2020 10:06 AM

Do you ever LOOK at anything you read? If so you would have realised that was posted late at night when I was tired.

In future try THINKING before getting critical.

--
Martin    | martin at 
Gregorie  | gregorie dot org

- R
- Richard Kettlewell
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Mar 20, 2020 10:19 AM

No, they aren?t. Bounces are sent to the transport-level sender address (often called the ?return path?).

--
https://www.greenend.org.uk/rjk/

- M
- Martin Gregorie
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Mar 20, 2020 10:42 AM

... and shouldn't have been one.

I'm not surprised. MySQL was known as being a limited system which lacked any form of query optimisation. It and MS Access were both known to be very limited, especially when the data volume gets large.

The original big three were Informix, Ingres and Oracle, with IBM joining in later, initially having led the field with System/R, developed by Ted Codd and Chris Date. Incidently, both have written extremely good books about the care and feeding of RDBMS systems.

Oracle has always been expensive and seems to need a lot of routine attention, or so I found when I briefly looked after a site.

I know very little about Informix, never having used it.

Ingres was always pretty good. Quick, easy to manage and with a decent query optimiser. There was a special University license which was cloned and became PostgreSQL, which is excellent, free and is currently maintained and developed. It has a good query optimiser and can be ignored for weeks or months on end - it just quietly gets on with automated housekeeping, etc.

Ingres also sold a developers license for version 10 to Microsoft - this is where Microsoft SQL Server came from.

Try PostgreSQL next time. You'll be pleasantly surprised.

Indeed. It was, after all, only MySQL.

I've done much the same in Java rather than using the Derby RDBMS, but that was only because I wanted a small and fairly simple in-memory database behind the covers of a club rostering system I wrote for my gliding club. The translation from RDBMS terms to Java looks like this:

Row -> Class with getters, setters and some table-level methods in it

Table -> ArrayList

Index -> TreeMap

and, before you ask, yes I did normalise the data first and then draw an ERD before cutting any code. It also implements a number of rules about minimum gaps between duties, not rosterinf members of a glider syndicate on the same day, etc, etc. Performance is good, with no delays noticeable during normal duty allocation/deallocation/moves or in switching between rosters.

--
Martin    | martin at 
Gregorie  | gregorie dot org

- T
- The Natural Philosopher
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Mar 20, 2020 10:47 AM

No, to the envelope from address.

--
Climate Change: Socialism wearing a lab coat.

- T
- The Natural Philosopher
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Mar 20, 2020 10:48 AM

'envelope from'

--
Climate Change: Socialism wearing a lab coat.

- M
- Martin Gregorie
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Mar 20, 2020 10:50 AM

Thanks for that: I haven't seen it before, but it looks very useful. Saved.

--
Martin    | martin at 
Gregorie  | gregorie dot org

- T
- Theo
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Mar 20, 2020 11:44 AM

For all of you trying to do this in regexes, I present some test cases:

??@??.?? (Chinese, Unicode) ???@????.???? (Hindi, Unicode) ????????@?????.??? (Ukrainian, Unicode) ????@???????.??? (Greek, Unicode)

????@??????.?? (Russian, Unicode)

courtesy of:

formatting link

I hope your code handles them appropriately :)

Theo

- R
- Richard Kettlewell
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Mar 20, 2020 12:26 PM

Yes, that too. Delivery agents often add it as a Return-Path: header as their last act.

--
https://www.greenend.org.uk/rjk/

- M
- Martin Gregorie
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Mar 20, 2020 12:43 PM

Regardless, the point I was trying to make is that the From: header is not used by the process of transferring mail from sender to recipient and is displayed by the receiving MUA for information only. So it can be, and is, used nefariously by spammers and other lowlife.

--
Martin    | martin at 
Gregorie  | gregorie dot org

- T
- The Natural Philosopher
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Mar 20, 2020 12:57 PM

No, they do not. Ever.

--
?when things get difficult you just have to lie?

- T
- The Natural Philosopher
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Mar 20, 2020 1:03 PM

No, the From: address is simply a header until you try and respond to an email, when it is, after 'reply to:', the default recipient (long with any cc: lists.

So it is not just displayed for information.

For clarity, the SMTP protocol is: Helo (can be anything) Mail From RCPT to data which is where the headers are sent, followed by the message parts.

That is, the SMTP exchange has nothing to do with the headers BUT once delivered, the headers are what the MUA uses to respond

--
?when things get difficult you just have to lie?

- M
- Martin Gregorie
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Mar 20, 2020 1:20 PM

This appears to describe just the display-name. From RFC 5322 it seems that the address can only use ASCII characters, and at this point I'm only dealing with the address.

--
Martin    | martin at 
Gregorie  | gregorie dot org

- D
- Deloptes
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Mar 20, 2020 1:29 PM

"your code"+"in C"

:D

- D
- druck
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Mar 20, 2020 1:39 PM

to 3

Tl;dr - I don't know how to do it properly, therefore it is crap.

---druck

- D
- Deloptes
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Mar 20, 2020 1:50 PM

Please note

Obsoleted by: 5322 PROPOSED STANDARD Updated by: 5335, 5336 Errata Exist

Network Working Group P. Resnick, Editor Request for Comments: 2822 QUALCOMM Incorporated Obsoletes: 822 April 2001 Category: Standards Track

formatting link

4.1. UTF-8 Syntax and Normalization

UTF-8 characters can be defined in terms of octets using the following ABNF [RFC5234], taken from [RFC3629]:

UTF8-xtra-char = UTF8-2 / UTF8-3 / UTF8-4

UTF8-2 = %xC2-DF UTF8-tail

UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2(UTF8-tail) / %xED %x80-9F UTF8-tail / %xEE-EF 2(UTF8-tail)

UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) / %xF4 %x80-8F 2( UTF8-tail )

UTF8-tail = %x80-BF

These are normatively defined in [RFC3629], but kept in this document for reasons of convenience.

See [RFC5198] for a discussion of normalization; the use of normalization form NFC is RECOMMENDED.

I was wrong about 1990 - you are at state of 2001 :)

Sorry for making fun of it, but it is indeed funny. I knew I had to refresh my memories, cause didn't have to do with mailing since 4 years when I changed company.

regards

- T
- The Natural Philosopher
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Mar 20, 2020 2:13 PM

No. The time to learn how to do it properly exceeds the time to do it the way I know so vastly that my life will be over before I *need* to learn it.

--
"I guess a rattlesnake ain't risponsible fer bein' a rattlesnake, but ah  
puts mah heel on um jess the same if'n I catches him around mah chillun".