Regexes and C

You said you were worried about cross-site scripting, which is a web issue.
It sounds like you?re worried about code injection attacks on these boundaries.
Sanitization is the wrong strategy. A perfectly valid email address may nevertheless contain characters that trip up your downstream applications in some way, if they have not been hardened to process untrusted data.
The mitigations for this when the shell is involved are:
* Minimize the portion of your application written in shell[1]. * Use shell quoting syntax properly within the shell scripts that you do have.
And when invoking commands (in any context): * Prefer array-format commands e.g. args=[...] in Python, or execvp (etc) in C * Avoid string-format commands (e.g. args="..." in Python or system & popen in C).
[1] Ideally to 0. The same applies to C. Both are terrible language choices for security.
--
https://www.greenend.org.uk/rjk/
Reply to
Richard Kettlewell
Loading thread data ...
On a sunny day (Thu, 19 Mar 2020 13:18:58 -0000 (UTC)) it happened Martin Gregorie wrote in :
My reference for libc related functions is libc.info, reg.. is explained there. I have libc.info as text file, you can also download it'from my site as one big text file: wget
formatting link
use editor search for any function, i thas sometimes examples too. I could not have written my programs without i!
That said, I have never used reg.. and it seems, for detecting illegal or allowed chars a bit overkill?
I usually set up a loop for that, for all chars in 'string', maybe like this:
#include #include
int main(int argc, char **argv) {
/* [^.a-zA-Z0-9@_-] */
if(argc != 2) { fprintf(stderr, "Usage: ./test62 some_text\n"); fprintf(stderr, "Dummy, you should enter some text!\n");
exit(1); }
char *ptr; int ok, c; ptr = argv[1]; ok = 0; while(1) { c = *ptr;
if(c == 0) break; else if(c == '^') ok = 1; else if(c == '.') ok = 1; else if(isalnum(c)) ok = 1; else if(c == '@') ok = 1; else if(c == '_') ok = 1; else if(c == '-') ok = 1; else ok = 0; if(! ok) break; ptr++; }
if(! ok) { fprintf(stderr, "Dummy, you should not use char %c in this field!\n", c); } else { fprintf(stderr, "Very good!\n"); }
exit(0); } /* end function main */
:-)
Reply to
Jan Panteltje
Yes, they are designed to be parsed and parsers for them exist (for instance in most email software). The specifications have always contained grammars for them. The language specified in RFC822 isn?t a regular language, but that just means you need something a little more sophisticated than a regular expression to parse it.
--
https://www.greenend.org.uk/rjk/
Reply to
Richard Kettlewell
RCE.html
Exactly so. Its not common, but it can also be used to inject a poison pill into the recipient's system.
Its well-known that the From: header is not used at all to transfer mail from sender to receiver - returned bounces are sent to the Reply-To address. The only defined use of From: is to be displayed by the receiving mail reader (MUA). Any other use is entirely up to the recipient and their system.
A common use for the From: header is in mail archives, which typically index emails by sender, recipient, subject and date, but the wise archivist knows that the From: header can be, and frequently is, a pack of lies.
Take a careful look at the next piece of spam you receive that's apparently from a friend. Many MUAs default to showing just the from text rather than both text and internet mail address. If yours is one of those, reconfigure it to show both. This gives you the ability recognise spam without opening it.
Then use your MUA to look at all the headers and you'll see that spammers are often both lazy and stupid: they often change the sender text to spoof the victim but both From: and Reply-To: both contain their real address - unless, that is, that the message was sent from a compromised system, in which case a common pattern is: From text is your friend's name, From address is the spammer's address and Reply-to is the address of the compromised system.
--
Martin    | martin at 
Gregorie  | gregorie dot org
Reply to
Martin Gregorie
Do you ever LOOK at anything you read? If so you would have realised that was posted late at night when I was tired.
In future try THINKING before getting critical.
--
Martin    | martin at 
Gregorie  | gregorie dot org
Reply to
Martin Gregorie
No, they aren?t. Bounces are sent to the transport-level sender address (often called the ?return path?).
--
https://www.greenend.org.uk/rjk/
Reply to
Richard Kettlewell
... and shouldn't have been one.
I'm not surprised. MySQL was known as being a limited system which lacked any form of query optimisation. It and MS Access were both known to be very limited, especially when the data volume gets large.
The original big three were Informix, Ingres and Oracle, with IBM joining in later, initially having led the field with System/R, developed by Ted Codd and Chris Date. Incidently, both have written extremely good books about the care and feeding of RDBMS systems.
Oracle has always been expensive and seems to need a lot of routine attention, or so I found when I briefly looked after a site.
I know very little about Informix, never having used it.
Ingres was always pretty good. Quick, easy to manage and with a decent query optimiser. There was a special University license which was cloned and became PostgreSQL, which is excellent, free and is currently maintained and developed. It has a good query optimiser and can be ignored for weeks or months on end - it just quietly gets on with automated housekeeping, etc.
Ingres also sold a developers license for version 10 to Microsoft - this is where Microsoft SQL Server came from.
Try PostgreSQL next time. You'll be pleasantly surprised.
Indeed. It was, after all, only MySQL.
I've done much the same in Java rather than using the Derby RDBMS, but that was only because I wanted a small and fairly simple in-memory database behind the covers of a club rostering system I wrote for my gliding club. The translation from RDBMS terms to Java looks like this:
Row -> Class with getters, setters and some table-level methods in it
Table -> ArrayList
Index -> TreeMap
and, before you ask, yes I did normalise the data first and then draw an ERD before cutting any code. It also implements a number of rules about minimum gaps between duties, not rosterinf members of a glider syndicate on the same day, etc, etc. Performance is good, with no delays noticeable during normal duty allocation/deallocation/moves or in switching between rosters.
--
Martin    | martin at 
Gregorie  | gregorie dot org
Reply to
Martin Gregorie
No, to the envelope from address.
--
Climate Change: Socialism wearing a lab coat.
Reply to
The Natural Philosopher
'envelope from'
--
Climate Change: Socialism wearing a lab coat.
Reply to
The Natural Philosopher
Thanks for that: I haven't seen it before, but it looks very useful. Saved.
--
Martin    | martin at 
Gregorie  | gregorie dot org
Reply to
Martin Gregorie
For all of you trying to do this in regexes, I present some test cases:
??@??.?? (Chinese, Unicode) ???@????.???? (Hindi, Unicode) ????????@?????.??? (Ukrainian, Unicode) ????@???????.??? (Greek, Unicode)
????@??????.?? (Russian, Unicode)
courtesy of:
formatting link

I hope your code handles them appropriately :)
Theo
Reply to
Theo
Yes, that too. Delivery agents often add it as a Return-Path: header as their last act.
--
https://www.greenend.org.uk/rjk/
Reply to
Richard Kettlewell
Regardless, the point I was trying to make is that the From: header is not used by the process of transferring mail from sender to recipient and is displayed by the receiving MUA for information only. So it can be, and is, used nefariously by spammers and other lowlife.
--
Martin    | martin at 
Gregorie  | gregorie dot org
Reply to
Martin Gregorie
No, they do not. Ever.
--
?when things get difficult you just have to lie?
Reply to
The Natural Philosopher
No, the From: address is simply a header until you try and respond to an email, when it is, after 'reply to:', the default recipient (long with any cc: lists.
So it is not just displayed for information.
For clarity, the SMTP protocol is: Helo (can be anything) Mail From RCPT to data which is where the headers are sent, followed by the message parts.
That is, the SMTP exchange has nothing to do with the headers BUT once delivered, the headers are what the MUA uses to respond
--
?when things get difficult you just have to lie?
Reply to
The Natural Philosopher
This appears to describe just the display-name. From RFC 5322 it seems that the address can only use ASCII characters, and at this point I'm only dealing with the address.
--
Martin    | martin at 
Gregorie  | gregorie dot org
Reply to
Martin Gregorie
"your code"+"in C"
:D
Reply to
Deloptes
to 3
Tl;dr - I don't know how to do it properly, therefore it is crap.
---druck
Reply to
druck
Please note
Obsoleted by: 5322 PROPOSED STANDARD Updated by: 5335, 5336 Errata Exist
Network Working Group P. Resnick, Editor Request for Comments: 2822 QUALCOMM Incorporated Obsoletes: 822 April 2001 Category: Standards Track
formatting link
formatting link

4.1. UTF-8 Syntax and Normalization
UTF-8 characters can be defined in terms of octets using the following ABNF [RFC5234], taken from [RFC3629]:
UTF8-xtra-char = UTF8-2 / UTF8-3 / UTF8-4
UTF8-2 = %xC2-DF UTF8-tail
UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2(UTF8-tail) / %xED %x80-9F UTF8-tail / %xEE-EF 2(UTF8-tail)
UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) /
%xF1-F3 3( UTF8-tail ) / %xF4 %x80-8F 2( UTF8-tail )
UTF8-tail = %x80-BF
These are normatively defined in [RFC3629], but kept in this document for reasons of convenience.
See [RFC5198] for a discussion of normalization; the use of normalization form NFC is RECOMMENDED.
I was wrong about 1990 - you are at state of 2001 :)
Sorry for making fun of it, but it is indeed funny. I knew I had to refresh my memories, cause didn't have to do with mailing since 4 years when I changed company.
regards
Reply to
Deloptes
No. The time to learn how to do it properly exceeds the time to do it the way I know so vastly that my life will be over before I *need* to learn it.
--
"I guess a rattlesnake ain't risponsible fer bein' a rattlesnake, but ah  
puts mah heel on um jess the same if'n I catches him around mah chillun".
Reply to
The Natural Philosopher

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.