You said you were worried about cross-site scripting, which is a web
It sounds like you?re worried about code injection attacks on these
Sanitization is the wrong strategy. A perfectly valid email address may
nevertheless contain characters that trip up your downstream
applications in some way, if they have not been hardened to process
The mitigations for this when the shell is involved are: * Minimize the portion of your application written in shell.
* Use shell quoting syntax properly within the shell scripts that you do
And when invoking commands (in any context):
* Prefer array-format commands e.g. args=[...] in Python, or execvp
(etc) in C
* Avoid string-format commands (e.g. args="..." in Python or system &
popen in C).
 Ideally to 0. The same applies to C. Both are terrible language
choices for security.
On a sunny day (Thu, 19 Mar 2020 13:18:58 -0000 (UTC)) it happened Martin
Gregorie wrote in :
My reference for libc related functions is libc.info, reg.. is explained there.
I have libc.info as text file, you can also download it'from my site as one big text file:
use editor search for any function, i thas sometimes examples too.
I could not have written my programs without i!
That said, I have never used reg.. and it seems, for detecting illegal or allowed chars a bit overkill?
I usually set up a loop for that, for all chars in 'string', maybe like this:
int main(int argc, char **argv)
/* [^.a-zA-Z0-9@_-] */
if(argc != 2)
fprintf(stderr, "Usage: ./test62 some_text\n");
fprintf(stderr, "Dummy, you should enter some text!\n");
int ok, c;
ptr = argv;
ok = 0;
c = *ptr;
if(c == 0) break;
else if(c == '^') ok = 1;
else if(c == '.') ok = 1;
else if(isalnum(c)) ok = 1;
else if(c == '@') ok = 1;
else if(c == '_') ok = 1;
else if(c == '-') ok = 1;
else ok = 0;
if(! ok) break;
fprintf(stderr, "Dummy, you should not use char %c in this field!\n", c);
fprintf(stderr, "Very good!\n");
} /* end function main */
Yes, they are designed to be parsed and parsers for them exist (for
instance in most email software). The specifications have always
contained grammars for them. The language specified in RFC822 isn?t a
regular language, but that just means you need something a little more
sophisticated than a regular expression to parse it.
Exactly so. Its not common, but it can also be used to inject a poison
pill into the recipient's system.
Its well-known that the From: header is not used at all to transfer mail
from sender to receiver - returned bounces are sent to the Reply-To
address. The only defined use of From: is to be displayed by the
receiving mail reader (MUA). Any other use is entirely up to the
recipient and their system.
A common use for the From: header is in mail archives, which typically
index emails by sender, recipient, subject and date, but the wise
archivist knows that the From: header can be, and frequently is, a pack
Take a careful look at the next piece of spam you receive that's
apparently from a friend. Many MUAs default to showing just the from text
rather than both text and internet mail address. If yours is one of
those, reconfigure it to show both. This gives you the ability recognise
spam without opening it.
Then use your MUA to look at all the headers and you'll see that spammers
are often both lazy and stupid: they often change the sender text to
spoof the victim but both From: and Reply-To: both contain their real
address - unless, that is, that the message was sent from a compromised
system, in which case a common pattern is: From text is your friend's
name, From address is the spammer's address and Reply-to is the address
of the compromised system.
... and shouldn't have been one.
I'm not surprised. MySQL was known as being a limited system which lacked
any form of query optimisation. It and MS Access were both known to be
very limited, especially when the data volume gets large.
The original big three were Informix, Ingres and Oracle, with IBM joining
in later, initially having led the field with System/R, developed by Ted
Codd and Chris Date. Incidently, both have written extremely good books
about the care and feeding of RDBMS systems.
Oracle has always been expensive and seems to need a lot of routine
attention, or so I found when I briefly looked after a site.
I know very little about Informix, never having used it.
Ingres was always pretty good. Quick, easy to manage and with a decent
query optimiser. There was a special University license which was cloned
and became PostgreSQL, which is excellent, free and is currently
maintained and developed. It has a good query optimiser and can be
ignored for weeks or months on end - it just quietly gets on with
automated housekeeping, etc.
Ingres also sold a developers license for version 10 to Microsoft - this
is where Microsoft SQL Server came from.
Try PostgreSQL next time. You'll be pleasantly surprised.
Indeed. It was, after all, only MySQL.
I've done much the same in Java rather than using the Derby RDBMS, but
that was only because I wanted a small and fairly simple in-memory
database behind the covers of a club rostering system I wrote for my
gliding club. The translation from RDBMS terms to Java looks like this:
Row -> Class with getters, setters and some table-level methods in it
Table -> ArrayList
Index -> TreeMap
and, before you ask, yes I did normalise the data first and then draw an
ERD before cutting any code. It also implements a number of rules about
minimum gaps between duties, not rosterinf members of a glider syndicate
on the same day, etc, etc. Performance is good, with no delays noticeable
during normal duty allocation/deallocation/moves or in switching between
For all of you trying to do this in regexes, I present some test cases:
??@??.?? (Chinese, Unicode)
???@????.???? (Hindi, Unicode)
????????@?????.??? (Ukrainian, Unicode)
????@???????.??? (Greek, Unicode)
????@??????.?? (Russian, Unicode)
I hope your code handles them appropriately :)
Regardless, the point I was trying to make is that the From: header is
not used by the process of transferring mail from sender to recipient and
is displayed by the receiving MUA for information only.
So it can be, and is, used nefariously by spammers and other lowlife.
No, the From: address is simply a header until you try and respond to an
email, when it is, after 'reply to:', the default recipient (long with
any cc: lists.
So it is not just displayed for information.
For clarity, the SMTP protocol is:
Helo (can be anything)
which is where the headers are sent, followed by the message parts.
That is, the SMTP exchange has nothing to do with the headers BUT once
delivered, the headers are what the MUA uses to respond
?when things get difficult you just have to lie?
Obsoleted by: 5322 PROPOSED STANDARD
Updated by: 5335, 5336 Errata Exist
Network Working Group P. Resnick, Editor
Request for Comments: 2822 QUALCOMM Incorporated
Obsoletes: 822 April 2001
Category: Standards Track
4.1. UTF-8 Syntax and Normalization
UTF-8 characters can be defined in terms of octets using the
following ABNF [RFC5234], taken from [RFC3629]:
UTF8-xtra-char = UTF8-2 / UTF8-3 / UTF8-4
UTF8-2 = %xC2-DF UTF8-tail
UTF8-3 = %xE0 %xA0-BF UTF8-tail /
%xE1-EC 2(UTF8-tail) /
%xED %x80-9F UTF8-tail /
UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) /
%xF1-F3 3( UTF8-tail ) /
%xF4 %x80-8F 2( UTF8-tail )
UTF8-tail = %x80-BF
These are normatively defined in [RFC3629], but kept in this document
for reasons of convenience.
See [RFC5198] for a discussion of normalization; the use of
normalization form NFC is RECOMMENDED.
I was wrong about 1990 - you are at state of 2001 :)
Sorry for making fun of it, but it is indeed funny. I knew I had to refresh
my memories, cause didn't have to do with mailing since 4 years when I