Getting started with AVR and C

Robert Roland · 2012-11-24T16:41:02+00:00

I am quite used to playing with PICs and JAL v2. I use the PicKit3 for programming and debugging. Now I want to learn C and AVR. I thought it may be a good idea to do both at the same time. I already have a device with an ATMEGA168 in it. I want to write my own firmware for this device. I have installed the Atmel Studio 6 and downloaded the data sheet for the '168. Now I need a programmer/debugger. This is purely for hobby use, so I can't blow too much money on it. So, I have a couple of questions: 1. Which programmer/debugger should I buy? The JTAGICE3 is acceptably priced and appears to be very capable. Any reason to not get that one? 2. Where do I start learning C? Is there a good online tutorial somewhere? I'd also be willing to buy a book. Is there one that stands out as the best? -- RoRo

J

John Devereux 13 years ago

UTF-8 is the way forward isn't it?

John Devereux

Vote

I

Ivan Shmakov 13 years ago

[...]

I doubt it is. FWIW, it requires three octets for Cyrillic, while UTF-16 requires only two. Personally, I'd try to use the latter whenever possible (which means: anywhere, unless OS interaction issues are deeply involved in the matter.)

FSF associate member #7257

Vote

R

Richard Damon 13 years ago

As with most compression systems it depends on what the usage pattern of characters is. If the text base is mostly the 7 bit ASCII character set, with some of the other lower valued characters and only a few bigger valued characters, UTF-8 makes sense. If most of the characters are in the larger values (like using a non-Latin based character set) then UTF-16 may make much more sense.

Vote

K

Keith Thompson 13 years ago

UTF-8 has a couple of other advantages. It's equivalent to ASCII as long as all the characters are

Vote

R

Richard Damon 13 years ago

UTF-8 and UTF-16 *ARE* compression methods. Uncompressed Unicode would be UTF-32 or UCS-4, using 32 bits per character. For most use, if you don't need code points above U+0FFFF, then you might consider UCS-2 uncompressed format. Then UTF-16 isn't really compression, but a method to mark the very rare character above U+0FFFF. UTF-8 is really just a compression format to try and remove some of the extra space, and will do so to the extent that characters 0-7F are more common than U+0800 and higher, the former saving you a byte, and the latter costing you one.

UTF-8 does have the other advantage that you mention, looking like ASCII for those characters allowing many Unicode unaware programs to mostly function with UTF-8 data.

Vote

K

Keith Thompson 13 years ago

[...]

I don't recall saying they aren't.

But they're (relatively) simplistic compression methods that don't adapt to the content being compressed, which is why applying another compression tool (I *did* say "another") can be useful.

Keith Thompson (The_Other_Keith) kst-u@mib.org Will write code for food. "We must do something. This is something. Therefore, we must do this." -- Antony Jay and Jonathan Lynn, "Yes Minister"

Vote

N

Nobody 13 years ago

Size isn't the only issue; the fact that UTF-16 may (and usually does) contain null bytes ('\0') rules it out for many applications.

Similarly, anything which expects specific bytes (e.g. '\x0a', '\x0d', etc) to have their "usual" meanings regardless of context will work fine with UTF-8 but not with UTF-16 or UTF-32.

Vote

U

upsidedown 13 years ago

For any given non-Latin based language, there are only a few possible bit combinations in the first byte(s) of the UTF-8 sequence, thus it should compress quite well.

For use inside a program, UTF-32 would be the natural choice with 1 array element/character.

Compressing a UTF-32 file using some form of Huffman coding, should not take more space than compressed UTF-8/UTF-16 files, since the actually used (and stored) symbol table would reflect the actual usage of sequences in the whole file. Doing the compression on the fly in a communication link would be less effective, since only a part of the data would be available at a time, in order to keep the latencies acceptable.

Vote

R

Richard Damon 13 years ago

But they are fundamentally different than other compressions. Multi-byte/symbol encodings are generally designed so that it is possible to process the data in that encoding. It isn't that much harder to process the data then if it was kept fully expanded. Some operations, like computing the length of a string, require doing a pass over the data instead of just taking the difference in the addresses, but nothing becomes particularly hard.

On the other hand, it is very unusual for any program to actually process "zipped" data as such, it is almost always uncompressed to be worked on and then re-compressed, and any changes tend to require reprocessing the entire rest of the file (or at least the current compression block).

Vote

K

Keith Thompson 13 years ago

I'd say that's a difference of degree, not anything fundamental.

Computing the length of a string requires doing a pass over it, whether it's UTF-8 encoded or gzipped. And it's certainly possible to process UTF-8 data by internally converting it to UTF-32.

And copying a file doesn't require uncompressing it, regardless of the format.

Keith Thompson (The_Other_Keith) kst-u@mib.org Will write code for food. "We must do something. This is something. Therefore, we must do this." -- Antony Jay and Jonathan Lynn, "Yes Minister"

Vote

P

Phil Carmody 13 years ago

Hmmm, those who work in compression tend to prefer the term "encodings", for such fixed 1-1 mappings of input to output tokens. UTF-8, and the others you consider to be "compressed", simply have output tokens of different lengths.

Phil

I'm not saying that google groups censors my posts, but there's a strong link between me saying "google groups sucks" in articles, and them disappearing. Oh - I guess I might be saying that google groups censors my posts.

Vote

J

John Devereux 13 years ago

Yes precisely. I had to update an embedded system with a simple home-made gui, so that it could do Chinese. I was pleasantly suprised how painless it was using UTF8. Strings are still null terminated char arrays, most everything just worked as before. You can't predict the number of characters just from the string size, but I was already using proportional fonts so this was not an issue. I could even abuse the C standard - sorry c.l.c - and embed utf8 in the C source code and that worked too. (I moved these out into resource files in the end though).

John Devereux

Vote

B

Ben Bacarisse 13 years ago

It's not much of an abuse. Multibyte character sequences are permitted in string literals and may even be converted to wide character strings as if by the use of the mbstowcs function when appropriate. In C99, the only trouble is that what encoding is assumed, and what characters are permitted, is implementation defined. Whilst that's also true in the latest standard, C11 does add the u8 prefix to produce UTF-8 encoded strings.

It also adds the U (and u) prefix to make Unicode character arrays from a multibyte character string, but the encoding is still implementation defined and dependent on the locale (as it should be, I think).

Ben.

Vote

Getting started with AVR and C

Join the Discussion

Didn't find your answer?