XC8 novice question

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Tue, Sep 13, 2016 8:11 AM

The problem with Unicode is that it makes the problem space bigger. Its relatively easy for a developer to decide on appropriate syntax for file names, etc. with ASCII, Latin1, etc. But, start allowing for all these other code points and suddenly the developer needs to be a *linguist* in order to understand what should/might be an appropriate set of constraints for his particular needs.

Also, we *tend* to associate meaning with each of the (e.g.) ASCII code points. So, 16r31 is the *digit* '1'. There's concensus on that interpretation.

However, Unicode just tabulates *glyphs* and deprives many of them of any particular semantic value. E.g., if I pick a glyph out of U+[2800,28FF], there's no way a developer would know what my

*intent* was in selecting the glyph at that codepoint. It could "mean" many different things (the developer has to IMPOSE a particular meaning -- like deciding that 'A' is not an alphabetic but, rather, a "hexadecimal character" IN THIS CONTEXT)

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Tue, Sep 13, 2016 10:24 AM

For _file_ names not a problem, stop scanning at next white space (or null in C). Everything in between is the file name, no matter what characters are used.

For _path_ specifications, there must be some rules how to separate the node, path, file name, file extension and file version from each other. The separator or other syntactic elements are usually chosen from the original 7 bit ASCII character set. What is between these separators is irrelevant.

For _numeric_entry_ fields, including the characters 0-9 requires Arabic numbers fallback mapping.

As strange as it sounds, the numbers used in Arabic countries differs from those used in Europe.

As long as it is just payload data, why should the programmer worry about it ?

In Unicode, there are code points for hexadecimal 0 to F. Very good idea to separate the dual usage for A to F. Has anyone actually used those code points ?

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Tue, Sep 13, 2016 10:25 AM

Of course they think "x=-1" means "x = -1" ! It has been almost forty years since "x =- 1" has been standard C. Most people also think that television is in colour, you can communicate to Australia by telephone, and flares are no longer in fashion. Live moves on.

In this particular case, the number of people who ever learned to write "x =- 1", and are still working as programmers (or even still alive) is tiny. And the number of those who failed to learn to use "x -= 1" at least 35 years ago, must be tinier still. Sure, you can /remember/ it - and remember having to change old code to suit new compilers. An old shopkeeper may remember when he made deliveries with a horse and cart - but he does not insist that new employees know about grooming horses.

Backwards compatibility, and compatibility with existing code, is important. That is why we still have many of the poor choices in the design of C as a language - for compatibility. But with each passing year or decade, compatibility with the oldest code gets less and less relevant - except to historians or the occasional very specialist cases.

Of all the lines of C code that are in use today, what fraction were written in pre-K&R C when "x =- 1" was valid? One in a million? One in a hundred million? If we exclude code lines that have been converted to later C standards, then I doubt if it is nearly that many.

That is different in that the parsing rules for C are quite clear here, and are the same as the always have been - /* starts a comment. But unless you have carefully created a pathological case and use a particularly unhelpful compiler (and editor - in this century, most programmers use editors with syntax highlighting), you are going to spot the error very quickly.

C provides enormous opportunity for accidentally writing incorrect code

- in many cases, the result is perfectly acceptable C code and will not trigger any warnings. If you were to take the top hundred categories of typos and small mistakes in C code that resulted in compilable but incorrect code, "x=y/*p" would not feature. It /might/ make it onto a list of the top thousand mistakes. It really is that irrelevant.

And it is preventable by using spaces. There is a reason that the space bar is the biggest key on the keyboard.

Take your head out of your history books. In C, "x-=1" means "x -= 1", while "x=-1" means "x = -1". That is it. It is a simple fact. It matters little what C used to be, decades before most programmers were born.

I am not as old as you, but I have been programming for about 35 years. I have had my share of hand-assembling code, burning eeproms, using punched tape, and even setting opcodes with DIP switches with a toggle switch for the clock.

But I understand the difference between what I do /now/, and what other programmers do /now/, and what I did long ago.

I am a hardware man too. And I quite appreciate that interpretation as well.

I can't remember - perhaps 20 or so.

Z80, 6502, 68k, x86, MIPS, COP8, 8051, PIC16, HP43000, ARM, PPC, AVR, AVR32, MSP430, NIOS, XMOS, TMS430, 56Fxxx

That's 18 - there are several more whose names I can't remember, and some that I have programmed on without being familiar with the assembly language.

Long ago, anyone wanting to make a C compiler for a new processor would either buy the front end and write their own code generator, or would pay a compiler company to write the code generator to go with their existing front end. Only hobby developers would write their own C front end - for professionals, it was not worth the money unless they were a full-time compiler development company.

So you got your front-end already made, with whatever features and warnings it supported. Clearly, the range of features would vary here. And sometimes you wanted to add your own extra features for better support of the target.

Now, anyone wanting to make a C compiler for a new processor starts with either gcc or clang, and writes the code generator - again, the front-end is there already.

The tools I used 20 years ago were not as good as the ones I use now. And the tools I used 20 years ago were not as good as the best ones available 20 years ago - the budget did not stretch.

But now, the budget /does/ stretch to high quality tools - for most microcontrollers, /everybody's/ budget stretches far enough because high quality compiler tools are free or very cheap. There are a few microcontrollers where that is not the case (the 8051, the unkillable dinosaur, being an example), but tool quality and price is a factor many people consider when choosing a microcontroller.

And how relevant are 20 year old tools to the work I do /today/, writing code /today/ ? Not very relevant at all, except for occasional maintenance of old projects.

The whole point is that 20 years ago I had to have a style that made sense 20 years ago with the tools of that era. Now I have a style that is suited to the tools of /this/ era. Not a lot has changed, because the guiding principles have been the same, but many details have changed. Function-like macros have morphed into static inline functions, home-made size-specific types have changed to uint16_t and friends, etc. Some of my modern style features, such as heavy use of static assertions, could also have been used 20 years ago - I have learned with time and experience.

But I refuse to write modern code of poorer quality and with fewer features simply because those features were not available decades ago - or even because those features are not available on all modern compilers.

Compilers can, do, and should complain about particularly bad style. It's important that such complaints are optional - and for compatibility, they are usually disabled by default. There is no clear division between what is simply a stylistic choice ("x=3" vs. "x = 3", for example), and what is a really /bad/ idea, such as putting comments in the middle of a statement. Thus any complaints about style need to be configurable.

But there is no doubt that such warnings can be helpful in preventing bugs. Warning on "if (x = 3)" is a fine example. Another is gcc 6's new "-Wmisleading-indentation" warning that will warn on:

if (x == 1) a = 2; b = 3;

Code like that is wrong - it is bad code, even if it is perfectly legitimate C code, and even if it happens to work. It is a good thing for compilers to complain about it.

No, I would not write any of that.

I would not write "const zero = 0" for several reasons. First, it is illegal C - it needs a type. Second, such constants are usually best declared static. Third, it is pointless making a constant whose name is its value.

But I /might/ write:

static const int start = 0; static const int end = 9;

for (int index = start; index < end; index++) { ... }

Even that is quite unlikely - "start" and "end" would usually have far better names, while "index" is almost certainly better written as "i". (But that is a matter of style :-) ).

The only thing the START and END form makes abundantly clear is that you really, really want everyone looking at the code to see at a glance that START and END won't change - and that is far more important than anything else about the code, such as what it does.

If "zero" does not map to zero, don't call it "zero". Call it "zeroK", or "lowestTemperature", or whatever.

Certainly we seem to be talking at cross-purposes here. It is a matter of viewpoint who is "missing the point" - probably both of us.

Yes, I am arguing that if something in your style is no longer the best choice for modern programming, then you certainly should consider changing it. Clearly you will not do so without good reason, which is absolutely fine.

I am also arguing against recommending new people adopt a style whose benefits are based on ancient tools and your own personal habits. Modern programmers should adopt a forward-looking style that lets them make the take advantage of modern tools - there is no benefit in adopting your habits or my habits, simply because /we/ are used to them. There are benefits in using, or at least being familiar with, common idioms and styles. But that should not be an overriding concern. Keep good habits, if they are still good - but drop bad habits.

There is a balance between choosing something that is mature, field proven and familiar, and choosing something that is newer and has benefits such as efficiency, clarity, flexibility, safety, etc.

I think that the large majority of work done in C would be better written in a different language, were it not for two factors - existing code written in C, and existing experience of the programmer in C. For most programming tasks, C /is/ archaic - it is limited, inflexible, and error prone. For some tasks, its limitations and its stability as a language are an advantage. But for many tasks, if one could disregard existing C experience, it is a poor choice of language.

Thus a lot of software on bigger systems is written in higher level languages, such as Python, Ruby, etc. A lot of software in embedded systems are written in C++ to keep maximal run-time efficiency while getting more powerful development features. New languages such as Go are developed to get a better balance of the advantages of different languages and features.

For a good deal of embedded development, the way forward is to avoid archaic and brain-dead microcontrollers such as the 8051 or the PIC. Stick to solid but modern processors such as ARM or MIPS cores. And move to C++ - /if/ you are good enough to learn and understand how to use that language well in embedded systems.

I would wait a few years, not weeks, but not decades, before adopting new languages for embedded programming. Maybe Go will be a better choice in a few years.

We've been through all this with "C vs. assembly" - and there are plenty of people that still use assembly programming for embedded systems because "it was good enough for my grandfather, it's good enough for me", or because they simply refuse to move forward with the times. Like assembly, C will never go away - but it /will/ move further and further into niche areas, and be used "for compatibility with existing code and systems".

In the meantime, we can try and write our C code in the best way that modern tools allow.

We have only been talking about coding styles, which are a small part of development styles. And development styles are only a small part of products as a whole.

Learning to use spaces appropriately and not using Yoda-speak for your conditionals will not mean end-users will automatically like your product!

It would depend on the rest of the context, which is missing here, but I'd guess it would be something like:

int noOfWhatsits = bar(); if (noOfWhatsits == foo) { ... }

Local variables are free, and let you divide your code into clear and manageable parts, and their names let you document your code. I use them a lot.

(I have also used older and weaker compilers that generated poorer code if you had lots of local variables - I am glad my current style does not have to handle such tools.)

Different languages have different symbols - yes, I know that.

I am lucky enough to have full use of my hands and my eyes, as well as my mouth. The same applies to other people I discuss code with. I would not try to distinguish "x = y" and "x == y" verbally - I would /write/ it.

Good. Then we agree - make the best use of the best tools available.

I agree. That means not distracting it with things that are easily found automatically by compilers and other tools, so that your mind can concentrate on the difficult stuff.

Agreed - there is plenty of scope for personal variation and style here.

The right balance here will vary depending on the circumstances - there is no single correct answer (but there are many wrong answers).

There is more to writing good code than that (and I know you know that). Whether you call bad code that happens to work "correct" or "incorrect" is up to you.

But my point here was that you seem to imply I write code with little regard for it being correct or incorrect, and then rely on the compiler to find my errors.

I suspect that in the great majority of cases where I don't like your style, then it is nothing more than that. I might think it is not clear or easy to understand, or not as maintainable as it could be, or simply looks ugly and hard to read, or that it is not as efficient as other styles. I can't say for sure, since about the only things I know for sure about your style is that you like to write "if (3 == x)" rather than "if (x == 3)", and that you like function pointers.

It takes a lot more than that for me to label code as "incorrect" or "bad" (assuming the final result does the job required).

If a change makes code clearer, then it is a good thing. Visually splitting the words in a multi-word identifier makes code clearer - whether that is done using camelCase or underscores is a minor issue. Small letters are easier to read (that's why they exist!), and avoid unnecessary emphasis - that makes them a good choice in most cases. And there is rarely any benefit in indicating that an identifier is a constant or a macro (assuming it is defined and used sensibly) - so there is no point in making such a dramatic distinction.

And those are updates after delivery. There are many perfectly good reasons for updating software after delivery. All I said was that I have provided updates in a variety of ways, and for a variety of reasons

- but never for the sort of mistakes that you seem to think you are immune to because you learned to program with limited tools, while you think /I/ make them all the time because I take advantage of modern tool features.

That is fine for some projects. I have had cards that have been cemented into the ocean floor - upgrades are not physically possible. And on other projects, customers want to be able to have new features or changes at a later date.

I think everyone agrees that shipping something that does not work correctly, and updating for bug fixes, is always a bad idea - just /how/ bad it is will vary.

Without decent warnings, developers (especially new ones) are likely to spend a good deal more time chasing small bugs than they would if the compiler or linter helped them out. But why do you think this particular issue is so important? New C programmers are often told how important it is to distinguish between = and ==, so it is something they look out for, and avoid in most cases. And the Yoda rule only helps in /some/ cases where you have comparisons - you still need to get your = and == right everywhere else.

I have abused preprocessors a bit (any use of ## is abuse!), but I haven't had to look at the output directly to debug that abuse. Maybe I haven't been creative enough in my abuses here.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Tue, Sep 13, 2016 10:39 AM

That sounds fine - but what is "white space" in unicode? In ASCII, it's space, tab, newline and carriage return characters. In unicode, there are far more. Invisible spaces, non-breaking spaces, spaces of different widths, etc. Did you remember to check for the Ogham space mark, for those Celtic file names?

Use UTF-8 and stop on a null character. Just let people put spaces of any sort in their filenames, and you only have to worry about / (or \ and : ) as special characters.

That's because our "Arabic numerals" came from India, not Arabia - though they were brought over to Europe by an Arabic mathematician. I believe that in Arabic, the term for them translates as "Indian numerals".

There are lots of cases where the same glyph exists in multiple unicode code points, for different purposes. I have no idea how often they are used.

- D
- Dennis
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Tue, Sep 13, 2016 3:22 PM

I'll go off on a tangent - it can be an important issue. I once worked with a guy that was visually impaired and used a screen reader for much of his work. The underscore form would read as (spoken)word (spoken)underscore (spoken)word... where the camelCase would cause it to give up and spell it all out. We referred to the underscore form as "easy reader code". This was over a decade ago so screen readers may be smarter now.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Tue, Sep 13, 2016 6:08 PM

Unless it is a screen reader specially designed for code, then I'd imagine it would have trouble with camelCase words. I think Don knows more about this sort of program.

But you are absolutely right that there can be particular circumstances that determine our choices here, and have overriding importance.

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Tue, Sep 13, 2016 10:41 PM

The file system code is easy cuz it doesn't have to impart meaning to any particular "characters". But, a there are typically human beings involved who *do* impart meaning to certain glyphs (as well as the OS itself), the developer can't ignore that expected meaning.

I want to use the eight character name "////\\\\". Or, "::::::::". Or, ">> Also, we *tend* to associate meaning with each of the (e.g.) ASCII

Why? shouldn't the glyphs for the Thai digits (or Laotian) also be recognized as numeric? Likewise for the roman numerals, the *fullwidth* (arabic) digits, etc.?

A particular braille glyph means different things based on how the application/user interprets it.

For example, the letter 'A' is denoted by a single dot in the upper left corner of the cell. 'B' adds the dot immediately below while 'C' adds the dot to the immediate right, instead.

Yet, in music braille, the "note" 'A' is denoted by a cell that is the union of the letters 'B' and 'C' *less* the letter 'A'

A B C

*. *. ** .. *. .. .. .. ..

Notes: A B C .* .* **

*. ** .* .. .. ..

I.e., the same glyph means different things. Imagine labeling a file of sound samples with their corresponding "notes"...

Because the programmer has to deal with the glyph's *meaning* to the user. Otherwise, why not just list file names and content as 4 digit unicode codepoints and eliminate the hassle of rendering fonts, imparting meaning, etc.?

The 10 arabic digits that we are accustomed to exist as U+0030 - U+0039 as well as U+FF10 - U+FF19. Then, there are the 10 arabic-indic digits, 10 extended arabic-indic digits,

10 mongolian digits, 10 laotian digits, etc. [We'll ingore dingbat digits, circled digits, super/subscripted digits, etc.]

Then, stacking glyphs (e.g., the equivalent of diacriticals)...

There's just WAY too much effort involved making sense of Unicode in a way that your *users* will appreciate and consider intuitive. When faced with the I18N/L10N issues, I found it MUCH easier to

*punt* (if they don't speak English, that's THEIR problem; or, a task for someone with more patience/resources than me!)

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Wed, Sep 14, 2016 6:07 AM

With a purely GUI user interface, not a big problem. However with direct command line entry or scripts will require all kinds of escape mechanisms.

Depending on sorting locale. Since sorting locales are usually different from language to language, you could create your own locale for exactly what you want.

Yes of course, use fallback character mapping from Thai digit 1 to Arabic digit 1. The Roman numerals are more complex, since they do not use the positional system.

Use fallback mapping for numeric entry, use original code points for rendering.

Use foldback mapping.

Why would a person have to speak English or even know latin letters to use a computer such as a cell phone ?

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Wed, Sep 14, 2016 6:29 AM

I'm not designing a cell phone. And, both parties (user and device)

*speak* to each other. Should I also add learning foreign pronunciation algorithms to my list of design issues? :>