Do you use serialization formats for communication?

P

pozz 9 years ago

I often have the need to exchange some data between two or more MCUs. I usually use I2C or UART as physical layers.

Normally I design a simple protocol between the MCUs: one framing mechanism (Start Of Frame, End Of Frame), one integrity check mechanism (CRC), and so on.

The payload is statically defined between the two MCUs:

- first byte is the version

- second byte is the voltage monitoring level

- third and fourt bytes are some flags

- ... and so on

As you can understand, both MCUs *must* know and agree about that protocol format. However during the lifetime of the product, I need to add some functionality or fix some bugs and those activites can lead to a review of the protocol format (maybe i need two bytes for the voltage level). Sometime, the two MCUs have a different version with a different protocol format implementation. In order to avoid protocol incompatibility, they all knows about the protocol formats used before, so they can adapt the parsing function to the real current protocol format. As you can understand, it could be a trouble.

So I'm thinking to use a "self-descriptive" serializer protocol format, such as Protobuf, Message Pack, BSON and so on.

Do you use one serialization format? Which one?

Of course, it should be simple to implement (in transmission/encoding and reception/decoding) in a small embedded MCU in C language, without dynamic memory support.

Vote

D

Dimiter_Popoff 9 years ago

I think you can just use PPP. Simple enough, easy to implement, proven to be robust, tested over a very long time. And of course it can be handy if at some point you want to put IP through it. It's been 10+ years since I last implemented it so my memory of it is somewhat blurry but I remember it did not take me too long.

Not that I understand why the need for encapsulation etc. but since you want it this is the proven way to go.

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

Vote

S

Simon Clubley 9 years ago

PPP doesn't help you when the message format itself changes and you need to maintain backwards compatibility which is what pozz seems to be worried about.

I wonder if a solution might be a Kermit style exchange of attributes in an attributes packet and maybe with the attributes describing the individual fields in addition to the implemented protocol capabilities.

In other words, maybe do an exchange of capabilities supported by the peer instead of just a version number.

One of the key things about the Kermit attributes packet is that it is variable length and if the attributes packet isn't big enough to describe a specific capability then the code knows that this specific Kermit protocol extension isn't supported by the peer.

Simon.

Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world

Vote

D

Don Y 9 years ago

Don't rely on positional information in the protocol to convey information. Instead, *tag* each "value" and then just keep track of how to decode each particular tag. If you encounter a tag that you don't understand, then you have to have designed the protocol such that the 'default' for that tagged value is acceptable.

If you change the *encoding* of a particular 'value', then you have to create a new, unique tag for that value in the new encoding. Note that you can also include the value in its original encoding -- under the original 'tag' -- in with the *new* encoding (and its new tag).

Look at protocols like DHCP, BOOTP, etc. to see how others have been doing this (in a future/past-safe manner) for decades...

Vote

D

David Brown 9 years ago

It depends on how flexible you want to be. Self-descriptive or tagged formats, like JSON, BSON, etc., are very future-proof - but they are also much more effort in development time and run time.

You can come a /long/ way with just a little more than the system you have. Keep the same framing mechanism, but make sure you have a field for "length of payload". In the payload, you have "type of telegram" and "version of telegram format". Then when you need to change the formats, you add new data to the old structure.

So format version 1 might be:

typedef struct { uint8_t programVersion; uint8_t voltageMonitor; uint16_t flags; } format1payload; static_assert(sizeof(format1payload) == 4);

Format version 2, with voltage now in millivolts, will be:

typedef struct { uint8_t programVersion; uint8_t voltageMonitor; uint16_t flags; // Start of version 2 uint16_t voltageMonitorMillivolts; } format2payload; static_assert(sizeof(format2payload) == 6);

A transmitter always sends with the latest version it knows, and will fill in both the voltageMonitor and voltageMonitorMillivolts fields. A receiver interprets as much as it can based on the latest version it knows and the version it receives - any excess data beyond its understanding can safely be ignored.

Your encoder and decoders are now nothing more than casts between char* pointers and struct pointers.

Vote

K

kalvin.news 9 years ago

A simple and quite robust message format which is also simple to parse:

START-OF-FRAME FRAME-LENGTH FRAME-LENGTH-INV FRAME-PAYLOAD FRAME-CHECKSUM END-OF-FRAME

The FRAME-LENGTH tells how many bytes to read in order to get the frame pay load and the checksum: FRAME-LENGTH = numer-of-bytes(FRAME-PAYLOAD) + num ber-of-bytes(FRAME-CHECKSUM), for example. The FRAME-LENGTH-INV is 1's comp lement of FRAME-LENGTH and it is used for fast detection of invalid frames: If the FRAME-LENGTH is not equal to ones-complement-of(FRAME_LENGTH-INV) t he frame needs to be discarded as the frame is already corrupted.

The FRAME-PAYLOAD is a sequence on PAYLOAD-ITEMs:

PAYLOAD-ITEM: ITEM-ID ITEM-LENGTH ITEM-PAYLOAD.

The ITEM-ID tells what the data item this is, the ITEM-LENGTH tells the len gth of the item payload and the ITEM-PAYLOAD contains the byte information for the item.

Pretty simple message format and easy to parse. If the parser doesn't recog nize the item id, it knows how many bytes to skip for the next item.

Br, Kalvin

Vote

P

pozz 9 years ago

So you use cast your struct pointers to char pointers and send it as is? I used this very simple technique in the past, but I don't use it anymore. Because the two MCUs could be different, could use a different endianness, could use a different compiler that places padding in different places, and so on.

Vote

T

Tauno Voipio 9 years ago

The canonical solution to this is to follow the Internet practices: Pack data tightly and use network byte order for multibyte binary data. This often needs byte transfers to pack and unpack the communication frames. The network byte order is in reverse for e.g. PC hardware.

Another way is to transfer all data as text. There is a good reason why XML is used in this kind of situations, though it is an extremely loose format, plenty of overhead compared to data.

-TV

Vote

D

David Brown 9 years ago

It is not a problem if the MCUs are different. It would matter if they had different encodings for signed integers or padding bits in their types, but let's assume you are not communicating with a mainframe from the 60's.

Padding is not a problem if you design your structs carefully. Make sure everything is naturally aligned - 16-bit data is 16-bit aligned,

32-bit data is 32-bit aligned, 64-bit data is 64-bit aligned. Use your tools to check this - "-Wpadded" for gcc, and static_asserts to check that the sizes of your structs match what you expect.

That just leaves endianness. Most microcontrollers are little-endian, as are PC's, so that is the endianness I normally use. The only exception would be if I were transferring data between two big-endian devices, I would probably use big-endian ordering.

So if I have a networked system with different endians on different microcontrollers, then I need to do endian swaps on the structs at one end. Some compilers support this, letting you annotate your structs with the endianness (gcc 6 has this, though I haven't tried the feature yet). Otherwise it must be done manually when receiving or transmitting the struct. But still, it is a fraction of the effort (in development time and run time) of decoding more general protocol formats.

Vote

P

pozz 9 years ago

I knew all your arguments. As I wrote, I used in the past exactly this trick. However I don't like it. In certain cases, you have to change the order of the fields in a struct (an order that appears logical), only because you have to avoid padding bytes.

Moreover, if you need to encode some complex structs, understanding if the compiler will introduce padding in-between is not trivial.

send(&struct1, sizeof(struct1)); send(&struct2, sizeof(struct2));

sizeof(struct1) could consider some extra padding bytes at the end of the struct. The receiver should know about it.

One time I had to communicate with a Visual Basic application. In that case, managin padding bytes was a mess.

Vote

P

pozz 9 years ago

Il 20/10/2016 12:30, snipped-for-privacy@gmail.com ha scritto: > [...]

Yes, of course this *is* a "self-made" serialization format. There are many other that are similar, but introduce many other advantages.

For example, consider MessagePack[1] (I don't work for MessagePack).

You can encode many types of data. Your serialization format can be encoded as a MessagePack map: a sequence of key-value pairs (a dictionary in Python).

As in your format, MessagePack encodes the key (your ITEM-ID), the value type (similar to your ITEM-LENGTH) and the value (your ITEM-PAYLOAD). The value type automatically specify the length and the object type (integer, boolean, an array...).

For example, the dictionary { 3: 5 } (only one item, 3 as ID and 5 as value) is encoded as three bytes { 0x81, 0x03, 0x05 }. If the value is much higher, for example { 3: 1000000 }, the encoded stream is 7-bytes long { 0x81, 0x03, 0xCE, 0x00, 0x0F, 0x42, 0x40 }. The decoder automatically understand what is the key (type, length and value) and the value (type, length and value).

If you want, you can use a string for the keys (all keys or some of them). Or you can encode an array instead of a map/dictionary.

[1]

formatting link

Vote

D

David Brown 9 years ago

It really is not hard at all. /No/ compiler, for any sane processor, adds padding or extra alignment requirements beyond the natural size of the fundamental types. You only have to be concerned with padding if you try to mix and match in other ways. And if you have a "uint8_t" field which should logically be followed by another field that happens to be "uint16_t", just add an explicit "uint8_t" padding field. Don't let the compiler add its own padding - use compiler warnings where possible to ensure it, and use static assertions to confirm that everything is correct.

Vote

G

George Neuner 9 years ago

There are plenty of chips (and compilers for them) that are not sane by your definition. The many "word-oriented" chips come to mind ...

George

Vote

D

David Brown 9 years ago

You are thinking of things like the TMS320F dsps (16-bit char) or the SHARC (32-bit char) ? First off, these are not MCU's, and are unlikely (not impossible, but unlikely) to be the kind of chip involved in this sort of communication. You pick your solution based on what is practical for real-life cases - not on what is necessary for the most awkward situations that you perhaps might meet.

Secondly, these chips and their tools also do not add any padding or alignment requirements beyond the natural size of their fundamental types - they are perfectly "sane" in this sense. The difference is that they do not have types uint8_t or int8_t (and perhaps not the 16-bit types if they have 32-bit chars). If your structs have 8-bit fields, then these won't compile directly. But it is not a big problem - after all, since you have explicitly added any padding needed to keep alignment for any bigger fields, you can always group your 8-bit fields in pairs (or make groups of 4 bytes if you have 32-bit chars). The most you might have to do is add a few extra explicit padding bytes at the end of the struct.

So with 16-bit char and 32-bit char architectures dealt with, are there any other problem or "non-sane" devices that come to mind? I know there are a few 24-bit architectures (eTPU, and some audio DSP's) - such devices are so specialised that you would make a solution specifically for those chips if you need them.

Over the years, I have worked with quite a range of microcontrollers - but there are vast numbers out there that I have never heard of, never mind used. So if you have examples of awkward (or "insane"!) devices, I would like to hear of them - even if I don't use them it is interesting to think about how the challenges they would pose.

Vote

P

pozz 9 years ago

Oh yes, it isn't hard, but could be error-prone and isn't versatile. I stopped using this method when I had the need to replace one MCU with a PC running an application written in a high-level different language (Visual Basic, Python, ...) At first they were two small MCUs so communicating "raw structs" was sufficient. With high-level languages isn't so straightforward. You have to explicitly consider padding. When the structs are long, are nested or there are arrays, it's not so simple.

Even staying in the MCU world, you need to write something similar:

void send_frame(void) { struct { uint8_t id; uint8_t padding1; uint16_t salary; char name[9]; uint8_t padding2; } frame_data[2]; static_assert(sizeof(frame_data[0]) == 14);

frame_data[0].id = get_id(1); frame_data[0].salary = get_salary(1); strcpy(frame_data.[0]name, get_name(1)); frame_data[1].id = get_id(2); frame_data[1].salary = get_salary(2); strcpy(frame_data.[1]name, get_name(2));

uart_send((uint8_t *)frame_data, sizeof(frame_data)); }

If you use a serialization format, the code is not so different, but you earn some points in versatility:

void send_frame(void) { uint8_t frame_data[32]; size_t i = 0;

i += serialize_u8(&frame_data[i], get_id(1)); i += serialize_u16(&frame_data[i], get_salary(1)); i + = serialize_str(&frame_data[i], get_name(1)); i += serialize_u8(&frame_data[i], get_id(2)); i += serialize_u16(&frame_data[i], get_salary(2)); i + = serialize_str(&frame_data[i], get_name(2));

uart_send(frame_data, i); }

Vote

G

George Neuner 9 years ago

You're wrong if you think DSPs don't get used as MCUs.

DSPs (relatively) are expensive, so a system that really needs a DSP in the first place will tend to use it as the main processor rather than as a peripheral to something else. There are OS kernels and communication stacks available for many DSP families that encourage such extended use.

Yes, but ...

Many floating point DSPs don't have IEEE-754 compatible types. Many trade range for precision in their basic "single-precision" type, and some also have extended precision types with odd lengths. Binary transfers to/from other systems require [sometimes non-trivial] data conversion.

There also are many DSPs that support odd length integer types that require care when/if transferring between systems.

There are simple workarounds for most data except non-IEEE floating point types.

George.

Vote

P

pozz 9 years ago

Il 20/10/2016 23:39, David Brown ha scritto: > On 20/10/16 18:06, pozz wrote: >> Il 20/10/2016 15:45, David Brown ha scritto: >>> On 20/10/16 13:52, pozz wrote: >>>> Il 20/10/2016 09:40, David Brown ha scritto: >>>>> On 20/10/16 00:22, pozz wrote: >>>>>> I often have the need to exchange some data between two or more >>>>>> MCUs. I >>>>>> usually use I2C or UART as physical layers. >>>>>> >>>>>> Normally I design a simple protocol between the MCUs: one framing >>>>>> mechanism (Start Of Frame, End Of Frame), one integrity check >>>>>> mechanism >>>>>> (CRC), and so on. >>>>>> >>>>>> The payload is statically defined between the two MCUs: >>>>>> - first byte is the version >>>>>> - second byte is the voltage monitoring level >>>>>> - third and fourt bytes are some flags >>>>>> - ... and so on >>>>>> >>>>>> As you can understand, both MCUs *must* know and agree about that >>>>>> protocol format. However during the lifetime of the product, I >>>>>> need to >>>>>> add some functionality or fix some bugs and those activites can >>>>>> lead to >>>>>> a review of the protocol format (maybe i need two bytes for the >>>>>> voltage >>>>>> level). Sometime, the two MCUs have a different version with a >>>>>> different >>>>>> protocol format implementation. In order to avoid protocol >>>>>> incompatibility, they all knows about the protocol formats used >>>>>> before, >>>>>> so they can adapt the parsing function to the real current protocol >>>>>> format. >>>>>> As you can understand, it could be a trouble. >>>>>> >>>>>> So I'm thinking to use a "self-descriptive" serializer protocol >>>>>> format, >>>>>> such as Protobuf, Message Pack, BSON and so on. >>>>>> >>>>>> Do you use one serialization format? Which one? >>>>>> >>>>>> Of course, it should be simple to implement (in transmission/encoding >>>>>> and reception/decoding) in a small embedded MCU in C language, >>>>>> without >>>>>> dynamic memory support. >>>>> >>>>> It depends on how flexible you want to be. Self-descriptive or tagged >>>>> formats, like JSON, BSON, etc., are very future-proof - but they are >>>>> also much more effort in development time and run time. >>>>> >>>>> You can come a /long/ way with just a little more than the system you >>>>> have. Keep the same framing mechanism, but make sure you have a field >>>>> for "length of payload". In the payload, you have "type of telegram" >>>>> and "version of telegram format". Then when you need to change the >>>>> formats, you add new data to the old structure. >>>>> >>>>> So format version 1 might be: >>>>> >>>>> typedef struct { >>>>> uint8_t programVersion; >>>>> uint8_t voltageMonitor; >>>>> uint16_t flags; >>>>> } format1payload; >>>>> static_assert(sizeof(format1payload) == 4); >>>>> >>>>> Format version 2, with voltage now in millivolts, will be: >>>>> >>>>> typedef struct { >>>>> uint8_t programVersion; >>>>> uint8_t voltageMonitor; >>>>> uint16_t flags; >>>>> // Start of version 2 >>>>> uint16_t voltageMonitorMillivolts; >>>>> } format2payload; >>>>> static_assert(sizeof(format2payload) == 6); >>>>> >>>>> A transmitter always sends with the latest version it knows, and will >>>>> fill in both the voltageMonitor and voltageMonitorMillivolts >>>>> fields. A >>>>> receiver interprets as much as it can based on the latest version it >>>>> knows and the version it receives - any excess data beyond its >>>>> understanding can safely be ignored. >>>>> >>>>> Your encoder and decoders are now nothing more than casts between >>>>> char* >>>>> pointers and struct pointers. >>>> >>>> So you use cast your struct pointers to char pointers and send it as >>>> is? >>>> I used this very simple technique in the past, but I don't use it >>>> anymore. Because the two MCUs could be different, could use a >>>> different >>>> endianness, could use a different compiler that places padding in >>>> different places, and so on. >>>> >>> >>> It is not a problem if the MCUs are different. It would matter if they >>> had different encodings for signed integers or padding bits in their >>> types, but let's assume you are not communicating with a mainframe from >>> the 60's. >>> >>> Padding is not a problem if you design your structs carefully. Make >>> sure everything is naturally aligned - 16-bit data is 16-bit aligned, >>> 32-bit data is 32-bit aligned, 64-bit data is 64-bit aligned. Use your >>> tools to check this - "-Wpadded" for gcc, and static_asserts to check >>> that the sizes of your structs match what you expect. >>> >>> That just leaves endianness. Most microcontrollers are little-endian, >>> as are PC's, so that is the endianness I normally use. The only >>> exception would be if I were transferring data between two big-endian >>> devices, I would probably use big-endian ordering. >>> >>> So if I have a networked system with different endians on different >>> microcontrollers, then I need to do endian swaps on the structs at one >>> end. Some compilers support this, letting you annotate your structs >>> with the endianness (gcc 6 has this, though I haven't tried the feature >>> yet). Otherwise it must be done manually when receiving or transmitting >>> the struct. But still, it is a fraction of the effort (in development >>> time and run time) of decoding more general protocol formats. >> >> I knew all your arguments. As I wrote, I used in the past exactly this >> trick. However I don't like it. In certain cases, you have to change the >> order of the fields in a struct (an order that appears logical), only >> because you have to avoid padding bytes. >> >> Moreover, if you need to encode some complex structs, understanding if >> the compiler will introduce padding in-between is not trivial. >> >> send(&struct1, sizeof(struct1)); >> send(&struct2, sizeof(struct2)); >> >> sizeof(struct1) could consider some extra padding bytes at the end of >> the struct. The receiver should know about it. >> >> One time I had to communicate with a Visual Basic application. In that >> case, managin padding bytes was a mess. >> > > It really is not hard at all. /No/ compiler, for any sane processor, > adds padding or extra alignment requirements beyond the natural size of > the fundamental types. You only have to be concerned with padding if > you try to mix and match in other ways. And if you have a "uint8_t" > field which should logically be followed by another field that happens > to be "uint16_t", just add an explicit "uint8_t" padding field. Don't > let the compiler add its own padding - use compiler warnings where > possible to ensure it, and use static assertions to confirm that > everything is correct.

There is another issue that can happen when you use "casting" approach.

Over the wire they are all bytes, but you know a block of bytes are a C struct. When they are bytes, you can use memcpy() and similar functions, but they don't guarantee your struct remains aligned.

In this case, the cast could fail and this may depend on the processor.

I had an experience of this kind of problem when I ported some code from one MCU where not-aligned access was possible (with additional clock ticks) to another MCU that didn't let the not-aligned access. The code that worked on the first MCU, didn't work on the second. I used cast approach and this was the reason of failure.

Vote

D

David Brown 9 years ago

DSP's get used for some MCU uses, but they are a relatively minor player. In the solid majority of cases of intercommunication between two devices on a board or two boards in a system, they will be devices with 8-bit chars. And as noted below, it is quite possible to use the same technique for 16-bit and 32-bit char architectures.

(Also note that in high-end DSPs, there is a trend of including a "normal" core such as an M3/M4 along side the DSP core, so that you can let the DSP concentrate on the stuff it is good at, and let the MCU do the stuff the DSP core is bad at. I haven't used such devices myself, merely heard this from distributors.)

Clearly if you are going to use odd, non-standard floating point formats then you can only use binary protocols to communicate between devices that also support these weird formats. If you want to communicate with something else, you have to convert to standard formats and/or ASCII formats.

There are such devices, yes. The question is, are they common enough to be relevant?

The challenge is not to find a communication format that will work for /everything/, programmed in every conceivable language, and running on every conceivable device past, present and future. The challenge is to find a method of communicating that is easy to develop, efficient at run-time, has efficient bandwidth usage, is flexible and expandable, and works on the solid majority of realistic systems. It's okay to say you need something different if you are working with that 12-bit DSP from the dark ages. Optimise for the common case, with an understanding of any limitations that might have - don't worry about devices that are not relevant.

Vote

D

David Brown 9 years ago

Certainly you need to be aware of alignment issues. If you receive your message as a char* pointer into a block of data with unknown alignment, then you cannot cast it to a struct pointer without taking alignment into account. Either you arrange things so that your incoming data goes directly into a properly aligned area (that is often quite easy to achieve), or you will need to memcpy() from your buffer into your struct area.

The compiler should give you are warning on such casts if you do them blindly - it is worth listening to such warnings. Rather than a simple cast, it is often useful to use a union:

union { uint64_t dummyForAlignment; struct { uint8_t telegramType; ... } uint8_t rawBuffer[64]; }

Vote

K

kalvin.news 9 years ago

torstai 20. lokakuuta 2016 10.40.44 UTC+3 David Brown kirjoitti:

Typically a uint8_t is just unsigned char, but the the char may be more tha n one octet ie. 8 bits. So, the static_assert(sizeof(format1payload) == 4) will be valid but depending of the target architecture the structure ma y be more than 4 octets. When you pass the payload structure to the transmi t function, it will send 4 or more octets depending of how many octets the structure contains. I wouldn't call this method a robust and portable at al l.

A better way would be to create a transmit buffer, and add the structure fi leds one at a time into the buffer. There should be different functions for different data types (char, uint8, uint16, int, long int etc.) which will take care of the proper size matching. When the all items of the structure is added into the buffer, the transmitter will send the buffer. I know, thi s is not for lazy people but it is portable and more robust way of doing th ings. When you port the application to a new platform, you just need to twe ak those which will take care of the actual data size matching (char, uint8 , uint16, int, long int etc.) I know, this method requires more initial wor k, but it is the way to do it in a portable manner.

Br, Kalvin

Vote

Do you use serialization formats for communication?

Join the Discussion

Didn't find your answer?