Auto-update protocol

- D
- D Yuniskis
  
  Contact options for registered users
posted
14 years ago

Mon, Mar 8, 2010 7:34 PM

Hi,

I'm looking for ideas for a lightweight (yet robust) autoupdate protocol for a family of network clients I'm designing.

Some criteria:

- Unattended, persistent operation. A device shouldn't require hand-holding from a user to get itself into a working configuration. It is acceptable (though not

*desireable*) to withold its normal functionality from the user while updating *or* recovering from a botched upgrade. But, it must be able to eventually "right itself".

- I don't want to have to *tell* the devices that they need to be updated. The user shouldn't even be aware that this is happening. I.e., the devices should be able to examine/query the "current images" to see if they coincide with *their* images.

- I should be able to tailor an image to a *particular* device (i.e., an *instance* of a device; not just a "model number").

- Images need to be signed so they can't be forged.

- The protocol can't be spoofed by "unfriendlies".

- The protocol should be as light as possible -- but no lighter! ;-)

- Minimize unnecessary network traffic as well as load on the server (the goal is for the user *not* to notice this activity -- though I am not trying to "keep it secret")

(seems like I have forgotten something -- but I can't recall what! :< Too early in the day...)

So, all I should have to do is put "current images" on the user's server and wait for each device to discover the image and update itself accordingly.

For example, the devices could check the server at IPL (and periodically at run time -- though updates at run time can be more of a challenge as they will probably interfere with normal operation :< ) and "fingerprint" the current image located there to see if it looks like it differs from its own image.

One way of doing this is to store the image for each device instance in a R/O file bearing the MAC of the device in question. But, this would require the device to perform a "trial download" of the image for the sole purpose of computing the fingerprint (why not just do a bytewise compare if you are going to this extreme?!).

This will hammer the network pretty hard. Granted, the individual segments for each device will see modest traffic -- but the server's segment will quickly max out if multiple devices do this simultaneously (e.g., powering up together). This would necessitate a second layer of the protocol to randomize/defer such competition. :<

Another approach is to *store* the fingerprint on the server in a uniquely accessible manner (e.g., use a file name like MAC.fingerprint). But, that represents a duplication of data (pet peeve of mine) which makes it at risk for getting out of sync.

For example, updating the image and forgetting to update the fingerprint; or, a run-time race -- the device examines the fingerprint, sees that it differs, starts to download image but the image hasn't been updated yet. Or, the image is updated but the fingerprint is stale when the device examines it. As a result, the image is NOT seen as "new".

[you could get around this if you implemented network file locking -- but that is more complexity and leaves open the possibility of stale locks, etc.]

This could be worked around by forcing the server to recompute the fingerprint each time an image is added. But, that still leaves a race window *and* requires the server to be aware of the introduction of the new image file(s).

The simplest compromise would seem to be having the device track the timestamp of the image file and check *that*. If changed, then *assume* the image actually has changed (of course, touch(1)-ing the image file would then force the device to consider the image file as changed -- this could be an advantage?). The device could then either fingerprint the image itself *or* naively assume it to be a new image and begin the (secure) update procedure.

Security then is another facet to be addressed. E.g., given that the devices *don't* have enough resources to store an entire image before flashing, the protocol would have to be interruptible. E.g., encrypted packets so they can't be spoofed. But, accepting the possibility that the entire image might not become available "when needed" for the reflash. I.e., fall back to a secure boot loader that can do the update without the rest of the application being available.

Note that this loader should *not* restart the upload but, instead, *continue* where it apparently left off, previously -- even in light of any intervening power cycles. This saves time, reduces flash wear and is just "smarter", in general. :>

But, it seems like this *still* requires server side locking so the file isn't changed *while* it is being doled out. E.g., something like TFTP would be inappropriate as it doesn't reliably lock the file from one packet to the next.

Shirley, this sort of thing has been done before (?). Pointers?

Thanks!

--don

- I
- Ignacio G. T.
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Mar 9, 2010 2:11 PM

El 08/03/2010 20:34, D Yuniskis escribió:

[...]

You could embed the fingerprint inside the image, at a reserved space in its beginning.

--
Saludos.
Ignacio G.T.

- B
- Boudewijn Dijkstra
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Mar 9, 2010 2:52 PM

Op Mon, 08 Mar 2010 20:34:53 +0100 schreef D Yuniskis :

That's a bug.

How can the server know the fingerprint before it has been fully updated?

No biggy. The device will check again sometime, right?

Have you considered your own simple protocol? E.g.

client identifies to server and asks for update since $TIMESTAMP

server says yes or no

if yes, start streaming

Only if the image is already accessible while the fingerprint isn't there yet. And why would you do that?

Why would you do that? Each unique image should have a unique name.

--
Gemaakt met Opera's revolutionaire e-mailprogramma:  
http://www.opera.com/mail/
(remove the obvious prefix to reply by mail)

- T
- Tim Watts
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Mar 9, 2010 3:54 PM

D Yuniskis wibbled on Monday 08 March 2010 19:34

I would use TFTP for the actual image transfer - simple, standard and the client initiates the connection so as long as it is happy it is talking to the right IP it doesn't have to worry too much about connection based attacks - code signing should sort the rest of the security out.

Unless you go multicast/broadcast (with the extra complexity that will force onto the clients) I don't think you could do much better. Would you really need to reduce net traffic below one image transmission per device?

Now the interesting bit is advertising the availability of images to the clients. TFTP of course cannot do directory listings, though the client could periodically retrieve a manifest file.

Do the devices have a notion of "type ID" anywhere in their firmware? In which case, you could serve the files up thusly named:

TypeID-Version or MacAddress

eg

23FE-010A 23FE-010B FFEECCBB1177 2401-0200

etc

Any device of type 23FE will sort the list and discover that version 010B is the highest version and will update to that.

Device with MAC address FFEECCBB1177 will find a specific image and will see if it i the one it is currently running and if necessary will update to that.

Images should contain a header also containing TypeID and Version so the device can make sanity checks that it has runnable code.

Simple, controllable.

I'd just go for Type and Version numbers, it's all you need if I understand your problem correctly. Timestamps are not really necessary.

Put a management system in place that people do not manage the TFTP directory directly, but give new firmware to a script that gracefully puts it in place, atomically updating the manifest file (hint rename() under linux is atomic)

I think the hardest part is managing encryption and/or code signing in a way that doesn't overload the client if they are lightweight. The basic approach however of offering a manifest of files and allowing the client to make a simple informed choice is the easy bit IMO.

--
Tim Watts

Managers, politicians and environmentalists: Nature's carbon buffer.

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Mar 9, 2010 6:09 PM

I *think* this is probably the best approach. It pushes all the "special processing" into the activity that "creates an image" instead of having to burden the system that *hosts* the image or the individual responsible for putting the image in place.

I still have to consider how to ensure the file is locked during the update (so it can't be removed or replaced) but that will tend to be driven by the choice of transport protocol.

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Mar 9, 2010 6:49 PM

[snips]

That's "human nature" :> I.e., part of the problem is anticipating what sorts of things are going to "go wrong" and designing the protocol so they have minimal impact.

I was assuming the fingerprint is computed elsewhere (i.e., part of building the image). So, you have xxxxx.image and xxxxx.fingerprint (like xxxxx.md5). (I'm trying not to make the server do much of anything so I can use some generic host for that role).

So, if someone is responsible for copying these two files onto the server (via ftp, etc.) and happens to copy the fingerprint *before* the image, then it is possible that the "device" could examine that fingerprint (coincidentally) and make its download/update decision before the actual image has been copied (uploaded to the server)

Yes. But then the frequency at which the device re-checks the server becomes more significant. Ideally, I would "signal()" the devices when I

*knew* the new image (and fingerprint) was available. But, that's another action that someone (or something) must perform. More resilient if the devices see to their own needs without needing external prodding.

Yes, but that means I have to have something "special" running on the server. I.e., a way of asking for an update "since $TIMESTAMP", a way for the server to respond, etc.

I am trying to do those things without developing any special server-side tools (which would have to be ported to whichever server hosted these images, etc.)

If, instead, I did something like:

- image file name is MAC address of device (so each device knows which image it should be interested in) with ".image" extension

- fingerprint is MAC address of device with ".fingerprint" (e.g., ".md5")

- device fetches fingerprint via TFTP (since it is a tiny file, you can get the entire file in one packet!)

- check against device's *computed* fingerprint

- device decides (on its own!) that it needs the new image

- device fetches image using (some existing) protocol

This fits within the services offered by most machines. It requires the "image builder" (person + software) to create these two files (since the image is built by compiling sources, it seems appropriate to burden that person with the task of creating the fingerprint) and somehow get them onto the server in the right place.

(this ignores security issues, spoofing, etc.)

If the server is "recomput[ing] the fingerprint" from the image, then the image is *there*. You would need to make sure it wasn't yet visible to the "client".

E.g., create a "MAC" subdirectory called MAC.temp. Put image in there. Compute fingerprint and put it in there, too. Then, mv(1) the subdirectory to "MAC" (where it would be visible to the client).

(this still leaves problems to resolve).

I'm just trying to figure out how to get something for nothing! :>

Well, it depends on how you then inform the client of the "right" file to access! I.e., you need something that maps client ID's to image filenames.

And then you have to make sure that is updated and the whole race issue with keeping that synchronous with the image/fingerprint files.

(unless you link(1) a client-specific name to a particular set of files, etc.)

There are lots of ways you *can* do this. I'm just looking for one that is clean and free from as many hazzards as possible. :<

So far, I think putting the fingerprint *in* the image is the best approach. Then, maybe using something like sftp to fetch it.

So, client *always* starts to fetch the image. After it has the first portion of it (seems wisest to put the fingerprint at the head of the image file so you can get to it quickly), it extracts fingerprint and decides if it wants to continue the sftp session -- followed by a flashing -- or abort it.

That way, if the file is deleted/renamed on the server, an instance of it should still be "link(1)-ed" to the sftp session (so it doesn't disappear in the middle of the xfer)

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Mar 9, 2010 8:48 PM

The issue (I think) with TFTP is that it doesn't protect against the image file being updated "between packets". So, you run the risk of pulling down part of one image and part of another, flashing this combination -- only to discover the resulting image is corrupt (fails checksum).

With a connection oriented protocol, the server side can open() the desired file and keep a link to it throughout the transaction (even if the original file is unlink(1)-ed, etc.)

The client can try to *recover* from this mixup. But, it means starting the entire process over again. In a pathologic scenario, it could *never* recover completely (highly unlikely, but possible).

My point was to avoid *needless* traffic. E.g., don't download the image ALL THE TIME if you don't need to do so. (imagine a RAM-based environment which *would* need that sort of support)

Or, it can just request a *specific* file name (e.g., related to it's MAC address -- since both sides of the link would need to know that)

Well, audio clients differ from video clients differ from...

But, I don't want to have to "presonalize" each instance of an "audio client" from each other instance. They configure themselves at runtime.

However, I may want to try a different image on a particular client (e.g., during development or when testing new features). So, I would want to be able to say:

"device having MAX xx:xx:xx:xx:xx:xx please use *this* image"

As such, if you can support this, then you might as well specify the image for *each* device in this same way.

For example, I manage my X Terminal clients with a file hierarchy of:

ModelA/ Version1/ ConfigurationX ConfigurationY Version2/ ConfigurationW Version3/ ConfigurationQ ModelB/ Version2/ ConfigurationP Version47/ ... Devices/ MACxxxxxxxxxxxx -> ../ModelA/Version1/ConfigurationY MACyyyyyyyyyyyy -> ../ModelA/Version3/ConfigurationQ MACzzzzzzzzzzzz -> ../ModelB/Version2/ConfigurationP

but this is very "manual" -- not the sort of thing I want to impose on others.

Already considered. Fingerprinting the image (secure hash). And, if the image crashes, you remember that and don't try it again! :>

I don't want the "image server" to be aware of this stuff. I want to burden the devices with it all. After all, *they* are the things "benefiting" from this... :>

Timestamp is a way around explicit version numbers. Their advantage is that they are viewable "outside" the image itself.

Again, this forces changes on the server. I can build something into the makefile (e.g., "make release") that automates some of this. But, that will be happening on a different host so that adds more things that can go wrong.

And, it doesn't address "production deployments" -- build an image and distribute it to others for *them* to deploy locally.

That's another reason why I want to avoid unnecessary updates. (network traffic, slower boot times, more windows of vulnerability, etc.). The "easy" fix is something that *pushes* updates to each device. But, that also requires the most "support" (on the servers as well as "by the developer") :<

I want a push deployment in a pull implementation! :>

- T
- Tim Watts
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Wed, Mar 10, 2010 9:21 AM

D Yuniskis wibbled on Tuesday 09 March 2010 20:48

I suppose it depends on who controls the TFTP server platform and how much confidence you have in them using management software and not fiddling directly?

That's true - although the original file can still be modified in place whilst the server is reading it - unless you apply the "immutable" bit to the file, assuming linux. You could read the file into RAM when a client requests it, and check the integrity against the signature/checksum before doling it out to the client on the net. That would be pretty bombproof AFAICS.

I think there is some merit to the simplicity of the TFTP protocol even if you implement a stronger server as you may find ready made code for the client side.

If that doesn't suit, then perhaps it it time to write your own protocol. In which case, I would do something based on simple frames with a length, frame type, sequence number and checksum. That gives you the possibility to use different frame types as commands, with one type for shifting blocks of image data (hence the sequence numbers). Replies including ACKs use the same sequence number as the original request which allows marrying up things again.

Do you already have such a protocol in place for other functions of the device?

OK, polling a tiny manifest file would seem to fit the bill? The image is only pulled if the client determines there is actually an upgrade it needs.

True. You could avoid the need to symlink the image files N-times by using the broadcast MAC address as a catch-all. Client checks for a file of its own MAC first, then for the catch-all.

So all your files are of the form

TypeID-MAC

eg

023a-ffffffffffff

So your "typeid" would be "audio client" or a numerical representation thereof? One image to all audio clients - don't see that affecting anything.

My scheme does give both options with a certain conciseness - not to say that that is the only scheme. Your users/customers are going to have to be aware of something or do you remote manage all this as part of the contract? Any scheme that the user sees can always be dressed up with a pretty web page or gui.

OTOH if you need to write your own image server, then why not? My last place had a motto - do the fiddly stuff on the big computer and make the embedded stuff as dumb as possible as the embedded stuff is far harder to program and debug.

But fragile and to some extent meaningless except in as much as they are a monotonically increasing sequence. Also much larger. I would strongly urge using formal version numbers that can be tied back to a branch of a version controlled source tree. I worked somewhere once that did not use version numbers in a consistent way and worse had no source control at all. To say it was a mess is an understatement. I left when the full horror became apparent - more fool me for not looking harder at the interview...

OK - fair point. Perhaps you would be better implementing your own server then.

Yeah - I think pulling a manifest file OR (based on earlier comments here) probing for the correct image and just pulling the header block from that image to determine version number would both be lightweight enough. If you have a 100 devices and they probe once every 2 minutes, that's still no more than one very short transfer per second and no modern network is going to notice that. If it were me, I would back off to checking once per hour or slower and have a magic packet (or other command) that could be used to force a check on a specific client for that edge case when you want to force it *now*. IME the edge case use will be rare but when you need it, you

*really* need it...

Cheers

Tim

--
Tim Watts

Managers, politicians and environmentalists: Nature's carbon buffer.

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Wed, Mar 10, 2010 5:03 PM

I prefer to make things as fool-proof as possible.

*If* there are two different ways of doing something, err on the side of the one that is least likely to get botched.

Is that true? I thought doing so caused the *open()* file to be unlinked -- though the file handle still valid -- and the name bound to the new "file contents" (i.e., elsewhere on the media). So, the original file would be anonymous yet still accessible until the close().

I'll have to dig through the sources to see...

Would require writing a server side tool (service). :<

I'm not worried about the client side as I can control that. I just don't want to have to dick with anything server side if I can find a way to avoid it. I.e., find some way of using existing services.

Blech blech blech! :>

Conceivably, each type of device (actually, each *instance* could, as well) has its own server -- e.g., audio server, video server, etc. Think of them as appliances -- as if half of the appliance was "someplace" and the other half (the part that the user perceives) elsewhere. I.e., two halves of the same abstract box (not "things you hang off a PC").

In my case, these will be handled by one piece of kit. That box won't be designed with "traditional services" in mind (e.g., FTP, NNTP, SMTP, etc.). Why burden it with all that cruft when it is intended to be "a piece of A/V kit"?

[But, I suspect anyone else wanting to build these will try to glob them onto an existing "general purpose" machine (e.g., some sort of generic server).]

Since there is no *need* to put these "maintenance" services on the same box that does The Real Work (audio/video server)

*and* since it is relatively easy to *find* a box that supports these "traditional services" (i.e., you probably already have one accessible locally), why not let *it* serve this function?!

So, you could power that box down *unless* you needed to update firmware (or some other totally independent usage).

Yes, but think of Ignacio's suggestion -- if I grab "the" image file, instead, and examine *its* contents...

I just *really* dislike having duplicates of data. It always causes a problem, sooner or later. If I can find a way to just plop *one* thing in place and let the client deal with sorting out The Right Thing To Do...

Yes, that's how my X Terminals work. They have a variety of default file names that they try (in a fixed sequence). It adds to the IPL time (insignificantly) and a bit more network activity. Of course, you rarely power up several X Terminals at the exact same time. OTOH, the audio clients will often come up in pairs or sets of four (or more) -- at least in a small/home environment. Perhaps more if deployed commercially.

I don't want to have to have any involvement once I release the design and codebase. :> More important/fun things to do than supporting users! ;-) So, I want something that doesn't invite lots of "stupid questions": "How come my device didn't update its firmware?" (wait two minutes and it will) "Why do I have all these messages in my syslog from the audio clients: File not found?" (because you updated the "MANIFEST" and forgot to put the image file on the server), etc.

Come up with a solution that "works" with the minimal amount of user involvement (read: "minimize things user can do wrong!") and then leave it for others to maintain (read: "*break*" :> )

It's more work that I'd care not to do (I want to move on to the *real* projects queued behind these). And, it would mean folks would have to port that to whatever platforms *they* wanted to support.

One thing I have learned about FOSS is you have lots of folks who just want "Free" and are clueless about how to *do* anything ("Why hasn't someone ported this service to my XYZ2000?" "How come this service doesn't work with the free NIC that came in my box of Corn Flakes?")

Much easier if I can just say "put this file on your XXXX service" (or, point them to a URL that can serve it up remotely -- from "a friend's site", etc.)

Yes. Though the advantage they have is that one could easily touch(1) a file to "force" it to be reconsidered by the client (even if it is the client's "current image")

You can embed version numbers in the image. Note that you also *don't* want to force the client to use the "highest" version number. Hence my scheme of putting *one* image for each device -- "THIS is the image you will use".

End users won't be concerned with that. They'll get an image (from somewhere) and just put it "wherever" they are supposed to put it. If they got the wrong image, there's nothing I can do about that (how can the client know that it should NOT "downgrade" to an earlier version?)

There might be a cleverer (hmm... like "banana" -- easy to spell but hard to know when to *stop*! :> ) way to cut down on polling, etc. by having a client who *does* upgrade announce this fact to others. That way, the first client to discover an upgrade tickles the other clients to go looking for it (even if *they* don't have newer images waiting for them, currently).

So, the update polls could normally be randomized (to minimize the chance of all of them being synchronous). And, if the first guy tells all the others, then the polling interval for any *one* client can be made longer and still give the same effect as if polling more frequently (grrrr... too early in the morning :< am I making this clear?)

This (above) scheme would do that -- automatically. I.e., the user need not "do" anything. Instead, wait for *someone* to see the update and have that *someone* (client) tickle the other clients. So, I don;t have to put any software on any machine to inform them (instead, the software residing in the *clients* -- which is under my control -- does this for me!)

I could do that indirectly by just cycling power to *one* client. At IPL, he probes for an update. Finds one. Informs the others on the network!

Wow, that sounds cool! ;-)

Thanks!

- T
- Tim Watts
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Wed, Mar 10, 2010 5:56 PM

D Yuniskis wibbled on Wednesday 10 March 2010 17:03

Only if you unlink() or rename() the file does the data remain unchanged to any open file descriptors.

If you were to open() with O_WRONLY or O_RDWR then you stand to modify the file in place with unpredictable results unless coordination mechanisms are in place.

The usual semantic that avoids this issue and does what you say is:

open (newtmpfile...) write (newtmpfile) close(newtmpfile) rename(newtmpfile,oldfile)

Any open() on oldfile prior to the rename() will continue to access the original data. Any open() after the rename() will see the complete new data.

There is a caveat that the newtmpfile must be on the same mounted filesystem as the oldfile.

I don't think you can entirely have your cake and eat it ;->

If the client doesn't want broken images downloaded, but cannot cache a whole image prior to checking and flashing, you *must* control the delivery process.

If you are at the mercy of customer server hardware, I suspect the only sane way is to provide a daemon of your own for them to install. Note however, this could be Apache with your custom config and service tree of Perl/PHP/whatever. Whether you write a server from scratch or wrap the protocol in HTTP and bundle an Apache blob, at least you give the user something to install which you do control.

I don't think there is any other way around it.

Personally I'd probably write a perl server based off FTP or TFTP using the many available CPAN modules to do a forking server (or a multiplexed one) and also to implement the protocol. There's be very little actual code to write so it shouldn't take long. But of course that's just me.

I thought the same initially - but it was a simple scheme which did work very well on small devices and once a suitable server side lib had been written, it was very reliable and straightforward - and extensible. In retrospect, using something "fatter" would have been a mistake.

Fine - I understand that now.

It would be but it would aid clarity when looking at the directory contents. I'm a sysadmin - I like an element of self description when looking at a directory of random files :)

If your devices could afford to cache the image and then decide what to do with it, then I would agree that the delivery mechanism wouldn't be so important. However, if I understand what you have written, they can't?

In which case, I think you are expecting to fulfil too many of:

Simple Efficient Reliable

I would contend you can have 2 of those. I would lose simple and write a delivery server, it won't be hard nor time consuming. Sub it out to a perl or C hacker if necessary.

Otherwise if you don't control the delivery mechanism I am sure (by instinct) that you are going to sacrifice either efficiency or reliability.

Having written a fair few mini servers in perl, it is a small cost to make other stuff work.

As you say, you basically want a naming scheme and a server that caches the file to RAM when asked for a download and verifies the file before transmission. That's pretty easy stuff. Also as I say, you could do a PHP script to achieve that on top of Apache if you really want to minimise your work and it will deploy on Windows, *nix and MacOS.

Why would you want to do that, assuming the client is able to verify the file before it commits to flashing - or at least has a backout strategy if it goes pear shaped part way through.

Even if you do, I would still keep the file name descriptive for everyone's sanity.

Does the client have a list of all the other clients - otherwise such an announcement is unlikely to escape the VLAN segment.

Yes I see the reasoning - subject to the VLAN thing.

Yes it would.

--
Tim Watts

Managers, politicians and environmentalists: Nature's carbon buffer.

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Wed, Mar 10, 2010 7:48 PM

Yes.

Of course! :> The trick to engineering is enumerating all the

*right*/good things, and the *wrong*/bad things... then deciding on the best "mix".

Yes. I was hoping I could rely on particular (existing) services to do that "consequentially" for me. E.g., using connection oriented protocols in the hope that they would "lock" files they were actively transfering (vs. connectionless protocols which can opt to forget they were even involved in a transfer "moments ago")

I suspect I am just going to leave others deal with *their* problems as best *they* see fit! :> I can leave myself a note as to how *I* would (in the future) update my devices and "not worry" about whether this works for other people. (I am amazed at how often people fail to read/understand "simple instructions". Particularly when it comes to updating firmware -- e.g., why are there so many "bricked" devices out there? couldn't people *read* how NOT to brick their device??? :< )

Understood. You'll note my X Terminal image symlinks as an example :>

Correct. The TEXT is considerably larger than the RAM available in the devices (which is often the case!). It is going to be a significant challenge to arrange the image so that I can do the update while the device is still "operating" (something I've opted not to mention in this thread as its just extra complexity that doesn't, IMO, impact the update protocol itself)

Reliability is of paramount importance. Can never have corrupted images. Must (also) always be able to *complete* an update (subject to the service being available, of course).

Efficiency -- in terms of how it impacts other things on the network and the server(s) -- is next most important. Since updates are (should be!) infrequent, don't want them to bring the rest of the "system" to its knees when they occur.

Simple is not important. However, I am trying to push the complexity into the *device* instead of the server. :<

You also want to make that server multithreaded *and* smart -- so it doesn't try to cache 5 copies of the same image for five different clients (and not service them "sequentially"). :-/

Sometimes things just "don't seem to work". And, if the client has the final say (it does!) on whether or not to flash itself, if you can't somehow *force* it to do so, then you have to resort to some other trickery (e.g., edit the image to convince the device that it is a "new" image -- even though it isn't).

I've just found that giving devices too much of a say in how they behave (i.e., allowing them to ignore you) often results in situations where you find yourself trying to outsmart the device.

Part of ongoing network discovery. E.g., the devices all need to synchronize their clocks, converse with each other when something goes wrong (i.e., to determine if the reason "I" am not getting data from the audio server is because of something specific to *me* or common to "all of us", etc.), etc.

I.e., it's a distributed system and the devices are intended to be semi-autonomous. But, that doesn't mean they can't elect to take advantage of information that they might have that could be beneficial! Surely they won't get anything of interest from some *printer* sitting on the network :> But, if they find similar peers, they *could* (if designed intelligently to avail themselves of this capability).

Of course, if they are the lone wolf on the wire, they're SOL. :>

(so, maybe they opt to poll for updates at a different frequency than when they *know* there are cooperating peers available?)

Yes. Any protocol has to take into account the fact that there might be switches, routers, etc. "in the way" and how to bridge those -- or, how *not* to!

Progress! :>

- T
- Tim Watts
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Wed, Mar 10, 2010 8:05 PM

D Yuniskis wibbled on Wednesday 10 March 2010 19:48

If it didn't, how could you write any system with shared access to a set of files, eg an RDBMS?

Can I ask for some parameters? How big are these images likely to be in kB with any header overhead?

Just because we've considered the obvious, doesn't mean that there might not be something already existing - such as one particular instance of an FTP server that does this.

Do you have any expectations of your customer's systems - are you contrained to one OS, do you supply the update server - or do they, will they insist on using any random OS they happen to like? Would it be less of a headache to sell them a little embedded update server with an FTP interface that they dump images onto (which of course are auto managed by the box in a correct fashion)? They might like that - little black box to plug n go and you get to own it. Would one of your existing devices be powerful enough to just re- badge for the job, with perhaps additional flash bolted on. Just thinking out loud here...

Always a difficult one. Are your users technical? Most sysadmins would be expected to flash all manner of devices using anything from TFTP to XModem/RS232 and not break the box (otherwise their boss chops their nuts off). If you sold the devices to me, i would not consider running a little TFTP server a problem, provided clear instructions were given on how to load the new images.

But if your users are AV dudes, that might be too much to expect - their expertise lies in different places.

I don't suppose you have enough flash for 2 images with a pivot in the bootloader? I expect that is obvious to you, so I apologise for wibbling eggs to granny ;->

OK

I think randomised polling will deal with that adequately.

Again - step back. How many devices and how big is the image. If the probability leans towards 5 concurrent connections for a few minutes for a single 64kB image, although you might like to be clever, it's not really worth it if the server has 2GB RAM!

The only reliable way unless you know UDP broadcasts are guaranteed to get between devices would be to coordinate through the server - at least as far as each device is given the IPs of one device on each other VLAN or broadcast domain.

If the expectation is that many devices will likely be within the same broadcast domain, and it is not important that they can talk to *all* their friends, broadcasting should be good enough. I don't have enough information.

Your unlikely to have issues with Layer 2 switches IME even over trunked connections to the same VLAN on other switches, but once routing is involved, it's mostly game over for broadcasts.

Call me devil's advocate(!) I'm a newbie with tiny systems but I've run some fairly decent linux installations.

--
Tim Watts

Managers, politicians and environmentalists: Nature's carbon buffer.

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Wed, Mar 10, 2010 9:16 PM

Yes, I was assuming that FTPd et al. would take pains *not* to do this (i.e., to effectively lock the file *in* the filesystem instead of having to mmap it completely *or* risk it being changed midstream). As I said, I'll have to start perusing sources.

For the audio and video clients, small -- probably 200K. But, I'm trying to learn something from this exercise that I can later use on other applications. E.g., if I come up with a scheme that scales nicely, then all-the-better. If it only works for small images, then I have to design bigger applications with a "piece-meal" update policy inherent in the design (i.e., plan on only updating a portion at a time)

Understood. I need to make time to see how the ones I have access to work. That might require "empirical testing" for those whose sources I can't inspect (so, I will need to find a way to throttle an FTP client so I can start a transfer and then have time to "play" on the FTP host while the transfer is still active)

In the short term (e.g., all the home automation stuff) I really don't *care* about the "customer" :> It will all be open source so, hopefully, *someone* will be willing to pick up the mantle and carry it forward (*I* Shirley won't! :> )

This could be a win for the "bigger" projects down the road. Which would suggest it might be smarter for me just to bite the bullet and come up with a *good* protocol, now. Though that would force "someone" to port it to OS folks opted to use for this "short term" project. :-/

Yes, this is worth looking into. I could even build a small "system" (disk, etc.) to act as an "on-site" agent for this express purpose.

In a quality shop, that's not a problem. But, you can't always be sure that you'll be dealing with one.

In my case, worst possible outcome is a dead device (whether that is permanent or temporary) as it can "cost" the user. Which means it will cost *me*! :< And, even if the device runs, if it isn't running the firmware that the user *thinks* it SHOULD be running ("I updated the firmware and its not working the way the release notes claim it should!" -- "No, you *think* you updated the firmware but it actually hasn't happened, yet"). Service calls cost money/time. :<

I think the biggest headache will be the opensource stuff as folks there tend to be "cheap" -- wanting something for nothing -- in terms of the time and effort they are willing to invest. I can (will?) simply chose to ignore the whining -- "You must be THIS TALL to ride this ride" -- or come up with something that minimizes the chance of folks having problems.

(while it's my nature to *try* to be patient and helpful, as I get older, I find myself having less time available for "hand-holding" :< Or, maybe I'm just getting cranky and irritable!! ;-) )

Not sure where your (Linux, etc.) lies. But, look at the chatter frequently accompanying things like MythTV. How many of those questions are unnecessary? Why bother writing release notes, How-To's, etc. if people are *still* going to whine: "Mine doesn't work..." -- "OK, go *buy* a DVR!!"

No. What I plan on doing is "partitioning" the application so that the minimum amount of it *required* to fulfill its function can remain active. E.g., sort of like exec()-ing a pared down version of itself. Then, while that continues to provide the service intended (e.g., play music), update the rest of the application. Finally, once that portion has been reflashed, update the "active player" portion (while executing out of RAM or the *updated* "pared down" version of this code). It will require some juggling to ensure I can reuse data structures from one version to the next "on the fly" but I can do that if I design with that in mind.

Yes. Though the recent idea of having one client tell the others about an available update adds a complication (i.e., the recipients will have to deliberately hold off on trying to update)

Sure. For an audio/video/home automation client, no big deal. But, imagine updating 5 "Linux servers" (e.g., new file system images) concurrently. :> I.e., I have to think about how the protocol/process will scale.

I am thinking more in terms of using "dedicated" TCP connections between devices. The server can inform each device -- as it comes on-line (initial connection to server) -- of its peers. Since we're not talking about thousands of peers, it's not a huge demand to place on a device (i.e., remember a list of a couple dozen peers). The device could also update its list "as needed" from the audio/video server (but, must take some pains to keep *a* list locally lest the server crash or lose connectivity)

Yes. Here it's a problem -- but that's just because of my current network topology (speaking in terms of the A/V/HA devices). I can fix that. Or, just use the directed TCP connections.

I think you have to actively question *all* of your design decisions to prove they are robust -- or, at least, be able to identify how they *will*/can break!

It's like doing a backup/restore cycle "needlessly" -- just to be sure it *does* work! :>

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Wed, Mar 10, 2010 10:56 PM

OK, my philosophy has been to push the complexity and "peculiars" (as well as "particulars" :> ) of the update protocol into the clients instead of the "update server". The thought being that I can control the clients (in the build process) much easier than having to worry about some "hosted software"/service.

But, in addition to the code *running* in the clients (at "update time"), I can also control the *format* of the "update file"!

E.g., Ignacio suggested putting the fingerprint in the image file. That need not be done as part of the compile (e.g., char fingerprint[] = ....) but as a secondary process -- yet governed *by* the build (makefile).

So, I could have "make release" do something like

cat `date` file.bin > imagefile

or

fingerprint file.bin > update.fingerprint cat file.bin update.fingerprint > update.image

So, why not *impose* a structure on the "update.image" file that lends itself to use with an existing *simple* service?

E.g., if I do something like

split -b 500 update.image foo. for f in foo.* ; do cat update.fingerprint foo.$f; done cat foo.* > update.release mv update.release /tftpboot

(here assuming update.fingerprint is exactly 12 bytes -- or some other unique/sequential value that differs with each "release")

Then, the client can use TFTP to pull down the image. By examining the first 12 bytes of each packet, he can decide if that packet is part of the "original" packet... or, if the original file has been overwritten on the update server *while* the transfer was taking place.

Right?

I.e., the user only has to have the TFTP service available. No other files to maintain.

This gets around the file locking issue. Still leaves the security/spoofing issue. I need to dig up Schneier... maybe I can sign each packet (?) (thought that poses a problem for the open source version :> :> )

- M
- Mike Kaufmann
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Thu, Mar 11, 2010 12:35 AM

since I will have to implement an update function in an embedded device I'm working on, I have, in parts, similar problems then you. So I thought abut a similar solution a while ago.

Reading this thread made me think about an enhancement to this. The Idea is to prepare the image as follows:

- compare the binary to the old one and remember the places which differ

- split the binary into parts with the size of one flash page

- put some meta information at the beginning of the image, at least the versions of the old and the new binary

- for each part of the binary which has changed put in - the address of the part - fingerprint of the whole image - the binary part itself - a check sum for all of this

- at the end of the image a check sum of the whole binary

The device would then check the meta information and if it doesn't have the mentioned old version it doesn't do the update, but instead needs an image with all binary part (old version number could be 0 there). After that it downloads one binary part with all additional information, checks the check sum and the fingerprint and if that's ok, it writes the binary into the flash. If not, it asks the server to resend the last part. At last it will check the whole image, just in case.

This way it needs only to download the changed parts, which would decrease traffic, and since it checks the data before flashing it is unlikely that it ends up with a corrupted image.

In my case the update will be done with a special application (not using TCP or UDP), started by a technician (who knows what to do and what not). In your case you should check if a TFTP server can do the resending and if the parts are transferred in order (AFAIR TCP doesn't do that allways, I don't know that for UDP).

It might be even possible to first update some part at the end of the flash, then switching the basic application to that part and then update the remaining parts (the switching might need some additional information in the image).

I think HMAC is the right term to look for. It should be an encrypted cryptographical hash. Just replace the check sums by that. But those function require a lot of calculating power. And you will have all the trouble with key management and key exchange. Pretty hard stuff, don't want that on my embedded devices and fortunately don't need it.

Before deciding on something you should IMHO think about the following:

- what part of the setup can be considered to be secure (device, networks, other computers)

- can you store a key in your device in a way, that it can't be read

- you might also have to think about someone messing around with the image on the server in order to disable the update (the device might then end up asking the server the resend the manipulated part for ever)

- the server should also check a new image before enabling it for an update

- is it a problem for you, if the binary itself is revealed The list is very likely not complete.

You might want to place a question on sci.crypt, there are the experts for that stuff. And don't forget to mention the limited amount of calculating power of your device ;-)

By, Mike

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Thu, Mar 11, 2010 2:24 AM

Great!

OK, so just a diff(1) of new and old...

So, you're counting on the differences to be few? I.e., else the overhead would quickly exceed the cost of the image itself (though since you are doing all this prep work "off line", you could decide *then* whether to produce an incremental update or just force a total update (i.e., if the changes are too numerous to be economically represented with differences)

Well, I plan on doing *something* like that -- just not in terms of "pages".

I don't have enough RAM (or flash) to keep two copies "live" of everything. But, i can decompose the application into independent modules with well defined -- and maintained! -- interfaces.

So, I go through and examine all of the "pieces" (modules) of the update. Any that need to be replaced/updated, can be updated *while* the "old counterpart" continues operating. Once the new stuff is verified, I can flash it and, on a successful burn, swap *out* the "old counterpart" (making that part of the flash available for use) and move on to the next part.

The problem with this approach is that I can end up with a hybrid device -- no longer Version N yet not quite version N+1. :< In theory, it would continue to operate since all of the modules would still be consistent at their interface level.

This is the only way I can see to do the updates for really

*big* images (e.g. GB's).

Understood.

TFTP is a simplex protocol: send me block X; ok, here is block X; send me block Y; OK here is block Y... Very easy to implement and minimal resource requirements (since this must be able to run *while* code is being "replaced"). The problem (?) with TFTP is that it is UDP based so no guarantee of delivery.

Yes. See above.

There are lots of "secure hashes".

The bigger problem (which I am hoping my "bite-sized" approach avoids) is if someone interferes with the update (by messing with the image or the update protocol) it effectively results in a DoS attack -- unless I am careful about just what order things get updated in.

Since deciding this a priori -- for once and for all -- would be difficult, I am thinking of having the first part of the update process (i.e., the first "module" in the image) fetch a "script" which, on verifying its integrity, the client then executes. That can then dictate exactly how the rest of the update is to proceed.

So, I can freely change the order that I update modules, add any "cruft" that might be needed to workaround problems that crop up in future releases, etc. That script is just the "new update module" (code) replacing the *old* "update module" -- so it is consistent with the rest of the design (it's sole distinction is that it is "first")

Yes.

This isn't a problem, in general. It *will* be a problem for the open source version -- i.e., the "default key" will be visible to anyone with a text editor and a copy of the sources! :> And, while it will obviously be part of the instructions that users "replace the default key with a custom key of their own", how much you wanna bet some large number of them end up with "default_key"? (how many wireless routers have SSID = "linksys", etc.)

Exactly. The DoS scenario I mentioned. If it causes the device(s) to end up being "half revision N and half revision N+1", that's not a problem. But, if it causes the devices to be "inoperable, awaiting update", then its a crappy design!

I have been trying to avoid having the server do *anything*. I.e., push all of these actions into the "compiler + makefile" so the image *is* legitimate before it gets to the server (then, all you have to worry about is someone tampering with the image -- which you will detect when you try to download it)

The binary can always be encrypted. That is one of the options I am exploring to work around the security issue (i.e., if you can't generate packets that I can decode correctly, then "you" are obviously not the trusted entity!)

Schneier is an excellent reference for this sort of stuff.

I think my approach of treating the application as a bunch of "replaceable at run-time" modules will get me around the resource requirements of the crypto code. If the update takes a long time due to the hash computations, etc. who cares as long as the "old version" is still running?

Fun problem! ;-)

Thanks!

--don

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Thu, Mar 11, 2010 5:06 AM

Hi,

I've seen at least some of the ongoing discussion, but it might help to have a little more information (see below).

How large are the client images? How many devices? How much bandwidth?

What is the overall structure? Does the new image reset the device or is it a hot patch that needs to continue using existing data structures?

Is it feasible to have the client load a new image concurrently with the old and switch at some well defined point(s)?

There's only one real solution to spoofing and/or "man in the middle" attacks and that is to encrypt the image binaries. You want it to be as hard as possible for someone to decompile your application and figure out how to spoof it.

Building on the idea of embedding the signature in a header, you could also embed a decryption key for the image. The header itself should be encrypted with a public key algorithm so that the client can verify that _you_ created the image. (Obviously the whole image could be public key encrypted, but it isn't necessary ... symmetric key algorithms are more performant. Successfully completing the process of decrypting the header, extracting the image decryption key, decrypting and verifying the image signature proves that the whole thing is legitimate.)

And don't store the decrypted image on the device ... design the update process so that the local image can be decrypted into RAM and reuse it to do the startup.

This feeds back to the size of the images. TFTP is about as simple as you can get using a generic farm server, but TFTP does not have any way to request a partial transfer (e.g., a "head" command). Of course, for version checking, you can hack your client to start the transfer and then abort after fetching the predetermined image header.

How often do you need to check? How often do you anticipate a planned update (as opposed to an emergency bug fix)?

Use a "well known" public name and hide the file updates behind hard links.

Changing a hard link is an atomic process so there will be no race conditions with the update - a process trying to open the file through the link will either get some version of the file or fail if the open call sneaks between unlinking the old file and linking the new.

So updating the server then is a 3 step (scriptable) process. You upload the new version alongside the old, change the well known hard link to point to the new version, then delete the old version.

Of course, if the old version happens to be in use it will persist until all handles to it are closed.

Both Unix/Linux and Windows (NTFS) support hard links, so you have a wide choice of server platforms.

Don't call me "Shirley".

George

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Thu, Mar 11, 2010 6:56 AM

They will vary. The audio client, e.g., is about 200K. There are a couple dozen of them (here... I doubt anyone else would use that many :> ). The video client is a bit larger but only 4 or 5 of those. The other "Home Automation" clients are much smaller -- maybe 50-100KB (depending on how I implement the file system) -- but there are several "flavors".

But, this is all just a "make one to throw away" exercise. The projects that I am "practicing for" have much larger images (1 - 4G) -- though generally have more resources to call on at run time, so...

I'm currently playing on a 100Mb wired network. But, I am looking for a solution that will scale up to Gb and down to "wireless" rates.

Everything tries to run hot if at all possible. E.g., if conditions coincidentally arise that would cause some piece of code to rely on resources that aren't available during the update, then things might not quite work as they would if the update process was not active (but this would be unlikely).

Not the whole image. I am structuring the application so I can swap out pieces of it and update it incrementally. E.g., update the RTOS, then switch it in; update parts of the library, then switch them in; etc.

The sources for the application will be available. So, shared secret/public key is the only way to do the encryption. The problem there is it forces all devices to have the same keys (or, I have to build special images for each device).

The more realistic problem is to guard against something messing with the image or process and effectively leaving the device(s) in "half updated" states. So, it is even more important that updates be supported "in pieces" (otherwise the window of opportunity for screwing things up is just too big)

Yes, but it would have to be done for each "piece".

Huh? I *think* you are assuming there is enough RAM to support the decrypted image *and* my run-time data requirements? If so, that's not the case -- far more code than data.

I'm not too worried about people reverse engineering devices. Rather, I am more concerned with someone having access to the medium and screwing with (spoofing or otherwise) the update "undetected" (consider the wireless case as well as how easy it is to have a hostile client sitting "somewhere" on a network...)

I was figuring I could just have the client request whatever blocks it wants in whatever order, etc. E.g., TFTP doesn't

*force* me to request blocks in sequential order...

For this "throwaway" application, I doubt I will update the code more than once or twice (beyond testing). I don't plan on bugs :> and think I have put the features I want/need into the initial release.

What I am more concerned with are the apps that follow. Those are so big that updates can be much more frequent.

The more pressing issue for those apps is turnaround time. I.e., the update would want to start as soon as it was available (on the server). So, checking "once a day" would be inappropriate. Once minute might be nice as the expected delay to update start would only he half that (recall that those devices would need much longer to perform their entire update).

The idea (mentioned elsewhere in this thread) of having devices collaborate to:

- reduce the polling rate "per device" to keep the overall polling rate (seen by the update server) constant

- share discovery of valid updates with other devices

Requires someone do something "special" on the server. When that someone (IT) screws up, The Boss doesn't see it as *his* problem. The IT guy *claims* he did everything right and there must be something wrong with the update software or the devices. So, *vendor* gets a frantic call from an angry customer complaining that the system has been "down" for "over an hour now"...

It has been suggested that a fix for this might be to deploy a vendor supplied "update server" with the installation just so this gets done "right" (regardless of whatever OS's the customer has on site)

Roger!

- T
- Tim Watts
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Thu, Mar 11, 2010 8:40 AM

D Yuniskis wibbled on Wednesday 10 March 2010 22:56

Yes - I think you have a good idea there. Why not go one further and split the firmware into little frames with a header that contains frame checksum and version/timestamp and a sequence number and total number of frames. Only a few extra bytes.

Then whatever happens the client will be assured that it is receiving frames with integrity, related to the same set of frames and that it has all off them. It doesn't matter too much if the frames align to TFTP transmission segments, as long as they are smaller.

I wouldn't get too hung up on security - these aren't updating across the public internet are they? If the user wants to break them by deliberate action, then that's their fault???

--
Tim Watts

Managers, politicians and environmentalists: Nature's carbon buffer.

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Thu, Mar 11, 2010 3:04 PM

The protocol should (?) guarantee that I get *all* the frames, right? I.e., get a packet with < 512 bytes and I know I'm at EOF.

Using frames that aren't aligned thusly leaves me open to a frame straddling a packet boundary which, presumably, could invite the file to change between packets (?)

For the A/V/HA clients *here*, it isn't a problem. Nothing can get "onto the wire" without me knowing -- every "no wired" interface is routed through bastion host. The house isn't big enough that someone could be "hiding in a closet" with a clandestine network drop :>

Anyone else who happens to build them...

But, for the products that follow, I think it important. Without going into them, consider the other types of devices I have designed over the years: medical instruments, pharmaceutical systems, gam[bl]ing devices, process control systems, etc. I.e., all things where safety/regulation are an issue *or* where service interruptions can result in huge "losses"/costs per hour of downtime. At the very least, putting the security in place is a sign of due diligence.

It's just CPU cycles. :>