Watermarking

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Sat, Apr 17, 2010 5:37 AM

That's true, but it avoids the problem of trying to defeat the compiler.

The problem is: both could be compiled into the very same instruction. You're trying to guess what the compiler will do and defeat it at the source level. That's highly tool chain dependent ... the next compiler may break everything.

And any way that statically leads to 42 can be optimized away by the compiler. Do you seriously think that if you write

syscall( 40 + 2 ) or syscall( (~0xD6) + 1 )

that the addition(s) or complement will be done at runtime?

Even in debug mode few compilers are that dumb. Any integer operation involving constant operands can be done at compile time and the constant result substituted back into the code. The same is true for some floating point operations (but not all).

Yup. You're definitely tired. There may be things that can be done at the source level, but I can't think of any offhand and the ones you've presented are brittle and tool dependent at best.

George

- W
- whygee
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Sat, Apr 17, 2010 10:38 AM

no, I intended to put the 2 constants in "volatile int"s.

volatile int const42_1=40, const42_2=2; ... syscall (const42_1 + const42_2); ...

I know it's quite harsh but given a sufficiently large program, only a fraction of the constants would be converted so the speed is not affected by the cache misses.

sure, some compilers will want to be smart. yet they have to adhere to what the programmmer wants. Even at -03, GCC has to respect a "volatile" attribute.

Now, I'm probably reinventing an old wheel, virus/trojan writers are wayyy beyond such methods.

regards,

yg

--
http://ygdes.com / http://yasep.org

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Sat, Apr 17, 2010 7:16 PM

OK. So, if someone copied one of your devices, you would have no way of knowing which particular device was the source of the copy (?)

No. Copying doesn't require an understanding of the product, its internals, etc. Folks who "specialize" in copying need not have any knowledge of the industry that the device is serving, "why" it does what it does, etc. All they need to do is reproduce what has been given to them. They can "copy" a medical device today and a consumer video game tomorrow. They aren't trying to enhance the item they are copying. Nor are they trying to disguise their actions.

E.g., someone who counterfeits paintings need not have any appreciation for "art".

No idea. In the 80's, you could get a "generic" design copied for ~$50K. That was a *lot* less than it cost to create the design in the first place! But, all you got was a *literal* copy of the design (photoplots for the boards, bills of materials, ROM images, etc.). Even full custom LSI wasn't "off limits" -- getting photomicrographs of the dies, etc.

Of course, technology has moved forward. Lithography is now finer. You can't photograph "flash contents", etc. But, *tools* have also progressed. And, there are now new labor markets that can be tapped for these activities.

Look at how "secure" the various "security technologies" have proven to be. Look at the tools available to students in labs nowadays (how long did the XBox's "security" stand up to "not for profit" attack?).

Relying on schemes to *prevent* copying is a losing battle. Instead, it seems vendors turn to litigation to discourage ex post facto this copying.

So, the trick is, to come up with a way that makes it hard for the thief to claim he as an "original creation". And, find ways to put pressure on folks who might "leak" a design to a competitor/counterfeiter ahead of its formal market introduction.

I suspect Apple has every preproduction iWhatever *seriously* accounted for!

- W
- whygee
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Sat, Apr 17, 2010 10:11 PM

there was a link to an article about this on slashdot, a few months ago. Apple has so much to lose and win that they almost piss off their manufacturers. Last minute changes, constant redefinitions, aborted runs for parts or assemblies... I don't remember but the descriptions were (industrially speaking) tough.

yg

--
http://ygdes.com / http://yasep.org

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Sun, Apr 18, 2010 1:36 AM

That is a *very* different thing than watermarking ... watermarking is simply a scheme to identify the source of an item that may be counterfeited.

Making it variable is possible but creates a problem identifying the watermark. Again, I think you're after something different.

It isn't trivial ... if you start with a well distributed customer id (say a crypto hash of the customer info) which is 160..256 bits long and only patch locations corresponding to '1' bits in the hash, a counterfeiter would need many samples of patched executables to unravel the scheme. Even having the unaltered binary would not help much. And the scheme can be made more complicated by using bits of the customer id in groups and skipping over potential patch points based on them.

I think that is a non-starter ... unless you use inline assembler you'll be working against the compiler. See my post to whygee.

George

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Sun, Apr 18, 2010 5:38 AM

OK. The next question is, how do you "read" such a watermark to discover the source of a stolen executable? AFAICS, with your method a copy would need to kept for comparison. With the system I am thinking about, the customer id can be recovered directly from the suspect executable.

YMMV. I see it as a horrible maintenance problem generating different code for each client - it's bad enough dealing with platform differences. Patching a function watermark into the release executable after the fact also bothers me, but less than having lots of, otherwise unnecessary, conditional code in the source.

George

- O
- Oliver Betz
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Sun, Apr 18, 2010 9:12 AM

Hello Don,

[...]

no, I don't care. Nobody did so till now and there is also some other specific knowledge he needed to be successful.

"Watermarking" would be much more effort for us than advantage since we sell to many customers and also via dealers.

[...]

Not a valid comparison. Let's talk about the current flash devices I cited. If you tell me what it costs to get a Flash ROM image from one of these, we can continue the effort / benefit discussion.

[...]

Also not a valid comparison, tha attack is not applicable to flash based devices without external memory.

As I wrote earlier, I just want to make copying so expensive that it's no more interesting compared to "develop from scratch".

Oliver

--
Oliver Betz, Muenchen (oliverbetz.de)

- O
- Oliver Betz
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Sun, Apr 18, 2010 9:12 AM

[...]

ack, IMO that's the main problem. Such a method is only viable for very small production volumes and expensive devices.

Oliver

--
Oliver Betz, Muenchen (oliverbetz.de)

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Sun, Apr 18, 2010 9:08 PM

I'm not claiming that you *do* care. Rather, I am trying to illustrate the different "problem" you are trying to address. *I* am trying to identify *which* of N "copies" of a device served as the genesis for a counterfeit product. This allows assets to be brought to bear on the "leak" (who probably doesn't have the deep pockets that the actual counterfeiter has and, as such, has much MORE, relatively speaking, to lose by his actions!)

I'm not interested in discussing counterfeiting techniques. There is a wealth of information in "public" channels on this subject -- as well as "clandestine" channels.

The point I was making was that $50K was a paltry sum in the 80's. Less than a manyear of paid time. Sure, technologies have advanced since then. But, so have other technologies for "countering" those. E.g., there are firms that can produce a "schematic" of a full custom IC "automatically" using various imaging techniques (SEM, optical, XRAY, etc.) and apply "simple" pattern matching on the 3D layers imaged to identify the various "components" implemented therein. Something unheard of in the 80's (had to be done manually).

And, there are now many more "players" in the market -- each looking for an "in". The folks you typically protected against 30 years ago are no longer the ones to worry about, now.

For a bit of perspective, try a read of Huang's _Hacking the Xbox: An Introduction to Reverse Engineering_ and remember that this recounts a "lone gunman" tackling a product made by a company with *very* deep pockets (how does your firm compare to MS's resources?)

A device can be de-encapsulating and microprobed while executing. This sort of technology is no longer out of reach -- many universities have these capabilities. I.e., "individuals" can gain access to tools like this; imagine what *businesses* have available to them.

Of course! The problem, as always, is finding that balance. One *sure* way to do it is to design a product that no one

*wants* to copy! :> I.e., the more successful (in the marketplace) that you are, the more likely you are going to inspire folks to want to copy your design.

I am leary of encryption as, historically, it has always managed to show weaknesses over time. New techniques for breaking codes, unforeseen vulnerabilities, etc. E.g., cracking passwords on PC's, WEP keys, etc. -- in *alarmingly* short times. (I think there is even a firm that will crack your WEP key for less than $100 and 24-48 hours).

"Businesses" can be protected against, to some extent, with fear of litigation, seizing counterfeit imports at ports, etc. (depends on how big *you* are and how big the market).

The trickier defense is protecting against well-motivated individuals as they now have the computing power, access to tools, *communities*, etc. to throw at anything they feel "worth copying". Then, trying to "silence" them from sharing what they have learned...

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Sun, Apr 18, 2010 9:28 PM

Watermarking is of little use if the watermark can be easily identified and removed/altered. Indeed, it would be trivial to just embed "This is copy #027 of the product" in each particular instance.

From

formatting link

(apologies if the URL folds)

"The Software Watermarking problem can be described as follows. Embed a structure W into a program P such that: W can be reliably located and extracted from P even after P has been subjected to code transformations such as translation, optimization and obfuscation; W is stealthy; W has a high data rate; embedding W into P does not adversely affect the performance of P; and W has a mathematical property that allows us to argue that its presence in P is the result of deliberate actions.

(complain to the author re: the academic-speak therein :> )

See above.

But, the counterfeiter can identify these types of transforms from as few as two copies and, changing just *one*, has now succeeded in changing the watermark! Depending on the Hamming distance between "successive (adjacent?)" watermarks, this can be enough to argue (in a court of law) that the device bearing the altered watermark was not, in fact, derived from the device having the unaltered watermark visible in the "defendant's" instance of the product.

I.e. the problem is similar to that of authentication with digital signatures. You don't want a party to be able to forge someone else's signature; NOR BE ABLE TO DISAVOW THEIR OWN!

Revisiting an example I posed previously, consider:

int foo(...) { int A; int B; int C; }

vs.

int foo(...) { int B; int C; int A; }

I.e., the two functions will behave the same regardless of the contents of , correct? (barring code that is self-examining or self modifying)

One could create M4 macros to wrap around each of these declarations such that you could "externally" massage the sources to effectively reorder the declarations. Right? (left as an exercise for the reader)

A compiler *could* rearrange the stack frame to essentially rewrite the first instance of foo to be *identical* to the second (I believe this is allowed under a strictly conforming C compiler).

But, what if the declarations were:

int foo(...) { int array[3]; }

#define A (array[0]) #define B (array[1]) #define C (array[2])

permuted (for a different watermark) to:

#define A (array[2]) #define B (array[0]) #define C (array[1])

I suppose a compiler could notice the invariant nature of the individual references and, IF THE array IS ENTIRELY "LOCAL", rearrange them (though it is hard to see why the compiler would *want* to do so... what is *gained* in such an optimization?)

The resulting binaries would run in roughly identical time. Their results would be identical. Yet, the would be different binaries. "Watermarked" uniquely.

This sort of source level translation would be easy to test (without having to develop a tool that automagically rewrote each function definition "conditionally").

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Sun, Apr 18, 2010 9:30 PM

Yeah -- like PRE-PRODUCTION BETA RELEASES!! ;-)

- O
- Oliver Betz
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Mon, Apr 19, 2010 6:58 AM

Hello Don,

no need to repeat, you already explained that.

[...]

not really applicable, as I wrote.

[...]

You mean "leery"? So you are biased due to lack of knowledge?

In the end, it's a cost/benefit question.

But thie discussion develops similar to other ones you started in the recent past. It seems you already decided what to do and you don't consider other methods impartially.

Oliver

--
Oliver Betz, Munich
despammed.com might be broken, use Reply-To:

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Mon, Apr 19, 2010 12:08 PM

My point was to see what *can* be done "informally" by a "casual hacker" (not a well-funder adversary)

On the contrary, I've used encryption successfully in past projects. I've also used obfuscation. The above statement merely addresses the reality of encryption techniques failing as technology improves. In the 1970's, crypt(1) was stong enough to deter deliberate password attacks. Nowadays, you can break an even more secure password in minutes using hardware you buy at your local department store.

Sure.

Um, I *do* have contractual obligations. You know, "bosses". :>

If I am hired to do X, I can't very well say, "No, I am going to do Y instead."

I clearly stated:

"Anyone with FIRST HAND experience deploying watermarking technologies? Comments re: static vs dynamic techniques?"

But, as is so often the case, instead of answering the question *asked*, folks reply with "Why do you want to do *that*? Why don't you do THIS instead?"

As I've said in the past, "assume I am competent". Do you require *your* employers to justify each decision *they* make when it is handed down to you?

It would be just as easy for me to assume *respondents* were NOT competent as they were unable to answer straightforward questions, right?

I think I have been patient in trying to explain some of the other concerns that may have motivated these "decisions" (*before* they were "handed down" to me) so folks can accept that others *may* have a different outlook on how and why things are done. As I also stated in my OP, I've already had first-hand experiences with watermarking and was acknowledging that technologies FOR WATERMARKING will have changed in the years since. So, I'm not "fishing" for information regarding something I've no experience with.

- N
- Nobody
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Mon, Apr 19, 2010 1:55 PM

This gets around an issue with equivalent code sequences, namely that decompiling then recompiling with optimisation will tend to eliminate any watermarks.

If the watermark affects the code's observable behaviour in any way, then it would have to be preserved by any "equivalent" code. Removing the watermark would require understanding the code to the extent that it could be modified such that the modified behaviour was merely "close enough" rather than identical.

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Apr 20, 2010 7:00 AM

Decompiling large pieces of code is expensive and requires careful analysis. E.g., I rely heavily on the use of pointers in much of my code (I think this is characteristic of folks coming from hardware/ASM backgrounds). Since those pointers can be algorithmically generated, its hard to imagine an automatic tool sorting out the possible targets for a particular pointer (variable).

I've never tried to tackle a "hand decompiled" project bigger than 100KB. And, "machine assisted" beyond 1MB is just too trying for me. Its hard to imagine folks -- even *teams* -- trying to tackle a multimegabyte project like that. :<

When I looked at the code in my Unisite, I found it easier to just plug in an ICE and turn on the trace buffer. :-/ But, getting from *there* to something that you could *modify* was an entirely different scope.

(N.B. Data I/O products tend to be fun to study when interested in protecting designs as they are REALLY paranoid about their IP!)

Ideally, the modified code has no observable differences -- other than the actual memory image (at *run* time).

In practice, this is often hard to achieve as just moving things around in memory can make slight changes to execution timing, etc. as cache lines are purged differently, default states of jumps change, etc. This can work to your advantage -- *or* against you depending on the magnitude of the changes.

- O
- Oliver Betz
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Apr 20, 2010 7:14 AM

Hello Don,

[...]

I didn't expect the watermarking you are looking for has to resist for decades. As far as I understand, the goal is to identify preproduction samples.

[...]

I gave _exactly_ a solution for your request, and I have first hand experience (besides I didn't yet embed any individual IDs, but that's trivial).

Why should they "justify"? But I even dare to tell our customers if they want something different from their needs or if their decision might be made on wrong assumptions.

Oliver

--
Oliver Betz, Munich
despammed.com might be broken, use Reply-To:

- W
- whygee
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Apr 20, 2010 5:00 PM

I've just been thinking about another trick : chaffing and flaring in the padding/alignment portions of the code... Generate stupid and randomized sequences of instructions that call and jump like mad in the "dead" portion of the binary... There is no use if the attacker traces/probes the actual execution, but static analysis will get crazy AND it provides quite some room for pseudo-random watermarks (which is the original intent anyway)

With GCC, the padding/alignment is managed at the ASM level so one has just to modify/patch the assembler. It does not work against the compiler.

regards, yg

--
http://ygdes.com / http://yasep.org

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Apr 20, 2010 8:49 PM

I understand, but I still say what you are looking for is not really "watermarking" ... it sounds more like you're looking for some kind of obfuscation scheme that will rearrange code in ways that are hard to compare.

I think you're being too literal ... for most ISAs there are a lot of things that can be done to an executable that will change the binary but won't change the function. If you identify a bunch of possible substitution points then you mix up what you do.

Remember that you said previously that you aren't trying to defend against disassembly. Just diff'ing executables doesn't tell you what the differences mean but only that they exist.

That actually ties in with something I thought of. I was thinking about whygee's idea of abusing data and it occurred to me that indirect function calls are perfect for abuse in that way.

Imagine something like the following:

char index2[] = { 1, 0, }; char index1[] = { 1, 0, }; (int (*)(int)) F1 = (int (*)(int))(func_addr(index2[index1[0]])); (void (*)(int)) F2 = (void (*)(int))(func_addr(index2[index1[1]]));

void* func_addr( int index ) { void *ptr[] = { func_1, func_2, }; return ptr[index]; }

int main( void ) { (*F2( (*F1)(42) ); return 0; }

The index tables can be rearranged at will - the actual function pointers don't change and are hidden in func_addr(). Using 2 stages is obfuscation - more things changed in a diff of two binaries - with the added benefit that the two tables must be synchronized for the program to work.

The indexes are declared as characters so that differences between "marked" binaries appear to be some kind of ID string - how the values are really used is not obvious unless the program is disassembled. The actual indirection pointers, F1, F2, etc. are not initialized until the program runs - so they should appear identical in a diff of 2 customized binaries.

Obviously this scheme is viable only if you can add enough functions so that the indexing tables are fairly long. And if this is the only "marking", you are obviously limited to the number of combinations made possible by the length of one table (because the 2nd table is linked). And, also obviously, in an RT situation, you may not want to call functions indirectly.

This scheme has the secondary benefit that the index tables could be modified after compilation using a patch tool. I still think it is a bad idea to use conditional compilation.

In any event, I thought this might give you some ideas.

George

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Apr 20, 2010 10:42 PM

Well, I don't care if it's "code", per se. I just want to be able to uniquely identify which "instance" of a "device" was used in producing a (counterfeit) copy. You can rearrange data, rearrange modules (in the code), change representations of data *or* code, etc. You just want to get "nearly identical" performance/operation from "(ideally) WILDLY DISSIMILAR" images.

But, I am acknowledging the fact that a thief would be able to identify and isolate "trivial" changes to devices. E.g., the "This is copy #027 of the image" approach would be easily recognized (compare two images, look for differences). Once identified, it would be easy to modify the "watermark" to:

- something totally bogus (untraceable) as in "This is copy #ABC of the image"

- something that (intentionally or unintentionally) makes it look as if the image was from some *other* image as in "This is copy #029 of the image"

I.e., in the former case, the thief chuckles and you're left wondering where the source came from. And, in the latter case, the source of the original image is not only "lost", but, some other party is implicated as the thief (intentionally or otherwise).

If this were possible, accusing *any* party of being the source of the theft could easily be defended against (in a court of law) by simply demonstrating how easily (the Defense) could alter the watermark to indicate yet *another* party -- "This is copy #030 of the image"

I liken it to the problem presented by digital signatures in that it is difficult to forge which both *protects* the original party's signature ("You can't hold me accountable for something that *I* didn't sign") as well as preventing the original party from *denying* their signature ("That's not *my* signature!" "Oh, really? We've already proven it's not possible to *forge* one...")

Note that you can add explicit *or* surreptitious tests to your code to detect if the image has been tampered with. But, this is extra effort that doesn't contribute to the functionality of the device. It also tends to be easy to spot when this sort of thing is going on in a device -- *especially* if you do something stupid with the information like: "Image corrupted. Halting."

(There are clever ways of doing this sort of thing that make detection harder; but, they still add effort to the design process that has no direct contribution to the device being produced)

Exactly. With two or more "images" (I am trying to be careful to avoid using "copy" as copy, in this case, implies a literal copy!), you can tell that *something* has been done to "make them unique". That could either be two different versions of the software (i.e., the two devices are not intended to perform identically) *or* that they are trying to "be unique" with the same functionality.

If the differences are nontrivial, then a thief is forced to understand what is actually going on "under the hood" in order to be able to disguise the original source of the image *or* masquerade as some *other* original image source. I.e., it makes it harder for the identity of the original source to be obfuscated. It also removes the *accused* source from claiming that "Heck, anybody could have PLANTED my identity in that device!"

For example, you can sort a list from top to bottom in ascending order; *or*, from bottom to top in

*descending* order. The results are the same. And, if there is no bias to the original ordering in the (unsorted) list, the operation will take the same amount of time and consume the same amount of resources.

But, this *looks* like it would force the developer to write two different versions of a "sort()" -- which would burden his development effort just to support this "mechanism".

*However*, you could simply create different versions of standard library routines and use Version 1 or Version 2 in a particular "linked image". E.g., memset() that works from the start to the end vs. the end to the start. Furthermore, you could include both versions of the routine and use one in some places and the other in other places (i.e., exploit things that the developer technically doesn't have any control over to encode information in the binary)

Ouch! Yes, that sort of thing would work! (I've not nitpicked your syntax -- I'll assume it's what I think it to be :> ) The problem (that I see currently) is that index1[] and index2[] (likewise ptr[]) are very visible (though I acknowledge that this is undoubtedly just to make the *idea* more obvious)

I was only showing how I can use macros to experiment with various approaches. I.e., use the macros to let me effectively rewrite the sources without actually rewriting them (lazy? error prone?? :> ). So, I could quickly create N different variants of an "identical" program and then examine the stripped executables to verify that the results were actually "different" and the behavior *identical*.

This save the effort of writing a tool to do the work (only to discover that it might have obvious failings?)

One downside is that (the example) forces a coding style on the developer. I.e., you would have to come up with a way to extract this sort of "information" from his/her "natural" way of doing things.

E.g., in C++, you could probably play games with the definitions of the vtable's without the developer ever needing to know that is happening (or, *will* happen after code "release"). I'll have to think if there are other cases (e.g., in C) where you can do this "transparently". There may be other aspects of the design that are more readily suited to this sort of exploit (though that makes it less "universal")

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Apr 20, 2010 11:11 PM

Perhaps a timely coincidence? ;-)

formatting link