How to use VCS (git) to save output binary files

I'm very new to VCS and git, so I have some difficulties to use them.

I'd like to use git for my embedded projects. I consider a must have featur e the possibility to retrieve the binary file of an old release. In this wa y, I can reprogram a real device with EXACTLY the same binary after some ye ars since the release.

In order to do so, I think I need to add output binary files (maybe even ob ject files) to the repository. But this means the output of git status comm and will be cluttered by unuseful info (at every change in the source code, many object files change as well).

Any suggestion?

Reply to
pozzugno
Loading thread data ...

1) use .gitignore to suppress the status messages 2) If the files are large, consider git-annex to track them without actually putting them in the git repo.
Reply to
Paul Rubin

I knew .gitignore is used to remove files entirely from the tracking process (not only suppress status messages).

What do you mean with "large"? I don't think they are big files (I usually work with MCUs with internal Flash memory).

However I want to clarify I don't need to track binary files for every commit. It's not useful. I think it's better to have a full snapshot (including binaries) only for prodiction releases. I understand a production release should be tagged in git, so it is sufficient to push additional not tracked files (binaries) to tags.

Another possibility is to save the *full* directory of the project in another place. But in this case I'll have two copies of the source (one is in the git repo) and there will be a risk they aren't well synchronised.

Reply to
pozzugno

If you use .gitignore to ignore binaries, you can still force-add them during your release build process so that those files only get committed on a release build. I would add this into a special target in your makefile so it's simple to make a release commit. Use:

git add --force some-file

Otherwise arrange a process that copies the released files into a parallel directory with its own git repository, add and commit them there, then tag the source code tree with the commit number of that commit. That way your source tree remains small, and you can always retrieve the exact source code used to build each released version.

Personally I much prefer the second option.

Clifford Heath.

Reply to
Clifford Heath

I often include the final output files (such as .hex, .bin or .elf) in the repositories, as an aid to getting exactly the same programming file later on - also to be able to check that a re-compilation produces the same results, and for convenience if I am developing on one machine but the programming is been done from a different system.

But I don't see any reason to keep object files, or any of the other temporary files that get generated - listing files, dependency files, etc. That would reduce your clutter enormously.

Another thing to consider, since you are new to version control, is if git is the right choice for you and your project. git is a great VCS, but it is also quite complicated and can be hard to learn and use well. And it is not very good with binary files. An alternative would be subversion, which has fewer features and capabilities, and a rather different philosophy, but which might be simpler, clearer and more appropriate for you. We mainly use subversion, and find it a better fit for more general development, both hardware and software, within small groups at the same location. For a couple of software-only projects that are co-operations across a number of different offices, git has advantages.

Reply to
David Brown

Suppose the project tree has two sub-folders named Release and Debug with all the files generated during the build process (listing, objects, binaries, ...) that I don't need to track during normal commit.

I would write the following lines in .gitignore (note trailing / for the folder): Release/ Debug/

The process to generate a commit and tag for a production release (with binary) would be:

make all git add --force Release/binary.hex git commit -m "Commit for the production release 1.0" git tag -a v1.0 -m "v1.0: first production release (with binary)"

Now I start making some modifications for the next production release. I think the next commit will have Release/binary.hex yet! Starting from the commit of the production release, the files force-added will continue to be tracked by git. Should I manually remove them after creating tag?

I don't know if I understood correctly.

What do you put in the "parallel directory"? Sources *and* binaries, or only the binaries? I think only the binaries.

In this case, why to have a repository for the "parallel directory"? What is the goal to track released files in the "parallel directory" with git? I would tend to copy the released files in a new folder for each release, without tracking. The difference between two successive releases isn't important.

Reply to
pozz

This is my exact goal. The problem I see is during normal working. Before making a commit of the working tree after solving a bug, I check which files I touched and will be committed in the repo. As you can understand, this log will be full of binaries.

Now I understand your point.

...as I noted :-(

Each VCS has pros and cons. After reading some docs, articles, forums, blogs I thought git was the right choice for me: a full-featured and modern VCS. But I'm not sure.

Of course, I should try every VCS to make a good choice, but I don't have time for that. So I read other suggestions.

I'll check svn again.

This isn't my case.

Reply to
pozz

No. The new commit will not add the ignored files, even if you force-added those files previously.

Yes, just what is released, and any debug symbol libraries, etc. The sources are tagged in the source repository with the commit number of the binary repo.

It keeps the source directory clean and small, to speed up any operations which do not require the full history of released binaries (such as cloning a repo for automated testing). note that the repository where the binaries are maintained cannot be cloned without all binaries for *all* releases being copied into the .git/objects directory. Your source trees, and your test trees, and any experimental "spike" trees do not need that; not even your production build machines do. Only your customer support environment needs it.

Clifford Heath.

Reply to
Clifford Heath

Yes. It is a mistake to try to put /all/ files into the repository. Put in all files that are actually needed to recreate the binary - i.e., the real source files. This includes project settings files or other such details, even if they don't look like source code. I also add some generated files if they are of particular convenience - such as the final executables or binaries, pdf files generated from LaTeX source, etc. This means that other users on different machines can make use of the output files without having to rebuild everything themselves.

But avoid backup files, temporary files, history files, log files, debugging files, dependency files, and other such clutter.

You are absolutely write about the pros and cons of different systems. There is no doubt that git offers more features than svn - but if those features are not of use to you (and you don't expect them to be useful in the near future), then they become a cost in terms of learning and the possibility of mistakes.

You can think of subversion as giving you a linear history of snapshots of your project directory. (It does have branches, merging, etc., but they are more complicated to use in svn.) Each checkin is logically a full new snapshot (but handled more efficiently than a full copy, of course).

git handles multiple branches and paths - it gives you another dimension. Checkins are logically changes or patchsets, rather than snapshots. This makes branches and merging a natural and critical part to git.

If you think you will work with multiple parallel branches, and move changes between those branches, then go for git. If that is likely to be a rarity, keep it simple with svn.

The other key difference is that subversion should always have a single central server. For git, you can use many servers or no servers, or a single central server. That means more flexibility - and more scope for getting confused and losing track of what you have where. Arguably, git requires more discipline to use reliably and safely.

Finally, svn is equally happy with Linux and Windows, and gui and command lines, while git is more at home in Linux and with the command line (though gui clients and Windows versions are now common, if you look at web tutorials or other sources of information, it will mostly assume Linux and the command line). It is more common to find plugins and support for Subversion than git on Windows programs.

Reply to
David Brown

If you're really serious about that, you do your production builds in a virtual machine, and archive a complete copy of that VM - including all compilers etc. That's what we used to do anyhow. But that was for legal escrow purposes, not for debugging problems in old releases of deployed software.

Clifford Heath.

Reply to
Clifford Heath

I have tried on a test repo and it seems to me that the file added with "git add -f" is tracked for newer commit.

git init echo main.hex >.gitignore echo "This is a source file" >main.c echo "This is a binary file" >main.hex git add * git add .gitignore git commit -m "First commit without binaries" echo "Some changes on main.c" >>main.c echo "v1.0 binary file" >main.hex git add -f main.hex git commit -a -m "First production release with binaries" git tag -a v1.0 -m "Version 1.0" echo "Other changes" >>main.c echo "Another new binary file" >main.hex git commit -a -m "First commit after production"

Now, after "git checkout v1.0", main.hex content is correcly "v1.0 binary file". After "git checkout master", main.hex content is "Another new binary file".

This means, main.hex *is* tracked for newer commits.

Reply to
pozz

Just put your production release on a separate branch, and switch back to master, such as:

git commit -a -m "v1.0 release candidate" # commit final changes, check it works git checkout -b release-branch # create a release branch and switch to it

git checkout master # return to the master branch, with no main.hex # possibly merge desired changes from release into master branch # eg with git cherry-pick # (for those changes you made to the release at the last minute)

# committed back on master branch

I find committing FPGA bitfiles into my personal git repos is very handy - they compress well, and 10MB of repo space per commit is a good tradeoff against multi-hour synthesis times. In our more important projects this is managed by Jenkins so I don't need to handle it in git.

Theo

Reply to
Theo Markettos

If I'd like to push *everything* (binaries, object files, ...) in the production branch, I could do:

make all git commit -a -m "v1.0 RC1" # Check if it works git checkout -b v1.0 # It works, create a branch for production rm .gitignore git add * # Add every file on production branch git commit -m "v1.0" # Update every file on production branch git tag -a v1.0 -m "v1.0" # Tag it as v1.0

git checkout master # Switch back to master branch

What do you think?

Reply to
pozz

If your process is such that "official" binary releases are seldom, archive them separately. The "one floppy set per version" method of version control sucks for keeping track of code during development, but it's not a bad way to make sure that the customer gets what they expect.

--

Tim Wescott 
Wescott Design Services 
http://www.wescottdesign.com
Reply to
Tim Wescott

Am 10.09.2015 um 11:31 schrieb David Brown:

Exactly.

The general rule is: if human beings didn't create, nor routinely modify it by hand, it doesn't go into version control. I.e. VC tracks your _work_, not the final product.

One should only deviate from that rule for files that

  • not everybody in the team is equipped to re-build,
  • just take to darn long to build, or
  • are needed to bootstrap the first build of a fresh check-out

It's a good idea to set up and maintain (some variation of) a "make distclean" procedure. This removes all files that can be re-generated from others, except possibly those already on the "ignore list". If any changes still show after make distclean, that means something has to be checked into VCS: the modified file, or an update to the VC's ignore list, or the distclean script itself.

Final output files (hex, map, build report, ...) that are deemed worth keeping for every "release" version should be checked in either in a separate system (see "configuration management"), or at least in a separate place / project within the VCS: the release archive. To this end these files, in their build-time location, should be on the ignore list of the VCS, so they never show up as modified. Because they'll practically _always_ be modified, there's really no point displaying them as such, nor should they ever be checked in at that location.

Then the release procedure can become roughly the following:

  • crank up the version number
  • make ; make check
  • copy the few(!) "sacred" release files into the release archival place
  • add those new files to the VCS (if archive is in VCS)
  • make distclean
  • check in everything that shows up
  • put your VC's version of a "label" or "checkpoint" onto it

Then you build it again, and verify that you got the exact same hex files (and only trivial differences in the other "sacred" ones) as the ones just copied into the Release archival place.

One radically different approach is to keep all generated, volatile files outside the source tree entirely. This is referred to by some as an "out-of-tree build". This means all those objects, listfiles, libraries, mapfiles and hexes never show up in VC status listings to begin with.

Suffice it to say that those are not criteria newbies should be worrying themselves about in a VCS.

What good is "modern" in a system whose primary purpose of existence is to still be useful to you when you need it, say, 20 years from now? It surely will no longer be modern, then!

What good is "full-featured" to a newbie, who doesn't have the background to understand the majority of those features, let alone why one possible implementation of them might be better than another?

No, you shouldn't. You should pick one that fulfills what requirements you actually have and isn't already widely ridiculed as totally bone-headed. And you should plan to _stick_ with that choice for the foreseeable future.

If ever it's time for you to re-evaluate that decision, you'll know: there will be this nagging pain in the lower back that you can no longer ignore. The bonus of having stuck to your original choice until that point is that by then you'll have the experience to know what really to look for in the replacement: remedy for that pain, to start with ;-)

Reply to
Hans-Bernhard Bröker

IMHO this is the best solution. After thinking about the goal of a VCS, it is a nonsense to add in the repository the binaries. The release files (hex, listings, ...) should be maintained in a different place.

The only link between the release archive and the source tree is the name of the release: the commit in the VCS that corresponds to the officiale release should be named/tagged in the same way as the release archive (v1.0, for example).

Of course, considering the process is manual, there is the risk the build result of v1.0 tag in VCS isn't the same as the files in v1.0 official release archive. The developer should be disciplined to avoid this problem. Maybe some scripts should help.

Reply to
pozz

Seems sensible...

Theo

Reply to
Theo Markettos

I tested it again with the commands I normally use (git add --update .) and it does behave the way you said. I'm sorry, it seems I was wrong.

Anyhow, I still wouldn't want large binaries in my source repositories, because every clone needs a copy. If you're careful to push them only to a production branch, and not to checkout all branches into a new clone, I guess you could get away with it - but "git clone" will normally fetch all branches.

Clifford Heath.

Reply to
Clifford Heath

One option for that is to use Subversion for the 'golden master' versions including binaries, and git for day-to-day source code development (including random throwaway branches and experiments). git-svn is a good bridge between the two. SVN sucks for daily use, but using it as something you interact with infrequently avoids much of the suckiness.

Theo

Reply to
Theo Markettos

Il 10/09/2015 07:18, snipped-for-privacy@gmail.com ha scritto:

I found a nice article here:

formatting link

In this workflow, there is a well-known Release branch that has the goal to prepare the official release (fixing documentation, small bugs, version numbering, ...). After the release is ready, it is merged in the master *and* develop branch.

Finally we have all the official releases in the master branch, each tagged with its version number. In this process, there is a well defined step that creates the new official release: it is the merging of the last Release branch with master branch. This step could be changed to add other files (binaries, listings, maps, ...) to the result merge commit in the master branch.

Of course, a developer will download all the binaries, together with the sources, when it clones from the central repository. I don't know if it is possible, but the developer interested only in... developing (sources), could clone all the repository, but the master (official releases) branch. Maybe "git clone" command can do this.

Just a final note on the VCS to use for a newbie as me. Most of you suggested to avoid git for its complexity. It is true, I found some difficulties to understand the concepts behind git. Maybe other VCS, like SVN, are simpler to understand.

But I understood git introduced a very good practice absent in other VCS. When you decide to start working on a new feature, you fire up a new branch, work on it for days, at last merge on the main branch. With SVN branches aren't used for daily work, but only for exceptional operations.

Reply to
pozz

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.