About News Download Docs Bugs Contact

Sorcery/Book

Information

Tools

Resources

Sorcery hacker's handbook

Author: Andrew Stitt (~afrayedknot)

Note to future Sorcery leads: this is your handbook, none of this has to be set in stone if you feel strongly enough that the rules I've put in place should be ammended it is yours to change, I just ask that you understand the reasoning behind those rules.

Table of Contents


Section 1

Introduction

If you are reading this, you're either already a sorcery team member and I've told you to read this, you're thinking about being on the team, or you're just curious about what it means to be a sorcery hacker. So I'll try to explain what being a sorcery hacker means, and the responsibility that comes along with it. If you're new to SMGL and want to do development work, I would suggest starting somewhere other than sorcery.

But before I even get to that, if you don't read anything beyond this sentence, the most important part of being on the team, above all else, and if done properly would make the act of me writing this entire handbook less of a necessity, is communication; you MUST communicate with your sorcery team leader, you NEED to be able to communicate with your fellow team members.

If you can accomplish this one thing I will be far more forgiving of any mistakes you make, because at least I'll know what you were doing, or at least what you were trying to do. There's also a good chance that in the act of communicating with me we've gotten more angles on a problem worked out, and thus there's less chance of problems later, not only that, but if I help you solve something or at least say "yes, good idea!", I then take some ownership and understanding in that work and what you were thinking when you did it and therefore problems with it are now, in part, also my problem, not just yours. Which in the end makes it a lot harder for me to chew you out for doing something dumb ;-)

Besides, you don't want me to be angry, you wouldn't like me when I'm angry.

My view of Sorcery

You should read this, even if you think I'm full of shit, you may gain some perspective on why I will act the way I do with sorcery related business.

Sorcery, in my opinion, is the most difficult and complex piece of the Source Mage GNU/Linux distribution (SMGL). Not to belittle the huge amounts of work that have been put in by the other teams on the grimoire, the installer and the other bits and pieces of SMGL. Our work on sorcery is worthless without the grimoire that our gurus have been thanklessly working on day in and day out, and similarly without an installer, we would have no user-base, and I'll let you extrapolate from there.

Sorcery does, however, take the cake in terms of complexity, and the frequency of execution versus lines of code.

What does that mean? It means that your code will get run more often than the code in any spell (on average). Things like the libhash and libcodex provide a significant portion of sorcery's backbone, and need to be solid, lest the whole of sorcery comes crashing down.

That means that when you're code breaks, there is a much higher chance of it effecting most SMGL users. That means that your code needs to work, it means that when you change the behavior of a function you need to be aware of when and how it's going to be called, it means that when you mess up and submit rm -rf ${var} * instead of rm -rf ${var}* people lose their data, people lose their time, people go to some other less broken distribution.

Everything you do, reflects on the entire SMGL community as a whole.

A guru can submit a broken spell and it can sit idle in the grimoire for months before anyone breaks their box with it. We don't get that option. Chances are, everyday, somewhere in the world, someone is running your code, not only that, they are trusting their box, their data and their time with it.

People depend on the validity of your code.

Working on sorcery is not as easy as writing a spell where you have certain well-defined places to put code. You have to know how to program, you have to be able to see the big picture, your code has to be maintainable and readable.

As a sorcery team member you MUST be able to work with the rest of the team, admit your mistakes, learn from them, understand how important sorcery is and you must accept and follow the guidelines from this handbook, or at least tell me why you can't.

If you think something I've said is incorrect, tell me.

Finding things to do

First of all, reading this guidebook is a good place to start. After that understanding sorcery is another good thing to do. Finally reading the Advanced Bash-Scripting Guide (pdf, mirror) would be good too.

Now in terms of real sorcery work, there's a sorcery wishlist. On that wishlist is a list of current projects, future projects, and easy-to-fix bugs.

I'd like to see all those easy-to-fix bugs go away first.
Then I'd like to see projects worked on.

Some of those projects are writing all new code (like a new tool or something). Other projects are fixing up or re-writing some existing bunch of code.

No matter what you're doing, before you start submitting code, and preferably before you write a whole lot, come talk with me so we can work out a design, document it if need be.

Then you can go code :-)

Why do it this way? Well it's easy to slide crappy changes in to the codebase, it's also easy to tangle the codebase up. I don't like either of those things, and I may not be very nice to you when I notice. It also gives me a chance to know what you're doing, and helps ensure that the code we are adding is well thought out and is thus, less likely to break somewhere.

Communication is the most important part of being a sorcery team member.

Again, if you have trouble doing this, sorcery may not be the best team for you to be on, go work on some spells they're much more laid back about working on your own thing, besides, they have more bugs and more lines of code that will need tending to.

Source repository

We keep our code in a source repository, why we do this should be mostly obvious, but if not it helps keep revision history and it allows users to collaborate on a shared project.

Currently we use Git.

For the average non-big-project type of work, you'll need to have the devel branch of sorcery (master branch in git repository) in your cloned repository:

$ git checkout master

Now the usual commands would be git pull, git revert, git diff, git commit and git push. As a sorcery team member you will be given read/write access to the master branch, and you won't need to worry too much about the other details.

Now for example we collaborate and decide on re-writing some big piece of sorcery, or we're adding some new, potentially dangerous tool. In order to work on this tool we'll have to adjust some library APIs, and cause other general havoc. Not only that we don't want these changes to be part of devel for a little while.

The preferable method is having a project branch whereby master is branched off into a separate code stream while we break everything and put it back together again. All the while Git is helping keep things in order, which is why we use it after all. So, that's what we can do now.

We can potentially branch master into devel-fix or something, then pull new changes from master into the branch and eventually push all the changes back. See our Git guide for more information.

The other aspect of this is that at release time I branch master into test then immediately into stable. After that I can branch it into a frozen release branch. So we can keep each release frozen in time somewhere to look at. If there are bugs that come up in that release we can fix them there, in the release branch (or make a new one), then push all those changes back into master if need-be.

All the details of this you don't really need to know, but when I figure it all out myself I'll write some more about it.


Section 2

This section well tell you how to do work and make your sorcery leader happy.

How to do development work

You write code, and then you do stuff.

Coding guidelines

I'm not big on exact code formatting and things like that, so these are only guidelines, not standards. It's probably a good idea to follow the indenting depth that's already in the file.

These two guidelines are not optional, you MUST follow them:

I don't like the use of this construction:

if foo &&
   bar &&
   baz &&
then
  success
else
  failure
fi

if is not the right construct for that sort of thing. Call me old fashioned but I like putting stuff in { } braces, which will run the code in the current shell. This is not to be confused with ( ), which will run the code in a subshell, and thus incur a fork() and not preserve variables. You can use a || between two blocks of code this way if you want, or maybe it's time for some functions, use your judgement.

functions should be named using the function keyword:

function foobar() { ... }

instead of without:

foobar() { ... }

It's not just a matter of taste and consistancy. It also makes grepping for functions easier and more importantly, enables their documentation through bashdoc.

Sometimes you end up with a small if statement like this:

if [ stuff ]; then
  other stuff
fi

It can make things more readable if you do this instead:

[ stuff ] && other stuff

Use your judgement.

Read the Advanced Bash-Scripting Guide, keep a copy of it under your pillow. Pop quizes will be given daily :-)

Improving some bit of code

So in the never ending quest to understand sorcery better you come across some function that either has a bug in it, or can be improved.

Ask yourself if you can fix the surrounding code as well. Perhaps some interface or module was poorly designed. You can then turn this into more of a project, of renovating this particular module.

Finally, you ought to talk to me about what your doing, both for my knowledge and sanity, and for quality control purposes.

Fixing a little bug

If you're picking up one of the "easy-to-fix" bugs on the wishlist, then chances are I've already taken a look at what you'd need to do and/or it won't turn into a project. But the guidelines about fixing some bit of code still apply.

Finally, be sure to let me know what you're doing, assign the bug to yourself, and let me know what changes you plan on making.

Fixing a big bug

This would be a "project grade" sort of thing. This will require either writing a lot of new code or fixing/re-writing existing code, or both. Don't take up a project on your first day.

As always, assign the bug to yourself, let me know what you're doing, discuss your design with me, document it. So on and so forth. I hope you're noticing a theme here.

This should lead directly into the creating a new module section. Hopefully the module you are making/rewriting will address the problems that are occuring, that's kind of the point, isn't it?

Creating a new module

So you've decided to take on writing a new module. Good job. This section is also going to cover re-writing a module, which is almost the same thing, because the reason you're re-writing it is because its current design/interface sucks.

So let's jump right in. A module is going to have some external functions that make up its "API". So for the libdownload library that is download_files, and get_files_and_urls. One sets the current spell, then runs get_files_and_urls, which then generates a stream of output containing a filename and some urls that it might be at. This output can then be redirected into download_files, which itself has parameters to control how it will act. Alternatively one can build their own list of file + urls and send them into download_files.

So this huge API consists of basically one real function, that provides the be-it-all, end-all interface for downloading anything and everything in sorcery (yes). The other is a helper routine for looking into spells. That helper routine is also used for some other information gathering. In anycase where download_files ends up going before we start actually downloading stuff, or at least poping out the other side into another library is basically the internals. This stuff should be basically only called within the module. Otherwise it would be part of the API! Bash doesn't give us a lot of protection from random people using internal functions, so you'll have to mark the functions as such, and hope for the best. If we can all just get along and stick with our API rather than needlessly hacking through every possible entry point in a module we can avoid a lot of code tangling.

For your module you'll want to come up with something that fits what you're working on.
Typically the external stuff should be simple, but flexible. Talk to me if you need help.

Now for the guts, continuing with our libdownload example. The download_files function guarantees its caller that it will do everything possible to get the files its been given. These actions include looking on disk, which is an internal function, with a few other library calls. It then makes use of the liburl library (where libdownload comes out the other side). liburl provides a url_download function, which given some list of urls will do everything it can to optimize around those urls and download one of them. In libdownload we have a concept of leap forward and fall back urls, which are tried before and after the passed in urls.

One important thing to note is that liburl has no idea what a leap forward of fall back url is. To it, a url is a url. On the other hand libdownload has no idea what a url actually is, it's just passing them along. This is a separation of responsibilities. Each module does one well defined thing, and does it really well.

So in your quest to make your new module, come up with what your input API's are going to be and what output API's you'll need to connect to.

Code libraries

What's a code library? Well, it's not quite a module. It's more of a collection of helper routines.

These routines can modify data, for example they could filter out directories from a list of files/symlinks and directories. Or provide some useful lower level function, or perhaps do something more than one module needs. The distinction is subtle, but one way to tell is how much code is underneath them. Please be aware of this distinction.

Stacks

What is a stack? This is kind of a borrowed term from networking. In networking you have multiple protocol layers. Indulge me with this overly simplified example. Say you want to look at a web page, so your browser makes an HTTP request to some server, that HTTP request gets bundled into a TCP packet, which gets bundled into an IP packet which then goes out your (for example) Ethernet card, and is thus sent out as an ARP packet (or whatever they're called at that point). The packet then gets partially unpacked and repacked as it traverses different types of networks. It eventually gets to the server whose network card gives its operating system back the IP packet, which is then unpacked to reveal the TCP packet, which is then put in order with other TCP packets, and eventually forms to HTTP request.

The thing to remember here is that you have all these layers, none of whom care about the one above or below them.

The HTTP request doesn't really care what type of NIC you have, or the server has. TCP doesn't care if the data is HTTP or SMTP. All the layers stack together, one on top of the other.

We do some of the same stuff with modules. Going back to my libdownload example, summon makes a download request to download_files, download_files then pokes around making requests to liburl. liburl then starts having url_handlers look at urls and then run some command (like wget). All these layers for a "stack" if you will.

Scribe and sorcery (with their update commands) make use of this downloading stack if you will.
They're analogous to the HTTP, FTP and SMTP in relation to TCP.

Parts of sorcery are, or should be designed this way. It helps keep code separated. If we can keep everyone but libdownload from using liburl then that's fine by me. The less tangling and fewer connections between boxes – the better.


Section 3

So now that you've written some code are you going to just submit it for peer review? That's the open source way, isn't it? Well, sort of. It never hurts to do at least some testing. Here I'll outline some stuff that might help you test things.

Before you submit

I always run git diff before submitting. It will give you all the changes you've made and lets you know exactly what changes you're making. It's a sanity check. If you're removing functions or changing an API (within sorcery) make sure the callers are all updated (and that I know, because that shouldn't be happening too often).

Testing

Okay, so testing. That humbling un-exciting thing. I was at a LUG meeting recently with a job opening for a test automator. Basically writing code to run a test so some moron doesn't have to do it by hand. Standing next to me was some other guy offering a job doing "development" work. Nevermind my opinion of what the company was trying to do, I got a whole lot of "oh, nevermind" sorts of answers, or "that's really lame and un-sexy" sorts of looks/responses. But this other guy had a crowd of people around him. Why? Because developing is COOL and testing is LAME for un-experienced or stupid people.

Well, don't let those people fool you. I can guarantee you that their code won't work without testing. Basically NO ONE's code will work flawlessly without testing. Sure you can make little changes and wave your hands. But I doubt a single one of those people who crowded around the offering of a sexy development job truely realizes how lousy their code probably is. Yes I'm cynical. Yes I've seen lots of bad code. And yes I've seen people who don't realize their code sucks, and are so ignorant that they can't be bothered to test it. Don't be one of those people. If 75% of the code you wrote was to test the other 25%, rather than the other way around, your 25% would be worth keeping (assuming you fixed the problems you found, and that your testing code was reasonable). The other party's code would be far less worthwhile and far more likely to have bugs.

Unfortunately testing isn't something one can just do. You have to apply yourself and think about all the possible inputs and situations your code can get into.

Let's start from the bottom.

Function testing

You wrote some function. It has a comment explaining what the parameters are, what goes in stdin if anything, and what comes out stdout.

First of all your function should match the comment. But now how to actually test this one function.

You write a simple script that calls it in different ways (as opposed to unit/system testing where you use it in its environment).

So for download_files I wrote a script like this:

#!/bin/bash
. /etc/sorcery/config
SPELL=tree
run_details
get_files_and_urls|download_files

Amazing, huh? I also did similar stuff to test just get_files_and_urls. I also made up some sample inputs for download_files and echo'ed it into the function rather than using another routine.

The . /etc/sorcery/config just loads all the sorcery libraries and stuff.

I of course also tried different combinations of parameters and so forth. After doing all that different parts of the code were exercised. Problems were found and fixed. All without having to do complicated testing with the rest of sorcery.

I also did similar things with each of the internal routines, as they were written.

One final note, its "okay" to "fake" calls to other modules or functions if you're trying to test something that's higher up in the code path. So I may have written a fake url_download that just faked downloading by touching a file in the right place. You get the idea hopefully.

You certainly don't have to be so rigorous, this is a technique I use that may be helpful for big complicated things.

Unit testing

This is almost exactly the same as function testing, but you're testing an entire subsystem (such as the download stack). The distinction is that you're specificically testing the unit as a whole.

System testing

This is where you bundle all that stuff up and work with the system as a whole. So you start casting lots of stuff or summon things and make sure things work as a whole.

It goes without saying that all of this requires you to use your imagination and try to find problems in your code and make sure you've thought about everything.

Changelog

We have a changelog. It's pretty straightforward what to do here.

Communication

If you plan on changing a whole lot of stuff, or re-writing a module, talk to me your fellow team members and all that good stuff. We need to know what you're doing.

When mistakes happen

Mistakes happen. Hopefully it was found in devel, not stable. But it's your responsibility to help fix the problem. I don't really like having to go back and clean up your mistakes. As a team member you should be mature enough to own up to your mistakes, fix them and learn from them. I make mistakes. I had a bug that left FIFO's all over /tmp. So I fixed it and learned to pay more attention to stuff like that.

How to revert code

Use git revert :-)

Consequences

So if you're finding that you can't seem to "get it right" and you're always submitting things that aren't working… I ask you to consider:

  1. working on something simpler, maybe no more big project stuff, only smaller stuff I delegate to you, or
  2. working on a lower impact part of the project. Say Grimoire or Cauldron.