WARNING: unit tests and TDD do NOT eliminate defects

blog entry posted by lalo (Lalo Martins) on 2012-01-15 14:08:00

Tags:

Here's an excellent article about why you should be doing Test-Driven Development.

No, really, it's excellent; go there and read it, then come back here.

A little harsh, isn't it? But very true. It's excellent.

However, something in it made me a little uncomfortable while reading, and it wasn't too hard to figure out what.

There's a lot of people out there under the misconception that unit tests and TDD are a QA method, and that if they do it right their software will have no defects (or “bugs”). That's a dangerous misconception. It's bad for your software, because it won't work; and it's bad for TDD, because when it blows up in your face, there's a pretty good chance you'll go out there telling other people that TDD doesn't work. It does work; and it probably did work for you. It just didn't do what you were mistakenly expecting it to do.

Now, if you will, go back to the article and search for any instance where Uncle Bob tells you TDD will make your software defect-free. He never claims that. The closest he says is “your software will work better”, which is true; TDD reduces bugs a lot, but most TDD champions (at least the ones who know what they're talking about) consider that a nice side-effect at best. (So if he doesn't make the wrong claim, why am I uncomfortable with the article? Because I can easily see proponents of the “TDD as QA” misconception misusing Uncle Bob's article as proof that they're right.)

TDD is not a QA tool. TDD is a development process, I'll even say a programming process. Its main benefits are, in order of (IMO) importance and relevance:

  1. Clearer and cleaner design. I'm talking about technical, architectural design, not visual. By forcing yourself to write down what you expect the software to do in a formal language (code), you come out with a clearer idea of what you're going to do; and by designing your internal APIs so that they can be easily called by unit tests, you end up with more modular and maintainable structures.
  2. Cleaner code. I've seen people whose unit tests are confusing but production code is crystal-clear. That's obviously not ideal, but it's much better than confusing production code. By focusing most of the effort in writing the test (therefore understanding what you're doing) and then writing the simplest code that makes the test pass, you make it harder to write convoluted code. (Harder, not impossible.)
  3. More confidence. Once you've written the test and you're confident that the test expresses the problem, you'll understand exactly what the solution is, and later after the code is written and deployed, you'll trust your old code a lot more.
  4. More reuse. To be honest, this isn't even about writing the test first, but in fact there's a step that often comes before writing the test: looking at the appropriate test file, reading the other tests, and checking if what you want is already there. (Because, you know, you need to find the right file in the tree and the right place in the file to add your test.) If there's something that does almost exactly what you want, and that you had never seen before, you'll write your new test and modify the existing functionality. If there's something that does exactly what you want, you save time and don't increase the code complexity.
  5. Faster. This is almost always difficult to claim, but it really does stand to reason. Think about the other benefits above; they alone make your coding a lot faster already, enough to offset the time you spend reading and writing tests. You'll end up writing less code, because you know exactly what you need and you won't write fluff. You'll end up rewriting your code less as you iterate, because writing the test made the solution clear to you. Writing code is much like the scientific method; you come up with a working hypothesis, check if it works, adapt as necessary. It might feel like we spend most of our time (in the non-TDD world) writing code, but in reality we spend most of our time figuring out stuff, followed by checking or rewriting code. Clearer code reduces time spent on the former, and writing your verification first as code reduces the latter.

As a nice side-effect, TDD also reduces defects. It does that by (a) making the design and structure cleaner and clearer; (b) making the code cleaner, therefore easier to work with later; (c) encouraging the programmer to think about the problem being solved and write “the right code”. See a pattern? And yes, (d) preventing regressions on the unit level by keeping the unit tests around to run later. But let's be honest: how many regressions are at the unit level? If your answer wasn't “very few”, there might be something else wrong with your process.

Now here's a few reasons why TDD will not take you to the magical no-bug land:

Conclusion: TDD is great for developers and you should use it everywhere. But it's not a QA strategy.

How to use bzr-svn with SourceForge

blog entry posted by lalo (Lalo Martins) on 2009-02-17 17:58:00

Tags:

Some projects I work with haven't yet abandoned Subversion. I try to tolerate it as much as I can, but sometimes (if I need local commits, or if there is heavy merging involved) it just won't do. Thankfully, I have bzr-svn to make my life less miserable.

The thing is, though; how to do the initial branching (“checkout” for those still stuck in svn terminology)? Because bzr-svn tries too hard at being atomic, and we all know SourceForge's Subversion server is made of purest fail. If the server decides to disconnect you in the middle of the operation, you lose all the (potentially hours of) work until that point.

After much frustration, I figured out the way to go with that.

First, branch from revision 0: bzr branch -r 0 https://crossfire.svn.sourceforge.net/svnroot/crossfire/server/trunk server-svn-trunk. Now you have a local branch that is already usable (well, usable as a branch, there will obviously be nothing in the working tree). Here lies the greatest trick, because while getting revision 0 doesn't actually pull any revisions, it makes bzr-svn do most of its hard mapping work.

Then get inside the branch, and pull the revisions in batches of (in my experience) no more than 500: bzr pull -r 500 && bzr pull -r 1000 && bzr pull -r 1500 etc. If something fails, you don't have much left to recover from.

You may be asking, if it's that painful, why do I bother? Simple: because it's only painful in the initial branching. After it's all up and running, it will be a lot less messy than dealing with svn, especially if I have non-trivial merging to perform. (Which, in this case, I do.)

Automagically-translating chat thingy

blog entry posted by lalo (Lalo Martins) on 2008-12-19 20:23:00

Tags:

Usually, I have to communicate with the people in the building's management office via Google Translate. It works, but it's awfully painful to be constantly flipping the language drop-downs back and forth. (It's two drop-downs, one for source and one for target language.)

So I wrote a little javascript gadget that does the hard work for me, and also keeps a “log” of the conversation. You can peruse it at http://lalomartins.info/transchat.html

(Attention though: this is not a chat app, not in the modern sense. It's “chat” in the old-school sense, of actually talking to a person that's in front of you. It's... an interpreter widget, not a chatbox :-) enjoy and spread if you wish...)

XML considered harmful, or,

blog entry posted by lalo (Lalo Martins) on 2008-10-25 15:37:00

Tags:

I have, on a number of occasions, stated that XML is harmful, and should be taken out and shot. So here I am today, to explain why I think that, and offer alternatives.

Not good for humans

The main problem is, of course, that XML was never intended for humans. It's not designed so that we can efficiently write it, read it, understand it at a glance, or maintain it. But many tools that use XML today tend to forget that, leading to hours of wasted time and lots of frustration. (XML for configuration files, anyone? Zope's ZCML and .Net's configs and all those Java frameworks?)

Then, of course, that's not XML's fault; it was never designed to succeed at that task. The fault lies with developers who misuse it. Well, yes and no. The reason people misuse it is because it's overhyped; XML is the new peanut butter (or garlic butter, according to Pete Abrams) — adding it to anything makes it taste better and sell more. (I don't even like peanut butter.)

Not good for machines

What it was designed for is communication between programs; an unified, extensible format for data transmission. By having libraries to handle it in most languages and environments, you'd make it easy for developers to deal with it, and as a consequence, to make their programs communicate.

However, after roughly ten years of working with it, it is my informed opinion that XML fails at that, too. I'm not saying it got supplanted by better technology which we invented later. It did, to be fair. But what I'm saying is that it was wrong from the beginning. And if it's not good for us and it's not good for our programs, why are we still using it? (Peanut butter, I know.)

So let's try to break out of the hype and prove that it's bad for our programs.

The perceived problem with XML can be summarised in one sentence: XML is costly to parse. But that's too superficial; let's go deeper, look at the specifics, and the flaws in philosophy/design that lead to this perception.

Parsing XML: layers

I usually tell my co-workers that there's two “layers” to parsing XML. While that is true, it's only true in the context of our data; if I were to make that statement more generic, I'd say: there's always at least two “layers” to parsing XML.

The first, the “bottom” layer if you want, is syntactic parsing. This means reading XML itself: tags, entities, attributes, comments, CDATA, PCDATA, white space, the works. The input to syntactic parsing is a string or stream of bytes; the “output” is an API — SAX, DOM, ElementTree, you name it.

On the opposite end of the stack, the “top” layer so to speak, is semantic parsing, or extracting the data you're actually interested in. The “input” here is a generic API; in the typical case of two layers, the API from syntactic parsing. The “output” is a domain-specific API or, more commonly, a collection of structured data (usually objects, nowadays).

An example where you may have more than two layers is when you're using something else built on top of XML; the most common case being feeds. So at the bottom layer something will parse XML, then another chunk of code will parse that as RSS or Atom, and then your semantic layer will actually extract the data. At work, we initially made our data available as RDF; so we had a second, “middle” layer (we actually used a JavaScript RDF library) which would parse the RDF, and then we did our semantic parsing by using the RDF library's API. That made our code a lot simpler, but it also made it a lot slower; so we later switched to ignoring the RDF and simply treating it as XML. (Even later, we switched to a JSON format.)

Syntactic parsing: too much structure

Syntactic parsing is what XML is supposedly “all about”; the point being, you don't see it. In our case, at work, it's done by the browser (which gives us DOM with a touch of XPath). In pretty much any other case, it will still be done by your environment (the browser, in our case; JBoss and .Net are other examples), or by a standard library.

Well, that's great, right?

It is, yeah. But it hides the fact that those libraries (even if it's “hidden” in the environment, it's still at some level done by a library) tend to be huge and ridiculously complex. The XML syntax is designed to cover an enormous universe of cases that your program will concretely never encounter, and yet, you have to pay the complexity cost for them.

Semantic parsing: not enough structure

XML shines on xHTML: a markup language for text, where you have arbitrary streams of text sparkled with special instructions about it. Some of those “instructions” are really containers, which have more text and instructions. XML does that really well.

It shines a little less on something like SVG, where it represents arbitrary streams of heterogeneous objects. Some of those contain other objects, and XML does help there.

But the truth is that, for representing your program's data? It probably sucks. Its model is very different from the object model of most (all?) popular languages and frameworks today. In the end, we find ourselves designing our data structures as many as three times: once in the language in which we're actually writing it, one in a relational database, and one in XML. The mappings between them are often poor, since the semantics of the three models are so poorly matched.

Sadly, it would be relatively trivial to pick a lowest-common-denominator model that would fit all of today's popular languages. But XML didn't even try.

That's not the whole of my objection, though. Due to the MASSIVE FAIL in the syntactic layer, we get a semantic layer that's only marginally simpler than it would be to parse a DSL (domain-specific language); maybe less simple, if you use a good library for your DSL. There are about half a dozen XML APIs in wide use; smart people are frequently getting annoyed at the ones already there and coming up with a new, better one. And although a modern offering like, say, ElementTree can be light-years ahead of SAX or DOM, it can't help being clumsy and feeling unnatural to the language; at the bottom line, what it's doing is dressing up a rotting corpse.

Conclusion

Here's a better phrasing then, for the problem of XML as I see it:

XML has too much structure where it doesn't help, and not enough where it matters. One of the reasons I love JSON is that it's not designed to mark-up text, or to transfer “streams of data”; it's designed to transfer objects (JSON means “JavaScript Object Notation”), which means it maps nicely to my code on both ends, whether that code is JavaScript, Python, C++, or even C. (It maps nicely to Java as well, but who cares.)

Alternatives (existing and ideal)

Right now, for real-life code, most places where you're using (or thinking of using) XML would probably be better served with JSON. A few more complex cases may justify a DSL, but I would hesitate a lot before going down that route.

Ideally, I'd like to propose a new format; an “active” derivative of JSON, inspired by the modern practise of “JSON with callback”. Essentially, I'd like to replace JSON's “flat” object notation ({'attr1': 'value', 'attr2': 'value'}) with something which looks like a Python constructor (MyClass(attr1='value', attr2='value')). The pseudo-classes (or pseudo-functions if you're looking at it from C) would play the role that tag names play in XML elements, which would make it even more straightforward to map this data to actual objects on each end.

This would, of course, lose the benefit that “JSON with callback” can simply be executed in a browser. But then again, “JSON with callback” is not formally correct JSON anyway, so we already sacrificed some portability for that ability. “Real” JSON is usually converted to “JSON with callback” by a simple routine on the server side. A similar transformation could convert the format I'm proposing into JavaScript; the fragment above would become: MyClass({attr1: 'value', attr2: 'value'}).

Five Simple Hints For Your Project Website

blog entry posted by lalo (Lalo Martins) on 2006-06-02 14:08:00

Tags:

A Free Software or Open Source project has to have a website, right? Well, no, actually it doesn't; it's perfectly acceptable if there isn't one. But most developers don't seem to think so, and most serious projects have a website, maybe even their own domain.

I don't consider myself any kind of web expert, but I've been a web developer for about ten years, and most of all, I'm a critic user; so I'd like to say a few words about the Free/OSS websphere I've seen.

1. Have the relevant info.

This is probably the easiest one. Here's the list of what the website should say:

  • What is the project; brief explanation required, a separate detailed explanation is a bonus (that's in addition to the brief one, not instead of). If your project is so obscure that people have to understand a few concepts before they can even have an explanation, then write the explanation assuming they do, and sprinkle it liberally with links to Wikipedia or whatever you prefer.
  • Project status. Is it vapourware, usable work-in-progress, stable, abandoned? That's a reasonably important piece of information.
  • Latest release: number (or name) and date, download link(s).
  • Contact info: how to report bugs at least. Maybe you or your community provide some measure of support, maybe not; but at least a channel for reporting bugs is an absolute must. For a more normal project, where to ask questions (mailing list, irc), and how to contribute, are also common.

And the nice-to-haves:

  • Documentation; even if your documentation is very good, mirror it online.
  • Brainstorming area: a wiki or board is great as brewing grounds for ideas which may become documentation.
  • Browseable, searchable bug/defect/TODO database.
  • Marketing area: say why your product is great, dazzle visitors with screenshots or screencasts or code samples or whatever, make them want it.
  • Egotrip area: give some praise to important contributors, including yourself if you wish. That gives people an incentive to contribute.

2. Don't make the news page the front page.

Your site is not a blog. Visitors have no interest whatsoever on knowing that a new version is out, or one of the developers is on vacation, or the lead developer's dog had puppies, if they don't know what your project is yet.

The front page is your introduction card. It's there to catch the visitor's attention. It should start by explaining what the project is, and maybe go on about why it's the best thing since sliced cheese; if your project has good eyecandy, include screenshots; otherwise, try to think of something else flashy and shiny to include.

It's ok to have a box with the latest news somewhere, maybe as a portlet; as long as it's not the dominating element in the page.

If your “site” is a single page, then replace “front page” above with “first section at the top of the page”.

I know many people think the news are the most interesting part; they certainly are for you, since you're already an user and supporter. Returning users can easily bookmark the news page rather than the front page. Even better, provide a syndication feed, so that they don't have to.

3. Design portably, design accessibly.

A website for a Free or Open Source software that only looks right in a non-free browser is an oxymoron. Test it in Gecko (any of them; Firefox, Epyphany, Galeon, Seamonkey, you name it) and in Khtml (either Konqueror or Safari, preferably both if you can have access to a mac).

Before the launch, give it a little go on Opera and Explorer too just in case. If it looks horrible in the proprietary browsers, you may or may not fix it, it's your call; if you don't, then consider adding a “Get Firefox” button. However, make sure your site is navigable with the non-free browsers, even if it looks like last week's leftover sushi in a train crash.

Please consider giving your site a go on a text-only browser too. That is for two reasons:

  • Text-to-speech browsers used by blind or sight-impaired people are very similar to text-only browsers like lynx. If you can figure the site out in lynx, then a blind person will be able to learn about your project. If your project is something that is potentially specially relevant to visually impaired people, then you should probably look up accessibility hints on the web, and try your site on emacs w3 (w3 with emacspeak seems to be the preferred browsing solution for visually impaired true geeks).
  • If your project is something I might want to run on a server, I may sometimes want to find something on your site with links or w3m over ssh. I expect that to be a rather common practise.

A few odder corner cases can be deduced from the above. If your project is something that people might want to use on their phones, make sure it looks decent on phone web browsers; and so on.

Corollary: do not under any circumstances use Flash or some other non-free non-sense like that. Preferably, avoid Java too — most Free Software and Open Source users will have a JVM, but a surprisingly high number of them don't have one in their browsers.

4. Think trough your information paths.

Serious website design includes tracing information paths: what kinds of information or services may visitors be looking for, how do they get there, and how easy it is to discover it.

Of course, doing the full-fledged thing may be too much on what is, for most people, a hobby. But do try to think up a few cases. Pretend you know next to nothing about the project — or even better, ask a friend, your sister, your dad, whatever. See if they click on the right menu items to get to the information they want. If they have to ask you “how do I...”, you're in trouble; specially if it's “cool, but how do I download it?”.

If you're into UI design, think of navigation in your site as an UI (because that is what it is). Feel free to do menus and submenus. If your software has a GUI — specially if it looks good — you may want to make the site look like the software, or resemble it, or reuse elements from its interface that people will remember. For a good example of the latter, see the gaim website (the icons in the main menu are taken from a Gnome icon theme).

5. Simple is fine. Now go write software.

If you whip up a few webpages — and you respect the rules above — then you're fine. Doesn't matter if you don't have a single line of css or javascript, if your site is mostly text, if there's no spiffy or shiny whatsoever. Don't let people convince you otherwise. If it conveys the information you need to convey, and you like it, then use it. Spending too much time on a flashier website will distract you from what you actually intended to do when you started (or joined) the project: write software.

Of course, if someone offers to help, then you can weight that as any other offer for help. Will it end up taking more of your time than you have, or will the contributor be able to handle it mostly alone? Does the contribution add real value? If you're getting a wiki, CMS-like news posting system, syndication, screenshot gallery, or documentation archive, or anything that is actually going to be useful to your users and potential users, then it might be worth the trouble, more so than if it's about a tableless layout or smart ajax menus.

older posts