How to use bzr-svn with SourceForge

blog entry posted by lalo (Lalo Martins) on 2009-02-17 17:58:00

Tags:

Some projects I work with haven't yet abandoned Subversion. I try to tolerate it as much as I can, but sometimes (if I need local commits, or if there is heavy merging involved) it just won't do. Thankfully, I have bzr-svn to make my life less miserable.

The thing is, though; how to do the initial branching (“checkout” for those still stuck in svn terminology)? Because bzr-svn tries too hard at being atomic, and we all know SourceForge's Subversion server is made of purest fail. If the server decides to disconnect you in the middle of the operation, you lose all the (potentially hours of) work until that point.

After much frustration, I figured out the way to go with that.

First, branch from revision 0: bzr branch -r 0 https://crossfire.svn.sourceforge.net/svnroot/crossfire/server/trunk server-svn-trunk. Now you have a local branch that is already usable (well, usable as a branch, there will obviously be nothing in the working tree). Here lies the greatest trick, because while getting revision 0 doesn't actually pull any revisions, it makes bzr-svn do most of its hard mapping work.

Then get inside the branch, and pull the revisions in batches of (in my experience) no more than 500: bzr pull -r 500 && bzr pull -r 1000 && bzr pull -r 1500 etc. If something fails, you don't have much left to recover from.

You may be asking, if it's that painful, why do I bother? Simple: because it's only painful in the initial branching. After it's all up and running, it will be a lot less messy than dealing with svn, especially if I have non-trivial merging to perform. (Which, in this case, I do.)

Automagically-translating chat thingy

blog entry posted by lalo (Lalo Martins) on 2008-12-19 20:23:00

Tags:

Usually, I have to communicate with the people in the building's management office via Google Translate. It works, but it's awfully painful to be constantly flipping the language drop-downs back and forth. (It's two drop-downs, one for source and one for target language.)

So I wrote a little javascript gadget that does the hard work for me, and also keeps a “log” of the conversation. You can peruse it at http://lalomartins.info/transchat.html

(Attention though: this is not a chat app, not in the modern sense. It's “chat” in the old-school sense, of actually talking to a person that's in front of you. It's... an interpreter widget, not a chatbox :-) enjoy and spread if you wish...)

XML considered harmful, or,

blog entry posted by lalo (Lalo Martins) on 2008-10-25 15:37:00

Tags:

I have, on a number of occasions, stated that XML is harmful, and should be taken out and shot. So here I am today, to explain why I think that, and offer alternatives.

Not good for humans

The main problem is, of course, that XML was never intended for humans. It's not designed so that we can efficiently write it, read it, understand it at a glance, or maintain it. But many tools that use XML today tend to forget that, leading to hours of wasted time and lots of frustration. (XML for configuration files, anyone? Zope's ZCML and .Net's configs and all those Java frameworks?)

Then, of course, that's not XML's fault; it was never designed to succeed at that task. The fault lies with developers who misuse it. Well, yes and no. The reason people misuse it is because it's overhyped; XML is the new peanut butter (or garlic butter, according to Pete Abrams) — adding it to anything makes it taste better and sell more. (I don't even like peanut butter.)

Not good for machines

What it was designed for is communication between programs; an unified, extensible format for data transmission. By having libraries to handle it in most languages and environments, you'd make it easy for developers to deal with it, and as a consequence, to make their programs communicate.

However, after roughly ten years of working with it, it is my informed opinion that XML fails at that, too. I'm not saying it got supplanted by better technology which we invented later. It did, to be fair. But what I'm saying is that it was wrong from the beginning. And if it's not good for us and it's not good for our programs, why are we still using it? (Peanut butter, I know.)

So let's try to break out of the hype and prove that it's bad for our programs.

The perceived problem with XML can be summarised in one sentence: XML is costly to parse. But that's too superficial; let's go deeper, look at the specifics, and the flaws in philosophy/design that lead to this perception.

Parsing XML: layers

I usually tell my co-workers that there's two “layers” to parsing XML. While that is true, it's only true in the context of our data; if I were to make that statement more generic, I'd say: there's always at least two “layers” to parsing XML.

The first, the “bottom” layer if you want, is syntactic parsing. This means reading XML itself: tags, entities, attributes, comments, CDATA, PCDATA, white space, the works. The input to syntactic parsing is a string or stream of bytes; the “output” is an API — SAX, DOM, ElementTree, you name it.

On the opposite end of the stack, the “top” layer so to speak, is semantic parsing, or extracting the data you're actually interested in. The “input” here is a generic API; in the typical case of two layers, the API from syntactic parsing. The “output” is a domain-specific API or, more commonly, a collection of structured data (usually objects, nowadays).

An example where you may have more than two layers is when you're using something else built on top of XML; the most common case being feeds. So at the bottom layer something will parse XML, then another chunk of code will parse that as RSS or Atom, and then your semantic layer will actually extract the data. At work, we initially made our data available as RDF; so we had a second, “middle” layer (we actually used a JavaScript RDF library) which would parse the RDF, and then we did our semantic parsing by using the RDF library's API. That made our code a lot simpler, but it also made it a lot slower; so we later switched to ignoring the RDF and simply treating it as XML. (Even later, we switched to a JSON format.)

Syntactic parsing: too much structure

Syntactic parsing is what XML is supposedly “all about”; the point being, you don't see it. In our case, at work, it's done by the browser (which gives us DOM with a touch of XPath). In pretty much any other case, it will still be done by your environment (the browser, in our case; JBoss and .Net are other examples), or by a standard library.

Well, that's great, right?

It is, yeah. But it hides the fact that those libraries (even if it's “hidden” in the environment, it's still at some level done by a library) tend to be huge and ridiculously complex. The XML syntax is designed to cover an enormous universe of cases that your program will concretely never encounter, and yet, you have to pay the complexity cost for them.

Semantic parsing: not enough structure

XML shines on xHTML: a markup language for text, where you have arbitrary streams of text sparkled with special instructions about it. Some of those “instructions” are really containers, which have more text and instructions. XML does that really well.

It shines a little less on something like SVG, where it represents arbitrary streams of heterogeneous objects. Some of those contain other objects, and XML does help there.

But the truth is that, for representing your program's data? It probably sucks. Its model is very different from the object model of most (all?) popular languages and frameworks today. In the end, we find ourselves designing our data structures as many as three times: once in the language in which we're actually writing it, one in a relational database, and one in XML. The mappings between them are often poor, since the semantics of the three models are so poorly matched.

Sadly, it would be relatively trivial to pick a lowest-common-denominator model that would fit all of today's popular languages. But XML didn't even try.

That's not the whole of my objection, though. Due to the MASSIVE FAIL in the syntactic layer, we get a semantic layer that's only marginally simpler than it would be to parse a DSL (domain-specific language); maybe less simple, if you use a good library for your DSL. There are about half a dozen XML APIs in wide use; smart people are frequently getting annoyed at the ones already there and coming up with a new, better one. And although a modern offering like, say, ElementTree can be light-years ahead of SAX or DOM, it can't help being clumsy and feeling unnatural to the language; at the bottom line, what it's doing is dressing up a rotting corpse.

Conclusion

Here's a better phrasing then, for the problem of XML as I see it:

XML has too much structure where it doesn't help, and not enough where it matters. One of the reasons I love JSON is that it's not designed to mark-up text, or to transfer “streams of data”; it's designed to transfer objects (JSON means “JavaScript Object Notation”), which means it maps nicely to my code on both ends, whether that code is JavaScript, Python, C++, or even C. (It maps nicely to Java as well, but who cares.)

Alternatives (existing and ideal)

Right now, for real-life code, most places where you're using (or thinking of using) XML would probably be better served with JSON. A few more complex cases may justify a DSL, but I would hesitate a lot before going down that route.

Ideally, I'd like to propose a new format; an “active” derivative of JSON, inspired by the modern practise of “JSON with callback”. Essentially, I'd like to replace JSON's “flat” object notation ({'attr1': 'value', 'attr2': 'value'}) with something which looks like a Python constructor (MyClass(attr1='value', attr2='value')). The pseudo-classes (or pseudo-functions if you're looking at it from C) would play the role that tag names play in XML elements, which would make it even more straightforward to map this data to actual objects on each end.

This would, of course, lose the benefit that “JSON with callback” can simply be executed in a browser. But then again, “JSON with callback” is not formally correct JSON anyway, so we already sacrificed some portability for that ability. “Real” JSON is usually converted to “JSON with callback” by a simple routine on the server side. A similar transformation could convert the format I'm proposing into JavaScript; the fragment above would become: MyClass({attr1: 'value', attr2: 'value'}).

Five Simple Hints For Your Project Website

blog entry posted by lalo (Lalo Martins) on 2006-06-02 14:08:00

Tags:

A Free Software or Open Source project has to have a website, right? Well, no, actually it doesn't; it's perfectly acceptable if there isn't one. But most developers don't seem to think so, and most serious projects have a website, maybe even their own domain.

I don't consider myself any kind of web expert, but I've been a web developer for about ten years, and most of all, I'm a critic user; so I'd like to say a few words about the Free/OSS websphere I've seen.

1. Have the relevant info.

This is probably the easiest one. Here's the list of what the website should say:

  • What is the project; brief explanation required, a separate detailed explanation is a bonus (that's in addition to the brief one, not instead of). If your project is so obscure that people have to understand a few concepts before they can even have an explanation, then write the explanation assuming they do, and sprinkle it liberally with links to Wikipedia or whatever you prefer.
  • Project status. Is it vapourware, usable work-in-progress, stable, abandoned? That's a reasonably important piece of information.
  • Latest release: number (or name) and date, download link(s).
  • Contact info: how to report bugs at least. Maybe you or your community provide some measure of support, maybe not; but at least a channel for reporting bugs is an absolute must. For a more normal project, where to ask questions (mailing list, irc), and how to contribute, are also common.

And the nice-to-haves:

  • Documentation; even if your documentation is very good, mirror it online.
  • Brainstorming area: a wiki or board is great as brewing grounds for ideas which may become documentation.
  • Browseable, searchable bug/defect/TODO database.
  • Marketing area: say why your product is great, dazzle visitors with screenshots or screencasts or code samples or whatever, make them want it.
  • Egotrip area: give some praise to important contributors, including yourself if you wish. That gives people an incentive to contribute.

2. Don't make the news page the front page.

Your site is not a blog. Visitors have no interest whatsoever on knowing that a new version is out, or one of the developers is on vacation, or the lead developer's dog had puppies, if they don't know what your project is yet.

The front page is your introduction card. It's there to catch the visitor's attention. It should start by explaining what the project is, and maybe go on about why it's the best thing since sliced cheese; if your project has good eyecandy, include screenshots; otherwise, try to think of something else flashy and shiny to include.

It's ok to have a box with the latest news somewhere, maybe as a portlet; as long as it's not the dominating element in the page.

If your “site” is a single page, then replace “front page” above with “first section at the top of the page”.

I know many people think the news are the most interesting part; they certainly are for you, since you're already an user and supporter. Returning users can easily bookmark the news page rather than the front page. Even better, provide a syndication feed, so that they don't have to.

3. Design portably, design accessibly.

A website for a Free or Open Source software that only looks right in a non-free browser is an oxymoron. Test it in Gecko (any of them; Firefox, Epyphany, Galeon, Seamonkey, you name it) and in Khtml (either Konqueror or Safari, preferably both if you can have access to a mac).

Before the launch, give it a little go on Opera and Explorer too just in case. If it looks horrible in the proprietary browsers, you may or may not fix it, it's your call; if you don't, then consider adding a “Get Firefox” button. However, make sure your site is navigable with the non-free browsers, even if it looks like last week's leftover sushi in a train crash.

Please consider giving your site a go on a text-only browser too. That is for two reasons:

  • Text-to-speech browsers used by blind or sight-impaired people are very similar to text-only browsers like lynx. If you can figure the site out in lynx, then a blind person will be able to learn about your project. If your project is something that is potentially specially relevant to visually impaired people, then you should probably look up accessibility hints on the web, and try your site on emacs w3 (w3 with emacspeak seems to be the preferred browsing solution for visually impaired true geeks).
  • If your project is something I might want to run on a server, I may sometimes want to find something on your site with links or w3m over ssh. I expect that to be a rather common practise.

A few odder corner cases can be deduced from the above. If your project is something that people might want to use on their phones, make sure it looks decent on phone web browsers; and so on.

Corollary: do not under any circumstances use Flash or some other non-free non-sense like that. Preferably, avoid Java too — most Free Software and Open Source users will have a JVM, but a surprisingly high number of them don't have one in their browsers.

4. Think trough your information paths.

Serious website design includes tracing information paths: what kinds of information or services may visitors be looking for, how do they get there, and how easy it is to discover it.

Of course, doing the full-fledged thing may be too much on what is, for most people, a hobby. But do try to think up a few cases. Pretend you know next to nothing about the project — or even better, ask a friend, your sister, your dad, whatever. See if they click on the right menu items to get to the information they want. If they have to ask you “how do I...”, you're in trouble; specially if it's “cool, but how do I download it?”.

If you're into UI design, think of navigation in your site as an UI (because that is what it is). Feel free to do menus and submenus. If your software has a GUI — specially if it looks good — you may want to make the site look like the software, or resemble it, or reuse elements from its interface that people will remember. For a good example of the latter, see the gaim website (the icons in the main menu are taken from a Gnome icon theme).

5. Simple is fine. Now go write software.

If you whip up a few webpages — and you respect the rules above — then you're fine. Doesn't matter if you don't have a single line of css or javascript, if your site is mostly text, if there's no spiffy or shiny whatsoever. Don't let people convince you otherwise. If it conveys the information you need to convey, and you like it, then use it. Spending too much time on a flashier website will distract you from what you actually intended to do when you started (or joined) the project: write software.

Of course, if someone offers to help, then you can weight that as any other offer for help. Will it end up taking more of your time than you have, or will the contributor be able to handle it mostly alone? Does the contribution add real value? If you're getting a wiki, CMS-like news posting system, syndication, screenshot gallery, or documentation archive, or anything that is actually going to be useful to your users and potential users, then it might be worth the trouble, more so than if it's about a tableless layout or smart ajax menus.

The language of my dreams, part 3 - Runtime

blog entry posted by lalo (Lalo Martins) on 2006-03-14 13:59:10

Tags:

I thought this series of posts was finished, but I caught myself thinking about it repeatedly :-) so here is what my “preposterous little brain” came up with in these last few days.

Revisions to the last two posts

I studied LLVM a bit, and decided, if I was to actually implement the language (which I'm now calling Dream), LLVM is not a good match. The advantages it offers on top of using the gcc framework are almost totally irrelevant for this project.

In fact, ideally, Dream should be written in itself, and have its own machine code generators; but it should also have a C generator, which pumps C into gcc to get machine code, for two reasons: first, shipping the generated C code for the compiler, is a great way to bootstrap from source; second, it would help on platforms for which a machine code generator doesn't exist yet. This approach is not my original invention - I'm copying it wholesale from Pypy. (Although Pypy does have an LLVM backend as well.)

Also, upon later reflection, I decided the C-like syntax for message passing (the last code example in the previous post) is Evil™, and detracts from the stated goal of “regular syntax”; it's out.

The binary format

I mentioned Dream having its own binary format. Why, and what does it look like?

It's important to bear in mind, this is a “pure” object-oriented environment; the binary formats we have now (ELF and EXE) are designed around the needs of C, with long-running processes, lots of functions identified by name only, and variables which are blobs of binary data. This is simply not appropriate for the kind of information we want to store.

A Dream binary file has to be essentially a persisted object. Technically, it will actually be a tree of objects; the “main” object represented by the file, plus the other objects needed to reconstruct it (its attributes and message handlers, and their attributes and message handlers, and so on). Notably, what's probably the most “interesting” part of the file, the machine code for any code objects contained in this tree.

Now, each Dream object consists of five pieces of information:

  • links to bases (what Self calls the “parent” link)
  • an opaque blob of bytes, with its size information (usually 0)
  • a symbol table mapping (interface, name) pairs to other objects - the attributes
  • the method handler table, mapping (interface, message name) pairs to other objects (usually code); this may include additional metadata (such as the interface of the return value, if any)
  • an index into the machine-dependent section of the file (normally unused, except for code objects)

A link to another object may take a few different forms:

  • an index to another object stored in the same file.
  • a “magic number” pointing to a built-in “shortcutting” method implementation - for example, the run message of code objects will make a call into the machine code version, which will have been stored elsewhere in the file (so that it can be loaded into a “code” segment, on hardware that uses segmentation), while arithmetic operations on a number object will resolve to a handful machine code instructions.
  • a reference to an object in a different file. This is the delicate part - there are issues to be carefully thought about, about search paths, about keeping these files useful when you move them between machines (which possibly have different directory layouts), and the most important - if an object N is referenced as an attribute of both A and B, if N itself is not persisted in its own file, and you persist both A and B to their own files, which one gets N? The answer to this will probably have to do with weak references.
  • a name (indirect reference) - mostly used for aliasing attributes, and more usually, for referring to registered interfaces.

A somewhat important attribute of this format is that the machine code is stored in a separate section from the object data; code objects will have their source in the “byte blob” slot. The machine-dependent section of the file is treated as a cache; you may move the file to another machine of a different architecture, or different settings, and the source will be transparently recompiled. You should also be able to chop off the code section entirely, forcing it to be regenerated - useful for distribution.

Binary and source

With this layout, a Dream file is not only a viable binary format, but also quite decent as the source form for development. There would be little sense in having the source for the code objects scattered in little text files, only to be collected into a Dream file by a tool. Rather, it's best to design tools to edit Dream files, tweaking the object links in it and editing the source for code objects; there could be a GUI interface, an emacs library, a set of command line utilities, you name it.

One consideration raised by this is that the format would then need to be amenable to revision control. There are two obvious ways to do that; either make the format text-based, or at least line-oriented (possibly XML - yuck), or, by having a C library that manipulates the file format, implement plugins for extensible revision control systems (like bzr) to handle Dream files smartly. (Which in this context would mean a lot of things - not only track object links and code source, but also, ignore the machine-dependent section entirely.)

Dreamshells

This runtime model of “pure objects”, of a bunch of persistent objects to which you send messages, does not map well into the OSes of today. At some point, there must be something with Unixish semantics - an executable which is loaded into a process, and which can be accessed from the shell or desktop environment.

This is the role of the Dreamshell: essentially, an object which has the ability to be saved as a “native” executable.

The Dreamshell interface would look more or less like this:

(string) config_name
return a filename to use when looking for a configuration file. On Unix, for example, if this message returns "bleh", we'd look for /etc/dreamconf/bleh, and ~/.dreamconf/bleh (if both exist, load both, in this order).
(sequence) config_data
return a sequence of config_data objects, each having a short name, a long name, help text, and some data on what to do with it if found (exact semantics to be defined; look at good command line libraries, such as Python's optparse, for inspiration).
(string) copyright
copyright info to display if requested on the command line.
(string) version
version info to display if requested on the command line.
(string) help
information to be displayed in --help output, after the list of arguments.
(integer) run

actually do whatever this Dreamshell is supposed to do (not called if the command line had --help, --version, or an error).

Gets arguments:

(string_sequence) raw_command_line
the unprocessed command line
(string_sequence) args
what was left of the command line after processing configuration
(string_mapping) environment
the environment
(sequence) files
the open file descriptors (wrapped in file objects)
(configuration) options
the data found on the config files and command line

An introspection call somewhere can dump a Dreamshell object into a native executable.

The system distribution would ship with a few useful Dreamshells; one to send a message by name to the object represented by a Dream file, one to listen on some network port and do distributed object brokering (probably using VIP), one to display a bunch of View (UI) objects on an X server (acting as a bridge between X and Dream until killed).

An hypothetical Windows distribution would rely on a different kind of object, which would bridge Dream with DCOM, rather than generate executable files.

Daydream

With all this said, it feels to me that it would be just too painful to write C code for code objects that need low-level logic or optimisation. Not only painful for the programmer, but also for the maintainers of the language itself, as the runtime would need to be able to compile C code.

Rather, I would again go the Pypy route, and implement a limited dialect of Dream, which would translate almost directly to machine code. I call it “Daydream”.

Here, a variable is not an object reference with an interface; it is a primitive type (integer, float, boolean, character, or a vector of one of these, or “object”), plus a (byte) size and a (vector) length, plus a reference to an object (usually “self”), plus an offset into the byte blob of that object, or just a stack address (for locals). If you need anything that can't be expressed with these primitives, you shouldn't be using Daydream ;-)

Note

The size property is the byte size of the individual elements, not of the total; the length is 1 if the variable is not a vector. So a single integer could be (integer, 32, 1), while a vector of 5 longs could be (integer, 64, 5). The sizes are explicit, so that the code is sanely portable between hardware platforms.

An “object” variable only has a few operations. You can perform some basic introspection (check types, check for attributes or message bindings), you can read or set attributes, and you can send messages. Sending a message looks more or less like:

send (interface) my_object "message_name" argument argument.

(the usual rules about syntactic noise apply.)

Notably, the Daydream runtime would have a way to open a shared library, either by absolute filename or by using the system search path, and wrap it as an “object”; you then could introspect it to check for the presence of symbols (attributes), get attributes and cast them to a primitive type, and send messages (call functions), casting the result. The object would be read-only, so trying to modify it would fail.

Maybe: smarter low-level data

Instead of an opaque byte blob, the “low-level” blob of an object could go all the way down to C, and be in the form of a symbol table, with name, primitive type, and value. I very much prefer not to do this, because it would encourage more complex logic to be done in Daydream, which is not the intent; but it would tie in better with the open-ended type system (as in, you could use two code objects that were written for different structures). The nag is, I'm not sure this is a good thing :-) I'd rather keep those “optimised” objects very simple, which means the Daydream code has to know exactly what low-level data it's messing with.

Syntax notes

A few ideas about syntax.

The syntax design revolves around three goals, in order:

  • Readable. Code should be more or less self-documenting; looking at a piece of code, you should easily infer intent. (Although no effort would be made to make it impossible to write bad code. All efforts in this direction that I've ever seen only end up creating cumbersome constraints.)
  • Learnable. This language is designed partially to attract new programmers, so it should be usable as a first language; learning to program in it should be easy, not only for people who are already programmers, but also (and almost more importantly) as a first language.
  • Writable. Once you have wrapped your brain around a few important paradigms - passing messages to objects, and Dream's concept of a variable - the syntax should be quite similar to how you think about the problem in your head; it should feel natural to write it down, it should feel like you're writing a message expressing your thoughts, and not like using some arcane structured secret code. This not only helps programmers be more productive (write faster), it also helps on the first goal (make the code readable and intent-expressing).

One of the corollaries that sprout naturally from these goals, is that syntax should try to model colloquial communication between humans, and not mathematical language. This is the rationale, for example, behind function calls not having the familiar obj.foo(arg1, arg2) syntax, which is borrowed from algebra, but feels consistently alien and weird to non-programmers who are looking at source code or learning to program. The way you think about this is something more similar to foo obj with arg1 arg2, which is valid Dream syntax if you only append a colon to with.

Another decision that follows from this is the use of the period (.) at the end of statements, rather than the semicolon inherited from C and Pascal. In colloquial communication, a message with a lot of semicolons is badly written; and as illustrated by this paragraph, statements separated by a semicolon are expected to be closely related. We have been using the period to end statements since we learned to write; let's keep it.

Returns

A code object returns the value of its last statement. This is mostly for the sake of very short code fragments; for longer code, an explicit return is recommended.

To return explicitly, begin a statement with an equal sign (therefore “assigning” the following value to the code output).

On the other hand, a message handler may be declared “asynchronous”, in which case it does not return. The compiler should probably issue an error or at least a warning, if an explicit return is used in code associated with such a message.

(Asynchronous messages may be tagged as such using the slot for the return value interface.)

Typing

Variables and arguments are typed (by interface). This is done by prefixing the interface name in parenthesis to the variable definition. The parenthesis are for the sake of argument lists - since the comma is optional, you need to be able to tell easily what is an interface and what is the name being defined. (But it then allows me to support sequence unpacking, if I decide so.)

You can omit the interface, defaulting to (object).

You can redefine a variable; you might want to do that, for example, after determining that the value is, after all, of a certain type.

An expression of the form (expression :: interface) (parenthesis not optional) is a cast. You shouldn't use it often, but sometimes you might want to send a message to an object using a different interface that you know it has. Pronounce it as “expression as interface” (eg, “first_name as sequence”).

Reserved symbols

This is the complete list of reserved symbols up to now: {}()"=:. are meaningful, ,; are syntactic noise, # is a comment (to end of line, like in shell or Python).

The curly braces delimit a code block. Parenthesis have a few uses - grouping expressions, declaring types, and casting (although casting is only a special case of grouping). Double quotes delimit a string; the string syntax is roughly similar to Python's unicode literals, sans the leading u, except that it can go multi-line.

The equal sign is used for attribution (and explicit return which is a special case of attribution); it is special in that it's only reserved when by itself (delimited by whitespace), so ==, +=, etc are not reserved. (This sounds dirty. Maybe use := instead?)

The colon is probably the most overloaded one. It's heavily used in the syntactic noise system - if a token after the second ends in colon, it's noise; if the first or second ends in colon, it's the message name. Doubled, it's used for casts. And as the first token in a code block, it starts an argument list (see next section).

The period, finally, separates statements. Like the Pascal semicolon, it's a separator and not a terminator - meaning, it's optional on the last statement of a block. The period has to be followed by whitespace or end of input, to disambiguate from a decimal point or ellipsis (or other uses of the period character we may introduce later).

Syntactic noise is ignored; this is important, because if you're using it as a separator, you have to follow it with whitespace. The expression 1, 230 is two numbers (possibly two arguments to a message), while the expression 1,230 is a single number.

Arguments

Argument lists are defined by having a colon as the first token in a code block:

(code) add = { : (number) a (number) b : a + b }.

I'm still uncertain about keyword arguments; coming from Python, I certainly know their value, but also coming from Python, I recognise that they usually appear on overcomplex code, that could be done better using more object-oriented techniques - or, for some uses (like the datetime constructor), multimethods.

You set up a default value, if you want one, inside the parenthesis:

(code) add = { : (number 2) a (number 2) b : a + b }.

For variable-length arguments, you use the ellipsis:

(code) add = { : (number) a ... :
  iterate_over ... with: { : (number) n : a += n }.
  = a
}.

Here's a possible implementation of the new method:

: ... :
new_obj = create_empty object.
iterate_over ... with: { : base : new_obj add_base: base }.
# only initialise after adding all bases -
# because one base may care what other
# bases the object has
iterate_over ... with: { : base : base, initialize: new_obj }.
= new_obj

Scope

There is no built-in syntax for “give me attribute bar of object foo”, because we don't want to encourage code to mess with attributes of objects other than self (sorry, Python). Of course, you can get to these attributes, using introspection (something like: (foo :: object) get_attribute: "bar").

Temporary (local) variables in Dream (not Daydream) don't live in the stack, but on a global soup on the heap, and are garbage collected. This is for the sake of nested scopes; consider:

(integer) number = 1.
(code) silly_code = { number + 1 }.
number += 1.
= silly_code

Since this code object returned, then number can't be “alive” in its stack frame anymore. But the code object returned at the end has a reference to it, so it must be somewhere. (Those who ever worked on a Lisp implementation, or even Python, will be familiar with closure theory, of which I just scratched the surface.)

For consideration: what happens to number if I persist silly_code?

So here's how a name is looked up to resolve to a variable:

  1. reserved names. There's only a few of these; null, unknown, true, false, self, context.
  2. local variables from the current code object (including arguments).
  3. local variables from enclosing code objects (nested scopes).
  4. attributes of self.
  5. interface names registered in the interface manager; an interface doesn't have to be registered, but registering it makes it accessible in this fashion, therefore much more usable.
older posts