Metalinguistic Abstraction

Computer Languages, Programming, and Free Software

Archive for June 2007

DeckWiki: Proposed Collaborative Presentation Creation

with one comment

This is something I have been turning over in my head for at least six or seven months as I discovered the prevalence of Microsoft PowerPoint™ in business settings. There are a number of shortcomings:

  • Big binary files exchanged by email[0]
  • Relatively weak version control/merging capabilities
  • Out of date borrowed slides[1]
  • Lobotomizing an old set of slides just to get the same style in an integrated Microsoft PowerPoint™ file
  • Having to reprocess slides manually to get them to conform to some new style
  • Weak metadata/commenting capabilities, leading to bad slides
  • Lack of visual support for “off the rails” discussion, leading to unnecessary hand waving
  • Almost no serious collaboration support whatsoever[2]
  • Doesn’t facilitate location of useful, other slides available in the organization

This is kind of a solved problem, if you could get everyone to accept using something like Prosper and a bunch of TeX files on some sort of version control plus a few scripts. But that’s not going to happen. Even I will admit that it would probably be a little painful.

So let’s discuss something viable.

Many people at this point are pretty familiar with the idea of a Wiki…reading one, at least. Instead of some heavy-handed new tool that has to convince everyone its way is the One True Way and takes no prisoners[3], I suggest using a wiki package or writing something employing the wiki model to facilitate authorship of slides for presentations. Most professionals in technology sort of understand the idea of a wiki, even if in practice they never use them or contribute to them, but at least it’s not completely alien and probably not too scary to new users. I think. Let’s start with that premise. Let’s also presume the Wiki has a notion of “Presentations,” or a path through a series of presentations, including the empty presentation (no pages) and automatic singleton presentations (consisting of one page, every slide is a presentation). This is an important recurrence relation, because I will henceforth never organize things in terms of slides except when referring to the current way of doing things: in this model all presentations will be formed by composing presentations.

Let’s consider how we can address the issues raised above:

  1. Big binary files exchanged by email
    If the presentations are held in a wiki, each presentation can be maintained independently and there will be a common place to view and download presentations. Hopefully this will prevent attaching big files and sending them around all the time. By exploiting page versions it is possible to make sure that a given presentation will remain with the same content for all-time.
  2. Relatively weak version control/merging capabilities
    Since each presentation is tracked and worked on independently, one would only need to be concerned about clashing of the revision of single atom of content. The definition of an “atom” is most obviously at least on the per-page-level, but could be as fine as the word/paragraph level by using diffing/patching, although this gets more complicated. This also doesn’t seem to present a humongous problem on Wikipedia, even on the busiest pages. They still seem to get contributed to and updated.
  3. Out of date borrowed slides
    One can easily obtain a list of updated sub-presentations for any or all presentations and accept or reject changes. The result is a new, unique presentation; old presentations are never lost, so reverting is easy.
  4. Lobotomizing an old set of slides just to get the same style in an integrated Microsoft PowerPoint™ file
    Styles should only be loosely connected to a presentation. One should be able to paint a new style over any presentation unit, so absolute conformity is an option.
  5. Having to reprocess slides manually to get them to conform to some new style
    Simply apply a new style to the presentation
  6. Weak metadata/commenting capabilities, leading to bad slides
    Right now presentation slides are meant as much to be distributed and read as much as they are meant to be shown in person. The results are slides with entirely too much text and visual noise, distracting the viewers during presentations. Cutting out some detail upsets some stakeholders because then the slides no longer communicate everything that was said during the presentation in person. The relatively wimpy commenting facilities seen in most presentation packages doesn’t seem to please anyone, but an up-to-date nicely-formatted cross-referenced wiki-page associated with each presentation may be better. This idea is not new at all[4], although its proper implementation may be tricky.
  7. Lack of visual support for “off the rails” discussion, leading to unnecessary hand waving
    If a presentation goes off the rails — and this is not always a bad thing — one is often left without visual aids must do blackboarding and waving of hands. It’d be better to have a big list of short presentations that one is at least moderately familiar with so that if discussion wavers to another engaging topic that deserves more through discussion one can pull up more appropriate presentations and documentation. Another way to use this is to include “see-also” sub-presentations that one can visit and then jump back from the aside back to the main presentation. In this way a presentation flow resembles a NDFA.
  8. Almost no serious collaboration support whatsoever
    All this version tracking buys us a nice, fluid system that allows for synchronized updating of artifacts with a number of authors that far exceed two (as is, in my experience, the limit with more traditional methods). Wikipedia is an empirical example of this model working.
  9. Doesn’t facilitate location of useful, other slides available in the organization
    This is an exciting one. With all this graph connectivity information searching and finding more information may be a lot easier. One can also tell roughly how much a presentation is being used elsewhere. There are many potential uses for this.

Here’s a conceptual sketch of a more formal treatment[5]:

presentation -> {id: Id, presentation: [From], presentation: [To], page: P, 

This slideshow could not be started. Try refreshing the page or viewing it in another browser.

: Ancestry} presentation -> Nil id -> UniqueIdentifier (probably 64 bit integer) page -> {Version, Data} (A version and some payload)

A presentation from Nil to Nil is the empty presentation. If From is Nil, then this presentation is the head of a presentation, if the To is Nil, then it’s the terminating point of a presentation. The Ancestry variable allows for tracking of the evolution of presentations over time by showing what presentations derived the current one[6].

Addendum:
Another interesting idea is to break free of stack-based thought and use continuation-style thought in order to model presentation traversal, but I suspect that will break the minds of many people. That way you may not jump back to the datum you started at, but somewhere else entirely…possibly never to return, or perhaps carrying all your exit continuations along with you for possible use. In any case, the ‘presentation’ datatype can support this since it has an ‘environment’ (the page and version) and a ‘label’ (the From and To nodes). In fact, all standard linear presentations are similar to invoking a continuation that never calls a return. This is another way to think about this. It’s certainly all within reach if presentations are simply recursively connected to presentations and should you define a ‘page’ datatype an adequate ‘environment’ and the presentation datatype (which includes the page) the ‘code.’ In any case, thinking of it of a NDFA (as mentioned above) is probably easier to understand and a less-stretched analogy, but the idea of carrying multiple exits is an intriguing one that deserves at least cursory attention. An NDFA analogy would also suggest a simple and well known form of visualization.

I have changed my type definitions somewhat to give more indication of the many-entrance many-exit nature of presentations by listifying To and From, although the same thing could have been accomplished by a large number of presentation type instances. I think this is more like what an actual implementation might look like. It may make ancestry tracking less hairy, too. My original goal was to define as little as possible to get things done, but then I decided this was silly and I should be spending a few more characters to more adequately carry the idea.

Footnotes:

[0]: Now slightly less-binary with the new Microsoft Office™ 2007 XML format. If anyone can hope to actually understand it in a reasonable amount of time.

[1]: Common scenario: “HR says we’re at X employees…oh, that’s old, we’re actually at X + N, let’s move on…” In isolation this is really not so bad except N is sometimes wrong due to misremembering and with enough such errata it is distracting. There’s no obvious reason why it has to be so hard to stay up to date.

[2]: Especially with over two people. The track edit feature is useful, but did you ever try giving slides out to five people and merging the changes?

[3]: Sound familiar? It’s Lisp vs UNIX again

[4]: Possibly a variant of Knuth’s notion of Literate Programming

[5]: This is hand-wavy amalgam of syntax/semantics borrowed from Prolog, ML, Erlang, Lisp (for Nil only, really) and/or Haskell. If anyone complains I’ll provide something more rigorous. My variables start with capital letters, my types start with lower case and when describing a variable use a colon, and lists are denoted with […], so

This slideshow could not be started. Try refreshing the page or viewing it in another browser.

is a list filled with “presentation”-typed things. -> is my reduction symbol. Braces denote tuples. Comments are in parentheses

[6]: One of the interesting problems presented here is that Wikis generally attempt to converge on an authoritative page that everyone sees as opposed to branching and derivative works, which itself can create a mess of cross-generational merging. The problem can be seen as the same one that plagues the distributed vs centralized version control camps. As merging is still a problem that seems to not have been solved to everyone’s satisfaction, implementors would do well to pay attention to the subtle issues engaged by both camps in that community in an attempt to make an informed decision.

Advertisements

Written by fdr

June 29, 2007 at 12:05 am

Posted in projects

SAGE Computer Algebra System

leave a comment »

While skimming the RSS feed for Lambda the Ultimate I spied an entry about a functional definition of TeX’s formula typesetting algorithm. For me the real gem here was not some insight about TeX, but a solitary (at the time of this writing) comment about the CAS SAGE (and its integration with jsMath). SAGE is pretty interesting, but we’ll get to why it’s interesting in a moment after discussing some other well known CAS and scientific computation programs.

I have never understood some people’s obsession with Matlab, Mathematica, and similar tools. I mean, I kind of do: they are allegedly well designed tools, the former providing well-optimized versions of just about every well known numerical recipe you want. Matlab was popular with the EE guys in college for plotting and numerical methods, and I know that the Mathematica guys can do some impressive symbolic manipulation and interactive visualization with relatively minimal effort…

But something feels wrong. It’s not just a matter of the proprietary nature of these tools, although that and the high price can sometimes be most…dissuading. Especially when the pricing information is hidden after a login screen, or one of the first calls to action is to request a quote. One knows that they are not in for a good time under these circumstances, generally speaking. But projects like GNU Octave also set off my radar, even though it has low cost. Why?

The real answer is the “tool among tools[0]” philosophy. This is also possibly the main reason why Lisp lost out to C and UNIX, as well as the relative obscurity of Smalltalk, despite its alleged brilliance (it is not something I have used myself). Part of this is cultural (GNU Octave: a scientific computation system mostly compatible with MATLAB, nowhere in the mission is high levels of integration) and part can be more sinister motivation (MATLAB has a negative incentive to ease transition to Mathematica and vice versa). Mathworks and Wolfram are both given incentive to instead make their own platform as attractive as possible and trying to one-up everyone else instead of wasting time making sure that everyone can move their codes around: the latter is just not good business.

Enter SAGE. This package is shamelessly and unabashedly about integration, much in the Python philosophy. In fact it’s written in Python with liberal FFI bindings using the most excellent Pyrex extension language and compiler. It has unidirectional bindings to an impressive number of CAS systems (including many I have never heard of) with some basic parsing/data type normalization. And, most importantly, it avoids the problem of getting yourself locked into a particular CAS with no hope for escape[1].

For me personally it may be useful for plotting; I still typically write my actual math codes in C (maybe Lisp for prototyping, but that’s another post). I will have to take a look. I hope that in the future that they may gain some interoperability with statistical tools such as SAS and R (which already has a Python integration package that, alas, only works on Python 2.4), as I think this would only increase the leverage of this tool tremendously. One hopes that the proprietary vendors won’t take offense and start trying to break things, ala Microsoft. If things get even better then maybe they’ll pay due notice and at least try to avoid breaking things all the time. At best, they’ll start to feature SAGE (or similar system) interoperability as a feature, and the world may be better off.

Footnotes:
[0]: I first heard this phrasing from Guido van Rossum when he gave a talk at Berkeley.
[1]: As detailed by another blogger, who also gives a different take and overview of SAGE. Actually, his is a lot more detailed in just about every respect, so I’d highly recommend reading this if you are seriously interested.

Additional Resources:

Written by fdr

June 28, 2007 at 1:51 am

Posted in mathematics

KPMake: A Proposed Dependency Resolution Tool

with 3 comments

One of the insane ideas I have floating around is to write yet another Make replacement. My rationale is as follows:

  • Make is simple, but not very powerful
  • Absurd macro languages and generation facilities grant enough power for “real work,” but are painful and ugly.
  • People widely use ant, probably a step down from Make if not for the incredible dedication to tool support to make it convenient for Java somewhat.

“But wait!” you say, “haven’t you ever heard of SCons or Rake, you imbecile?”

Why yes, yes I have. The nice thing about these approaches is that they recognize that most non-trivial build-systems are going to demand the power of a “real” programming language; certainly not the looseness of shell scripting nor the inflexibility of ant buildfiles. (Quote a colleague: “it’s like Lisp, without any of the advantages of Lisp!” ) The downside is they lose out on clarity. A hand-written Makefile tends to stay very simple whereas any convoluted thing can happen in a SConstruct file (the rough equivalent to Make’s “Makefile”), leaving the user doing some head scratching as to why something-or-other happened or why-or-not something built and how or why. On the plus side, the machinery of SCons also gets you things like a “clean” target for free. Another downside is agnosticism: Makefiles, thanks to their simple nature, can be easily written to control complex toolchains that do things other than compiling programs[0]. The barrier for entry in writing a builder in SCons is considerably higher than simple rules as dictated by a Makefile such as

foo : bar baz
	cat bar baz > foo

(Yes, I realize you can use automatic variables like ‘$@’ and ‘$^’, but let’s keep it less cryptic for now)

I have been working on and off on a project that uses a Makefile to orchestrate around a dozen individual tools. Not only is it robust, but after almost a year of inactivity it’s possible to read the Makefile and recall the roles of all the tiny moving parts and how they relate to one another. I then went back and wrote documentation. But now I want to improve on Make by allowing for complex behaviors while retaining succinct, declarative syntax and keeping the system easy to debug. How do I portend to do this?

KPMake, or “Knowledge Powered Make” is one possible solution. In a nutshell: I want to leverage state of the art symbolic AI techniques to solve a relatively mundane problem of dependency resolution while including modern explanation generation facilities to increase user understanding of why any particular actions were taken and to help debug these actions. I will probably use the KM knowledge base, partially because of being exposed to it at work and partially because I know it has been successfully been used in large projects such as Vulcan Inc’s ambitious Project HALO. I also happen to know it’s available as a single lisp file and has practically no set-up or portability problems. The base package for KPMake should simply be a distribution of KM, some basic models to handle basic generic dependency resolution, and macro/procedure support to make writing KPMakefiles easier. Knowledge models to build for say, particular languages (such as a gcc or a javac model) will be distributed and maintained separately and ought to have an eye towards making things even more convenient.

Addendum:
A neat feature that is enabled that I didn’t include in this writing to begin with is using situation calculus to allow programmers to load, save, and debug situational traces as well as perform advanced error recovery. KM supports situational reasoning and simulations that can be used to test build scripts (and write fuzz testing!) to make sure they respond in at least a reasonable way to unexpected events as well as increasing visibility of every step of the build process: right now one generally has to do a lot of manual logging to get a trace of what’s going on or, if one is lazy, set a breakpoint and only be able to view a few slices in the process. You can also have your build engineers in India and send your broken build “core dump” abroad and mostly avoid the “well it works on my machine” scenario.

Footnotes:
[0]: I owe my appreciation of this technique to Professor Strain. Imagine a Makefile to build the program, run the tests, spit out any pictures, compile the TeX sources, and finally deliver you a postscript file to send off to the journal. I’m sure if submissions were something that had to be done often there would be a “mail” target that’d also deliver the resultant postscript to the relevant journal(s).

Written by fdr

June 27, 2007 at 9:10 pm

Posted in lisp, projects

Why did I start this thing anyway?

leave a comment »

Great. Another blog, just what the world needed. I have been persuaded to start this syndication partially because I have been stumbling across so many interesting things in the space of computer programming and computer science and yet have no way to store or search them all and partially for the reasons detailed elsewhere. Although I hope I will be able to avoid a job search altogether and always have a colleague with an interesting workplace that they can refer me to I figure simply putting my thoughts down for documentation couldn’t hurt. I may have gotten lucky finding an interesting job out of the gate from Berkeley, but one should never count on luck all the time…

I also figured that it would be possible, if improbable, that someone else may be able to use my writings.

And so, let us take this forward…

Written by fdr

June 27, 2007 at 8:38 am

Posted in meta-blogging