Graphical Programming: I Really Hope This Is the Future

Since March 2013, your author has been working on a software project with a fervor not previously experienced by said author. This work has not been a never-ending drive in a certain direction with a clear vision; rather, I’ve been engaging in copious exploratory programming. Over spring break, I got a working pre-alpha, and I thought the basic architecture was fairly solid and wouldn’t change much. Since then, every aspect of the stack has been swapped out for something new at least once. All the data structures are different, the important program logic is solving a completely different problem than it was, and the interface has gone through iteration after iteration of paradigm shift.

One thing, however, has remained.

Purpose

The purpose of this project is to assist the coder in developing more complex code faster and easier by clearly exposing the structure of the code.

To explain the solution I propose, I will first explain the particular problem I see.

ASCII Is Awesome, Just Not That Awesome

In the Good Old Days(tm), people wrote programs in binary, and the great masters of the era were those who achieved the greatest oneness with the computer. By which I mean, the interface between humans and computers followed a very simple rule: make something the computers can understand, and then teach the humans to speak the language. This system worked rather well because humans have an amazing ability to learn even the most abstract skills remarkably well.

In time, though, with advent of assemblers, and later compilers and interpreters, we moved the frontier between humans and computers. Now, the computer does much more work to try to understand what we mean; likewise, humans do much less work to try to explain what we mean. As abstraction is layered on abstraction, humans can use simpler statements to describe more complex programs than ever before.

However, and this is the meat of my criticism of the state of programming today, our interface for programming is still based on text.

Text is awesome. I much prefer the terminal over GUIs. I write documents in LaTeX using vim and avoid Microsoft Word and LibreOffice Writer like the plague because they’re just such a pain to work with. Nearly without exception, if I’m given the option to write something in plain text rather than “visually” or in “formatted” or “rich” text, then I’ll write in plain text, even if that means throwing in a little code. For a combinatorics problem I’ve been working on recently I needed to draw a series of graphs, so what did I use? Tikz in LaTeX. I find that almost without exception, GUIs (1) do not allow the kind of manipulation I want, (2) hide important information, and/or (3) are simply frustrating to use.

Is this a problem inherent in graphical user interfaces, though? Is text really the best interface possible to interact and communicate with computers?

I submit that it is not.

GUIs have been misused.

A Defense of Graphical User Interfaces

Why do I use text programs rather than graphical programs? As outlined above, there are essentially three reasons.

Programmability

This is the most important isssue. The terminal is awesome because of its composability. Basically, the Unix pipe and backticks allow any command to be used as the input to any other command, and this allows the user to take relatively simple commands and mold them together into a one-liner that solves exactly his or her problem. When a new command is added, it integrates perfectly with the other commands, so it immediately gains all the power contained in the other commands. This allows the user to learn one set of commands and everything else just plugs right in.

I use vim for everything from configuring programs to note taking to todo lists to writing math papers to programming. I even use vim macros as a sort of scripting language if I’m trying to do something that’s a little too complicated for sed. Regular expressions make manipulating many different kinds of data — anything from file lists to code to csv files — surprisingly easy.

I like LaTeX better than Word not because I can use a plain text interface. I like it better because I can program in it (either directly or by writing a program that spits out LaTeX code). And being able to program in it gives the interface more expressive power. It is easier to tell the computer what I want it to do.

Text interfaces in general usually exhibit this kind of programmability. GUIs, as a rule, do not.

This problem, however, is solvable. Inherently, I don’t think it’s any easier to make a text interface programmable than it is to make a graphical interface programmable. However, since the text tools are already available, it is easier in practice to create a programmable text program. The solution for GUIs, then, is to create the tools necessary to program them. That’s a really vague description, but that’s because this is the hardest problem. I will describe my solution in detail in another post.

Show Me Everything

WYSIWYG editors are basically evil. I don’t want you to hide all the complicated formatting stuff in the background. I don’t trust you that much. One of the most frustrating experiences is, in Microsoft Word, to try to put a picture exactly where you want it. There usually is a way to make it work, but it is so needlessly complicated that I get insanely frustrated. When I drag this picture right here, why do you put it over there? The problem is not so much the putting it over there as it is the not telling me why. Please, if I’m doing something that doesn’t work in your framework, then let me know.

In LaTeX, every little bit of formatting is eminently visible. If there’s a problem, then at least I know exactly what the inputs are that give me the incorrect results, so I can know what to try to change to fix it.

However, this is clearly just a misuse of GUIs. Is there any inherent reason why GUIs must show less information than text interfaces? No. In fact, I would argue that they ought to be able to show more. This is a problem with current GUIs, but this is very solvable.

Physical Interface

GUIs generally depend on the mouse for interaction, and that’s generally a bad thing. What is the difference between the mouse and the keyboard? Basically, the mouse gives you pseudo-analog interaction, and the keyboard gives you a lot more buttons. Thus, the mouse should be used in cases where you want analog interaction, and the keyboard should be used for making discrete choices. That’s why manipulating menus with a mouse grates on me so much: it’s absolutely the wrong choice of utensil. It’s like eating soup with a fork.

There are good uses for mice. For example, viewing 3D models should definitely be done with a mouse since the interaction is fundamentally analog (although there should be discrete buttons to, for example, snap to an axis). Flight simulators are a lot more fun with a joystick than just using the arrow keys, and that’s because flying a plane is an analog interaction.

GUIs don’t need to be based on the mouse, though. It’s perfectly possible to create a GUI that is primarily controlled by the keyboard, and in many cases that would be a good thing. Again, this is a solvable problem.

Graphical Programming

So, yeah, text is great, and modern GUIs are, if not completely awful, at least significantly worse than text interfaces. I’m trying to make a GUI for programming that is an actual improvement over a text interface.

How could programming be helped by seeing the code graphically? Basically, in text, there are a lot of connections behind the scenes that the programmer must keep in his or her head. For example, an identifier is not merely a string of characters; it is connected with every other instance of that identifier, and in particular with its definition. It may also be connected in some ways to other identifiers — it may have been included from some module, it may have various functions that apply to it, or it may be the inverse of some other identifier. All these connections must be kept track of by the programmer in his or her mind (assisted by documentation).

However, the computer knows all these connections as well. Why can’t the computer just show me these connections? In a text interface, this is difficult (although automatic indentation and syntax highlighting is helpful).

Thus, this isn’t really about graphical programming. The point is not that stuff is arranged nonlinearly. The point is that we show more of the structure of the code. We show several different aspects of the code at once, and we can have different graphical views of the same code. Maybe one view emphasizes the data flow; another view emphasizes the control flow, and another view emphasizes the human-meaningful divisions in the code (between, for example, view logic, controller logic, and model logic).

This applies especially well to Lisps. A Lisp program is just a tree, and every expression in the program is just a subtree of the program. Text, however, being line-oriented, is good for displaying linear processes, as in assembly or C or Python. It is difficult to naturally represent a tree in text. The most common solutions are indentation, which is too vertically verbose for Lisp’s needs, and parentheses.

Using parentheses to display a tree is a decent system. It is unambiguous, simple, and easy to type. However, it is clear to me that it is not the best system. When we draw a tree on a blackboard or on paper, we don’t use the linear parenthesis system. No, we draw them graphically. Why? Because it’s easier for humans to understand something drawn out visually. Why don’t we use that when programming Lisp, then? Because it has historically been hard to draw that kind of thing quickly and easily with a computer. Basically, it’s just easier to use parentheses.

However, I believe this can change. Thus, I propose that graphical programming should be the future. I haven’t seen a graphical programming system that I thought was better (or even close to as good) as a text system, but that’s just because people are going at it the wrong way.

In time, I’ll write about what my solution is to the problem of graphical programming. My solution is still in pre-alpha, although I think I’m tantalizingly close to an alpha version. So, until then, I’d love to hear anyone’s opinions on the subject.

References

After working on my project for a couple of months, I found Bret Victor’s work, and I’ve since incorporated some of his ideas into my view of programming (as you may notice above). I particularly recommend Learnable Programming and The Future of Programming:

http://worrydream.com/#!/LearnableProgramming

http://worrydream.com/#!/TheFutureOfProgramming

For an overview of many of the studied tree visualization techniques, see treevis.net. I’ve implemented several of these visualizations in my project, and I plan to implement many more:

http://vcg.informatik.uni-rostock.de/~hs162/treeposter/poster.html

About these ads

31 thoughts on “Graphical Programming: I Really Hope This Is the Future

  1. Parallelist says:

    I’m working on a similar project. Care to exchange notes? You have my email address now. I’d love to hear from you. I agree with pretty much everything you say in this post.

  2. Hi, I work on shoebot when I have the time, and am aiming to move it closer towards brett victors ideals.

    (Nodebox, on which we are based, comes a lot closer).

    It’s funny there isn’t a community for learnable programming environments anywhere yet.

  3. Good post. I’ve been considering this for quite a while as well. I come from an digital vfx artist background and have moved completely to programming realtime systems. At the moment my stack is ARM microcontrollers that feed and receive data from nodejs. This data is then visualised in webgl or plotted. Something similar to an oscilliscope but focusing on digital variables and how they change over time.

    This is hugely useful while developing algorithms and refining performance, but the actual code is still text (Sublime editor). I thought about changing syntax highlighting to instead show clusters of tetris like cubes in 3D, where the Z dimension would be time, so the first code to by executed would be first in the stack.

    Nested loops would create zigzagging staircases, Statemachines a morsecode of switches. Arrays would gain volume as they fill with data. Actual coding in this view would be something compeltely different than how we understand code today. I think it would be more akin to building a mechanical device from lego like components. Each component looking suited for how it works, variables as little containers of information, objects with loads of inputs and outputs and selfcontained complexity. Wheels and gears to pickup and move things in ordered sequences or patterns.

    Its fun to think about.

  4. Phil Hudson says:

    Excellent, thought-provoking, even a bit mind-expanding. Several thoughts arising from this.

    First, as I was reading, I kept thinking “Smalltalk. Smalltalk. I’ve got to tell him about Smalltalk.” Then I saw the word Lisp. OK then. :-)

    Second, oh blast I’ve forgotten since I started writing this reply. It will come back to me.

    Third, which I only thought of as I started writing this reply, Userland Frontier used GUI tree structuring for code. It was created by Dave Winer, who always talks of it in terms of outlining. I believe he open-sourced it (GPL) quite a while back; it required the OS X dev tools to build.

    Still haven’t remembered that second point. I’ll post this anyway.

    • Ha, yes, I’m definitely a Lisp guy (Racket, specifically).

      I’m curious about this UserLand Frontier thing. I hadn’t really looked into it before. Really, though, it looks way too dependent on the mouse. I don’t think that graphical necessarily implies mouse-driven, so it’s disappointing to see otherwise-promising programs done in (in my view) by a dependence on the mouse.

      If you ever remember the second thing, let me know!

      • Phil Hudson says:

        I’m another keyboard-obsessive type for my own work and environments — I use ratpoison as my WM of choice on X — but I try to make my software that I create very intuitive, with high affordance, multiple redundant paths to functionality, forgiveness of user error, dialog and feedback, etc. The holy grail for me is the pre-OS-X Apple Human Interface Guidelines. That means mouse-y and GUI all the way, while trying to make sure I still offer low- and high-level scripting support and stdio stuff whenever possible. Anyway, that’s kind of a prelude to the Frontier point I want to make, which is that it was another of these built-in-itself bootstrapped dynamic environments, like emacs or Smalltalk or any Lisp (not quite to the same extent). It was a *long* time ago but AFAICR Frontier was mouse-y by default but trivially easy to adapt to my keyboard-y style of working. I loved it. I don’t love Dave Winer (genius though he is), but that’s another story.

        • Yeah, I see what you mean. For graphical programming, though, I’m writing software for other programmers, many of whom are as keyboard-obsessive as we are. I’m not necessarily trying to write something that’s intuitive to use (since that will be different from user to user); rather, I’m trying to write something that is powerful and efficient to use. Obviously, being intuitive would be a major bonus, but power and efficiency are paramount.

          Incidentally, my project is built in much the same way as emacs in that it is, as you say, a bootstrapped dynamic environment. Already, the equivalent of several hundred lines of code in my software was written by my software itself. You can literally type in Racket code to control my program while it’s running (much as you can with emacslisp). Of course, I can’t really tell how many similarities it has to emacs because I can’t stand using emacs for longer than an hour every six months. But that’s not related to its basic architecture.

  5. You might take a look at the LabVIEW language for some inspiration. It’s been largely pushed at scientists and engineers for the past 25 years, and it does a fantastic job of implementing a graphical based “data flow” programming language. Unfortunately, its parent company pushes it predominantly as a hardware to software and instrumentation solution, but I think it is fantastic for all programming tasks, not just hardware interaction.

    Most CS types dismiss it immediately as a toy, since it lacks the traditional ASCII based programming methods. The trick is to embrace it as a new language, not try to force the dogma of traditional ASCII languages. Is it the right tool for implementing A+B+C as a typed equation? No (but you can). But for advanced data structures, interactions, and user interfacing, it can be quite elegant.
    It is also panned for the closed format and cost, although barely restrictive “Student” versions are widely available for around $50US.

    - looking forward to seeing your progress!

  6. Wow, man, I’m thinking all of these thoughts. Only difference is that although I don’t own a word processor, I do use latex, and I fully appreciate having a programmable system shell, I find it very hard to be an apologist for the crowd who’re resolute that a graphical programming language can never be more than a toy. A lack of vision taken as a vision of lack, that’s hubris and it’s keeping us down.

    Right now what I’m thinking is it might be a good idea to develop a general purpose data browser- something we’d want anyway- and a data interchange format/centralized type base and build languages on top of that. Have some of the basic collection types as trees, grids, lists, and graphs[Internally Linked Object Spaces?]. Let the IDE be a program running in the data browser, operating on whatever the source code format ends up being. But it’s all very experimental at this stage, for sure. I have no idea what we’re going to converge on.

    I think you and I should collaborate, but I think there are enough of us that it would be a better idea if one of us opened a submailinggrouppagewiki or something for the project. I’ll hang around in #varrep on freenode while I’m awake. I hope to see some collaborators there.

    • “A lack of vision taken as a vision of lack, that’s hubris and it’s keeping us down.” I like that a lot. Just because stuff hasn’t happened doesn’t mean it can’t/shouldn’t happen.

      Honestly, what I’m developing is, at its root, just a visualization of a (set of) trees, and much of the interaction does not depend on the fact that it’s code. It could just as easily be used to edit general tree structures (and, with a little work, perhaps general graph structures). That’s the beauty of using a Lisp — the code is already just a data structure, and it’s easier to visually the manipulate a data structure than a recipe.

      I do think we should collaborate when we can. Do you have a blog or something where you talk about what you’re working on? What’s the best way we can stay in touch? I may see if I can be in #varrep sometimes.

      • No blog, really. I havn’t started coding, or when I did start coding I tripped and fell down an infinite recursion in my conception of the type system pretty much immediately, so I didn’t end up producing anything. Before I got side-tracked with all the thoughts about type federation I was going to start with making a tree representation and interaction paradigm/GUI framework[we are going to need our own GUI framework I have absolutely no doubt about that] in SDL3 and just seeing what kind of mutation I could get going. You’re probably further down pretty much the same path I would have travelled.

  7. Really great post. Like many of the other commenters, I’m working on something similar. I wish we could all get together and compare notes! Anyways, if you’d be up for talking about it, you have my email. At the very least, thanks for writing this (and ignore the cynics)!

  8. Oh hey, this is one of my pet projects too. =) I’m going the strongly-typed route and trying to make a good , consistent language for the interaction, though.

    I’m amazed/encouraged to see how many people are working on variations of this theme. I guess it’s true what they say about an idea whose time has come.

    • Yeah, I really hope the time has come. The thing is, people have been working on this for so many years that it’s hard to really tell when someone will finally hit upon a decent solution. I think I can bring something novel to the discussion, so I’ll try to do that over the coming months.

      I’m curious what you mean by the strongly-typed route. Care to elaborate?

      • Structures have types attached. Commands on the structures (I don’t have a proper visual command language, though — not sure if/how you’re planning to do that, but I’d be interested to see) are functions on those structures, and only functions whose input type matches the argument to which they’re being applied can be applied — this is nice for discoverability, because you can present only functions that work with the currently selected objects (or, when building a bigger expression, at the currently-edited point in the expression). Type theory provides many of the benefits of Lisp in terms of homoiconicity, but also provides a mechanism for describing the structure of an object in the system itself (possibly before the value of that structure has been fully calculated, in the case of external or complicated data), which is nice.

        • Mm. Imagine[though you probably already have] having a reference integrated into your code editor no matter what language you were using, and you could search it in all kinds of ways, for example, queries like “everything that takes GifData as an input” or even query it like a graph DB, “shortest chain of functions that link a pixel buffer to a jpg”, or if we had a more semantic understanding of the relations, “how can I make a jpg from this pixel buffer?”.

          • Yes. :þ Those are almost exactly the applications I had in mind, and they’re all exactly equivalent to some kind of proof search.
            (Actually, my current architecture is even implemented as such: output to the screen &c. is done by a ‘renderer’ program that attempts to convert each object into a form appropriate for its associated output medium, e.g. ‘interactive 2D graphic’.)

            • I really like this train of thought. It’s different from the way I’m pursuing this, but I think it provides many of the same benefits. My approach is conceptually simpler than this, but I could definitely imagine it being eventually generalized to this.

              If graphical programming is going to take off, then I think it’s going to be important for many different paradigms to be tried until we find one that works well.

  9. Common Lisp rather than Racket, but still possibly of interest:

    https://github.com/projectured/projectured/wiki/Screenshots

    … also, I’d ask you to consider using a spreadsheet-like reactive/propagational approach to the entities in your tree. (That is, the nodes of the tree should be bindable and modifiable by multiple running threads, and thus the program editor and a graphical slider in a window could both have real-time read/write access to a value, for example).

    • That’s a really interesting project — definitely much closer to what I’m envisioning.than something like LabVIEW. I’ll be investigating this project much more of the next few days. Are you the author, or is this something you’ve seen?

      And, yes, I definitely plan to make it so that several different views of the same information can all change it. I’ve implemented several different tree visualizations, for example, and you can edit the same code under all the different visualizations at once (in different parts of the screen). Actually, the implementation of this is quite buggy at the moment, but that’s just because I haven’t updated it since I made some major back-end changes.

      And, yes, Common Lisp is a good language as well. I prefer Racket because it’s a Lisp-1 and is based on Scheme, which is just much cleaner in general. But, I understand CL as well, and eventually you should be able to edit CL in my editor.

      • Heh. Different views on the same information? Guess what #varrep stands for. I think it’s one of the most fundamental benefits of what we’re exploring.

        Projectured is cool. Nesting data inside data of a different format is something I expected to see, for example, sticking sprites in the source code of a game, or diagrams in the documentation layer of a body of source code. I’m kind of surprised the author’s not showing off any binary format representations though. Maybe to avoid spooking the text fetishists.

  10. No, that one’s not mine. I’ve been playing with this sort of technology for a long time, always hoping to find a way to make it not suck. Recently, I’ve been experimenting with multiple inputs for live coding, including this:

    http://quick.as/5p8f7ga

    … which, combined with some tracing code, allows me to perform using the editor and some custom controls, then edit the code produced by the performance and play it back. This turns out to be an interesting way to make animations and music. Some example output in animated GIF form here:

    http://blog.jackrusher.com

    Definitely get in touch when you’ve got some code. I love Racket. :-)

    • Wow, that’s rather impressive. I really like the idea of using more types of physical interaction. Just the other day I hooked up another keyboard to use as a foot pedal (I didn’t realize how hard hand-foot coordination is…). I don’t think that mouse/keyboard should be the end of human interface innovation, so I like the trend of using a phone as an external input device.

      • Thanks. I thought it would be fun to have a remote control with which to tweak parameters during live performances, but now I’m thinking about how to present multiple sets of controls to multiple users so they can collaborate on a single computation. (An example would be building a multi-player musical instrument on the fly while several persons are performing with it).

        The only luck I’ve ever had with a foot pedal was using it as a momentary switch to engage/disengage my Leap Motion, which would otherwise follow my heads when I reached toward the keyboard after position something on screen.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s