Our ultimate goal is to build a better way to develop software. Last year we developed a prototype of some of our ideas and presented it at Handmade Seattle 2020. Some of the results were exciting, but now we think we shouldn't start by building a single integrated tool. Our new priority is the design of a new file format for source code. It's still a challenging task, but it's well scoped and achievable. Here we explain our roadmap for the development of this new format.
Our first instinct was to build a single all-in-one programming environment. This seems like the obvious approach, because it makes code sharing easier, and lends itself to a deeply integrated environment. And it's not just us, many projects in this space have a similar all-in-one architecture. But after working with it, we think the downsides outweigh the benefits.
We could say a lot about the problems with the all-in-one approach, which would be a good topic for future posts. Here we can summarize the problem by pointing out that it breaks the principle don't change too much at once. To stick with that, we think that a good first step would be to upgrade the piece that sits at the center of our ecosystem of tools: the plaintext file format.
It's difficult to reason about the benefits of a better format that doesn't exist yet, but we can easily imagine the drawbacks of an inferior format. If it is clear that a less appropriate format for source code would lead to a worse way of programming, that should help to convince us that formats matter.
Imagine for a moment that bitmaps were the format for source code instead of text files. Think about each tool in this alternate ecosystem. Instead of text editors, programmers would use bitmap editors. The text-based languages we have now would be replaced with bitmap-based languages. Everything we use to exchange code, view code, and so on would read and write bitmaps.
There are a couple of ways that a bitmap could specify a program. A Turing-complete 2D cellular automaton could work, but it might be fairly difficult to compile wire world. A better approach might be to define a language with the ability to recognize a character set from a particular font. Once a compiler can extract a textual structure from a bitmap, our existing compiler-building theories can get us the rest of the way to a working program.
Power users in this world would need a better tool than "paint" for editing. Such an editor would need to offer features that improve the common tasks of bitmap programming. For starters, having the bitmap automatically grow in width and height dynamically would be a great quality-of-life feature. It would probably be useful to set a width and height for a snap grid to keep things organized. A high end bitmap editor should be able to use the textual structure in the code to automate the re-flow and re-layout tasks, at least as long as the bitmap parses correctly.
In this alternate ecosystem there would be editors, compilers, and version control, but those tools would have to solve a different set of problems. It's just a guess, but it seems like it would be harder to make great tools for working with bitmap code. All of this implies that the central format for an ecosystem determines how hard it is to make great tools.
If we were programming with bitmaps, we would probably have some very sophisticated tools for dealing with them. In that situation, switching to text files would simplify away some of the problems, and the most sophistcated tools on text could then achieve more powerful features. Our goal is to build a format that compares to text files in the way that text files compare to bitmaps.
We can't expect to automatically understand this problem space on our first attempt. To end up with a well designed format for source code, we need to explore the space first. That means we need to try multiple formats and learn from them. And there are a lot of factors we have to pay attention to when trying a format.
For each format there are multiple tools that need to be evaluated. For example, if we only optimize a format to make high-quality compilers easier to build, we may well end up with a format that makes building powerful editors more difficult. We are striving to make a new format that is a global optimization to the environment, not one that shuffles around difficult problems. So our tests need to include multiple types of tools.
Another difficulty is that there is no shortcut for evaluating easiness of working with a format. If we had a magic formula that could give an accurate easiness score for every tool with every format, we could just crunch the numbers and get our answer, but there is no such formula. The best option we have is to prototype these tools and do our best to gather insights along the way.
As we are testing formats we can't assume that one bad result means that the format is bad. We could always have one attempt at building a tool turn out poorly even for a good format. Similarly a good result doesn't necessarily mean the format is good. To be thorough we will eventually have to try again. For example building more than one language with the format will give us more insight than just building one could.
There is only so much we can to do to be rigorous, because easiness involves subjective preference. We will try to account for that by having team members work independently on different prototypes. That way if one of us evaluates a particular format with a heavy bias, the other team members have a chance of balancing it out. Ultimately we know that these tests will not be perfect, but we think this is a good way to make progress.
With this plan to create so many prototypes we need to make sure we have good iteration speeds. We have a few ideas on how to do that.
First we need scaffolding to support quick prototyping. Our layers for graphics, user input, and debug systems, should be optimized for prototyping. Everything should be organized so that it can be reused between prototypes. And the scaffolding system should let us keep multiple prototypes at various levels of development.
Before we try to design a "serious" candidate format, we plan to keep it simple. Toy formats will help us check the scaffolding code and practice our testing process. They will also give us a chance to see format features in isolation, which will guide our intuition on how to combine them on later designs. Then, over time, we can shift towards more serious candidates until we find an ideal balance of simplicity and utility.
Later on, if we believe that a format is ready for a really thorough test, we think the best thing to do will be to get more people involved. The goal is to make it easier to develop a high quality ecosystem for software production. That means feedback from a community sharing that goal is the true test. There are some open questions about how we would go about this, but it is a certainty that it must happen before we can call a new format complete.
The good news is there are thriving communities that we think will be interested in our goal. Just to name two, the Handmade Network fosters an approach to problem solving in computing very similar to ours, and the Future of Coding is all about exploring new paradigms of programming. There are many instances of communities emerging and contributing to projects by teams as small as ours. All this leads us to believe that the interest is out there for what we are doing. If that sounds like you, stay tuned because we'll need your help!
Once the format ideas have been reviewed and refined in the hands of the broader community, the last step is to finalize the format with a last production pass.
With a finalized format we will develop all the necessary materials to formally publish it. That means a specification, working open source example code, and free starter tools so that there is a usable ecosystem right away. All of this will be published under a permissive open source license.
The practice of endlessly extending and modifying existing software systems is almost ubiquitous nowadays. However, we think this project is a clear case where being done and sticking with the final results is the right approach. A file format for source code should be a stable substrate for the tools that work with it. So our roadmap ends here, with publishing the final format.
That's it, now you are all caught up! We're getting to work on this plan and we'll let you know when we start getting interesting results. On top of the results from new experiments, there is a lot we learned in 2020 that we haven't been able to share yet. A big part of our plan involves getting this information out there. You can be sure to catch blog posts by joining our mailing list, following us on Twitter, joining our Discord, or subscribing to our blog's RSS feed.