Announcing BABLR

Welcome! I'm Conrad, a Frontend Engineer by trade, and for the past five years I've been writing open source Javascript code. I've developed and am today releasing the latest version of a new generalized parser framework, BABLR, and with it a whole new API-based platform for software development.

My goal with this project is to explore what would happen if the IDE was not a text file editor first, but rather a code document editor. If we were willing to shed some of the baggage of history, what might we gain in return?

Already developers are using document-driven coding tools daily: formatters to handle indentation, linters to catch common mistakes, transpilers to desugar new language, and codemods to perform refactors that touch anywhere from tens of files to hundreds of thousands and more.

Today we are shipping the core technologies of that platform: our parser framework, BABLR to compete with Tree-sitter; a parse tree format called agAST to compete with ESTree; and a new data language designed for parse trees called CSTML.

Each of these technologies can stand alone, but together they represent a thoroughly-integrated platform that is more than the sum of its parts. BABLR parsers provide the means to create CSTML and agAST trees, while libraries of data structures and algorithms make the novel tree formats useful in practice. Whether the platform achieves impressive growth over the long term will depend mostly on how good a compromise I've brokered between language authors, tool authors, and users, because ultimately each of these three groups of people will see value in the platform if it is a place where they can come to meet the other two groups.

To tell the story of the product then is to tell the story of why each of these three core constituencies would see value in it.

BABLR for Devs

We see several ways to get developers to fall in love with our platform. We've made it browser-native and incredibly lightweight, so that our code analysis can run in any docs site or blog post while only adding ~10KB. Our syntax highlighting tool is called Bedazzlr and it's running right here in this page powering code examples like this one:

let result = /[^^*+/[-][]/.exec("input");

Bedazzlr is going beyond the basic token coloring that other tools offer though: it's providing a more like a version of the ASTExplorer experience embedded right into the web page, allowing the user to interrogate the structure through semantic selection and through the ability to see the names of nodes in the embedded syntax tree. These names can be much more useful to colors when you're learning a new language, and our ability to compute them on the frontend also ensures that our ability to help users learn new languages by directly exposing them to the structure of syntax trees need not be limited to English-speaking users.

BABLR is more than just a fancy syntax highlighter though, it can also be the basis for a visual, tactile, API-programmable code editor that runs right in your browser. The code editor product, Paneditor will be the first code editor built from the ground up to be natively usable on touchscreen devices like phones and tablets, as those are the devices we believe the next generation is likely to have in their hands when they become curious about programming. Using Paneditor to write code will not require users to download any software, nor even to have root permissions on the device they are using. Paneditor will feature powerful features such as semantic code search and replace, all powered by the BABLR code that is shipping today.

Our strongest argument to users is that the story will keep getting better and better day by day because we will keep attracting more languages, more tools, and more users.

BABLR for Tooling Devs

Authors of tools care about how data is presented once it has been collected. We took our largest cues about how to think of data from the XML ecosystem.

agAST is our DOM-style API for data processing, and CSTML is our SAX-style API for data processing.

While the CSTML language is heavily inspired by XML, and like XML is designed to be parsed as a tag stream, we have avoided repeating XML's mistakes. CSTML has no CDATA and no entity codes, and can be parsed by a fairly simple parser. We've even 1-up'd XML by making our tag stream language trivial to parse with a stream parser. XML is non much fun for a stream parser to parse because of the ambiguity between <NodeName> and <Namespace.NodeName>.

Because BABLR is a streaming parser and CSTML is a streamable output format, BABLR is able to emit partial parse results before it has completed a full parse.

For tools that only wish to query code, using the tag stream API to access the document's structure may be optimal, since it can avoid the need to load the entire contents of the file into memory before searching begins. This allows developers to build structural searching tools that can even work correctly in files too large to be loaded into memory.

BABLR for Parser authors

A huge chunk of BABLR's value in practice will be determined by how many languages the platform has parsers for. Right now on release day it's a small number. We support several of our own languages, and then more or less just JS and Python, and as of today neither of those languages even has complete support yet.

What I've done rather than focus on writing a lot of parsers myself is focus on making it as easy to write parsers. There's no compile step when writing parsers, for example, the code that runs is exactly the code you wrote, and you can debug it every step of the way with a normal old JS debugger. We also have several BABLR-specific tools to help parser authors, such as the BABLR CLI which can run parsers while showing beautifully formatted live trace output with luxuries like syntax highlighting inside regex patterns.

In addition to tools, we've made our parser APIs as powerful as we can to help when languages have gnarly design. We support arbitrary lookahead and speculative execution to make it easy to parse languages with ambiguity. We support "shifting" to make it easy to parse math-y expressions like 2 + 2. My personal favorite parser features is "guarded spans," which until explicitly cleared will cause the source to appear to end at the location the guard pattern matches. Guarded spans make parsing quoted strings very easy in BABLR.

The Space Left Behind

This is the part where we go off-script!

This isn't supposed to be here. This whole project, it isn't supposed to be here. Ask anyone and they'll tell you that new tooling projects are supposed to be written in Rust, and if code absolutely must be written in Javascript it should at least be written in Typescript, and if it isn't going to be written in Typescript the code should at least have legions and legions of test cases. Well, BABLR is written in plain Javascript and it doesn't have legions of test cases either.

I'm breaking lots of other rules too: I'm not using monorepos, even though individual repositories introduce more process overhead. My commit messages are total garbage, and there has been no code review process. I'm not using AI of any kind, and I have so much trouble with carpel tunnel that I have to wear wrist braces when I work.

Meanwhile the work itself could not be stepping on any more important people's toes. The names of the projects inheriting the JS tooling ecosystem are supposed to be known already: it'll be Biome or VoidZero, right?

The problem that these companies seem not to have fully anticipated is what Josh Waitzkin in his Chessmaster 2000 tutorials calls, "the space left behind." These projects like Biome and VoidZero, they decided that their focus was on building the fastest tools for writing Javascript. Their decisions show that it was of no particular importance to them that the tools be written in Javascript, and they continue to think that way.

That's all well and fine for them and I can't really argue with any strategy that seems to be working, and yet in leaving uncontested the project of making the best tools in Javascript, they have ensured that I need not compete with them directly. Their tools expect that you're sitting in front of a laptop with a full-size keyboard, and that you're the root user. We're shipping tools that can go anywhere there's a browser. No root account? No matter. No keyboard? No matter. Their tools? 50 - 100MB download for every new version. Hope you're not on a stingy data plan! Our tools? 10 - 50KB. We can run in a system WebView, even. Their tools offer a second-class plugin API to JS, our tools make JS extensibility a first class feature from the ground up.

That's already a lot of differences, and we haven't even gotten to the biggest yet: BABLR supports being extended with support for new languages post-release. Neither Biome nor VoidZero has this ability. We're more like an x86; they're more like an XBox.

It gets worse

Oh, much worse! CSTML in addition to being a suitable format for parse trees, is (in a purely technical sense at least) the natural successor to XML and HTML.

What!?

XHTML

Back when I started doing webdev we had a dream, and that dream was called "the semantic web". One of its most promising technologies was XHTML, with its partner in mischief XSLT serving as a bridge to XML.

On its surface the idea was brilliant: instead of requiring people to write XHTML markup for their webpages they would be allowed to write XML markup instead. Then as part of the rendering process the XSLT transformer would translate from the semantic XML input document to the HTML-XML (XHTML) output document. The idea was that if you wanted to have a restaurant menu the most basic asset would be extremely compact and simple, something like just:

<Menu>
  <Section>
    name: <Header "Appetizers" />
    #: "\n\n"
    <Item>
      name: "Spring Rolls"
      #: "  - "
      price: "$3.50"
    </>
    #: "\n"
    <Item>
      name: "Deviled Eggs"
      #: "  - "
      price: "$5.25"
    </>
    #: "\n"
    <Item>
      name: "Honey Burrata"
      #: " - "
      price: "$8"
    </>
  </>
</>

Notice that this menu document involves only some simple syntax, and otherwise contains nothing but the language of restaurants. There are are no div or span tags here: none of the language of page layout.

This can be a boon in several ways:

Documents are easier to read when there's a higher signal to noise ratio
HTML page bodies are hardest to cache. This shrinks them, cheapening hosting
The business shouldn't need a web developer just to be able to update the menu
It is highly accessible, in all senses of the word

We have one other key advantage over XHTML, which is that without a stylesheet you can still render a CSTML document because a CSTML document has inner text, just like HTML does. The menu document above with no additional transformation would render as this plain text:

Appetizers:

Spring Rolls  - $3.50
Deviled Eggs  - $3.50
Honey Burrata - $8.00

Like HTML, when viewing the plain text of the menu the user could "Inspect Element" to view the raw markup. Most likely the page author, a restaurant owner, would also be seeing the document in its source text form when editing, the same way a user composing a blog post in a browser might use a keyboard shortcut to make certain text italic without caring that the browser generated an <em> tag, the restaurant owner would be able to use UI to add an item to the menu without really needing to care that an <Item> tag is being added to the underlying markup document, and especially without needing to care about coder-y things like a " in the name of a dish needing to become \" to avoid breaking the page's source code.

While we don't have a semantic page editor yet, this web page was in fact written as CSTML!

What comes next

We hope what comes next is that you'll take a chance on trying out our product! It's free-forever, MIT licensed, and available right now on NPM as bablr@0.12.6.

In some ways this is a finish line for BABLR, and in some ways it's very much just the starting line. What we have more or less complete are the core standards: agAST, CSTML, and the way BABLR grammars are written. The work I'm about to throw myself into includes building out the semantic code editor and investing in support for more languages.

We've also created a for-profit company, Silphium Labs, founded by myself and Stirling Hosstetter, with the intent to pursue this mission.

Much hangs in the balance right now though. As of yet our company has taken no VC money, and what options and opportunities we have for the future will depend in part of how the community reacts to this announcement. If you'd like to show your support for our open source work, we welcome your feedback, your human-written code contributions, or if you just want to send us a few bucks because something we made made your day easier, we take donations on OpenCollective.