Qualifying the Extensibility of Language Frameworks

Aug 21, 2024

Introduction

Programming languages incorporate assistive tools like user-defined abstractions, type-checkers, meta-programming and compiler optimizations to write programs that are as accurate and performant as possible, while remaining concise and legible to human eyes. Compilation of feature-packed languages becomes complex, with compilation performance being important in ensuring productivity and ease of use. As language features are added, language implementations must also be updated.

Systems for extensibly specifying languages, which we’ll collectively call Extensible Language Frameworks, or ELFs, offer language designers the ability to add language features sequentially, without rewriting existing language features. ELFs may be their own standalone language, or more often, are domain-specific languages (languages designed to support a specific purpose, DSLs for short) or libraries borrowing useful behavior of some implementing language, known as a host language. In this writeup, we’ll refer to the language specified in a particular usage of an ELF as the object language, and the ESL itself as the meta language.

Extensibility

In order to understand the differences between ELFs, we develop a system for describing extensibility. We take inspiration from formal methods for specifying programming languages. Broadly, models of programming languages specify language structure and behavior by respectively defining the concrete syntax and semantics of a language.

Syntactic specifications dictate which symbols and combinations thereof are valid programs in a given language. The names of features, binding forms, and lexical structures of a programming language are part of this syntax spec, with every lexical phrase abiding by the spec being a well-structured language form that semantic specifications can then reason about. Syntax may be grouped, allowing us to refer to classes of valid syntax that, for example, represent expressions, types, and so forth. Syntax variables allow syntactic forms to be parameterized on all possible instances of syntax of a particular class; for example the variable e in the syntax specification (add1 e) may be used to denote any possible syntax which is a member of the class of expressions. We will use term to refer to any piece of syntax which abides by a given object language’s syntax specification, and will not consider specific approaches to specifying syntax, reasoning about syntax specifications generally so long as they support the abstractions of syntax classes and syntax variables.

Semantic specifications are imposed on instances of legal program syntax, encoding properties of a language like which pieces of syntax are indeed well-formed programs, and how those well-formed programs behave when they’re executed. Models of language semantics may be abstractly thought of as relations over valid syntactic forms, where the inclusion of syntax in a relation holds some meaning. For example, consider a binary relation has-type which relates syntax representing terms to syntax representing types; a pair of syntaxes representing a term and a type are a member of this binary relation if the term syntax has a type which is represented by the type syntax. By defining this relation, we encode the semantics of “having a type:” a term has a specific type because the pair of term and type is in the has-type relation, and we check if an arbitrary term is well typed by proving that the pair of it and the type we believe it to be are a member of the relation. Relations may also be defined in terms of each other; for example, the has-type relation may rely on a unary well-formed-type relation, which specifies which syntaxes may even represent meaningful types.

In pursuit of adding new features to an object language, an ELF user would like to be able to add new syntactic forms with which to use them; we denote this syntactic extension, by which one modifies the specification for the syntactic forms and classes that could potentially make up a program. Here, the implementer of an ELF has a decision: is extension strictly additive, adding either classes or cases but not modifying existing syntax, or does an extension have the privilege of modifying existing syntax? We’ll call the former weak and the latter strong syntactic extensibility. Strong extensibility need not be strictly more desirable than weak extensibility, and depending on the goals of an ELF, it may not be possible to maintain assumed invariants, or manage the implementation of strong extensibility while providing a friendly interface.

Semantic extension is how an ELF allows a user to provide new meaning to syntactic forms. Consider a simple extension, where syntax has been added that represents the new terms true and false, and their new type Bool in a language which has an existing semantic relation has-type. To complete the new language feature, the has-type relation would need to be extended to include members of the has-type relation which allow us to type-check our new syntax. We call this kind of extension term-level semantic extension, which allows us to encode meaning for new syntactic forms under existing semantics. We also introduce an analogous concept of weak and strong term-level semantic extensibility, the latter allowing for redefinition of relations over existing terms; for example changing the type of numbers from Natural to Integer. Weak syntactic extensions may require strong semantic extensions in order to achieve desired behavior. For example, sound gradual typing is a typing discipline which allows for static imprecision in typing that induces run-time constraint checks. Often presented in literature as a modification of some base type system, the run-time semantics of a gradually typed language differ from those of the base language, changing how existing terms are interpreted. Should we wish to add gradual types to a simply typed system, we would syntactically need only add syntax representing the unknown type (a weak extension), but semantic changes supporting this new syntax must be made to existing typing relations (strongly extending term-level semantics).

An ELF user may also wish to extend semantic definitions in a way that requires the creation of new relations, adding or modifying dependencies between existing relations to integrate extensions into an existing object language. We’ll call this judgement-level semantic extension, which in contrast to term-level extension, extends what meanings we can ascribe to syntactic forms rather than interpreting new syntactic forms within existing relations. In our previous example, term-level extension allowed us to add new type in a system which already contained a has-type relation, but adding the has-type relation to an untyped system would be a judgement-level extension. Similarly, we distinguish between weak and strong judgement-level semantic extensions. We summarize our extensibility classes in figure 1 .

SyntacticSemantic
Term-levelJudgement-level
WeakAddition of syntactic forms and groupingsAddition of syntax to existing semantic relationsAddition of new semantic relations
Strong+ modification of existing syntactic forms and groupings+ modification of existing associations+ modification of existing relations
Extensibility classes

Tangential to these classes, we introduce the concept of quasi-extensibility, where an ELF provides a means by which an object language can be extended, provided such extensions can be expressed using existing object language functionality. Macro systems, for example, display weak syntactic quasi-extension paired with weak semantic quasi-extension: an extension creates a new syntactic form (which must have already been legal in the object language) whose behavior is defined in terms of existing syntax and that syntax’s existing semantics, and must therefore only have behavior already expressible in existing object language semantics.

Related is the concept of syntactic abstractions used to formalize the expressive power of macros, described by Felleisen , in relation to a given object language, as the set of all named instances of valid object language syntax. Felleisen’s syntactic abstractions impose the same restrictions as our idea of quasi-extensibility: a language is quasi-extensible in both syntax and semantics if it can support syntactic abstractions. We contrast this with our concept of syntactic and semantic extensibility, which allows for an ELF user to add syntax and semantics to an object language not expressible using solely syntactic abstractions.

The Expression Problem

The classes of extensibility we’ve defined map well to the expression problem, which as described by Wadler , is the problem of achieving extensibility of both data and addition of functions which operate on that data. In the context of programming languages, the data domain comprises syntactic language forms, with the function domain comprising language semantics, namely type checking, elaboration and interpretation algorithms. Designing ELFs which are both term-level and judgement-level semantic extensible requires one to solve the expression problem by allowing for the extensibility of data (that being, syntactic and term-level semantic extension: adding syntactic forms, with new cases describing their semantics under existing judgements) and adding functions on that data (judgement-level semantic extension: new judgements being expressed as new functions on the data representation of language forms). In supporting either term-level semantic extension or judgement-level semantic extension, one does not need to solve the expression problem: term-level extension is neatly modelled with traditional object-oriented programming (OOP) structures (where syntax maps to classes, and relations to methods called by a visitor, but adding new methods is hard), and judgement-level extension maps neatly to typical functional patterns (where writing new functions which operate on an existing datatype is easy, but adding new cases to union types requires modification of existing functions). In desiring both, we require the addition of both new language forms with semantics encoded as instances of existing interfaces, and the addition of new interfaces (read: judgements) to existing forms.

In Wadler’s presentation, a solution to the expression problem should allow for extension without modification of existing code (in our application, allowing for separate compilation of a language base and language extensions). This criterion provides a good delineation between an extension and a modification; if the original language no longer exists as a distinct entity—a distinct unit of compilation in the host language—the object language has not been extended, it has been modified.