I want to develop alternative syntax for Lisp. My idea is to write several possible syntaxes and see if I like them or not.
Step 1
I imagine it like this:
- write some BNF-like syntax description
- generate a parser that will return S-expressions
- apply transformations similar to nanopass to get correct Lisp S-expressions
- For example, transform
(a + b)
to(+ a b)
- For example, transform
Parser + transformations give me a “reader”, which I can put in my implementation of Lisp and try to run some examples.
To make it a faster process I want to:
- reuse “transformers” - which should be trivial, because they are pure functions
- reuse patterns in BNF similar to predefined patterns in Rosie
Step 2
I wonder if it is possible to generate pretty printer based on grammar and transformations. If it would be possible I can get a set of Lisp programs and generate programs in new syntax to see if it looks good.
Can we inverse transformers? Related: Bidirectionalization Transformation Based onAutomatic Derivation of View Complement Functions, Combining Syntactic and Semantic Bidirectionalization, relational programming.
Step 3
Given grammar and some additional annotations, it should be trivial to generate syntax highligter (inspired by Rosie).
Implementation
I need to chose parser generator. PEG parser seems like the best candidate for my purpose (Rosie uses it).
Potential issues with PEG:
- it is linear in time but needs more memory than others. I don’t think this is important for an experiment
- Possible solution: Packrat Parsing with Elastic Sliding Window, Kimio Kuramitsu, 2015
- It doesn’t support left recursion
- BNF is non-deterministic, but PEG is deterministic. So if I would naively translate BNF to PEG, there is a chance I would get the wrong parser. Is there a way to check that the two definitions are equivalent? Maybe swap the order of rules in “prioritized choice” and see if it is different grammar or not? If BNF defines an unambiguous language swapping choices should not affect. If there is only one “catch them all” rule it should go last in “prioritized choice”.
- Maybe use something like model checker to generate sentences or counterexamples?
- Out of the box it doesn’t have good error messages (because of backtracking)
- Not a PEG problem, but if the parser will generate S-expressions instead of AST (with meta-information about line and column) and later they will be transformed it will be hard to find the source of runtime error. Maybe it would be possible to generate something like sourcemaps
Other ideas:
- print decision tree (I guess it is more like a graph, rather than tree). Is this useful: good decision trees or lazy decision trees?
- print stack traces, like Rosie or arborist (
Arborist::GlobalDebug.enable!
) - visualise parsing
- Implementation of Tamias to Check Production Rules for Parsing Expression Grammar
- NezCC: A Cross-Language PEG Parser Generator, 2017
- Add optional semantic information in grammar, for exmaple
- types: number, string etc.
- specific syntax constructs: matching brackets, comments, like in comby
- Macros for Domain-Specific Languages
Tutorials for PEG parsers
Grammars corpus
Another approach to finding “ideal” grammar is to take a look at existing grammars and see what good and bad parts they have - to learn from their experience:
- The Grammar Tool Box: A Case StudyComparing GLR Parsing Algorithms
- Grammar Zoo: A corpus of experimental grammarware,
- Grammar Zoo
Zipf’s law, in probability, assertion that the frequencies f of certain events are inversely proportional to their rank r. The law was originally proposed by American linguist George Kingsley Zipf (1902–50) for the frequency of usage of different words in the English language; this frequency is given approximately by
f(r) ≅ 0.1/r
. Thus, the most common word (rank 1) in English, which isthe
, occurs about one-tenth of the time in a typical text; the next most common word (rank 2), which isof
, occurs about one-twentieth of the time; and so forth. Another way of looking at this is that a rankr
word occurs1/r
times as often as the most frequent word, so the rank 2 word occurs half as often as the rank 1 word, the rank 3 word one-third as often, the rank 4 word one-fourth as often, and so forth. Beyond about rank 1,000, the law completely breaks down.
Is it possible to construct some common corpus of syntactic constructs (assume they follow Zipf’s law) and validate new syntax against those cases?
Other “parsers” for inspiration
- rosie
- REBOL 3 Concepts: Parsing: Parsing Blocks and Dialects
- Red Programming Language: Introducing Parse
- OMeta: an Object-Oriented Language for Pattern Matching
- Higher-Order Functions for Parsing
- parsec: Monadic parser combinators (LL(1))
- frisby (PEG)