Skip to content
Leo edited this page Apr 13, 2021 · 14 revisions

This page documents the syntax of Husk programs.

Program structure

A Husk program consists of one or more lines, each of which defines a function (or a constant value). The first line is the main function, which is called during execution, the rest are auxiliary functions. These functions can be accessed using subscript digits through , with being the main line. They are parsed individually, but you can parse a multi-digit number by prepending a space to a string of subscripts.

You can mark a line by adding one leading space to it. The subscript labels on unmarked lines ignore marked lines, and work on the list of unmarked lines instead. On marked lines, labels work normally and you can refer to any line, marked or unmarked. The purpose of this syntax is to let extra code to be easily added to a complete program without disturbing the labeling.

A subscript number larger than the number of lines is not automatically a syntax error. Suppose that there are N lines in a program (or N unmarked lines, if the label is on an unmarked line), and you have a label with value M. The label will refer to line number M mod N, and apply the modifier function with index M / N to it. The list of modifier functions can be found on the Commands page.

Expressions

Each line is a single expression. Expressions are either atomic (literals and built-ins), or constructed from other expressions by juxtaposition, grouping with (), or lambda abstraction. Parentheses are implicitly closed at end of line.

Juxtaposition can be thought of as an invisible infix operation that associates to the left: ABC really means (AB)C. The invisible operator has a nondeterministic type with the following implementations:

  • Function application: AB can be function A applied to value B. In a conventional programming language, this would be A(B).
  • Composition: AB can be the function that applies first B a value, then A to the result. Thus ABC can be interpreted as A(B(C)).
  • 2-composition: AB can be the function that applies first B to two values, then A to the result. Thus ABCD can be A(B(C,D)).
  • 3-composition: as above, but with three values. Thus ABCDE can be A(B(C,D,E)).
  • 4-composition: as above, but with four values. Thus ABCDEF can be A(B(C,D,E,F)).

In practice, you can usually think of Husk's syntax like this. Each token has some unknown arity (number of arguments), ranging from 0 to 4. The compiler guesses an arity for each token and parses the program accordingly. Missing arguments cause an expression to be interpreted as a function of those arguments. In reality, things are a bit more subtle and the guessing happens during type inference instead of parsing.

Literals

Numbers

A sequence of digits is interpreted as a number. If the sequence contains a period, it is interpreted as a floating point number, and if not, it is an integer. A period with no leading digit sequence is interpreted as 0., and a period with no trailing digit sequence is interpreted as .5. In particular, a lone . is equivalent to 0.5. Parsing is done greedily from left to right, so .1.2. is parsed as 0.1 0.2 0.5, while 1.2. is parsed as 1.2 0.5. Multiple consecutive numbers may be separated by spaces if needed. There are no negative number literals, so to get a negative number, use the negation function _.

Characters

A character literal has the format 'c, where c is any character. A string literal is a sequence of characters surrounded by double quotes ". The trailing quote can be omitted at end of line. The characters \"¦¨¶ and newlines should be escaped with backslashes. The unescaped characters ¦¨¶ can be used for backslash, double quote and newline, respectively. Semantically, strings are lists of characters.

Command line inputs

Input values given on the command line follow Haskell's formatting conventions. Negative numbers are denoted with the minus sign -, and digit sequences without dots are always interpreted as integers. Fractions use /, and the three special values are shown as Inf, -Inf and Any. Characters are in the format 'c' instead of 'c, and the characters \' and newlines must be escaped. Strings are not closed implicitly, and the shorthands ¦¨¶ are not available. Lists have the format [1,2,3], and pairs (1,'c').

Sometimes the type of an input value cannot be deduced unambiguously, like for an empty list, and this can affect type inference and program semantics. For this reason, types can be explicitly specified by appending : and the type. The syntax for types is a simple prefix grammar with the following tokens:

  • N for numbers, arity 0.
  • C for chars, arity 0.
  • L for lists, arity 1.
  • P for pairs, arity 2.

For example, an empty list of number-string pairs is denoted by []:LPNLC.

Output

The result of applying the main function defined in the program to the arguments provided is converted to a string and printed to Standard Output. The conversion is usually performed by the Haskell function show, but there are some exceptions:

  • If the type of the result is String (or equivalently [Char], a list of characters), this is printed directly without calling show.
  • If the type of the result is [String] (a list of strings), the strings are joined with newlines to a single string and then printed, using the Haskell function unlines.
  • If the type of the result is [[String]] (a list of lists of strings), the strings of each sublist are joined with spaces, using the Haskell function unwords, and then this is treated as the previous case.

Lambdas

The characters λμξφψχ are used for lambda abstraction. A single lambda abstraction creates two variables and makes them visible within its scope. If E is an expression, then λE) is an anonymous function of one argument. The argument can be accessed with the superscripts ⁰¹, and the closing parenthesis is optional at end of line. The exact semantics depend on which superscripts are used in E.

  • If ¹ is not present, then λE) evaluates to E with occurrences of replaced by the argument. If is also not present, the argument is ignored and E is returned as such.
  • If ¹ is present, then λE) evaluates to with occurrences of ¹ replaced by the argument x, and occurrences of replaced by a higher order function that takes a function, flips its arguments and evaluates it on x. This means there are two effects: the argument is implicitly appended to E, and each occurrence of ⁰F in E, where F is a function, becomes a shorthand for `F¹.

With nested lambdas, we use a variant of de Bruijn indexing with superscripts through : ⁰¹ refer to the variable of the closest lambda, ²³ the variable of the next one, and so on. In general, odd-numbered superscripts are implicitly inserted after the expressions if present, while even-numbered superscripts are not; if both are present, the even-numbered superscript refers to a higher-order function as specified above. Similarly to the line labels, a space followed by a string of superscript digits is parsed as a multi-digit lambda argument, which follows the same rules. The characters μξ are nested lambdas: μE) is the same as λλE)), and ξE) is the same as λλλE))), except that odd-numbered superscripts are inserted after the inner expression E, and not between the lambda expressions. The characters φψχ correspond to λμξ, but they define fixpoint lambdas: the first argument refers to the anonymous function itself. They can be used to implement inline recursion.

If you use a superscript digit that is larger than the number of enclosing lambdas, it will refer to an argument of the whole line. It causes the appropriate number of lambdas to be implicitly inserted at the beginning of the line, and odd-numbered superscripts are again implicitly appended to the line expression, inside the implicit lambdas.

Built-ins

Most Husk built-ins are one character long, and denote functions. For example, as mentioned above, _ is the negation function. The list of built-ins can be found on its own Wiki page.

Source formats

Husk programs can be written in three formats: Unicode, bytes, and verbose ASCII. In Unicode mode, the source file is simply read as a Unicode string. In bytes mode, the source is read as a byte string, and each byte is interpreted as a character via the Husk code page. Verbose mode is equivalent to Unicode mode, except that every non-ASCII character has an ASCII alias that can be used in its place, surrounded by backslashes. For example, can be replaced by \leq\. A literal backslash is \\. The list of aliases is given on a separate Wiki page.

Clone this wiki locally