Skip to content

Latest commit

 

History

History
683 lines (541 loc) · 31.2 KB

STANDARDS.md

File metadata and controls

683 lines (541 loc) · 31.2 KB

Introduction

This document describes a set of standards for all code under the Liqwid project. It also explains our reasoning for these choices, and acts as a living document of our practices for current and future contributors to the project. We intend for this document to evolve as our needs change, as well as act as a single point of truth for standards.

Motivation

The desired outcomes from the prescriptions in this document are as follows.

Increase consistency

Inconsistency is worse than any standard, as it requires us to track a large amount of case-specific information. Software development is already a difficult task due to the inherent complexities of the problems we seek to solve, as well as the inherent complexities foisted upon us by decades of bad historical choices we have no control over. For newcomers to a project and old hands alike, increased inconsistency translates to developmental friction, resulting in wasted time, frustration and ultimately, worse outcomes for the code in question.

To avoid putting ourselves into this boat, both currently and in the future, we must strive to be automatically consistent. Similar things should look similar; different things should look different; as much as possible, we must pick some rules and stick to them; and this has to be clear, explicit and well-motivated. This will ultimately benefit us, in both the short and the long term. The standards described here, as well as this document itself, is written with this foremost in mind.

Limit non-local information

There is a limited amount of space in a developer's skull; we all have bad days, and we forget things or make decisions that, perhaps, may not be ideal at the time. Therefore, limiting cognitive load is good for us, as it reduces the amount of trouble we can inflict due to said skull limitations. One of the worst contributors to cognitive load (after inconsistency) is non-local information

  • the requirement to have some understanding beyond the scope of the current unit of work. That unit of work can be a data type, a module, or even a whole project; in all cases, the more non-local information we require ourselves to hold in our minds, the less space that leaves for actually doing the task at hand, and the more errors we will introduce as a consequence.

Thus, we must limit the need for non-local information at all possible levels. 'Magic' of any sort must be avoided; as much locality as possible must be present everywhere; needless duplication of effort or result must be avoided. Thus, our work must be broken down into discrete, minimal, logical units, which can be analyzed, worked on, reviewed and tested in as much isolation as possible. This also applies to our external dependencies.

Thus, many of the decisions described here are oriented around limiting the amount of non-local knowledge required at all levels of the codebase. Additionally, we aim to avoid doing things 'just because we can' in a way that would be difficult for other Haskellers to follow, regardless of skill level.

Minimize impact of legacy

Haskell is a language that is older than some of the people currently writing it; parts of its ecosystem are not exempt from it. With age comes legacy, and much of it is based on historical decisions which we now know to be problematic or wrong. We can't avoid our history, but we can minimize its impact on our current work.

Thus, we aim to codify good practices in this document as seen today. We also try to avoid obvious 'sharp edges' by proscribing them away in a principled, consistent and justifiable manner.

Automate away drudgery

As developers, we should use our tools to make ourselves as productive as possible. There is no reason for us to do a task if a machine could do it for us, especially when this task is something boring or repetitive. We love Haskell as a language not least of all for its capability to abstract, to describe, and to make fun what other languages make dull or impossible; likewise, our work must do the same.

Many of the tool-related proscriptions and requirements in this document are driven by a desire to remove boring, repetitive tasks that don't need a human to perform. By removing the need for us to think about such things, we can focus on those things which do need a human; thus, we get more done, quicker.

Conventions

The words MUST, SHOULD, MUST NOT, SHOULD NOT and MAY are defined as per RFC 2119.

Tools

Compiler warning settings

The following warnings MUST be enabled for all builds of any project, or any project component:

  • -Wall
  • -Wcompat
  • -Wincomplete-record-updates
  • -Wincomplete-uni-patterns
  • -Wredundant-constraints
  • -Werror

Justification

These options are suggested by Alexis King - the justifications for them can be found at the link. These fit well with our motivations, and thus, should be used everywhere. The -Werror ensures that warnings cannot be ignored: this means that problems get fixed sooner.

Linting

Every source file MUST be free of warnings as produced by HLint, with default settings.

Justification

HLint automates away the detection of many common sources of boilerplate and inefficiency. It also describes many useful refactors, which in many cases make the code easier to read and understand. As this is fully automatic, it saves effort on our part, and ensures consistency across the codebase without us having to think about it.

Code formatting

Every source file MUST be formatted according to Fourmolu, with the following settings (as per its settings file):

  • indentation: 2
  • comma-style: leading
  • record-brace-space: true
  • indent-wheres: true
  • diff-friendly-import-export: true
  • respectful: true
  • haddock-style: multi-line
  • newlines-between-decls: 1

Each source code line MUST be at most 100 characters wide, and SHOULD be at most 80 characters wide.

Justification

Consistency is the most important goal of readable codebases. Having a single standard, automatically enforced, means that we can be sure that everything will look similar, and not have to spend time or mind-space ensuring that our code complies. Additionally, as Ormolu is opinionated, anyone familiar with its layout will find our code familiar, which eases the learning curve.

Lines wider than 80 characters become difficult to read, especially when viewed on a split screen. Sometimes, we can't avoid longer lines (especially with more descriptive identifiers), but a line length of over 100 characters becomes difficult to read even without a split screen. We don't enforce a maximum of 80 characters for this exact reason; some judgment is allowed.

Code practices

Naming

camelCase MUST be used for all non-type, non-data-constructor names; otherwise, TitleCase MUST be used. Acronyms used as part of a naming identifier (such as 'JSON', 'API', etc) SHOULD be downcased; thus repairJson and fromHttpService are correct. Exceptions are allowed for external libraries (Aeson's parseJSON for example).

Justification

camelCase for non-type, non-data-constructor names is a long-standing convention in Haskell (in fact, HLint checks for it); TitleCase for type names or data constructors is mandatory. Obeying such conventions reduces cognitive load, as it is common practice among the entire Haskell ecosystem. There is no particular standard regarding acronym casing: examples of always upcasing exist (Aeson) as well as examples of downcasing (http-api-data). One choice for consistency (or as much as is possible) should be made however.

Modules

All publically facing modules (namely, those which are not listed in other-modules in package.yaml) MUST have explicit export lists.

All modules MUST use one of the following conventions for imports:

  • import Foo (Baz, Bar, quux)
  • import qualified Foo as F

Data types from qualified-imported modules SHOULD be imported unqualified by themselves:

import Data.Vector (Vector)
import qualified Data.Vector as Vector

The main exception is if such an import would cause a name clash:

-- no way to import both of these without clashing the Vector type name
import qualified Data.Vector as Vector
import qualified Data.Vector.Storable as VStorable

The sole exception is a 'hiding import' to replace part of the functionality of Prelude:

-- replace the String-based readFile with a Text-based one
import Prelude hiding (readFile)
import Data.Text.IO (readFile)

Data constructors SHOULD be imported individually. For example, given the following data type declaration:

module Quux where

data Foo = Bar Int | Baz

Its corresponding import should be:

import Quux (Foo, Bar, Baz)

For type class methods, the type class and its methods MUST be imported as so:

import Data.Aeson (FromJSON (fromJSON))

Qualified imports SHOULD use the entire module name (that is, the last component of its hierarchical name) as the prefix. For example:

import qualified Data.Vector as Vector

Exceptions are granted when:

  • The import would cause a name clash anyway (such as different vector modules); or
  • We have to import a data type qualified as well.

Justification

Explicit export lists are an immediate, clear and obvious indication of what publically visible interface a module provides. It gives us stability guarantees (namely, we know we can change things that aren't exported and not break downstream code at compile time), and tells us where to go looking first when inspecting or learning the module. Additionally, it means there is less chance that implementation details 'leak' out of the module due to errors on the part of developers, especially new developers.

One of the biggest challenges for modules which depend on other modules (especially ones that come from the project, rather than an external library) is knowing where a given identifier's definition can be found. Having explicit imports of the form described helps make this search as straightforward as possible. This also limits cognitive load when examining the sources (if we don't import something, we don't need to care about it in general). Lastly, being explicit avoids stealing too many useful names.

In general, type names occur far more often in code than function calls: we have to use a type name every time we write a type signature, but it's unlikely we use only one function that operates on said type. Thus, we want to reduce the amount of extra noise needed to write a type name if possible. Additionally, name clashes from function names are far more likely than name clashes from type names: consider the number of types on which a size function makes sense. Thus, importing type names unqualified, even if the rest of the module is qualified, is good practice, and saves on a lot of prefixing.

LANGUAGE pragmata

The following pragmata MUST be enabled at project level (that is, in package.yaml):

  • DeriveFunctor
  • DerivingStrategies
  • EmptyCase
  • FlexibleContexts
  • FlexibleInstances
  • GeneralizedNewtypeDeriving
  • InstanceSigs
  • ImportQualifiedPost
  • LambdaCase
  • MultiParamTypeClasses
  • NoImplicitPrelude
  • OverloadedLabels
  • OverloadedStrings
  • TupleSections

Any other LANGUAGE pragmata MUST be enabled per-file. All language pragmata MUST be at the top of the source file, written as {-# LANGUAGE PragmaName #-}.

Furthermore, the following pragmata MUST NOT be used, or enabled, anywhere:

  • PartialTypeSignatures

Justification

DerivingStrategies is good practice (and in fact, is mandated by this document); it avoids ambiguities between GeneralizedNewtypeDeriving and DeriveAnyClass, allows considerable boilerplate savings through use of DerivingVia, and makes the intention of the derivation clear on immediate reading, reducing the amount of non-local information about derivation priorities that we have to retain. DeriveFunctor and GeneralizedNewtypeDeriving are both obvious and useful extensions to the auto-derivation systems available in GHC. Both of these have only one correct derivation (the former given by parametricity guarantees, the latter by the fact that a newtype only wraps a single value). As there is no chance of unexpected behaviour by these, no possible behaviour variation, and that they're key to supporting both the stock and newtype deriving stratgies, having these on by default removes considerable tedium and line noise from our code. A good example are newtype wrappers around monadic stacks:

newtype FooM a = FooM (ReaderT Int (StateT Text IO) a)
  deriving newtype (
    Functor,
    Applicative,
    Monad,
    MonadReader Int,
    MonadState Text,
    MonadIO
    )

EmptyCase not being on by default is an inconsistency of Haskell 2010, as the report allows us to define an empty data type, but without this extension, we cannot exhaustively pattern match on it. This should be the default behaviour for reasons of symmetry.

FlexibleContexts and FlexibleInstances paper over a major deficiency of Haskell2010, which in general isn't well-motivated. There is no real reason to restrict type arguments to variables in either type class instances or type signatures: the reasons for this choice in Haskell2010 are entirely for the convenience of the implementation. It produces no ambiguities, and in many ways, the fact this isn't the default is more surprising than anything. Additionally, many core libraries rely on one, or both, of these extensions being enabled (mtl is the most obvious example, but there are many others). Thus, even for popularity and compatibility reasons, these should be on by default.

InstanceSigs are harmless by default, and introduce no complications. Their not being default is strange. ImportQualifiedPost is already a convention of this project, and helps with formatting of imports.

LambdaCase reduces a lot of code in the common case of analysis of sum types. Without it, we are forced to either write a dummy case argument:

foo s = case s of
-- rest of code here

Or alternatively, we need multiple heads:

foo Bar = -- rest of code
foo (Baz x y) = -- rest of code
-- etc

LambdaCase is shorter than both of these, and avoids us having to bind variables, only to pattern match them away immediately. It is convenient, clear from context, and really should be part of the language to begin with.

MultiParamTypeClasses are required for a large number of standard Haskell libraries, including mtl and vector, and in many situations. Almost any project of non-trivial size must have this extension enabled somewhere, and if the code makes significant use of mtl-style monad transformers or defines anything non-trivial for vector, it must use it. Additionally, it arguably lifts a purely implementation-driven decision of the Haskell 2010 language, much like FlexibleContexts and FlexibleInstances. Lastly, although it can introduce ambiguity into type checking, it only applies when we want to define our own multi-parameter type classes, which is rarely necessary. Enabling it globally is thus safe and convenient.

Based on the recommendations of this document (driven by the needs of the project and the fact it's cardinally connected with Plutus), NoImplicitPrelude is required to allow us to default to the Plutus prelude instead of the one from base.

OverloadedStrings deals with the problem that String is a suboptimal choice of string representation for basically any problem, with the general recommendation being to use Text instead. It is not, however, without its problems:

  • ByteStrings are treated as ASCII strings by their IsString instance;
  • Overly polymorphic behaviour of many functions (especially in the presence of type classes) forces extra type signatures;

These are usually caused not by the extension itself, but by other libraries and their implementations of either IsString or overly polymorphic use of type classes without appropriate laws (Aeson's KeyValue is a particularly egregious offender here). The convenience of this extension in the presence of literals, and the fact that our use cases mostly covers Text, makes it worth using by default.

TupleSections smooths out an oddity in the syntax of Haskell 2010 regarding partial application of tuple constructors. Given a function like foo :: Int -> String -> Bar, we accept it as natural that we can write foo 10 to get a function of type String -> Bar. However, by default, this logic doesn't apply to tuple constructors. As special cases are annoying to keep track of, and in this case, serve no purpose, as well as being clear from their consistent use, this should also be enabled by default; it's not clear why it isn't already.

The exclusion of PartialTypeSignatures is by design, as it creates confusing situations which are hard to understand.

Prelude

The PlutusTx.Prelude MUST be used. A 'hiding import' to remove functionality we want to replace SHOULD be used when necessary. If functionality from the Prelude in base is needed, it SHOULD be imported qualified. Other preludes MUST NOT be used.

Justification

As this is primarily a Plutus project, we are in some ways limited by what Plutus requires (and provides). Especially for on-chain code, the Plutus prelude is the one we need to use, and therefore, its use should be as friction-free as possible. As many modules may contain a mix of off-chain and on-chain code, we also want to make impendance mismatches as limited as possible.

By the very nature of this project, we can assume a familiarity (or at least, the goal of such) with Plutus stuff. Additionally, every Haskell developer is familiar with the Prelude from base. Thus, any replacements of the Plutus prelude functionality with the base prelude should be clearly indicated locally.

Haskell is a 30-year-old language, and the Prelude is one of its biggest sources of legacy. A lot of its defaults are questionable at best, and often need replacing. As a consequence of this, a range of 'better Preludes' have been written, with a range of opinions: while there is a common core, a large number of decisions are opinionated in ways more appropriate to the authors of said alternatives and their needs than those of other users of said alternatives. This means that, when a non-base Prelude is in scope, it often requires familiarity with its specific decisions, in addition to whatever cognitive load the current module and its other imports impose. Given that we already use an alternative prelude (in tandem with the one from base), additional alternatives present an unnecessary cognitive load. Lastly, the dependency footprint of many alternative Preludes is highly non-trivial; it isn't clear if we need all of this in our dependency tree.

For all of the above reasons, the best choice is 'default to Plutus, with local replacements from base'.

Versioning

A project MUST use the PVP. Two, and only two, version numbers MUST be used: a major version and a minor version.

Justification

The Package Versioning Policy is the conventional Haskell versioning scheme, adopted by most packages on Hackage. It is clearly described, and even automatically verifiable by use of tools like policeman. Thus, adopting it is both in line with community standards (making it easier to remember), and simplifies cases such as Hackage publication or open-sourcing in general.

Two version numbers (major and minor) is the minimum allowed by the PVP, indicating compilation-breaking and compilation-non-breaking changes respectively. As parsimony is best, and more granularity than this isn't generally necessary, adopting this model is the right decision.

Documentation

Every publically-exported definition MUST have a Haddock comment, detailing its purpose. If a definition is a function, it SHOULD also have examples of use using Bird tracks. The Haddock for a publically-exported definition SHOULD also provide an explanation of any caveats, complexities of its use, or common issues a user is likely to encounter.

If the code project is a library, these Haddock comments SHOULD carry an @since annotation, stating what version of the library they were introduced in, or the last version where their functionality or type signature changed.

For type classes, their laws MUST be documented using a Haddock comment.

Justification

Code reading is a difficult task, especially when the 'why' rather than the 'how' of the code needs to be deduced. A good solution to this is documentation, especially when this documentation specifies common issues, provides examples of use, and generally states the rationale behind the definition.

For libraries, it is often important to inform users what changed in a given version, especially where 'major bumps' are concerned. While this would ideally be addressed with accurate changelogging, it can be difficult to give proper context. @since annotations provide a granular means to indicate the last time a definition changed considerably, allowing someone to quickly determine whether a version change affects something they are concerned with.

As stated elsewhere in the document, type classes having laws is critical to our ability to use equational reasoning, as well as a clear indication of what instances are and aren't permissible. These laws need to be clearly stated, as this assists both those seeking to understand the purpose of the type class, and also the expected behaviour of its instances.

Other

Lists SHOULD NOT be field values of types; this extends to Strings. Instead, Vectors (Texts) SHOULD be used, unless a more appropriate structure exists. On-chain code, due to a lack of alternatives, is one place lists can be used as field values of types.

Partial functions MUST NOT be defined. Partial functions SHOULD NOT be used except to ensure that another function is total (and the type system cannot be used to prove it).

Derivations MUST use an explicit strategy. Thus, the following is wrong:

newtype Foo = Foo (Bar Int)
    deriving (Eq, Show, Generic, FromJSON, ToJSON, Data, Typeable)

Instead, write it like this:

newtype Foo = Foo (Bar Int)
    deriving stock (Generic, Data, Typeable)
    deriving newtype (Eq, Show)
    deriving anyclass (FromJSON, ToJSON)

Deriving via SHOULD be preferred to newtype derivation, especially where the underlying type representation could change significantly.

type SHOULD NOT be used. The only acceptable case is abbreviation of large type-level computations. In particular, using type to create an abstraction boundary MUST NOT be done.

Justification

Haskell lists are a large example of the legacy of the language: they (in the form of singly linked lists) have played an important role in the development of functional programming (and for some 'functional' languages, continue to do so). However, from the perspective of data structures, they are suboptimal except for extremely specific use cases. In almost any situation involving data (rather than control flow), an alternative, better structure exists. Although it is both acceptable and efficient to use lists within functions (due to GHC's extensive fusion optimizations), from the point of view of field values, they are a poor choice from both an efficiency perspective, both in theory and in practice. For almost all cases where you would want a list field value, a Vector field value is more appropriate, and in almost all others, some other structure (such as a Map) is even better.

Partial functions are runtime bombs waiting to explode. The number of times the 'impossible' happened, especially in production code, is significant in our experience, and most partiality is easily solvable. Allowing the compiler to support our efforts, rather than being blind to them, will help us write more clear, more robust, and more informative code. Partiality is also an example of legacy, and it is legacy of considerable weight. Sometimes, we do need an 'escape hatch' due to the impossibility of explaining what we want to the compiler; this should be the exception, not the rule.

Derivations are one of the most useful features of GHC, and extend the capabilities of Haskell 2010 considerably. However, with great power comes great ambiguity, especially when GeneralizedNewtypeDeriving is in use. While there is an unambiguous choice if no strategy is given, it becomes hard to remember. This is especially dire when GeneralizedNewtypeDeriving combines with DeriveAnyClass on a newtype. Explicit strategies give more precise control over this, and document the resulting behaviour locally. This reduces the number of things we need to remember, and allows more precise control when we need it. Lastly, in combination with DerivingVia, considerable boilerplate can be saved; in this case, explicit strategies are mandatory.

The only exception to the principle above is newtype deriving, which can occasionally cause unexpected problems; if we use a newtype derivation, and change the underlying type, we get no warning. Since this can affect the effect of some type classes drastically, it would be good to have the compiler check our consistency.

type is generally a terrible idea in Haskell. You don't create an abstraction boundary with it (any operations on the 'underlying type' still work over it), and compiler output becomes very inconsistent (sometimes showing the type definition, sometimes the underlying type). If your goal is to create an abstraction boundary with its own operations, newtype is both cost-free and clearer; if that is not your goal, just use the type you'd otherwise rename, since it's equivalent semantically. The only reasonable use of type is to hide complex type-level computations, which would otherwise be too long. Even this is somewhat questionable, but the questionability comes from the type-level computation being hidden, not type as such.

Design practices

Parse, don't validate

Boolean blindness SHOULD NOT be used in the design of any function or API. Returning more meaningful data SHOULD be the preferred choice. The general principle of 'parse, don't validate' SHOULD guide design and implementation.

Justification

The description of boolean blindness gives specific reasons why it is a poor design choice; additionally, it runs counter to the principle of 'parse, don't validate. While sometimes unavoidable, in many cases, it's possible to give back a more meaningful response than 'yes' or 'no, and we should endeavour to do this. Designs that avoid boolean blindness are more flexible, less bug-prone, and allow the type checker to assist us when writing. This, in turn, reduces cognitive load, improves our ability to refactor, and means fewer bugs from things the compiler could have checked if a function wasn't boolean-blind.

No multi-parameter type-classes without functional dependencies

Any multi-parameter type class MUST have a functional dependency restricting its relation to a one-to-many at most. In cases of true many-to-many relationships, type classes MUST NOT be used as a solution to the problem.

Justification

Multi-parameter type classes allow us to express more complex relationships among types; single-parameter type classes effectively permit us to 'subset' Hask only. However, multi-parameter type classes make type inference extremely flakey, as the global coherence condition can often lead to the compiler being unable to determine what instance is sought even if all the type parameters are concrete, due to anyone being able to add a new instance at any time. This is largely caused by multi-parameter type classes defaulting to effectively representing arbitrary many-to-many relations.

When we do not have arbitrary many-to-many relations, multi-parameter type classes are useful and convenient. We can indicate this using functional dependencies, which inform the type checker that our relationship is not arbitrarily many-to-many, but rather many-to-one or even one-to-one. This is a standard practice in many libraries (mtl being the most ubiquitous example), and allows us the benefits of multi-parameter type classes without making type checking confusing and difficult.

In general, many-to-many relationships pose difficult design choices, for which type classes are not the correct solution. If a functional dependency cannot be provided for a type class, it suggests that the current design relies inherently on a many-to-many relation, and should be either rethought to eliminate it, or be dealt with using a more appropriate means.

Type classes must have laws

Any type class not imported from an external dependency MUST have laws. These laws MUST be documented in a Haddock comment on the type class definition, and all instances MUST follow these laws.

Justification

Type classes are a powerful feature of Haskell, but can also be its most confusing. As they allow arbitrary ad-hoc polymorphism, and are globally visible, it is important that we limit the confusion this can produce. Additionally, type classes without laws inhibit equational reasoning, which is one of Haskell's biggest strengths, especially in the presence of what amounts to arbitrary ad-hoc polymorphism.

Additionally, type classes with laws allow the construction of provably correct abstractions above them. This is also a common feature in Haskell, ranging from profunctor optics to folds. If we define our own type classes, we want to be able to abstract above them with total certainty of correctness. Lawless type classes make this difficult to do: compare the number of abstractions built on Functor or Traversable as opposed to Foldable.

Thus, type classes having laws provides both ease of understanding and additional flexibility.