This document explains some syntactical innovations included in merd.
See here for a more detailed syntax description.

# Function Calls

Here are the different syntax used for function calls:
gcd(10, 4)
• math
• Algol-like (C, C++, Java, Perl, Python...)
(gcd 10 4)
• Lisp
gcd(10,4,r)

Facts:

• people know well the math notation
• ML notation can be counter-intuitive eg: map gcd(0) list when you mean map (gcd 10) list
• one solution is to use horizontal layout
• another solution is to disallow this syntax (rationale: keeping both notations is bad)
• ML notation enables easy partial application [1]
! partial application can be misleading if you don't know the function
! ML partial application mix up two different beasts:
• let f1 x y = x + y which needs both arguments to compute something
• let f2 x = (print "f2 called" ; fun y -> x + y) which really produces a function using the first argument.
The time of evalution is different, but the type signatures of f1 and f2 are the same. This forbids eta-expansion: f2 x is not equivalent to y -> f2 x y

In merd, i propose to use gcd(10,) as sugar for x -> gcd(10,x), instead of ML's (gcd 10).
In gcd(10, ), i call the empty parameter a hole.

Example of use: length = foldl(, 0, (+ 1))

Pros:

• keep the simple math notation
• exhibit the partial application: in ML's (gcd 10) you can't know whether it is a partial application, whereas in gcd(10,) you know it is a partial application
• partial application is expressive enough compared to ML
• no pb for the partial application on second parameter (=> no need for haskell's flip) [2]
• no pb of currying/uncurrying
• enables default values and overloading based on number of parameters (think C++, Java)
• no evaluation time problem caused by partial evaluation:
• f1(x,y) = x+y has type Int,Int -> Int and is partial evaluated using f1(10,) whereas
• f2(x) = (print "f2 called" ; y -> x+y) has type Int -> Int -> Int and is partial evaluated using f2(10) [3]
Cons:
• not as clean theoretically
• more sugared (Lisp doesn't need "," as a tuple constructor)

This hole can be used in function declaration too:

```member?(e,) =
[] -> False
e : _ -> True
_ : l -> member?(e,l)
```

## is "id(1,2)" allowed when "id" expects one argument?

With "id(x) = x", one could allow "id(1,2)" where x's value is the tuple (1,2).

Disallowing this makes higher-order programming harder:

```apply(f,x) = f(x)
myfirst(a,b) = apply((x,_ -> x), (a,b))
```
That's why merd will allow id(1,2) and rely on type-checking to catch bad use of functions.

# WYSIHIIP

WYSIHIIP = What You See Is How It Is Parsed

I invented this word to classify the cases when the parsing is misleading. It belongs to the more general idea of least surprised.

## Horizontal Layout

Here are some examples

• Perl/Ruby/Python: -3 ** 2 gives -9

• Haskell: -3 ^ 2 gives -9

• OCaml: - 3.**2. gives 9.

• Ruby: Math.sqrt (1-2).abs
will fail because it is parsed as (Math.sqrt(1-2)).abs [4]

• C: 1 + i>>4
is parsed (1+i) >> 4

• ML (Haskell...): map gcd(10) list
when you meant map (gcd 10) list

• Perl \$c ? \$i=2 : \$j=3
is awfull: Perl parses it as (\$c ? (\$i=2) : \$j) = 3, and you don't get any warning

There are 2 (non-exclusive) solutions:

• Restrict or disallow those expressions
• Lisp uses a radical solution with no sugar at all
• latest GCC warns about most operator mixing [5]
• Use horizontal layout to disambiguate.
• zero or more space is different [6]
• tokens non-separated by spaces are parenthesized

so 1+2 * 3 is parsed as (1+2)*3

## Indentation Based Grouping

It is also called vertical layout.
The classic example is
```if (C1)
if (C2)
S1;
else
S2;
```
which is terribly misleading because the indentation suggests that
• S2 is executed when !C1, whereas
• S2 is executed when C1 && !C2

The solution is to base the grouping on indentation.

Pros:

• standardize the code style
Cons:
• tabulations can make everything go wild: so tabulations must be forbidden or the tabulation size must be precised in the source code (via a pragma) (and anyway don't use tabulations in any language)
• complete rigid indentation is not possible (think emacs), but intelligent editors can be smart enough to make things smooth (try emacs mode for python or haskell)
• can need special syntax for overrule indentation rules => complexify the language

### proposal

merd completly generalizes the layout scheme found in haskell (python's layout is even simpler):
```aaaaa
bbb
c
c
aaaaa
```
is the same as (aaaaa ; (bbb ; (c) ; (c))) ; (aaaaa)

# Choosing the operator and function names

## Choice of functions name

Rules for choosing:
• choose the more common used function names (cf Syntax Across Languages)
• keep the whole coherent
• choose the longer name if it enhances readability and the function is not used very often (huffman compression) (eg: rev vs reverse)
• choose the shorter name when the longer doesn't enhance readability (eg: foldl vs fold_left) ??
• use "_" as word separator: separate_all_words_with_underscores instead of capitalizeTheSecondaryWords and CamelCase. (some rationale: GNU Coding Standards (Stallman), Ada, Eiffel, glasses emacs mode, various)

## Choice of operators name

See Syntax Across Languages to see what other languages are using.

• ``.'' more common method invocation operator (C++, Java, Python, Beta, Cecil, Delphi, Eiffel, Sather, Modula-3, Ruby, Visual Basic, Icon).
• ``::'' common package resolution operator (C++, Perl). The ``.'' operator can't be used for this (as in Java, Python, Ruby, Modula-3) otherwise Module.method(para) would mean method(Module, para) whereas when imported is is used as method(para). Aka the syntax Module.method(para) would need a special syntax rule, disallowing module as first class values.
• ``{ ... }'' record selector
• ``:='', ``='' both assignment/declaration operator are available, with different priorities.
• ``!!'' type operator
• ``#'' most standard Unix commenting char (Perl, Ruby, Python, Tcl, Icon, Awk, Shell)
• ``+'' string concatenation (Ruby, Python, Java, C++)
• ``+'' list concatenation (Ruby, Python)
• ``[ a, b, c ]'' list constructor (Haskell)
• ``||'' logical or

# Operator priorities (precedence)

Instead of numbered priorities, it would be better to do it the Cecil way: define a partial-order relation on operators

# Various

## Association Variable Name & Type

### Introduction

Do you know FORTRAN? No? Well FORTRAN didn't have explicit typing. Instead it had implicit typing based on the variable name. I, J, K, L, M and N are ints and all others are floats. Of course, this is very limitative to have a type associated with each variable name. That's why, since FORTRAN, languages have avoided this feature.

But people like that idea. The hungarian notation is based on this:

Long, long ago in the early days of DOS, Microsoft's Chief Architect Dr. Charles Simonyi introduced an identifier naming convention that adds a prefix to the identifier name to indicate the functional type of the identifier.
A big limitation of this hungarian notation is that it's only a convention, not enforced by the C compiler[7]. It also take away some readability. Perl is another case of association variable name and type. It uses the prefix \$, @, %. This is quite verbose as most variables are \$ prefixed. It doesn't help readability and lowers expressivity.

### Proposal

Give the programmer the ability to associate a variable name with a type. It is different from a global variable. It just tells that everytime the variable is used, its type must be compatible. eg (inspired by Haskell's Prelude):
```vartype c = Char

isDigit c =  c >= '0' && c <= '9'
...
primExitWith :: Int -> IO a
primExitWith c = IO (\ f s -> Hugs_ExitWith c)
```
will fail to typecheck because of c in primExitWith.

Another example inspired by Scheme;

```vartype ".*\?" = a -> Bool
vartype ".*\!" = a -> Unit
```
this enforce the convention that functions of the form xxx? are predicates and xxx! are mutators.

A good scope for this association is the module. Exporting this association seems a nice feature to ensure a global behaviour.

Pros:

• stricter typechecking retaining expressivity
• give the ability to ensure a common behaviour for some variables
Cons:
• variable type and usage is separated. Good error reporting is needed: the typechecker must issue special error messages when variables are global-typed.
• complexify the language
• it may complexify the typechecker (for the type error reporting)

## Open-ended lists

```animals = [
"cat",
"dog",
]
```
is not a valid Haskell code because of the last comma. This is very annoying because the last line must be treated differently.
(OCaml, Python, Ruby, Perl, C... are ok)

But beware, it also means than

```f(foo,
bar,)
f(foo,
bar,
)
```
are not the same. The first introduces a hole, but not the second one.

## One element tuple

### Why is 1-uple needed?

In languages allowing computing tuples (eg: (1,2) + (3,4) => (1,2,3,4)), it is necessary to have 1 element tuples. Otherwise you have to allow:
``` (1,2) + 3 => (1,2,3)
```
which is no good for catching errors (at compile-time for merd, at run-time for python)

The ability to compute tuples is very important to handle things like the compile-time typed printf, or things alike macro-processing.

### The 1-uple syntax issue

merd uses the comma to construct tuples. Alas this doesn't handle 0-uple and 1-uple.
• the most commonly accepted syntax for empty tuples is "()".
• since merd makes a distinction between a value and the 1-uple containing that value (for better type checking), you need a way to write this 1-uple.
• Python uses "(a,)". This syntactic construct is already used in merd for partial application
• Perl use parentheses for both grouping and tuples (called lists). This causes some problem:
``` "Hello "  x 3   #=> "Hello Hello Hello "
("Hello ") x 3   #=> ("Hello ", "Hello ", "Hello ")
("Hello " . world()) x 3   #=> ("Hello world", "Hello world", "Hello world")
```
In ("Hello " . world()) x 3, parentheses are necessary because the priority of "x" is higher than ".". As a result, you must write ("Hello " . world())[0] x 3 to have the string concatenation behaviour.
• To escape the ambiguity of using parentheses for both grouping and tuples, merd's parentheses have a tuple meaning only when double-parenthesing is used: (2) is equivalent to 2 whereas ((2)) is the 1-uple containing 2.
• Of course, 1-uple could also be handled using a normal function: "tuple(2)" would be the 1-uple containing 2. No need for special sugar.

Some examples in the various syntaxes:
rawPythonMerd
tuple(2)2,((2))
tuple(1,2)1,21,2
tuple(tuple(1,2))(1,2),((1,2))
tuple(tuple(2))(2,),((((2))))
tuple(1) + tuple(2)
--> tuple(1,2)
1, + 2,
--> (1,2),
((1)) + ((2))
--> ((1,2))
tuple(tuple(1,2)) + tuple(tuple(3,4))
--> tuple(tuple(1,2), tuple(3,4))
((1,2),) + ((3,4),)
--> ((1,2), (3,4)),
((1,2)) + ((3,4))
--> ((1,2), (3,4))

## Recursivity

• Haskell: a variable definition is recursive (unless you define another value in a where clause)
• OCaml: a variable definition is not recursive. Recursive functions are introduced using a special construct let rec.
• Merd: a function definition is recursive, a variable definition is non recursive. Detecting whether this is a variable or a function declaration is based on the syntax. Examples of function declarations:
```f(x) = x
f := x -> x
```

# Notes

• This problem is partially solved in OCaml with named parameters
• There is a work-around in Haskell for partially applying the second parameter: "(`f` 2)" is "(\x -> f x 2)"

[3] And eta expansion is preserved:

• f1(10,) is the same as x -> f1(10,x) by definition of the partial evaluation
• f2(10) is the same as x -> f2(10)(x)

But note that evaluation time is kind of weird is merd. Partial evaluation is used...

[4] Even worse return (1-2).abs is parsed return((1-2).abs) which show that return is parsed differently even if it has a functional syntax just like Math.sqrt. return has a lower precedence.

another non-WYSIHIIP ruby example: p (1..10).to_a parsed as (p(1..10)).to_a.

example of why raising method priority would fail is sin(0.7).to_i

"ruby -w" catches most of this problem, so use it!

[5] You can't even use the fact that && has precedence over || or you get
``warning: suggest parentheses around && within ||''

[6] experimentation is needed to know if this rule could work for more than one space, eg:
1 + 2  *  3 parsed as (1+2)*3

[7] Associating a type with a variable is not easy, especially in C where coercions are everyday life. I don't think it would be possible to enforce the association without loosing a lot of expressivity.

Pixel