Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intrinsic type operators and genericity #21

Open
wclodius2 opened this issue Oct 26, 2020 · 9 comments
Open

Intrinsic type operators and genericity #21

wclodius2 opened this issue Oct 26, 2020 · 9 comments

Comments

@wclodius2
Copy link
Contributor

In generics/theory/intrinsics/Readme.md we find the odd statements

"The purpose of this directory is to explore issues related to ensuring that intrinsic types are on an equal footing with user defined types.

The main concern has to do with "concepts" (ref C++). If Fortran generics provide a similar mechanism for restricting type-name parameters in a template, it would presumably be based upon the interfaces defined by type-bound procedures. But intrinsic types do not have type-bound procedures."

I say it is odd primarily because of the phrase "it would presumably be based upon the interfaces defined by type-bound procedures". While the early enforcement of type safety in generics, what C++ calls "strong concepts", in Fortran would have to be based upon interfaces, they do not have to be "type-bound" procedure interfaces. I can think of two other ways of specifying operator interfaces for type safety: either implicitly as a type attribute, or as separate generic parameters.

Ada in its generics can qualify the type parameters by what is essentially an attribute specifying the operations a type must support. These include assignment, equality operators, comparison operators, and numerical operators. In Fortran these could be specified by extending the type-attr-spec-list of the derived-type-stmt

derived-type-stmt is TYPE [ [ , type-attr-spec-list ] :: ] type-name [ ( type-param-name-list ) ]

to add, say, a required operators specifier, required-ops-spec

type-attr-spec is ABSTRACT
or access-spec
or BIND (C)
or EXTENDS ( parent-type-name )
or required-ops-spec

where required-ops-spec might be

required-ops-spec is BASIC
or EQUALITY
or COMPARABLE
or GROUP
or FIELD
or NUMERIC
or CONCATENABLE

with the following requirements on the operators supported by the type

required-ops-spec Required operations
BASIC Assignment
EQUALITY Assignment, ==, /=
COMPARABLE Assignment, ==, /=, >, >=, <, <=
GROUP Assignment, +, -
FIELD Assignment, +, -, *, /
NUMERIC Assignment, +, -, *, /, **
CONCATENABLE Assignment, //

The example of the Readme.md then becomes

TYPE, GROUP :: T
END TYPE T

The above approach has four weaknesses:

  1. It requires the instantiation type to define all the operations of the required-ops-spec, when only a subset may be needed;
  2. It assumes that both arguments to the operator are of the same type;
  3. It doesn't allow operator renaming, e.g., if the generic module consistently sorts using the > operator and the user wants to substitute the < operator to sort in the opposite order;
  4. It can't deal with user defined operator names, e.g., .operator..

All four weaknesses can be addressed by allowing generics to take operators as parameters, say by using OPERATOR(defined-operator) as a parameter. Then the example becomes in Fortranesque pseudo-code

generic module example(T, operator(+))
    type, basic :: T
    end type T
    interface operator(+)
        function plus(x, y) result(z)
            import T
            type(T), intent(in) :: x, y
            type(T) :: z
        end function plus
    end interface
    ...
end module example

Then an instantiation might be

use:: rename => example(real(real64), operator(+))

or if you want to substitute the * operator for the +

use:: rename => example(real(real64), operator(*))

As operators are just symbols for functions with one or two arguments, the standard could even allow renaming using functions as the instantiation

use:: rename => example(real(real64), my_fun)

The above is easily generalized to operators with non-uniform arguments

generic module example(T, U, V, operator(+))
    type, basic :: T
    end type T
    type, basic :: U
    end type U
    type, basic :: V
    end type V
    interface operator(+)
        function plus(x, y) result(z)
            import T, U, V
            type(T), intent(in) :: x
            type(U), intent(in) :: y
            type(V) :: z
        end function plus
    end interface
    ...
end module example


use:: rename => example(real(real64), complex(real64), complex(real64), operator(+))
@mleair
Copy link

mleair commented Oct 26, 2020

I am having trouble understanding how an "operator" parameter adds value here. Can you put together a use case that demonstrares why this is better than other coding styles such as user defined operators?

@wclodius2
Copy link
Contributor Author

wclodius2 commented Oct 27, 2020

The OPERATOR(defined-operator) syntax is currently used in three ways:

  1. To assign an operator "name" as an alias for a function of one or two arguments, whose names will not be important;
  2. To tell the processor to make that name public or private; and
  3. In the only list to tell the processor that that name will be one of the ones imported from a module.

I propose to extend that syntax to provide syntactic sugar in two mutually related generic contexts:

  1. On the generic definition side to denote an input parameter that is a "function" of one or two arguments to be used within the generic definition with operator syntax; and
  2. On the instantiation side to tell the processor to search the list of "functions" with that "name" as their interface to find one with arguments and results that correspond to the interface for the corresponding parameter in the generic definition statement.

For the purposes of the above the intrinsic operators are considered functions that are (currently) only available under their operator names.

Examples of generic instantiations where I would want to use a function of one or two parameters with operator syntax:

  • Hash tables need to compare hash values and keys for equality and inequality in entering and searching for data. I would prefer to use key1 == key2 over equals( key1, key2 );
  • Red-Black trees need to compare keys for equality and relative value in entering and searching for data. I would prefer to use key1 == key2 over equals( key1, key2 ), and key1 < key2 over less_than( key1, key2 );
  • Sorting need to compare values for equality and relative value. I would prefer to use value1 == value2 over equals( value1, value2 ), and value1 < value2 over less_than( value1, value2 ); and
  • Matrices use mathematical operations to define matrix addition and the forms of multiplication. I would prefer to to use matrix1(i,j)+matrix2(i,j) over plus(matrix1(i,j),matrix2(i,j)) and matrix1(i,j)*matrix2(j,k) over times(matrix1(i,j),matrix2(),k).

Examples of generic definitions where I would want to search among a list of operator names for the appropriate function would include any operators for the intrinsic types or derived types where the underlying name for the operator's function is private:

  • A hash table whose key is default character;
  • A hash table whose key is a derived type implementing the UCS;
  • Red-Black trees whose key is default character;
  • Sorting an array whose elements are an intrinsic type;
  • Matrices whose elements are a real or complex type; and
  • Matrices whose elements are a quaternion derived type whose underlying addition and multiplication operations are privater.

Note edited to change one use of parameters to arguments.

@wclodius2
Copy link
Contributor Author

For a use case to consider alternative syntaxes, let us follow Tom Clune's example and consider a generic where we want to use a plus operator of uniform type T_generic.

Tom asserts that the only way to do this and maintain "concepts" is to use interfaces defined by type bound procedures. This would have a syntax somewhat like the following.

generic module uses_plus(T_generic)
    type :: T_generic
    contains
        procedure :: plus
        generic :: operator(+) => plus
    end type
    interface
       function plus(a,b) result(c)
          type(T_generic) :: c
          class(T_generic), intent(in) :: a
          type(T_generic), intent(in) :: b
       end function plus
    end interface

which might be instantiated as

use :: plus_module => uses_plus(T_instantiation)

This has the advantage that the only argument you have to pass is a single type so it will work with the most restrictive definition of generics. It has as its main disadvantage that it is difficult to to make this work with intrinsic types as the instantiation type. Tom tries a number of approaches to try to make this work for intrinsics, but I am not happy with any of them. It has as an additional disadvantage that the only function that can be used for the + operation is the one named + in the instantiation type. It also doesn't consider the impact of non-type bound procedures.

As one alternative I suggest type attributes. With them the above example might be written:

generic module uses_plus(T_generic)
    type, group :: T_generic
    end type

which might be instantiated as

use :: plus_module => uses_plus(T_instantiation)

This has the following advantages:

  • the only argument you have to pass is a single type so it will work with the most restrictive definition of generics; and
  • it will work with the intrinsic types.

It has the following disadvantages:

  • The language has to add and define a new set of attributes for the derived type definition statement;
  • The instantiation type also has to define a - operator; and
  • The only function that can be used for the + operation is the one named + in the instantiation type.

As a partial workaround, one might consider allowing procedures as generic parameters. Then Tom's example becomes:

generic module uses_plus(T_generic, plus)
    type :: T_generic
    end type
    interface operator(+)
       function plus(a,b) result(c)
          type(T_generic):: c
          type(T_generic), intent(in) :: a
          type(T_generic), intent(in) :: b
       end function plus
    end interface

which might be instantiated as

use :: plus_module => uses_plus(T_instantiation, plus_instantiation)

This has the following disadvantages:

  • Generics have to be generalized to allow procedures as parameters; and
  • The function plus_instantiation has to be available on the instantiation side.
    • Difficult to make work with the intrinsic types; and
    • Exposes the implementation of derived types.

As a generalization of allowing procedures as generic parameters, I suggested allowing operators as generic parameters. Then the example might be written:

generic module uses_plus(T_generic, plus)
    type :: T_generic
    end type
    interface operator(+)
       function plus(a,b) result(c)
          type(T_generic):: c
          type(T_generic), intent(in) :: a
          type(T_generic), intent(in) :: b
       end function plus
    end interface

which might be instantiated as

use :: plus_module => uses_plus(T_instantiation, operator(+))

or as

generic module uses_plus(T_generic, operator(+))
    type :: T_generic
    end type
    interface operator(+)
       function plus(a,b) result(c)
          type(T_generic):: c
          type(T_generic), intent(in) :: a
          type(T_generic), intent(in) :: b
       end function plus
    end interface

which might be instantiated as

use :: plus_module => uses_plus(T_instantiation, operator(+))

This has the following advantages over Tom's approach:

  • It will easily work with the intrinsic types;
  • It allows renaming for flexibility.

Its main disadvantage is that it requires a more flexible, and hence more complicated, generic definition. If you are going to allow operators as parameters, you are almost certainly going to want to allow procedures as parameters. I would argue that to represent ranks and dimensions you will also want to allow integers and integer vectors as generic parameters, so you will wind up with a definition significantly more complicated than that of, say, Java.

@tclune
Copy link
Member

tclune commented Oct 27, 2020

@wclodius2 For your final example:

generic module uses_plus(T_generic, operator(+))
    type :: T_generic
    end type
    interface operator(+)
       function plus(a,b) result(c)
          type(T_generic):: c
          type(T_generic), intent(in) :: a
          type(T_generic), intent(in) :: b
       end function plus
    end interface

how would this work with a derived type that has defined is + as a type-bound procedure? There would no operator(+) to pass in at the point of instantiation. (At least presuming the actual procedure was itself private, and thus can only be accessed via the object.)

@tclune
Copy link
Member

tclune commented Oct 27, 2020

The issue of type-bound vs non type-bound operators is only one aspect that I was exploring with the directory that started this discussion.

Consider this case. I want to have a template that is a simple container that stores a single object whose type is defined by the type parameter T. And provide setters and getters.

The template could look something like

type :: container<T>
    type(T), allocatable :: object
contains
    procedure :: set
    procedure :: get
end type
abstract interface
      subroutine set(this, value)
            class(container<T>), intent(out) ::this
            type(T), intent(in) :: value
            this%value = value
      end subroutine set

      subroutine get(this, value)
           class(container<T>), intent(in) :: this
           type(T), intent(out) :: value
      end subroutine get
end interface

Something like that ought to work for most intrinsics and simple derived types, but what if the type has a len type prarameter? Consider how we want the template to be instantiated when T is character. The component in the type should be declared as character(:), allocatable :: value, and the value dummy in set should be character(*), intent(in) :: value, and the value dummy in get should be character(:), allocatable :: value

How is the compiler to "know" that we meant for T to be declared differently in the case of character (or indeed any type with a len type parameter? It is not impossible to do, but its at best an awkward aspect of all this.

A similar concern is whether or not a generic type parameter can include rank. I.e, could I declare T in the above example to be a rank 3 integer array? Or only scalars? A rank 3 integer array is a different type than a scalar integer in most other languages. We can solve this one by allowing an additional integer template parameter to specify the length -- hopefully allowing a default of 0 so that we don't burden the most common case.

None of these issues is likely to undermine an otherwise solid solution for generics. But I do want to watch for solutions which allow such aspects to be handled as seamlessly as possible.

@tclune
Copy link
Member

tclune commented Oct 28, 2020

I now realize that I've really not even mentioned the primary reason that Magne asked me to explore this. Namely we don't want the user to have to "teach" the compiler what operations are supported by the compiler. There is some concern that the arithmetic operators on intrinsics are handled at such a low level that it is not trivial for the compilers to determine that a given operation is supported at the more abstract level of a template short of instantiating and compiling. With traits/requirements/constraints we want the compiler to have this higher level understanding of intrinsic operations. Probably will not play a role in how the language specs are written, but rather as a way to convince the compiler developers that this is straightforward.

@wclodius2
Copy link
Contributor Author

wclodius2 commented Oct 28, 2020

@wclodius2 For your final example:

generic module uses_plus(T_generic, operator(+))
    type :: T_generic
    end type
    interface operator(+)
       function plus(a,b) result(c)
          type(T_generic):: c
          type(T_generic), intent(in) :: a
          type(T_generic), intent(in) :: b
       end function plus
    end interface

how would this work with a derived type that has defined is + as a type-bound procedure? There would no operator(+) to pass in at the point of instantiation. (At least presuming the actual procedure was itself private, and thus can only be accessed via the object.)

There are five "forms" of operators that must be dealt with:

  1. Intrinsic

  2. Fortran 90 style non-polymorphic non-type-bound operators such as my example. This is the most similar in semantics to the intrinsics.

  3. Fortran 03 style polymorphic non-type-bound operators.

     generic module uses_plus(T_generic, operator(+))
         type :: T_generic
         end type
         interface operator(+)
            function plus(a,b) result(c)
               type(T_generic):: c
               class(T_generic), intent(in) :: a
               type(T_generic), intent(in) :: b
            end function plus
         end interface
    
  4. Fortran 03 style non-polymorphic type-bound operators.

     generic module uses_plus(T_generic, operator(+))
         type :: T_generic
         contains
             procedure :: plus
             generic :: operator(+) => plus
         end type
         interface
            function plus(a,b) result(c)
               type(T_generic):: c
               type(T_generic), intent(in) :: a
               type(T_generic), intent(in) :: b
            end function plus
         end interface
    
  5. Fortran 03 style polymorphic type-bound operators similar to your example.

     generic module uses_plus(T_generic, operator(+))
         type :: T_generic
         contains
             procedure :: plus
             generic :: operator(+) => plus
         end type
         interface
            function plus(a,b) result(c)
               type(T_generic):: c
               class(T_generic), intent(in) :: a
               type(T_generic), intent(in) :: b
            end function plus
         end interface
    

My understanding of type-bound procedures is that it is not that there is no + operator to pass, but rather that it is always present/public in scopes where its type is present/public, in which case the operator(+) for type bound procedures is a redundancy. Infelicitous, but not a problem.

The distinction between TYPE and CLASS strikes me as harder to deal with. I suspect that the only solution is to have "different" generics for polymorphic vs. non-polymorphic types using something similar in spirit to C++'s pattern matching.

Note edited to allow the viewer to recognize the code fragments as code.

@wclodius2
Copy link
Contributor Author

"How is the compiler to "know" that we meant for T to be declared differently in the case of character (or indeed any type with a len type parameter?"

The only way I can see of doing this is through something similar to C++'s pattern matching.

generic module containers(T, R)
    constrain
        type :: T(len)
            integer, len :: len
        end type T
        integer, parameter :: R
        requires(R==0)
    end constrain
        type :: container
            type(T(:)), allocatable :: object
        contains
            procedure :: set
            procedure :: get
        end type
    contains
        subroutine set(this, value)
            class(container), intent(out) ::this
            type(T(*)), intent(in) :: value
            this%object = value
        end subroutine set

        subroutine get(this, value)
            class(container), intent(in) :: this
            type(T(:)), allocatable, intent(out) :: value
            value = this % object
        end subroutine get

    else constrain
        type :: T
        end type T
        integer, parameter :: R
        requires(R==0)
    end constrain
        type :: container
            type(T), allocatable :: object
        contains
            procedure :: set
            procedure :: get
        end type
    contains
        subroutine set(this, value)
            class(container), intent(out) ::this
            type(T), intent(in) :: value
            this%object = value
        end subroutine set

        subroutine get(this, value)
            class(container), intent(in) :: this
            type(T), intent(out) :: value
            value = this % object
        end subroutine get

    else constrain
        type :: T
        end type T
        integer, parameter :: R
        requires(R>0)
    end constrain
        type :: container
            type(T), allocatable :: object(rank(R))
        contains
            procedure :: set
            procedure :: get
        end type
    contains
        subroutine set(this, value)
            class(container), intent(out) ::this
            type(T), intent(in) :: value(rank(R))
            this%object = value
        end subroutine set

        subroutine get(this, value)
            class(container), intent(in) :: this
            type(T), allocatable, intent(out) :: value(rank(R))
            value = this % object
        end subroutine get
end module containers

Provided the matching can be kept to a single level, the resulting generic language should still be type checkable.

@wclodius2
Copy link
Contributor Author

"There is some concern that the arithmetic operators on intrinsics are handled at such a low level that it is not trivial for the compilers to determine that a given operation is supported at the more abstract level of a template short of instantiating and compiling."

My strong suspicion is that compilers will reduce the template to an intermediate representation, and whether this intermediate representation is before or after "type checking" will have little to do with the properties of the intrinsic types, and more as to what intermediate representations are already used in the compiler and the perceived demands of the user community. The most obvious intermediate representation is the parse tree, and this will have essentially no "type checking", although using the parse tree will reduce the high lexical cost of processing Fortran in instantiating the template. However if preprocessing is used in the template file they may not be able to even reduce it to a parse tree prior to instantiation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants