User:Ruud Koot/Computer science/Strong and weak typing

Definitions[edit]

Some definitions have been given/alluded to in the literature:

how hard it is to intentionally or accidentally run into type unsoundness
- not a binary distinction: hardly any programming language has a type system that is completely type safe (not even ML, Haskell; exceptions would be Coq, STLC)
- whether abstraction barriers may be (easily/accidentally) broken (Liskov 1973, 1974)
whether running into a type unsoundness results in an exception or in undefined behaviour
- ML has type erasure but can still guarantee an exception will be raised
- Java needs to do some dynamic checking to raise exceptions
- C causes undefined behaviour
how hard it is to accidentally write a type correct program that does not have the type you expected
- implicit coercions (Demers 1977) and ad hoc polymorphism are a great source of these bugs
synonymous with static and dynamic typing (possibly Steelman 1978)

Literature review[edit]

1970s[edit]

Strongly typed[edit]

Barbara Liskov and Robert Freiburghouse (1973). "Report of session on structured programming". Proceeding of ACM SIGPLAN - SIGOPS interface meeting on Programming languages - operating systems.
Declarations cause objects to come into existence; i.e. storage is allocated to hold the object, and this storage continues to exist as long as the name of the object is accessible. The operations on the objects are performed via calls on the functions in the level of abstraction supporting the object type, and the language must be strongly typed so that it is not possible to make use of the object in any other way.
Barbara Liskov and Stephen Zilles (1974). "Programming with abstract data types". Proceedings of the ACM SIGPLAN symposium on Very high level languages.
The language is strongly typed; thus there are only three ways in which an abstract object can be used:
1. An abstract object may be operated upon by the operations which define its abstract type.
2. An abstract object may be passed as a parameter to a procedure. In this case, the type of the actual argument passed by the calling procedure must be identical to the type of the corresponding formal parameter in the called procedure.
3. An abstract object may be assigned to a variable, but only if the variable is declared to hold objects of that type.
J McHugh, V Basili (1974). "SIMPL-R and its Application to Large Sparse Matrix Problems". Technical Report.
SIMPL-R is a strongly typed, procedure-oriented language for writing structured computer programs.
"Steelman" (1978)
3A. Strong Typing. The language shall be strongly typed. The type of each variable, array, record, expression, function and parameter shall be determinable during translation.

Weakly typed[edit]

(1974) REAL-TIME Ada SOFTWARE EXPERIMENTS
No access to full text... Listed in IEEE Xplore as 1990.
Alan J. Demers, James E. Donahue, Ray T. Teitelbaum and John H. Williams (1977). "Encapsulated data types and generic procedures". Design and Implementation of Programming Languages. Lecture Notes in Computer Science 54.
The second most confusing aspect of type, after deciding what type should mean, is to decide what "type-checking" should mean. Most discussions of type-checking [Ledgard 1972, Gannon and Horning 1975, Wirth 1975] view type-checking as a purely syntactic operation: we record the type of each identifier, and if there is a conflict between the type required in some context and the type supplied, the program is taken to be erroneous. Thus a language like PL/I with its automatic conversions is said to defeat the purposes of type-checking as an error detection facility.

The difficulty with this simple approach to type-checking is that type identifiers have a semantic, as well as a syntactic interpretation (the inclusion of type definitions makes this position inescapable). Thus without a clear semantic interpretation of type-checking and type equivalence, we have no assurance that whatever approach we take does not introduce errors, rather than catch them.

Our approach to type-checking and type equivalence is based on the following definitions. We will say that a language is statically typed iff the meanings associated with all occurrences of the basic operators (i.e. the operators for which meaning is given by type definitions) is statically determined. Otherwise, we will say the language is dynamically typed. There are two basic ways of providing static typing. One is the approach taken in BCPL [Richards 1969] or BLISS [Wulf et al. 1971], where each basic operator has only a single meaning (x := Y means "move k bits from x to y"). The other approach is that common to Algol-like languages. If all identifiers are typed and the meaning of all types is stati cally determined, then it is clear that we can determine the meaning of all panmorphic operators strictly from their appearance in the program text.

The difficulty with static typing is in the semantics of parametric constructs, e.g., procedures. If a language is statically typed by requiring the typing of identifiers, then the meanings of the basic operators applied to the parameters will be determined from their type. Now, if we are to protect against the production of spurious results, we must ensure that applications of the parameterized construct to arguments produce results only when the meanings of the basic operators on the arguments are equivalent to their meanings as derived from the parameters specifications. We say that a language is strongly typed iff the application of a parametric construct to a set of arguments produces the semantically distinguished result "type error" if there is such a conflict in meaning; otherwise, it is said to be weakly typed. PL/I external procedures are an example of a language construct that is statically, but weakly typed, since there is no check that the argument and parameter types agree. A stronger notion of typing, which we will call protective typing, is also common to Algol-like languages (in particular, to PASCAL). We will say a language is protectively typed iff all programs containing an application of a parametric construct producing "type error" have meaning "type error".

Thus,
p : procedure (integer x,y);
x := y

end;

var x,y : real;

if false then p(x,y) fi

has the semantic meaning "type error" in the case of protective, but not strong, typing. Note that with strong typing the production of "type error" is a property of parametric constructs, with protective typing a property of the programs which use them.

Thus, to guarantee at least strong typing, we must guarantee the type equivalence (in our semantic sense) of parameters and arguments. Clearly, since we are defining type and equivalence in purely semantic terms, the equivalence of two types will in general be undecidable. However, we can be more restrictive and give simple, sufficient conditions for type equivalence that will ensure a strongly typed language.
Edmond Schonberg, Jacob T. Schwartz and Micha Sharir (1979). "Automatic data structure selection in SETL". Proceedings of the 6th ACM SIGACT-SIGPLAN symposium on Principles of programming languages (POPL '79).
Note that we generate representations for vairable occurrences rather than for the variables themselves; this is because SETL is only weakly types, so that each variable may assume more than one data-type in the course of execution.
B. Ryder (1979). " A Survey of Interprocedural Data Flow Analysis Techniques". Technical report.
Recently, the lattice model technique has been extended to ascertain data types in weakly typed languages and to perform interprocedural data flow analysis (Jones/Muchniok, 1975) (Kaplan, 1973) {Tenenhaum, 197“) (Miller, 1979) (Schwartz, 1975) (Sharir/Pnueli, 1973). ...

1980s[edit]

Maarten Fokkinga. On the notion of strong typing
Already notes that "strong typing" is not formally defined!

2010s[edit]

Alex Wright (2010). "Type theory comes of age". Communications of the ACM 53 (2).
Languages without a sound type system are sometimes called unsafe or weakly typed languages. Perhaps the best-known example of a weakly typed system is C. While C does provide types, its type checking system has been intentionally compromised to provide direct access to low-level machine operations using arbitrary pointer arithmetic, casting, and explicit allocation and deallocation. However, these maneuvers are fraught with risk, sometimes resulting in programs riddled with bugs like buffer overflows and dangling pointers that can cause security vulnerabilities.

By contrast, languages like Java, C# , Ruby, Javascript, Python, ML, and Haskell are strongly typed (or “type safe”). Their sound type systems catch any type system violations as early as possible, freeing the programmer to focus debugging efforts solely on valid program operations.