Datatype

Editorial note: programming language also has discussion of type system.

In computer science, a datatype (often simply called type) is a statically assigned constraint on a programming language phrase that denotes the kinds of values it may take on and gives it certain semantic meaning for the purposes of preventing errors, building abstractions, documenting the program, and gaining some measure of runtime safety and efficiency. A type system provides a method for reasoning about program behavior based on type rules, which specify the ways in which typed program phrases can legally interact. The study of type systems is known as type theory. Programming languages which provide type systems are known as typed languages. Although the majority of programming languages are typed, some, known as untyped languages, do not provide types.

Table of contents

1 Basis
2 Compile-time and run-time
3 Categories of types
4 Compatibility, equivalence and substitutability
5 Type checking

5.1 Static type checking
5.2 Dynamic typing
5.3 Strong Typing
5.4 The Controversy between static and dynamic typing

6 See also

Basis

The basic idea of typing is to give mere bits semantic meaning. Types are usually associated either with values in memory or with objectss such as variabless. Because any value is simply a set of bits for computers, there is no distinction in hardware even among memory addresses, instruction code, characters, integers and floating-point numbers. Numerical and string constants and expressions in code can and often do imply type in a particular context. For example, an expression 3.14 implies its type is floating-point while [1, 2, 3] implies type is a list of integers; typically an array.

In some languages, such as C and Java, some types are associated with the particular implementation. For example, in Java, type int is defined as a 4-byte signed integer. On the other hands, some languages only define the semantic behavior of all types.

The type system allows operations to be done relying on contexts by type. For example, in an arithmetic expression, a + b, if a and b are typed as integer, an underlying operation can be integer addition. If the type is real, floating-point addition is probably done. In generics the type of values determines which code will be executed.

Types make impossible to code some operations which cannot be valid in certain context. This mechanism effectively catches the majority of common mistakes made by programmers. For example, an expression "Hello, Wikipedia" / 3 is invalid because a string literal cannot be divided by an integer in the usual sense.

Using types in languages also improves documentation of code. For example, the declaration of a variable as being of a specific type documents how the variable is used. In fact, many languages allow programmers to define semantic types derived from builtin types; either composed of elements of one or more builtin types, or simply as aliases for names of builtin types.

Datatypes may be of first-class, second-class or third-class value.

Compile-time and run-time

While some languages use types during compile-time and do not have them during run-time, type information can be stored in memory for use during run-time. Many OOP languages keep certain information about type at run-time to make possible dynamic binding. In C++, such information is called RTTI.

Categories of types

Types can be classified with following categories:

primitive types - simplest oldest kind of type. e.g. integer and floating-point number.
composite types - types consisting of basic types.
object types
subtype and derived type

Compatibility, equivalence and substitutability

The question of compatibility and equivalence is a complicated and controversial topic and it is related to the problem of substitutionality: that is, given type A and type B, are they equal types? compatible? can the value with type B be used in the place where the value of A?

If type A is compatible with type B, A is a subtype of B while not always vice versa. The definition is known as Liskov substitution principle.

Type checking

The process of verifying types is called type checking. If it occurs at compile-time, the whole type system is called statically typed. If it occurs at run-time, the type system is called dynamically typed. C, Java, ML, and Haskell are statically typed while Lisp, Perl, Visual Basic, Ruby, and Python, are dynamically typed. One of the primary tasks of semantic analysis is type checking. In dynamic scope, type checking must be done at run-time because variables can be differently typed according to execution path.

Static type checking

Static type checking system usually assign a single type to each syntactic program entity (e.g., each bound variable name or expression). This is in contrast to dynamically typed systems, which do not require that syntactic entities be consistently typed.

Consider the following pseudocode example:

var x;    // (1)
x = 5;    // (2)
x = "hi"; // (3)

In this example, (1) declares the name x; (2) binds the integer value 5 to the name x; and (3) binds the string value "hi" to the name x. A typical static type discipline would require that the name x be assigned a single type, and hence that all values bound to x be of the same type. In such a system, the above code fragment would be illegal, because (2) and (3) bind x to values of inconsistent type (in most type systems, no value can be both an integer and a string). By contrast, a purely dynamically typed system would permit the above program to execute, because the name x would not be required to have a consistent type.

Many statically typed languages, have a "back door" in the language that enables programmers to write code that does not statically type check. For example, C and Java have "casts".

Many static type systems, such as C's and Java's, require type declarations: the programmer must explicitly associate each variable in a function with a particular type. Others, such as Haskell's, perform type inference: the compiler draws conclusions about the types of variables based on the operations which the function performs upon them. For instance, in a function f(x,y), if at some point in the function the variables x and y are added together, the compiler can infer that they must be numbers -- since addition is only defined over numbers. Therefore, that any call to f elsewhere in the program that gives a string or a list (e.g.) as an argument would be erroneous.

The presence of static typing in a programming language does not necessarily imply the absence of dynamic typing mechanisms. For example, Java is statically typed, but certain operations require the support of runtime type tests, which are a form of dynamic typing. See programming language for more discussion of the interactions between static and dynamic typing.

Widely known programming languages with static typing include the following: ML, C (a procedural programming language), Java.

Dynamic typing

The implementation of a dynamically typed language will catch errors related to the misuse of values---"type errors"---at the time the erroneous statement or expression is computed. In other words, dynamic typing catches errors during program execution. A typical implementation of dynamic typing will keep all program values "tagged" with a type, and checking the type tag before any value is used in an operation.

For example, consider the following pseudocode:

var x = 5;     // (1)
var y = "hi";  // (2)
x + y;         // (3)

In this code fragment, (1) binds the value 5 to x; (2) binds the value "hi" to y; and (3) attempts to add x to y. In a dynamically typed language implementation, the value bound to x might be a pair (integer, 5), and the value bound to y might be a pair (string, "hi"). When the program attempts to execute line (3), the language implementation would check the type tags integer and string, discover that the operation + (addition) is not defined over these two types, and signal an error. However, if this is a weakly-typed language, such as Visual Basic, the code would run properly, yielding the result "5hi". There are problems to weakly typed languages, though. For example, would the result of the following code be 9 or "54"?

var x = 5;
var y = "4";
x + y

Many say that weak typing gets programmers into bad habits because it doesn't teach them to use explicit type conversion.

Dynamic typing is often associated with so-called "scripting languages" and other rapid application development environments.

Well-known dynamically typed languages, in each of the major language paradigms, include the following: Lisp and its dialects, Perl, Smalltalk, Ruby, Python, Visual Basic

Strong Typing

A strongly typed language does not allow an operation to succeed on arguments which are of the wrong type. An example of the absence of strong typing is a C cast gone wrong; if you cast a value in C, not only is the compiler required to allow the code, but the runtime is expected to allow it as well. This allows C code to be compact and fast, but it can make debugging more difficult.

Sometimes the term safe language is used more generally for languages that do not allow nonsense to occur. For example, a safe language will also check array bounds.

The Controversy between static and dynamic typing

The choice between static and dynamic typing requires some trade-offs.

Static typing usually results in compiled code that executes more quickly. When the compiler knows the exact data types that are in use, it can produce machine code that just does the right thing. Further, compilers in statically typed languages can find short cuts more easily. See optimization.

Static typing provides documentation for a program, by way of the types.

Static typing finds type errors reliably and at compile time, which should increase the reliability of the program. The value of this is unclear, and depends on how frequently type errors make it through a normal software development process compared to other sorts of errors. Static typing advocates feel like programs are more reliable when they have been type checked, while dynamic typing advocates point to distributed code that has proven reliable and to bug databases.

Dynamic typing allows constructs that would be illegal in some static type systems. For example, eval functions that execute arbitrary data as code is possible. Furthermore, dynamic typing accommodates transitional code, such as allowing a string to be used in place of a data structure.

Dynamic typing allows debuggers to be more functional; in particular, the debugger can modify the code arbitrarily and let the program continue to run. Programmers in dynamic languages often "program in the debugger" and thus have a shorter edit-compile-test-debug cycle.

Dynamic typing allows compilers to run more quickly, since there is less checking to perform and less code to revisit when something changes. This as well shrinks the edit-compile-test-debug cycle.