Identifier naming convention

In computer programming, an identifier naming convention is a standardized method by which to name identifiers (variables, functions, procedures, and any other items that might need a name). There are several major problems the various naming conventions are intended to solve, and several solutions to each of them. At times the choice of naming convention can become an enormously controversial issue, with partisans of each holding theirs to be the best and others to be much inferior.

Multiple-Word Identifiers

As most programming languages do not allow spaces in identifiers, some system must be devised when a programmer wishes to use a name containing multiple words. There are several in widespread use; each has a significant following, though sometimes one dominates amongst users of a particular programming language. There are also some programmers who eschew multiple-word names entirely, and so use none of these systems (see the section below on the amount of information in identifiers).

One approach is to replace spaces with another character. The two characters commonly used for this purpose are the hyphen ('-') and the underscore ('_'), so the two-word name two words would be represented as two-words or two_words. The hyphen is arguably the easier to type and more readable of these, and is used by nearly all programmers of Lisp, Scheme, and other languages that permit hyphens in identifiers. However, many other languages reserve the hyphen for use as the subtraction operator, and so do not permit it in identifiers. Thus some programmers of these languages use underscores instead. However, underscores are somewhat harder to type due to their location on most keyboards, and so this solution has not been universally adopted; it is, however, in fairly widespread use among programmers of C, Perl, and many scripting languages.

An alternate approach, developed mostly as an alternative to the underscore in languages that do not permit hyphens, is to omit the space and indicate word boundaries using capitalization, thus rendering two words as either twoWords or TwoWords. This is called CamelCase, among other names.

Information in Identifiers

There is significant disagreement over how much information to put in identifiers. This was driven initially by technical reasons, as some early programming languages only allowed identifiers of a certain length. Thus in the standard C library (C was initially one of those languages), one finds atoi as the name of a function that converts ASCII strings to integers. In Lisp, one would be more likely to encounter the same function named as ascii-to-integer or similar. However, the use of shorter identifiers has outlived those technical restrictions, partly as heritage (it continues more commonly in those languages that once had the restrictions), and partly out of ease of use -- it's simply easier to type shorter identifiers, especially when the identifier is used frequently. Those who prefer the longer identifiers argue that the difficulty of typing the longer identifiers is outweighed by the ease of reading code that is more descriptive rather than peppered with impenetrable acronyms and abbreviations.

In addition to the issue of length of identifiers in their descriptive capacity, there are also several systems for codifying specific technical aspects of a particular identifier in the name. Perhaps the most well-known is Hungarian notation, which encodes the type of a variable in its name. Several more minor conventions are widespread; one example is the convention of naming variables in C and C++ with an initial lowercase letter, and naming user-defined datatypes with an initial capital letter.

Multiple-Word Identifiers

Information in Identifiers

See also