In mathematics, logic and computer science, a

**formal language**is a set of finite-length words (or "strings") over some finite alphabet. Note that we can talk about

*formal language*in many contexts (scientific, legal and so on), meaning a mode of expression more careful and accurate than everyday speech. Use of a particular formal language in the sense intended here is an 'ultimate' version of that usage: formal enough to be used in written form for automatic computation, is a possible criterion.

A typical alphabet would be {a, b}, a typical string over that alphabet would be "ababba", and a typical language over that alphabet containing that string would be the set of all strings which contain the same number of a's as b's.
The empty word is allowed and is usually denoted by *e*, ε or λ. Note that while the alphabet is a finite set and every string has finite length, a language may very well have infinitely many member strings.

Some examples of formal languages:

- the set of all words over {a, b},
- the set { a
^{n}|*n*is a prime number }, - the set of syntactically correct programs in some programming language, or
- the set of inputs upon which a certain Turing machine halts.

- Strings produced by some formal grammar (see Chomsky hierarchy)
- Strings produced by a regular expression
- Strings accepted by some automaton, such as a Turing machine or finite state automaton
- From a set of related YES/NO questions those ones for which the answer is YES, see decision problem

*L*

_{1}and

*L*

_{2}are languages over some common alphabet.

- The
*concatenation**L*_{1}*L*_{2}consists of all strings of the form*vw*where*v*is a string from*L*_{1}and*w*is a string from*L*_{2}. - The
*intersection*of*L*_{1}and*L*_{2}consists of all strings which are contained in*L*_{1}and also in*L*_{2}. - The
*union*of*L*_{1}and*L*_{2}consists of all strings which are contained in*L*_{1}or in*L*_{2}. - The
*complement*of the language*L*_{1}consists of all strings over the alphabet which are not contained in*L*_{1}. - The
*right quotient**L*_{1}/*L*_{2}of*L*_{1}by*L*_{2}consists of all strings*v*for which there exists a string*w*in*L*_{2}such that*vw*is in*L*_{1}. - The
*Kleene star**L*_{1}* consists of all strings which can be written in the form*w*_{1}*w*_{2}...*w*_{n}with strings*w*_{i}in*L*_{1}and*n*≥ 0. Note that this includes the empty string ε because*n*= 0 is allowed. - The
*reverse**L*_{1}^{R}contains the reversed versions of all the strings in*L*_{1}. - The
*shuffle*of*L*_{1}and*L*_{2}consists of all strings which can be written in the form*v*_{1}*w*_{1}*v*_{2}*w*_{2}...*v*_{n}*w*_{n}where*n*≥ 1 and*v*_{1},...,*v*_{n}are strings such that the concatenation*v*_{1}...*v*_{n}is in*L*_{1}and*w*_{1},...,*w*_{n}are strings such that*w*_{1}...*w*_{n}is in*L*_{2}.