Natural language generation

Natural Language Generation (NLG) is the natural language processing task of generating natural language from a machine representation system such as a knowledge base or a logical form.

Some people view NLG as the opposite of natural language understanding. The difference can be put this way: whereas in natural language understanding the system needs to disambiguate the input sentence to produce the machine representation language, in NLG the system needs to take decisions about how to put a concept into words.

Stages

The process to generate text can be as simple as keeping a list of canned text that is copied and pasted, possibly linked with some glue text. The results may be satisfactory in simple domains such as horoscope machines or generators of personalised business letters. However, a sophisticated NLG system needs to include stages of planning and merging of information to enable the generation of text that looks natural and does not become repetitive. Typical stages are:

;Content determination: Determination of the salient features that are worth being said. Methods used in this stage are related to data mining.

;Discourse planning: Overall organisation of the information to convey.

;Sentence aggregation: Merging of similar sentences to improve readability and naturalness. For example, the sentences "The next train is the Caledonian Express" and "The next train leaves Aberdeen at 10am" can be aggregated to form "The next train, which leaves at 10am, is the Caledonian express".

;Lexicalisation: Putting words to the concepts.

;Referring expression generation: Linking words in the sentences by introducing pronouns and other types of means of reference.

;Syntactic and morphological realisation: This stage is the inverse of parsing: given all the information collected above, syntactic and morphological rules are applied to produce the surface string.

;Orthographic realisation: Matters like casing, punctuation, and formatting are resolved.