Database

A database is an information set with a regular structure that allows automated searches and updates. There are a wide variety of databases, from simple tables stored in a single file to very large databases with many millions of records, stored in rooms full of disk drives.

Databases resembling modern versions were first developed in the 1960s. A pioneer in the field was Charles Bachman.

One way of classifying databases is by the programming model associated with the database. Several models have been in wide use for some time. Historically, the hierarchical model was implemented first, then the network model, then the relational model and flat models reached their zeniths.

Database models

The flat (or table) model consists of a single, two-dimensional array of data elements, where all members of a given column are assumed to be similar values, and all members of a row are assumed to be related to one another. For instance, columns for name and password might be used as a part of a system security database. Each row would have the specific password associated with a specific user. Columns of the table often have a type associated with them, defining them as character data, date or time information, integers, or floating point numbers. This model is the basis of the spreadsheet.

The network model allows multiple tables to be used together though the use of pointers (or references). Some columns contain pointers to different tables instead of data. Thus, the tables are related by references, which can be viewed as a network structure. A particular subset of the network model, the hierarchical model, limits the relationships to a tree structure, instead of the more general directed graph structure implied by the full network model.

Relational databases also consist of multiple database tables. Unlike the hierarchical and network models, there are no explicit pointers; in theory, columns of any type may be used to create an ad-hoc relationship between two or more tables. Relational databases allow users (or, more often, programmers) to write queries that were not anticipated by the database designer. As a result, relational databases can be used by multiple applications in ways the original designers did not foresee, which is especially important for databases that might be used for decades. This has made relational databases very popular with businesses.

The relational model is a mathematical theory developed by Ted Codd to describe how relational databases should work. Although this theory is the basis for relational database software, very few databases actually follow the model very closely and almost all have features contradicting the theory.

Implementations and indexing

All of these kinds of database can take advantage of indexing to increase their speed. An index is a sorted list of the contents of some particular table column, with pointers to the row associated with the value. An index allows a set of table rows matching some criterion to be located quickly. Various methods of indexing are commonly used, including b-trees, hashes, and linked lists are all common indexing techniques.

A relational DBMS has the advantage that indexes can be created or dropped without changing existing applications, because applications don't use the indexes directly. Instead, the database software decides on behalf of the application which indexes to use. The database chooses between many different strategies based on which one it estimates will run the fastest.

Mapping objects into databases

In recent years, the object-oriented paradigm has been applied to databases as well, creating a new programming model known as object databases. These databases attempt to overcome some of the difficulties of using objects with the relational model. An object-oriented program allows objects of the same type to have different implementations and behave differently, so long as they have the same interface (polymorphism). This doesn't fit well with a relational database where all rows in a table have exactly the same columns and the columns are directly accessible.

A variety of ways have been tried for storing objects in a database, but there is little consensus on how this should be done. Some ways of implementing object databases appear to undo the benefits of relational model by introducing pointers and making ad-hoc queries more difficult. As a result, object databases tend to be used for specialized applications and general-purpose object databases have not been very popular commercially. Instead, objects are often stored in relational databases using complicated mapping software. At the same time, relational database software vendors have added features to allow objects to be stored more conveniently, drifting even further away from the relational model.

Applications of databases

Databases are used in many applications, spanning virtually the entire range of computer software. Databases are the preferred method of storage for large multiuser applications, where coordination between many users is needed. Even individual users find them convenient, though, and many electronic mail programs and personal organizers are based on standard database technology.

Database application

A database application is a type of computer application dedicated to managing a database. Database applications span a huge variety of needs and purposes, from small user-oriented tools such as an address book, to huge enterprise-wide systems for tasks like accounting.

The term "database application" usually refers to software providing a user interface to a database. The software that actually manages the data is usually called a database management system (DBMS) or (if it is embedded) a database engine.

Examples of database applications include Microsoft Access, dBASE, FileMaker and (to some degree) HyperCard.

Transactions and concurrency

In addition to their data model, most practical databases attempt to enforce a database transaction model that has desirable data integrity properties. Ideally, the database software should enforce the ACID rules:

Atomicity - either all or no operations are completed. (Transactions that can't be finished must be completely undone.)
Consistency - all transactions must leave the database in consistent state.
Isolation - transactions can't interfere with each other's work and incomplete work isn't visible to other transactions.
Durability - successful transactions must persist through crashes.

In practice, many DBMS's allow some of these rules to be relaxed for better performance.

Concurrency control is a method used to ensure transactions are executed in a safe manner and follows the ACID rules. The DBMS must be able to ensure only serializable, recoverable schedules are allowed, and that no actions of committed transactions are lost while undoing aborted transactions.