Instruction pipelining is a method for increasing the throughput of a digital circuit, particularly a CPU, and implements a form of instruction level parallelism. The idea is to divide the logic into stages, and to work on different data within each stage. An often used real-world analogy involves doing the laundry: if you have two loads of laundry to do, you can either wash the first load and then dry the first load, before moving onto the next, or, you can wash the first load, and when you put it in to dry, you can put the next load in to wash. If each step takes 20 minutes, then you will finish in 60 minutes instead of 80.
A pipelined digital circuit works the same way. Data enters the first stage, and takes some time to process. When the data finishes the first stage, the clock ticks, and the intermediate results are latched into registers at the head of the next stage, while the next set of data enters the beginning of the first stage.
Ideally, pipelining increases throughput by an factor equal to the number of stages used. Realistically, the time taken by the extra logic added (in the form of latches or registers) to store the intermediate values results in diminishing returns, and this extra logic also means an increase in size and cost.
Furthermore, in a CPU or other circuit, previous data may have an effect on later data (for instance, if a CPU is processing C = A + B, followed by E = C + D, the value of C must finish being calculated before it can be used in the second instruction). This type of problem is called a data dependency conflict. In order to resolve these conflicts, even more logic must be added to stall or otherwise deal with the incoming data. A significant part of the effort in modern CPU design goes into resolving these sorts of dependencies.
Many modern day processors that utilize pipelining are also superscalar architechtures.