Computer vision is a rapidly growing field which aims to make computers 'see' as effectively as humans. In this book Dr Shapiro presents a new computer vision framework for interpreting time-varying imagery. This is an important task, since movement reveals valuable information about the environment. The fully-automated system operates on long, monocular image sequences containing multiple, independently-moving objects, and demonstrates the practical feasibility of recovering scene structure and motion in a bottom-up fashion. Real and synthetic examples are given throughout, with particular emphasis on image coding applications. Novel theory is derived in the context of the affine camera, a generalisation of the familiar scaled orthographic model. Analysis proceeds by tracking 'corner features' through successive frames and grouping the resulting trajectories into rigid objects using new clustering and outlier rejection techniques. The three-dimensional motion parameters are then computed via 'affine epipolar geometry', and 'affine structure' is used to generate alternative views of the object and fill in partial views. The use of all available features (over multiple frames) and the incorporation of statistical noise properties substantially improves existing algorithms, giving greater reliability and reduced noise sensitivity.