Thursday 27 August 2009

Z Axis For Camera Calibration and getting a 3D coordinate from 2D image

I have recently started working with computer vision problems and one of the first problem i needed to solve was getting the 3D coordinates from a 2D image. I had absolutely no clue to where I should start and I ended up reading lots of theory behind image formation and simple models of camera called pinhole camera model.

In the pinhole camera model, you imagine a box with a small aperture through which light can enter and the image plane is inside the box. This model basically give a transformation equation from a 3D coordinate to 2D image plane coordinates with the focal length as a variable. In a real camera, lenses are used and hence there are more variables internal to the camera like lens distortion coefficients- both tangential and radial. These parameters are called the intrinsic parameters. Once these intrinsic parameters are known, the transformation between a 3D coordinate system and 2D system can be solved. And camera calibration is a method to do just that.

A good calibration software for Matlab is available HERE It also contain links to various other works on camera calibration.

To get the depth of the image, a single camera is not sufficient. 2 cameras can be used for calibration and that procedure is called stereo calibration. There is a library containing different types of algorithm for computer vision problems called OPENCV.

There are mainly two types of calibration process - photogrammetric calibration which is calibrating a camera by viewing an object with known size and location and is the commonly used method for almost all calibration methods. The second method is self-calibration which aims to calibrate a camera by moving it and has not been fully developed yet.

For photogrammetric calibration as the one in the link above, its easy to be confused with how the chessboard(calibration object) coordinates are taken. I thought at first that all the values of 3D coordinate of the object is known but this is not the case. Only the X and Y coordinate values are known as the chessboard has all patterns of same size. The third value ie the z cordinate is not known and is taken as constant. This is logical as the frame of reference is fixed on the board and hence we can alway assume z to be any constant. In the equation
x=MRTw

where x=image coordinate vector, M= camera matrix , R=rotational vectors ,T=translational vectors, w=object vector.

Its easy to wrongly assume that the translation and rotational vectors are somehow used to determine the depth of the object from the camera. The coordinates are taken as fixed on the chessboard. So you have x axis along the width and y axis along height or vice versa and finally the z axis perpendicular to the plane. During the calibration process the z value is always assumed to be constant. This is abit confusing as we are moving the chessboard around for each calibration image.

To understand this, imagine the chessboard as lying flat on a table and assume the coordinates as described above. Now you move the camera to get the same image as the ones taken before. Here, the rotation and translation of the camera is described by the rotation and translational vectors ie how the camera rotate and translate relative to the fixed frame of the chessboard and hence the depth of the camera from the board is not known at all. All we are doing is getting the XY values (since those values are known) so that we can determine the intrinsic parameters. The rotation and translation vectors have the same meaning when doing the calibration by moving the chessboard and keeping camera fixed.