SK

An Introduction to Epipolar Geometry

Date written Aug 8, 2017
Date updated Last updated: Jul 10, 2020
Filed under Computer vision in math

Contents

In this post we will take a look at how Camera Projections work. A demo at the end will illustrate the important segments of the theory. Prerequisites to understand the material are available in the Readings & References section below.

Epipolar Geometry is the intrinsic projective geometry a between two views. This knowledge becomes an interesting piece in the puzzle of estimating the 3D geometry of a given image projection and the estimated 3D model can then be applied to a myriad of meaningful real-world problems.

To get there, we will first establish a theoretical framework on how projections are formed and realize some interesting properties.

The Pinhole Camera Model

We will start with a very simple camera where the centre of projection is at the origin of the Euclidean Coordinate System. Let Z=fZ = f be the plane of projection, more simply the plane where the points in the 3D world space are projected. It is also called the image plane or the focal plane.

CC is the camera centre and pp is the principal point. By virtue of similar triangles, we can establish a relation between a point in 3D space at (X,Y,Z)(X,Y,Z) to a point on the focal plane at (x,y)(x^\prime,y^\prime), a distance of focal length ff. This has been visualized in the figure above on the right.

fZ=yYy=fYZ\frac{f}{Z} = \frac{y}{Y} \leftrightarrow y = \frac{fY}{Z}

A similar analysis (think of viewing the zxzx plane down the yy axis) gives us

fZ=xXx=fXZ\frac{f}{Z} = \frac{x}{X} \leftrightarrow x = \frac{fX}{Z}

and we equivalently define a mapping (by dropping the zz coordinate)

(X,Y,Z)(x,y)(fX/Z,fY/Z)(X,Y,Z) \mapsto (x,y) \leftrightarrow ({fX}/{Z},{fY}/{Z})

More formally, this can be defined in terms of homogeneous coordinates and a matrix transform as

(XYZ1)(fXfYZ)=[f0000f000010](XYZ1)\begin{pmatrix}X \\ Y \\ Z \\ 1 \end{pmatrix} \mapsto \begin{pmatrix}fX \\ fY \\ Z \end{pmatrix} = \begin{bmatrix}f & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix} \begin{pmatrix}X \\ Y \\ Z \\ 1 \end{pmatrix}x=PXx = PX

This 3×43 \times 4 matrix PP is known as the Camera Projection Matrix. To adjust for the reference frame in the image plane - origin at corner v/s origin at the center, we add a translation component to the left 3×33 \times 3 matrix of PP.

K=[f0px0fpy001]K = \begin{bmatrix}f & 0 & p_x \\ 0 & f & p_y \\ 0 & 0 & 1 \end{bmatrix}

and this is called the Camera Calibration Matrix. For added generality to allow for non-square pixels we multiple the matrix with an extra factor of diag(mx,my,1)diag(m_x,m_y,1) where mm is the pixels per unit distance. There's another possibility of distortion due to the xx and yy axes not being perpendicular and this is denoted by ss. The final matrix then becomes -

K=[αxsx00αyy0001]K = \begin{bmatrix}\alpha_x & s & x_0 \\ 0 & \alpha_y & y_0 \\ 0 & 0 & 1 \end{bmatrix}

In most cases, the camera coordinates will be with respect to a different coordinate system where it is not at the origin in the world coordinate system. And as we had seen in an earlier discussion on projective geometry, Euclidean transformations consist of a rotation and a translation component. Hence, any coordinate can be shifted from the world coordinate space to the camera coordinate space (where the camera is the origin) with the equation -

Xcamera=R(XworldC)X_\text{camera} = R(X_\text{world} - C)

where RR is the rotation matrix and C-C represents a translation with respect to the camera center to make it the origin. Combining the camera calibration matrix and the coordinate frame shifting above, we can draw a mapping between the point in world coordinate system XX and the point xx in the image plane, represented by

x=KR[IC]Xx = KR \begin{bmatrix}I \vert -C\end{bmatrix}XP=KR[IC]P = KR \begin{bmatrix}I \vert -C\end{bmatrix}

Parameters in KK are called the camera intrinsics (they never change) and the remaining parameters RR and CC are called the camera extrinsics. In literature, these terms will come often and now we know how these are constructed.

Epipolar Geometry

Now that we have established all the camera parameters, let us consider two camera centers CC and CC^\prime with both image planes shown in the figure below. These cameras project a point XX in the world coordinate system to xx and xx^\prime in respective image planes. The plane formed by all these points is known as the epipolar plane.

The line joining the two camera centers is known as the baseline and intersects both planes at ee and ee^\prime. These points are known as the epipoles, visualized on the right. We are aiming to establish and constraint between xx and xx^\prime. If one observes the figure on the right above, the baseline (CCCC^\prime) and the ray back-projected from xx (CXCX) form the epipolar plane π\pi. When π\pi intersects with the image plane on the right, it forms the line ll^\prime which is known as the epipolar line corresponding to xx. A search for the point correspondence of xx denoted by xx^\prime is now constrained to this epipolar line which comes as a direct consequence of the way we constructed π\pi.

If we rotate the epipolar plane with baseline as the axis, we get a pencil of planes and all these planes will still intersect at the epipoles ee and ee^\prime. This is a very interesting result!

The Fundamental Matrix

We've established some basics for epipolar geometry and realized how images are projected by a given camera matrix PP. Now we will give a more formal treatment to the problem both algebraically and geometrically and arrive at a very important matrix known as the Fundamental Matrix (literally!).

Construction via the Epipolar Plane

In a previous discussion (see Readings), we had seen Projectivities which are mappings from point in one plane to another. By the above construction of the epipolar plane, we can assert that a similar matrix exists, known as the Planar Homography HπH_\pi.

x=Hπxx^\prime = H_\pi x

Construction via Camera Matrices

In another interesting result, Fundamental Matrix does not need the epipolar plane to be necessarily defined. Let us see how it directly relates to the camera matrices we described above.

If one remembers the alternate definition of epiline corresponding to point xx, it was the image of the ray back-projected to the world coordinate XX in the second camera view. The solution to the equation

x=PXx = PX

is given by a family of solutions in the form of

X(λ)=P+x+λCX(\lambda) = P^+x + \lambda C

To find the epipolar line ll^\prime for point xx, we know from the above discussion that both the epipole ee^\prime and the point correspondence xx^\prime lie on the epiline. The intersection of these (from our projective geometry primer) gives us the line,

l=e×x=[e]×xl^\prime = e^\prime \times x^\prime = [e^\prime]_\times xl=[e]×Hπx=Fxl^\prime = [e^\prime]_\times H_\pi x = Fx

where F=[e]×HπF = [e^\prime]_\times H_\pi is known as the Fundamental Matrix, a matrix of rank 2. In plain words, FF represents a mapping of the projective plane P2\mathbb{P}^2 of the first image to a pencil of epipolar lines P1\mathbb{P}^1 through ee^\prime.

Note that the notation [e]×[e]_\times is a skew-symmetric matrix equivalent for the cross product.

[e]×=[0e3e2e30e1e2e10][e]_\times = \begin{bmatrix} 0 & -e_3 & e_2 \\ e_3 & 0 & -e_1 \\ -e_2 & e_1 & 0 \end{bmatrix}

where P+P^+ is the pseudo-inverse of PP, giving PP+=IPP^+ = I and CC is the right-null space of PP commonly called the camera center. This has its own interesting little derivation and out of scope here.

We have two known points CC and P+xP^+x on the ray which can be projected onto the second image plane via a second camera matrix PP^\prime and the epiline is the cross-product of these two projected points.

l=(PC)×(PP+x)l^\prime = (P^\prime C) \times (P^\prime P^+ x)

We have just arrived at an alternate representation of l=e×xl^\prime = e^\prime \times x^\prime and hence we realize that epipole ee^\prime is actually the image of the other camera center. Again we see that

l=[e]×(PP+x)=Fxl^\prime = [e^\prime]_\times (P^\prime P^+ x) = Fx

where F=[e]×PP+F = [e^\prime]_\times P^\prime P^+ and Hπ=PP+H_\pi = P^\prime P^+. Wow! We've discovered quite a few important relations between Homography, Camera Matrices and the Fundamental Matrix.

Properties of the Fundamental Matrix

After all the details above, it is easy to arrive and prove at the following result

xTFx=0x^{\prime T}Fx = 0

where xxx \leftrightarrow x^\prime are point-to-point correspondences in the two image planes. Solution to the above equation is a fairly popular research problem. For all practical purposes, the above problem generally remains over-determined with an exact solution determined by 3 non-collinear point correspondences. Hence, techniques like the family of RANSAC are used to non-deterministically determine the best fit model via a relevant cost function like the re-projection error.

Demo Illustration

Setup

Each execution requires a stereo image pair (for this purpose I shot two-view images around my house) which represent the two image planes discussed above. To arrive at xxx\leftrightarrow x^\prime point correspondences, I use SURF to detect keypoints in both images, match them via FLANN and apply Lowe's ratio test to filter out matches that are too far away and might be false positives. If all above sounds alien, consider it as a black box as it is out of scope for this discussion.

Then we estimate the Fundamental Matrix via RANSAC b. One we get that we draw the epilines on both images via l=FTxl = F^T x^\prime for the left image plane and l=Fxl^\prime = Fx for the right image plane. These line equations are a direct consequence of the result xTFx=0x^{\prime T}Fx = 0.

Code

Full code is available upon request but for the purpose of this discussion, the one below should be easy to follow.

##
# Load the stereo image pair
#
image_l = cv2.pyrDown(cv2.imread('images/view_left.jpg', 0))
image_r = cv2.pyrDown(cv2.imread('images/view_right.jpg', 0))
##
# Consider this matcher a black box for the purpose of this discussion
#
matcher = SURFKeyPointMatcher()
kp_l, des_l, kp_r, des_r = matcher.detect_and_compute(image_l, image_r)
pts_l, pts_r, good_matches = matcher.find_good_matches(kp_l, des_l, kp_r, des_r)
##
# Estimate the Fundamental Matrix via RANSAC
#
pts_l = np.int32(pts_l)
pts_r = np.int32(pts_r)
F, mask = cv2.findFundamentalMat(pts_l, pts_r, cv2.FM_RANSAC)
##
# Use only the points which RANSAC determined to be model inliers
#
pts_l = pts_l[mask.ravel() == 1]
pts_r = pts_r[mask.ravel() == 1]
##
# Compute epilines. Note that right image points are used for left view
# and left image points are used for right view
#
epilines_l = cv2.computeCorrespondEpilines(pts_r, 2, F).reshape(-1, 3)
epilines_r = cv2.computeCorrespondEpilines(pts_l, 1, F).reshape(-1, 3)
image_epilines_l = draw_epilines(image_l, epilines_l)
image_epilines_r = draw_epilines(image_r, epilines_r)
cv2.imwrite('epi_left.jpg', image_epilines_l)
cv2.imwrite('epi_left.jpg', image_epilines_r)

Results

Take a look at the following resulting images from the above code which show a general sense of direction of both the camera centers. The epipoles (the point of intersection of the epilines) lie outside the visible image planes.

It goes without saying that the epilines constructed are only as good as the xxx \leftrightarrow x^\prime point correspondences. The robustness of the point correspondences is dependent on how well our keypoint detection algorithm works.

References

  1. Hartley, R. & Zisserman, A., 2003. Multiple view geometry in computer vision, Cambridge university press.

Footnotes