I have a renderer using directx and openGL, and a 3d scene. The viewport and the window are of the same dimensions.

How do I implement picking given mouse coordinates x and y in a platform independent way?


If you can, do the picking on the CPU by calculating a ray from the eye through the mouse pointer and intersect it with your models.

If this isn't an option I would go with some type of ID rendering. Assign each object you want to pick a unique color, render the objects with these colors and finally read out the color from the framebuffer under the mouse pointer.

EDIT: If the question is how to construct the ray from the mouse coordinates you need the following: a projection matrix P and the camera transform C. If the coordinates of the mouse pointer is (x, y) and the size of the viewport is (width, height) one position in clip space along the ray is:

mouse_clip = [
  float(x) * 2 / float(width) - 1,
  1 - float(y) * 2 / float(height),
  0,
  1]

(Notice that I flipped the y-axis since often the origin of the mouse coordinates are in the upper left corner)

The following is also true:

mouse_clip = P * C * mouse_worldspace

Which gives:

mouse_worldspace = inverse(C) * inverse(P) * mouse_clip

We now have:

p = C.position(); //origin of camera in worldspace
n = normalize(mouse_worldspace - p); //unit vector from p through mouse pos in worldspace

Here's the viewing frustum:

viewing frustum

First you need to determine where on the nearplane the mouse click happened:

  1. rescale the window coordinates (0..640,0..480) to [-1,1], with (-1,-1) at the bottom-left corner and (1,1) at the top-right.
  2. 'undo' the projection by multiplying the scaled coordinates by what I call the 'unview' matrix: unview = (P * M).inverse() = M.inverse() * P.inverse(), where M is the ModelView matrix and P is the projection matrix.

Then determine where the camera is in worldspace, and draw a ray starting at the camera and passing through the point you found on the nearplane.

The camera is at M.inverse().col(4), i.e. the final column of the inverse ModelView matrix.

Final pseudocode:

normalised_x = 2 * mouse_x / win_width - 1
normalised_y = 1 - 2 * mouse_y / win_height
// note the y pos is inverted, so +y is at the top of the screen

unviewMat = (projectionMat * modelViewMat).inverse()

near_point = unviewMat * Vec(normalised_x, normalised_y, 0, 1)
camera_pos = ray_origin = modelViewMat.inverse().col(4)
ray_dir = near_point - camera_pos

Well, pretty simple, the theory behind this is always the same

1) Unproject two times your 2D coordinate onto the 3D space. (each API has its own function, but you can implement your own if you want). One at Min Z, one at Max Z.

2) With these two values calculate the vector that goes from Min Z and point to Max Z.

3) With the vector and a point calculate the ray that goes from Min Z to MaxZ

4) Now you have a ray, with this you can do a ray-triangle/ray-plane/ray-something intersection and get your result...


I have little DirectX experience, but I'm sure it's similar to OpenGL. What you want is the gluUnproject call.

Assuming you have a valid Z buffer you can query the contents of the Z buffer at a mouse position with:

// obtain the viewport, modelview matrix and projection matrix
// you may keep the viewport and projection matrices throughout the program if you don't change them
GLint viewport[4];
GLdouble modelview[16];
GLdouble projection[16];
glGetIntegerv(GL_VIEWPORT, viewport);
glGetDoublev(GL_MODELVIEW_MATRIX, modelview);
glGetDoublev(GL_PROJECTION_MATRIX, projection);

// obtain the Z position (not world coordinates but in range 0 - 1)
GLfloat z_cursor;
glReadPixels(x_cursor, y_cursor, 1, 1, GL_DEPTH_COMPONENT, GL_FLOAT, &z_cursor);

// obtain the world coordinates
GLdouble x, y, z;
gluUnProject(x_cursor, y_cursor, z_cursor, modelview, projection, viewport, &x, &y, &z);

if you don't want to use glu you can also implement the gluUnProject you could also implement it yourself, it's functionality is relatively simple and is described at opengl.org