Three.js Projector and Ray objects

Solution 1:

I found that I needed to go a bit deeper under the surface to work outside of the scope of the sample code (such as having a canvas that does not fill the screen or having additional effects). I wrote a blog post about it here. This is a shortened version, but should cover pretty much everything I found.

How to do it

The following code (similar to that already provided by @mrdoob) will change the color of a cube when clicked:

    var mouse3D = new THREE.Vector3( ( event.clientX / window.innerWidth ) * 2 - 1,   //x
                                    -( event.clientY / window.innerHeight ) * 2 + 1,  //y
                                    0.5 );                                            //z
    projector.unprojectVector( mouse3D, camera );   
    mouse3D.sub( camera.position );                
    mouse3D.normalize();
    var raycaster = new THREE.Raycaster( camera.position, mouse3D );
    var intersects = raycaster.intersectObjects( objects );
    // Change color if hit block
    if ( intersects.length > 0 ) {
        intersects[ 0 ].object.material.color.setHex( Math.random() * 0xffffff );
    }

With the more recent three.js releases (around r55 and later), you can use pickingRay which simplifies things even further so that this becomes:

    var mouse3D = new THREE.Vector3( ( event.clientX / window.innerWidth ) * 2 - 1,   //x
                                    -( event.clientY / window.innerHeight ) * 2 + 1,  //y
                                    0.5 );                                            //z
    var raycaster = projector.pickingRay( mouse3D.clone(), camera );
    var intersects = raycaster.intersectObjects( objects );
    // Change color if hit block
    if ( intersects.length > 0 ) {
        intersects[ 0 ].object.material.color.setHex( Math.random() * 0xffffff );
    }

Let's stick with the old approach as it gives more insight into what is happening under the hood. You can see this working here, simply click on the cube to change its colour.

What's happening?

    var mouse3D = new THREE.Vector3( ( event.clientX / window.innerWidth ) * 2 - 1,   //x
                                    -( event.clientY / window.innerHeight ) * 2 + 1,  //y
                                    0.5 );                                            //z

event.clientX is the x coordinate of the click position. Dividing by window.innerWidth gives the position of the click in proportion of the full window width. Basically, this is translating from screen coordinates that start at (0,0) at the top left through to (window.innerWidth,window.innerHeight) at the bottom right, to the cartesian coordinates with center (0,0) and ranging from (-1,-1) to (1,1) as shown below:

translation from web page coordinates

Note that z has a value of 0.5. I won't go into too much detail about the z value at this point except to say that this is the depth of the point away from the camera that we are projecting into 3D space along the z axis. More on this later.

    projector.unprojectVector( mouse3D, camera );

If you look at the three.js code you will see that this is really an inversion of the projection matrix from the 3D world to the camera. Bear in mind that in order to get from 3D world coordinates to a projection on the screen, the 3D world needs to be projected onto the 2D surface of the camera (which is what you see on your screen). We are basically doing the inverse.

Note that mouse3D will now contain this unprojected value. This is the position of a point in 3D space along the ray/trajectory that we are interested in. The exact point depends on the z value (we will see this later).

At this point, it may be useful to have a look at the following image:

Camera, unprojected value and ray

The point that we have just calculated (mouse3D) is shown by the green dot. Note that the size of the dots are purely illustrative, they have no bearing on the size of the camera or mouse3D point. We are more interested in the coordinates at the center of the dots.

Now, we don't just want a single point in 3D space, but instead we want a ray/trajectory (shown by the black dots) so that we can determine whether an object is positioned along this ray/trajectory. Note that the points shown along the ray are just arbitrary points, the ray is a direction from the camera, not a set of points.

Fortunately, because we a have a point along the ray and we know that the trajectory must pass from the camera to this point, we can determine the direction of the ray. Therefore, the next step is to subtract the camera position from the mouse3D position, this will give a directional vector rather than just a single point:

    mouse3D.sub( camera.position );                
    mouse3D.normalize();

We now have a direction from the camera to this point in 3D space (mouse3D now contains this direction). This is then turned into a unit vector by normalizing it.

The next step is to create a ray (Raycaster) starting from the camera position and using the direction (mouse3D) to cast the ray:

    var raycaster = new THREE.Raycaster( camera.position, mouse3D );

The rest of the code determines whether the objects in 3D space are intersected by the ray or not. Happily it is all taken care of us behind the scenes using intersectsObjects.

The Demo

OK, so let's look at a demo from my site here that shows these rays being cast in 3D space. When you click anywhere, the camera rotates around the object to show you how the ray is cast. Note that when the camera returns to its original position, you only see a single dot. This is because all the other dots are along the line of the projection and therefore blocked from view by the front dot. This is similar to when you look down the line of an arrow pointing directly away from you - all that you see is the base. Of course, the same applies when looking down the line of an arrow that is travelling directly towards you (you only see the head), which is generally a bad situation to be in.

The z coordinate

Let's take another look at that z coordinate. Refer to this demo as you read through this section and experiment with different values for z.

OK, lets take another look at this function:

    var mouse3D = new THREE.Vector3( ( event.clientX / window.innerWidth ) * 2 - 1,   //x
                                    -( event.clientY / window.innerHeight ) * 2 + 1,  //y
                                    0.5 );                                            //z

We chose 0.5 as the value. I mentioned earlier that the z coordinate dictates the depth of the projection into 3D. So, let's have a look at different values for z to see what effect it has. To do this, I have placed a blue dot where the camera is, and a line of green dots from the camera to the unprojected position. Then, after the intersections have been calculated, I move the camera back and to the side to show the ray. Best seen with a few examples.

First, a z value of 0.5:

z value of 0.5

Note the green line of dots from the camera (blue dot) to the unprojected value (the coordinate in 3D space). This is like the barrel of a gun, pointing in the direction that they ray should be cast. The green line essentially represents the direction that is calculated before being normalised.

OK, let's try a value of 0.9:

z value of 0.9

As you can see, the green line has now extended further into 3D space. 0.99 extends even further.

I do not know if there is any importance as to how big the value of z is. It seems that a bigger value would be more precise (like a longer gun barrel), but since we are calculating the direction, even a short distance should be pretty accurate. The examples that I have seen use 0.5, so that is what I will stick with unless told otherwise.

Projection when the canvas is not full screen

Now that we know a bit more about what is going on, we can figure out what the values should be when the canvas does not fill the window and is positioned on the page. Say, for example, that:

the div containing the three.js canvas is offsetX from the left and offsetY from the top of the screen.
the canvas has a width equal to viewWidth and height equal to viewHeight.

The code would then be:

    var mouse3D = new THREE.Vector3( ( event.clientX - offsetX ) / viewWidth * 2 - 1,
                                    -( event.clientY - offsetY ) / viewHeight * 2 + 1,
                                    0.5 );

Basically, what we are doing is calculating the position of the mouse click relative to the canvas (for x: event.clientX - offsetX). Then we determine proportionally where the click occurred (for x: /viewWidth) similar to when the canvas filled the window.

That's it, hopefully it helps.

Solution 2:

Basically, you need to project from the 3D world space and the 2D screen space.

Renderers use projectVector for translating 3D points to the 2D screen. unprojectVector is basically for doing the inverse, unprojecting 2D points into the 3D world. For both methods you pass the camera you're viewing the scene through.

So, in this code you're creating a normalised vector in 2D space. To be honest, I was never too sure about the z = 0.5 logic.

mouse3D.x = (event.clientX / window.innerWidth) * 2 - 1;
mouse3D.y = -(event.clientY / window.innerHeight) * 2 + 1;
mouse3D.z = 0.5;

Then, this code uses the camera projection matrix to transform it to our 3D world space.

projector.unprojectVector(mouse3D, camera);

With the mouse3D point converted into the 3D space, we can now use it for getting the direction and then use the camera position to throw a ray from.

var ray = new THREE.Ray(camera.position, mouse3D.subSelf(camera.position).normalize());
var intersects = ray.intersectObject(plane);

Solution 3:

As of release r70, Projector.unprojectVector and Projector.pickingRay are deprecated. Instead, we have raycaster.setFromCamera which makes the life easier in finding the objects under the mouse pointer.

var mouse = new THREE.Vector2();
mouse.x = (event.clientX / window.innerWidth) * 2 - 1;
mouse.y = -(event.clientY / window.innerHeight) * 2 + 1; 

var raycaster = new THREE.Raycaster();
raycaster.setFromCamera(mouse, camera);
var intersects = raycaster.intersectObjects(scene.children);

intersects[0].object gives the object under the mouse pointer and intersects[0].point gives the point on the object where the mouse pointer was clicked.

Solution 4:

Projector.unprojectVector() treats the vec3 as a position. During the process the vector gets translated, hence we use .sub(camera.position) on it. Plus we need to normalize it after after this operation.

I will add some graphics to this post but for now I can describe the geometry of the operation.

We can think of the camera as a pyramid in terms of geometry. We in fact define it with 6 panes - left, right, top, bottom, near and far (near being the plane closest to the tip).

If we were standing in some 3d and observing these operations, we would see this pyramid in an arbitrary position with an arbitrary rotation in space. Lets say that this pyramid's origin is at it's tip, and it's negative z axis runs towards the bottom.

Whatever ends up being contained within those 6 planes will end up being rendered on our screen if we apply the correct sequence of matrix transformations. Which i opengl go something like this:

NDC_or_homogenous_coordinates = projectionMatrix * viewMatrix * modelMatrix * position.xyzw;

This takes our mesh from it's object space into world space, into camera space and finally it projects it does the perspective projection matrix which essentially puts everything into a small cube (NDC with ranges from -1 to 1).

Object space can be a neat set of xyz coordinates in which you generate something procedurally or say, a 3d model, that an artist modeled using symmetry and thus neatly sits aligned with the coordinate space, as opposed to an architectural model obtained from say something like REVIT or AutoCAD.

An objectMatrix could happen in between the model matrix and the view matrix, but this is usually taken care of ahead of time. Say, flipping y and z, or bringing a model thats far away from the origin into bounds, converting units etc.

If we think of our flat 2d screen as if it had depth, it could be described the same way as the NDC cube, albeit, slightly distorted. This is why we supply the aspect ratio to the camera. If we imagine a square the size of our screen height, the remainder is the aspect ratio that we need to scale our x coordinates.

Now back to 3d space.

We're standing in a 3d scene and we see the pyramid. If we cut everything around the pyramid, and then take the pyramid along with the part of the scene contained in it and put it's tip at 0,0,0, and point the bottom towards the -z axis we will end up here:

viewMatrix * modelMatrix * position.xyzw

Multiplying this by the projection matrix will be the same as if we took the tip, and started pulling it appart in the x and y axis creating a square out of that one point, and turning the pyramid into a box.

In this process the box gets scaled to -1 and 1 and we get our perspective projection and we end up here:

projectionMatrix * viewMatrix * modelMatrix * position.xyzw;

In this space, we have control over a 2 dimensional mouse event. Since it's on our screen, we know that it's two dimensional, and that it's somewhere within the NDC cube. If it's two dimensional, we can say that we know X and Y but not the Z, hence the need for ray casting.

So when we cast a ray, we are basically sending a line through the cube, perpendicular to one of it's sides.

Now we need to figure out if that ray hits something in the scene, and in order to do that we need to transform the ray from this cube, into some space suitable for computation. We want the ray in world space.

Ray is an infinite line in space. It's different from a vector because it has a direction, and it must pass through a point in space. And indeed this is how the Raycaster takes its arguments.

So if we squeeze the top of the box along with the line, back into the pyramid, the line will originate from the tip and run down and intersect the bottom of the pyramid somewhere between -- mouse.x * farRange and -mouse.y * farRange.

(-1 and 1 at first, but view space is in world scale, just rotated and moved)

Since this is the default location of the camera so to speak (it's object space) if we apply it's own world matrix to the ray, we will transform it along with the camera.

Since the ray passes through 0,0,0, we only have it's direction and THREE.Vector3 has a method for transforming a direction:

THREE.Vector3.transformDirection()

It also normalizes the vector in the process.

The Z coordinate in the method above

This essentially works with any value, and acts the same because of the way the NDC cube works. The near plane and far plane are projected onto -1 and 1.

So when you say, shoot a ray at:

[ mouse.x | mouse.y | someZpositive ]

you send a line, through a point (mouse.x, mouse.y, 1) in the direction of (0,0,someZpositive)

If you relate this to the box/pyramid example, this point is at the bottom, and since the line originates from the camera it goes through that point as well.

BUT, in the NDC space, this point is stretched to infinity, and this line ends up being parallel with the left,top,right,bottom planes.

Unprojecting with the above method turns this into a position/point essentially. The far plane just gets mapped into world space, so our point sits somewhere at z=-1, between -camera aspect and + cameraAspect on X and -1 and 1 on y.

since it's a point, applying the cameras world matrix will not only rotate it but translate it as well. Hence the need to bring this back to the origin by subtracting the cameras position.