zondag 23 januari 2011

Building a 3D scanner part 1

An idea that has been floating around in my head for a while after I graduated is to build a 3D scanner. Mainly because I know how to make one and because it will make for some very interesting blog posts, but also because it will allow me to import objects from the real world into 3D games. Now you might think that something like a 3D scanner needs a whole array of gizmos and doodads, but the principle is surprisingly simple. All you really need is a camera, a line laser and some way to move either the laser or the object to be scanned. Oh, and of course software to calculate the coordinates of the object from the camera images. This image from Wikipedia sums it up pretty nicely.


In the image you can see that there is an angle between the camera and the line laser. This way the further a point on the object is away from the camera, the further to the right the laser line will appear on that point. So the software can analyze each horizontal line of the image and calculate on which pixel position the laser line appears. Note that right will become left if the camera and laser are swapped. Also be aware that this setup with the camera and the line laser is just one way to make a 3D scan of an object, there exist other methods as well such as using two cameras right next to each other and comparing the two images they produce.

So now we know that there is some way to calculate the position on the object where the laser line intersects it by analyzing an image from the camera. I think that the simplest way to calculate the position on the object is to view the line laser as a mathematical (infinite) plane, the pixels on the image of the camera as mathematical rays and calculating the intersection point between the two.

The plane equation in algebraic form is:

         n ∙ p - d = 0

Where:
-          n is the normal of the plane facing away from the center of the coordinate system.
-          p is a point on the plane.
-          d is the smallest distance from the center of the coordinate system to the plane.


The ray equation is:

         pt = u t + p0

Where:
-          pt is the position on the ray at distance t.
-          u is the (normalized) direction the ray is traveling in.
-          t is the distance from the starting point.
-          p0 is the starting point of the ray.

So we solve p = pt by substituting the ray equation in the plane equation and solving for t.

         t = ( dp0 ∙ n ) / ( u ∙ n )

t can then be inserted in the ray equation to obtain the actual intersection point.

So now that we know that how to calculate the intersection point the problem becomes finding out all the values in those equations. The values of the plane equation isn’t such a problem, you can just throw some high school trigonometry at it and calculate the values of d and the components of the plane normal. This is all assuming that you choose the center of the rotating platform in the image of the duck to be the center of the coordinate system. The real challenge lies with the ray equations, or the camera really. The starting position of the rays lie in the center of the lens of the camera and measuring this is rather tricky. Maybe you could get away with a slightly inaccurate position since it only appears once in the total calculation of the intersection point. The direction of the ray however appears twice. Once in the calculation of t and once when inserting t into the ray equation. So we accurately need to know the direction of the ray that each pixel corresponds to. To do that we need to calibrate the camera, also known as camera resectioning, so we know its geometrical properties that transforms a point from 3D Euclidian space to image space. This is what my next blog post will be about. After that I will describe how to build the setup required for the 3D scanner and after that how the software works and processes the images from the camera. Lastly I will test the accuracy and precision of the scanner.


maandag 10 januari 2011

A Sudoku solving algorithm

When I make long trips by using the train I often take a newspaper to try to make the long travel less boring. Unfortunately the contents of the newspaper is not always very interesting, so I end up at the back of the newspaper rather quickly. Luckily that section always contains a lot of puzzles. The most interesting, and time consuming, of them all are usually the Sudoku puzzles. I often solve them without too much effort, but sometimes one of the puzzles is really hard and I am unable to solve it. So I have done what every self respecting programmer would have done: I wrote a algorithm to solve Sudoku puzzles. Now of course there are already a lot of websites that have Sudoku solving algorithms, but where is the fun in using a premade solution?

My algorithm is heavily based on how humans solve Sudoku puzzles. Basically what the algorithms does for each of the 81 tiles is marking whether a number is possible on that space or not. This can be visualized as a 9x9x9 volume of boolean values where the width and length of the volume lie parallel to the Sudoku puzzle and the depth signifies the numbers 1 to 9. This will be called the state volume. In the state volume a value of true means that a number is possible at that tile and false means it is not.


The algorithm works as follows:

  1. Create an empty 9x9 grid, this is called the solved numbers.
  2. Create a 9x9x9 state volume with all elements set to possible.
  3. Put the given numbers of the Sudoku puzzle in a container, these are the new numbers.
  4. Use the new numbers to mark elements in the state volume as not possible. Transfer the new numbers to the correct position in the solved numbers grid.
  5. Search the state volume for new numbers and put them in the container, if they are not already in the solved numbers grid. 
  6. If there are new numbers then go to step 4.
  7. The solved numbers grid contains the solved Sudoku.

Now the most interesting steps are of course step 4 and step 5, because how should we mark the elements as not possible and how should we search for the new numbers? The rules of a Sudoku puzzle say that a number should only appear once in a row, once in a column and once in a square. We also want to mark the elements in the state volume in this way.

This image shows how the fourth slice of the state volume is affected when it is marked by the number 4 in the middle of the puzzle.


The tiles that are now marked yellow cannot contain the number 4. There are also elements that are marked in the state volume that are not shown in this image, namely the elements directly above and below the center of the fourth slice. This is because the number 4 now occupies the center tile on the Sudoku puzzle and this means it is impossible for any other number to exist there.

Searching happens in the same four ways that the volume is marked. Along the rows, along the columns, in a square and in the depth of the volume. In the case of searching along the rows, the algorithm checks for each slice and for each row whether there is only one element in the row on that slice that is still possible. If this is the case then the column and row number become the position on the Sudoku and the depth in the volume becomes the number at that location. Along the columns is almost the same as along the rows of course, just switch the two words. Searching in a square happens for each slice and for each of the 9 3x3 squares. Inside a square at that slice the algorithm checks if only one element is still possible. Searching in the depth happens for each column and for each row.

When I finished coding it and started testing the algorithm I noticed that the algorithm solved the puzzles within the blink of an eye. Also note that the running time of the algorithm, and any other Sudoku solving algorithm for that matter, depends on the number of tiles that are not given. This is of course assuming that the size of the puzzle stays constant. Unfortunately when I tried to solve one of the Sudoku’s on Wikipedia that are marked “Hardest”. The algorithm didn’t work. This was a bit disappointing and I think that due to the fact that the algorithm is based on the methods that humans use when solving Sudoku’s, but I already have a good idea of how to extend the algorithm so it might even solve these puzzles.

Oh, before I forget. The source code below (in C++) simply uses a 9x9 grid as the container for the new numbers. This way I don’t have to make an extra data type to remember the positions on the Sudoku grid. However I do have to search over this entire grid every time I want to mark the elements in the state volume. It is not a big overhead though.