Tuesday, June 3, 2008

Fast Multidimensional Indexing

Let's assume that you have a set of n points in the plane and that you want to find all the points within a particular rectangle. How can you do that efficiently?

Almost every program needs to memorize some data. The problem is how to quickly retrieve that information when it is needed. We all know the principal data structures: stack, queue, list, hash table and tree. This paper is mainly concerned with trees. A tree is a simple but powerful data structure where n objects are organized in such a way that we can access a single object (it does not matter which one) in time O(log n) in the worst case. It is a remarkable result, and the idea behind it is so simple!

The one-dimensional case is simple, but what about the two-dimensional and the multidimensional cases?

Some of the common approaches employ grids, multilevel grids, space partition trees based on splitting planes (for instance, BSP Trees and KD-Trees), hierarchies of bounding objects (for instance, Axis Aligned Bounding Boxes), space-filling curves based structures (for instance, UB-Trees, based on the Z-curve).

Okay, so we have a lots of methods to choose from. Why another method? I was looking for a method that, like the binary trees in the one-dimensional case, was worst-case efficient, i.e. behaved nicely all the times.

My method solves the two-dimensional case (i.e. it finds all the points in a rectangle) in time O(logn). Eh? Like the binary trees in the one-dimensional case? Where's the catch? The catch is that it requires space O(n logn). I believe we can't do any better than this.
In the d-dimensional case, my method requires space O(n log^(d-1) n) and time O(log^(d-1) n). Moreover, whenever the restrictions are imposed only on k >= 2 coordinates, the search takes time O(log^(k-1) n + d - k).

The idea behind my method is extremely simple. You just need to understand the first theorems. The pictures should help.

If you find any errors, unclear passages or you have any questions, well, just let me know!

You can view and download (PDF) the document here:
http://www.scribd.com/full/3216858?access_key=key-1r88z590x620idm48g3n
http://www.scribd.com/doc/3216858/Worst-Case-Efficient-Multidimensional-Indexing
http://www.keepandshare.com/doc/view.php?id=628583&da=y

NOTE: I've just noted that scribd's renderer is not perfect. I suggest you download the PDF and view it in Acrobat Reader, indeed other viewers (such as SumatraPDF) have some problems with some of the pictures.