Correlation Filter Tutorial

The following sections describe elementary and advanced applications of the Correlation Filter, for both data comparison and correlation computation.

# Basic Example

In this example we will use a Correlation Filter to compute the correlation of set of data points. We first use a Sequence Source to create a sequence of values $n = 0,\ldots, 2$ with 11 elements and connect it to a Function Filter to generate a point $(n, n\times n)$ for each value $n$. As seen below, this gives us a sequence of points $(x, y)$ in the output.

Now, we wish to compute the correlation between the values in v[0] with the values in v[1] in Stream 0. To do this, we first create and connect a Correlation Filter to the output of the Function Filter. Once connected, we have to specify a few things for the filter so that it understands what data, exactly, it should compare. Select the correlation filter, and the first panel, known as the Stream panel, should appear.

Select the only Stream available, Stream 0, and enter a '2' in the box that appears. This informs the correlation filter that the first stream will be included twice in the item iterations - that it, for each level of comparison, the whole stream will be used once. This means that each element in the stream will be compared against every other element in the stream once. While this idea may seem unclear, once we format the output and take a look, it will make perfect sense.

Speaking of output, we should define some values to output. Select the 'Values' tab along the top, and you will be met with a new panel. Click on the 'Mirror Values' button. This will add a copy of every input variable to the output, which will allow us to see what the results are computed from. Once these values have been added, let's add a fifth value - the correlation itself. Click on the 'New' button, and then select the entry that the button added to the list.

Select the text field labeled Expression and enter the following in it: sqrt(vi1+vj1)/(vi0+vj0) . This equation will illustrate the correspondence between the square root of the sum of the squares over the sum of the values. Click 'Propagate Changes' and select the 'Output' tab next - we'll see the data we've just manipulated.

In the output panel, p[0] represents the iterator's position in vi and p[1] does the same for vj. In the next four columngs, vi0, vi1, vj0, and vj1 are all mirrored. The values in a given row are the values used for that row's computation. Notice how p[0] and p[1] decribe the element iteration occuring in the vi and vj columns. The last column here contains the actual computation of the equation. In this way, we have compared every value in a stream against itself - this is the basic idea of the correlation filter.

Now that we've gotten our feet wet, we will move to a more in-depth and practical application of the correlation filter: we will use it to group particle data by distance between particles to get a notion of gravitational wakes in a simulation of Saturn's A rings.

## Input File

The input file we will be using for this example is available here. It includes Cartesian positions and radii of individual particles produced by a computer simulation of a small patch of Saturn's A rings. Note that this file is large - it contains the data of roughly 100,000 particles for a single timestep. First, let's open and read the file. We do this be creating a Cartesian and Radius Source.

## Thinning the Data

We just read the file in, but now we have to consider how to proceed. We would like to perform a correlation for this data based on particle distances, but comparing every particle toe very other particle in the data set will yield roughly 25 million comparisons, a somewhat computationally infeasible proposition, since computers at the time of this writing cannot accomplish this. (At this point, correlating the data is like firing a loaded gun at your processor.) Therefore, we should sort out some of the data to be able to perform a more reasonable computation. We select the Thinning tab for our data source, check the Use Thinning box, and tell it to only take 1 data point in ever 100 - reducing the total number of points to 1600, which yields 2.5 million comparisons.

Once you have entered 100, go back to the Setting tab and click 'Read File' - this is absolutely necessary to ensure the file is properly thinned.

## Correlation Mathematics

Now that we've thinned the data to a more manageable size, we connect a Correlation Filter to the data. Right-click on the Source in the SwiftVis Graph area, select Filter, and then select Correlation Filter from the drop-down list. Now, select the Correlation Filter we just created. You should see something like the image below.

There should only be one input stream currently feeding into the filter, and it will contain all of the data about the Cartesian coordinates of the simulation particles. In order to locate the frequency of gravitational wakes, we will compute the distance between pairs of particles. To do this, we must do a few things: indicate the number of times to use our input stream, provide an equation for the target computational values, and verify the output.

### Setting Up The Streams

As with the previous example, we want to compare every input element with every other input element. In order to do this, we will set the stream use for Stream 0 in the Correlation Filter to be 2 - one for an iteration through each object, and another for an iteration through each object for every object in the first iteration. To illustrate, by indicating that the filter should use Stream 0 twice, we are setting up the following computation:

for every object, i, in stream 0, do:
for every object, j, in stream 0, do:
compute the equation that uses i and j


### Providing the Equation

Once we've set up the stream usage, we will instruct the filter to compute the distance between the two points. To do this, first select the Values tab. Then, select 'New' to add a new value expression to the list of values to be computed. Next, select the New Value in the List.

Once selected, we need to fill in the equation. Highlight the '1' in the Expression field. Replace it with sqrt((vi0-vj0)**2 + (vi1-vj1)**2). This instructs the equation parser to perform the calculation $\sqrt{(v_{i_x} - v_{j_x})^2+(v_{i_y} - v_{j_y})^2}$ for every pair of points, which is the standard 2-Dimensional distance formula. (Note: we use a 2D distance formula here because Saturn's A ring is mostly flat - the Z direction is going to be really small, and is not important for our ultimate goal. However, if you were trying to do a distance formula in three directions, with x in v[0], y in v[1], and z in v[2], the equation would be sqrt((vi0-vj0)**2 + (vi1-vj1)**2 + (vi2-vj2)**2).)

## Output

Click 'Propagate Changes' at the bottom of the screen to let your new equation compute. This may take a few moments, depending on the speed of your compute. Once complete, check out the Output panel of the Correlation Filter. It should look something like this:

(You may notice that the first entry has a distance of 0 - this illustrates the fact that every point is compared to every other point, including itself.)

## One Step Further

Now that we have our distance data, we would like to see a graph illustrating how often a certain distance occurs, so that we can get some idea of how far apart particles typically are. In order to do this, we will use a Binned Filter to sort the Correlation Filter output into equally-sized bins based on the value of the particle distance. We create a Binned Filter connected to the Correlation Filter and set the following properties:

Note that the Maximum for the Binned Filter was changed to a smaller value - this is because the section of Saturn's A ring that we are working with is approximately 2e-6 across - the maximum distance generated by default. Binning these numbers, however, will falsely inflate the tail of our graph because, simply put, every particle is at least 2e-6 units away from any other particle. Again, don't forget to click 'Propagate Changes' after updating the settings.

## The Plot

Now that we've binned everything up, let's get a plot of our final product. Create a plot off of the Binned Filter and set it up as follows:

(See the 2D Plotting Tutorial if you are unsure of how to accomplish this.)

And now, the plot!

## A Final Product

After this, we have also attached a 2D image that paints a top-down view of the particle in the ring next to the distance plot. You can clearly see the gravity wakes in the particle plot, and see that the distance between wakes corresponds to the large peak in the distance plot.

### Note on Computation

The graph above was generated using 1 in every 25 particles on a machine using 15 Gigabytes of RAM - you should not try reproducing this specific plot on a standard, end-user desktop machine. While the correlation filter's computations will perform as fast as your processor can handle, the largest problem typically encountered with this filter is finding the empty RAM to store the output results.

## Unrest in the Forest, Trouble with the Trees

While for our provided dataset, using a moderate number of particles, this calculation was reasonably fast, we can significantly improve the speed of it by utilizing the Correlation Filter's other powerful feature: a K-Dimensional Spatial Tree for comparison segmentation. That is, we can divide up the space in a spatial sense.

### Tree Benefits

The primary benefit of using such a tree in this kind of computation is that we can significantly reduce the number of comparisons performed. In the example above, our primary motivation for severely thinning the data was that 1.5 million comparisons was computable, but 25 million really wasn't on a modern desktop computer at the time of this writing. However, the KD Tree in the Correlation Filter allows us to limit these comparisons. For example, we can divide the tree up on its X and Y spatial axes into small bins, and then only perform our distance calculation between particles in the bins. Then, if we have 100 bins and 100,000 particles, and we put roughly 100 particles into each bin, we get approximately 1 million comparisons, versus the raw input's 25 million.

### Setting Up Spatial Trees

In order to use the tree, we need to give the Correlation Filter some heuristic for building it. Select the Pruning tab and click the box next to "Use Spatial Tree for Data". Upon selecting this, a first axis will automatically get created - you can't have a KD Tree without some form of axis, so there must be at least one.

We're going to build up our tree using the X and Y coordinates of the particles as our heuristic for dividing them up. Therefore, we need a second axis. Click "New" to create a second one - it should be labeled Axis 1. Now, select Axis 0 from the list. Set the values to the following image:

The v[0] instructs the filter to split points using the v[0] value of the particle. The 7e-7 indicates that only particles within 7e-7 of each other on the X axis should have their distances calculated. ( We have picked this value because, in the image above, our peak occurs before 7e-7. This way, we will encompass our peak in our pruned computation, and therefore preserve accuracy while reducing computations.) Now, Axis 0 in our spatial tree will divide points based on their X value. We will repeat this for Axis 1, but using the Y value of our points instead.

Now that we have given the Correlation Filter a way to build the tree, click on "Propagate Changes". If everything went smoothly, you should notice a significant speed increase. Now, let's try getting the above results (using 1 in 25 particles) to replicate the plot above made on the high-end computer. Follow the instructions to adjust the file thinning, but change it to 1 in 25. Then let the changes propagate again - you should find that this takes up to 2 minutes, but certainly not the 40 spent computing the image above on a high-end computer.

### Output Considerations

We can now look back at our plot and see that our limitations have made a significant difference. We no longer get that lengthy tail on the end because, simply put, we don't compute distances between particles that far away. Because of this, we were able to deal directly with a much more potent quantity of data, resulting in a smooth, reliable graph, without having to wait hours for a machine to number-crunch even insignificant results.

### And to Quote some Great Men

So the maples formed a union
And demanded equal rights.
"The oaks are just too greedy;
We will make them give us light."
Now there's no more oak oppression,
For they passed a noble law,
And the trees are all kept equal
By hatchet, axe, and saw.
- Rush, The Trees

page revision: 48, last edited: 20 Jul 2009 15:32