Correlation Filter Tutorial

The following sections describe elementary and advanced applications of the Correlation Filter, for both data comparison and correlation computation.

Basic Example

In this example we will use a Correlation Filter to compute the correlation of set of data points. We first use a Sequence Source to create a sequence of values $n = 0,\ldots, 2$ with 11 elements and connect it to a Function Filter to generate a point $(n, n\times n)$ for each value $n$. As seen below, this gives us a sequence of points $(x, y)$ in the output.

CorFilter0.png

Now, we wish to compute the correlation between the values in v[0] with the values in v[1] in Stream 0. To do this, we first create and connect a Correlation Filter to the output of the Function Filter. Once connected, we have to specify a few things for the filter so that it understands what data, exactly, it should compare. Select the correlation filter, and the first panel, known as the Stream panel, should appear.

CorFilter1.png

Select the only Stream available, Stream 0, and enter a '2' in the box that appears. This informs the correlation filter that the first stream will be included twice in the item iterations - that it, for each level of comparison, the whole stream will be used once. This means that each element in the stream will be compared against every other element in the stream once. While this idea may seem unclear, once we format the output and take a look, it will make perfect sense.

CorFilter2.png

Speaking of output, we should define some values to output. Select the 'Values' tab along the top, and you will be met with a new panel. Click on the 'Mirror Values' button. This will add a copy of every input variable to the output, which will allow us to see what the results are computed from. Once these values have been added, let's add a fifth value - the correlation itself. Click on the 'New' button, and then select the entry that the button added to the list.

CorFilter3.png

Select the text field labeled Expression and enter the following in it: sqrt(vi1+vj1)/(vi0+vj0) . This equation will illustrate the correspondence between the square root of the sum of the squares over the sum of the values. Click 'Propagate Changes' and select the 'Output' tab next - we'll see the data we've just manipulated.

CorFilter4.png

In the output panel, p[0] represents the iterator's position in vi and p[1] does the same for vj. In the next four columngs, vi0, vi1, vj0, and vj1 are all mirrored. The values in a given row are the values used for that row's computation. Notice how p[0] and p[1] decribe the element iteration occuring in the vi and vj columns. The last column here contains the actual computation of the equation. In this way, we have compared every value in a stream against itself - this is the basic idea of the correlation filter.

Advanced Example

Now that we've gotten our feet wet, we will move to a more in-depth and practical application of the correlation filter: we will use it to group particle data by distance between particles to get a notion of gravitational wakes in a simulation of Saturn's A rings.

Input File

The input file we will be using for this example is available here. It includes Cartesian positions and radii of individual particles produced by a computer simulation of a small patch of Saturn's A rings. Note that this file is large - it contains the data of roughly 100,000 particles for a single timestep. First, let's open and read the file. We do this be creating a Cartesian and Radius Source.

AdvCorTutorial0.png

Thinning the Data

We just read the file in, but now we have to consider how to proceed. We would like to perform a correlation for this data based on particle distances, but comparing every particle toe very other particle in the data set will yield roughly 25 million comparisons, a somewhat computationally infeasible proposition, since computers at the time of this writing cannot accomplish this. (At this point, correlating the data is like firing a loaded gun at your processor.) Therefore, we should sort out some of the data to be able to perform a more reasonable computation. We select the Thinning tab for our data source, check the Use Thinning box, and tell it to only take 1 data point in ever 100 - reducing the total number of points to 1600, which yields 2.5 million comparisons.

AdvCorTutorial1.png

Once you have entered 100, go back to the Setting tab and click 'Read File' - this is absolutely necessary to ensure the file is properly thinned.

AdvCorTutorial2.png

Correlation Mathematics

Now that we've thinned the data to a more manageable size, we connect a Correlation Filter to the data. Right-click on the Source in the SwiftVis Graph area, select Filter, and then select Correlation Filter from the drop-down list. Now, select the Correlation Filter we just created. You should see something like the image below.

AdvCorTutorial3.png

There should only be one input stream currently feeding into the filter, and it will contain all of the data about the Cartesian coordinates of the simulation particles. In order to locate the frequency of gravitational wakes, we will compute the distance between pairs of particles. To do this, we must do a few things: indicate the number of times to use our input stream, provide an equation for the target computational values, and verify the output.

Setting Up The Streams

AdvCorTutorial3.png

As with the previous example, we want to compare every input element with every other input element. In order to do this, we will set the stream use for Stream 0 in the Correlation Filter to be 2 - one for an iteration through each object, and another for an iteration through each object for every object in the first iteration. To illustrate, by indicating that the filter should use Stream 0 twice, we are setting up the following computation:

for every object, i, in stream 0, do:
    for every object, j, in stream 0, do:
        compute the equation that uses i and j

Providing the Equation

Once we've set up the stream usage, we will instruct the filter to compute the distance between the two points. To do this, first select the Values tab. Then, select 'New' to add a new value expression to the list of values to be computed. Next, select the New Value in the List.

AdvCorTutorial4.png

Once selected, we need to fill in the equation. Highlight the '1' in the Expression field. Replace it with sqrt((vi0-vj0)**2 + (vi1-vj1)**2). This instructs the equation parser to perform the calculation $\sqrt{(v_{i_x} - v_{j_x})^2+(v_{i_y} - v_{j_y})^2}$ for every pair of points, which is the standard 2-Dimensional distance formula. (Note: we use a 2D distance formula here because Saturn's A ring is mostly flat - the Z direction is going to be really small, and is not important for our ultimate goal. However, if you were trying to do a distance formula in three directions, with x in v[0], y in v[1], and z in v[2], the equation would be sqrt((vi0-vj0)**2 + (vi1-vj1)**2 + (vi2-vj2)**2).)

AdvCorTutorial5.png

Output

Click 'Propagate Changes' at the bottom of the screen to let your new equation compute. This may take a few moments, depending on the speed of your compute. Once complete, check out the Output panel of the Correlation Filter. It should look something like this:

AdvCorTutorial6.png

(You may notice that the first entry has a distance of 0 - this illustrates the fact that every point is compared to every other point, including itself.)

One Step Further

Now that we have our distance data, we would like to see a graph illustrating how often a certain distance occurs, so that we can get some idea of how far apart particles typically are. In order to do this, we will use a Binned Filter to sort the Correlation Filter output into equally-sized bins based on the value of the particle distance. We create a Binned Filter connected to the Correlation Filter and set the following properties:

AdvCorTutorial7.png

Note that the Maximum for the Binned Filter was changed to a smaller value - this is because the section of Saturn's A ring that we are working with is approximately 2e-6 across - the maximum distance generated by default. Binning these numbers, however, will falsely inflate the tail of our graph because, simply put, every particle is at least 2e-6 units away from any other particle. Again, don't forget to click 'Propagate Changes' after updating the settings.

The Plot

Now that we've binned everything up, let's get a plot of our final product. Create a plot off of the Binned Filter and set it up as follows:

AdvCorTutorial8.png

(See the 2D Plotting Tutorial if you are unsure of how to accomplish this.)

And now, the plot!

AdvCorTutorial9.png

A Final Product

After this, we have also attached a 2D image that paints a top-down view of the particle in the ring next to the distance plot. You can clearly see the gravity wakes in the particle plot, and see that the distance between wakes corresponds to the large peak in the distance plot.

export.png

Note on Computation

The graph above was generated using 1 in every 25 particles on a machine using 15 Gigabytes of RAM - you should not try reproducing this specific plot on a standard, end-user desktop machine. While the correlation filter's computations will perform as fast as your processor can handle, the largest problem typically encountered with this filter is finding the empty RAM to store the output results.

Unrest in the Forest, Trouble with the Trees

While for our provided dataset, using a moderate number of particles, this calculation was reasonably fast, we can significantly improve the speed of it by utilizing the Correlation Filter's other powerful feature: a K-Dimensional Spatial Tree for comparison segmentation. That is, we can divide up the space in a spatial sense.

Tree Benefits

The primary benefit of using such a tree in this kind of computation is that we can significantly reduce the number of comparisons performed. In the example above, our primary motivation for severely thinning the data was that 1.5 million comparisons was computable, but 25 million really wasn't on a modern desktop computer at the time of this writing. However, the KD Tree in the Correlation Filter allows us to limit these comparisons. For example, we can divide the tree up on its X and Y spatial axes into small bins, and then only perform our distance calculation between particles in the bins. Then, if we have 100 bins and 100,000 particles, and we put roughly 100 particles into each bin, we get approximately 1 million comparisons, versus the raw input's 25 million.

Setting Up Spatial Trees

In order to use the tree, we need to give the Correlation Filter some heuristic for building it. Select the Pruning tab and click the box next to "Use Spatial Tree for Data". Upon selecting this, a first axis will automatically get created - you can't have a KD Tree without some form of axis, so there must be at least one.

TreeCorUtil0.png

We're going to build up our tree using the X and Y coordinates of the particles as our heuristic for dividing them up. Therefore, we need a second axis. Click "New" to create a second one - it should be labeled Axis 1. Now, select Axis 0 from the list. Set the values to the following image:

TreeCorUtil1.png

The v[0] instructs the filter to split points using the v[0] value of the particle. The 7e-7 indicates that only particles within 7e-7 of each other on the X axis should have their distances calculated. ( We have picked this value because, in the image above, our peak occurs before 7e-7. This way, we will encompass our peak in our pruned computation, and therefore preserve accuracy while reducing computations.) Now, Axis 0 in our spatial tree will divide points based on their X value. We will repeat this for Axis 1, but using the Y value of our points instead.

TreeCorUtil2.png

Now that we have given the Correlation Filter a way to build the tree, click on "Propagate Changes". If everything went smoothly, you should notice a significant speed increase. Now, let's try getting the above results (using 1 in 25 particles) to replicate the plot above made on the high-end computer. Follow the instructions to adjust the file thinning, but change it to 1 in 25. Then let the changes propagate again - you should find that this takes up to 2 minutes, but certainly not the 40 spent computing the image above on a high-end computer.

Output Considerations

TreeCorUtil3.png

We can now look back at our plot and see that our limitations have made a significant difference. We no longer get that lengthy tail on the end because, simply put, we don't compute distances between particles that far away. Because of this, we were able to deal directly with a much more potent quantity of data, resulting in a smooth, reliable graph, without having to wait hours for a machine to number-crunch even insignificant results.

And to Quote some Great Men

So the maples formed a union
And demanded equal rights.
"The oaks are just too greedy;
We will make them give us light."
Now there's no more oak oppression,
For they passed a noble law,
And the trees are all kept equal
By hatchet, axe, and saw.
- Rush, The Trees

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License