Predict Data.jpgToday we are again visited by the ghost of analytics future – Prediction Across Datasets. On the 3rd day of Analytics, Kate showed us prediction within a dataset. Most other analytics tell us what happened, or what is occuring right now. Prediction provides us a way to look at the trends and correlations in order to understand complex interactions – and also possibly forsee what future outcomes will be from actions.

Unlike simple prediction, we really want to see how otherwise disparate data can identify interconnected relationships. Often I want to see if the number of people corresponds with the average amount of traffic – or if police force funding is inversely related to criminal activity. Mathematically, this is done through methodologies like Pearson’s Correlation. However, the difficulty is that in order to compare two datasets, the features in each must align over similar areas. This isn’t the case, for example, when comparing funding at a county level with individual crimes, or then compared to State-wide policies.

Through the power of spatial aggregation I can in fact evenly compare two datasets. We can aggregate the individual point locations to the total number for each county and then directly compare that with the funding amounts. Similarly we could aggregate up to State levels and compare to other characteristics such as population, funding, or policies.

Within GeoIQ we allow users to define their Independent attribute, the characteristic that we believe occurs by itself. Within this we choose which aggregate calculation value we want to compare: Count of features, Sum of a value, Average, Minimum, or Maximum. We can then choose the Dependent attribute, what we want to investigate if it varies based on the independent attribute. Similarly we can choose how we want to calculate that value up to it’s aggregate level. And finally, we choose the common boundary we’ll be using to compare – this can be either the independent or dependent – or even a brand new boundary.

In carrying forward our collaborative chain of analytics over these final holidays, I wanted to show how to analyze our social media data we gathered during Black Friday sales and see if brick & mortar stores drive social media engagement. The video demonstrates comparing Tweets in Manhattan with Starbucks locations, just one example of how prediction across datasets can provide insight into social media engagement – amongst many other possibilities.

Predict Across Datasets from FortiusOne on Vimeo.

We’ve moved beyond simple spatial querying into some much more complex analytics. Sean described how you can write your own expression. The power to now share some advanced analysis with subject matter experts so that they can do their own investigation means that more members of the community and organizations can make informed decisions. Those are some smart Maids-a-Milking.

 

2 Responses to On the 8th Day of Analytics – What does the Future Behold?

  1. Alan Seeling says:

    You might consider stepping back a bit when the correlations are not as clear cut as you had anticipated. Many times our intuition is more complexly accurate than our investigative algorithms. For instance you mentioned comparing funding of police vs criminal activity (arrests?, convictions?). If that seems to place points all over the map, perhaps the driving force is not the $ but possibly something else. Perhaps the determination of the neighborhood culture to fight criminality in one location makes the $ figure more successful than another place. Say, where an urban town council saw the success in a neighboring suburb, threw the same, or more money, at the problem but with less or little result. The operating factor is not the amount of money but the determination that it be used in a manner that ends up more effective.

    ???????

    • Sean Gorman says:

      Good point and we should never forget that correlation is not causation. Visualization and analysis tools are a great way to look for trends and explore relationships but should not replace on the ground investigation. It does help point you in the right direction, which can be invaluable when you are swimming in data

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>