On the 8th Day of Analytics – What does the Future Behold?
Today we are again visited by the ghost of analytics future – Prediction Across Datasets. On the 3rd day of Analytics, Kate showed us prediction within a dataset. Most other analytics tell us what happened, or what is occuring right now. Prediction provides us a way to look at the trends and correlations in order to understand complex interactions – and also possibly forsee what future outcomes will be from actions.
Unlike simple prediction, we really want to see how otherwise disparate data can identify interconnected relationships. Often I want to see if the number of people corresponds with the average amount of traffic – or if police force funding is inversely related to criminal activity. Mathematically, this is done through methodologies like Pearson’s Correlation. However, the difficulty is that in order to compare two datasets, the features in each must align over similar areas. This isn’t the case, for example, when comparing funding at a county level with individual crimes, or then compared to State-wide policies.
Through the power of spatial aggregation I can in fact evenly compare two datasets. We can aggregate the individual point locations to the total number for each county and then directly compare that with the funding amounts. Similarly we could aggregate up to State levels and compare to other characteristics such as population, funding, or policies.
Within GeoIQ we allow users to define their Independent attribute, the characteristic that we believe occurs by itself. Within this we choose which aggregate calculation value we want to compare: Count of features, Sum of a value, Average, Minimum, or Maximum. We can then choose the Dependent attribute, what we want to investigate if it varies based on the independent attribute. Similarly we can choose how we want to calculate that value up to it’s aggregate level. And finally, we choose the common boundary we’ll be using to compare – this can be either the independent or dependent – or even a brand new boundary.
In carrying forward our collaborative chain of analytics over these final holidays, I wanted to show how to analyze our social media data we gathered during Black Friday sales and see if brick & mortar stores drive social media engagement. The video demonstrates comparing Tweets in Manhattan with Starbucks locations, just one example of how prediction across datasets can provide insight into social media engagement – amongst many other possibilities.
Predict Across Datasets from FortiusOne on Vimeo.
We’ve moved beyond simple spatial querying into some much more complex analytics. Sean described how you can write your own expression. The power to now share some advanced analysis with subject matter experts so that they can do their own investigation means that more members of the community and organizations can make informed decisions. Those are some smart Maids-a-Milking.
2 Responses to On the 8th Day of Analytics – What does the Future Behold?
Leave a Reply Cancel reply
About Us
Welcome to the Esri DC Development Center blog. We write about features of our work on big data analytics, open platforms, and open data, what is new and exciting in the Esri and community, and general industry thought leadership and discussions of geospatial data visualization and analysis.
Please explore what we're working on and let us know if you have any questions or ideas!
New GeoCommons Maps- cobarisk ardan.a
- resiko mochi123
- Peta Bahaya Gempa Bumi Bantul dirajf@gmail.com
- Untitled Map baerroger
- SWIFT project pH levels and USGS jgearhar
- india_state_Dissolve2 nimra
New GeoCommons Datasets- DFW Hail Storm 05152013
- Produksi Padi Sleman
- Aggregation of Station into Montreal census tracts
- Aggregation of Station into Montreal census tracts
- resiko angka REQUIRED: The person responsible for the metadata information.
- resiko22 REQUIRED: The person responsible for the metadata information.
Recent Comments
- プラダ 財布 on World Bank’s Mapping for Results updates
- buy twitter follower on Cell phone service providers: Who's on top?
- shops on Dataset of the Day: Mega Millions!!!!
- fashion on Dataset of the Day: U.S. Census Bureau Annual Population Estimates
- outlet on If You Were Sec. Paulson for a Day: A Foreclosure Clearing House?





You might consider stepping back a bit when the correlations are not as clear cut as you had anticipated. Many times our intuition is more complexly accurate than our investigative algorithms. For instance you mentioned comparing funding of police vs criminal activity (arrests?, convictions?). If that seems to place points all over the map, perhaps the driving force is not the $ but possibly something else. Perhaps the determination of the neighborhood culture to fight criminality in one location makes the $ figure more successful than another place. Say, where an urban town council saw the success in a neighboring suburb, threw the same, or more money, at the problem but with less or little result. The operating factor is not the amount of money but the determination that it be used in a manner that ends up more effective.
???????
Good point and we should never forget that correlation is not causation. Visualization and analysis tools are a great way to look for trends and explore relationships but should not replace on the ground investigation. It does help point you in the right direction, which can be invaluable when you are swimming in data