The 3rd Day of Analytics — Predict within a Dataset
In the original 12 Days…song today is normally “3 French Hens,” today instead we are going to predict within datasets. Previously on the 12 Day of Analytics we learned about merge and aggregation. Both of these functions serve as building blocks to other analysis. Today’s functionality is the ability to “Predict within a Dataset” in statistic nerd talk “predict” is correlation.
In order to predict within a dataset you need to have attributes to perform that prediction. This is why the ability to merge datasets as well as potentially aggregate them. Lets dive further into the previous example of “Tweets on Black Friday about Target and Walmart.” In order to perform my analysis I aggregated the count of both Target Stores and Walmart Stores by state into a single dataset which also had counts of tweets.
The question I was trying to answer was do the number of Walmart Stores in an area predict the likelihood of there being tweets about Walmart in that area. In order to do this I created a map with my data layer of Walmart Stores/Tweets by State. I then selected “Data Analysis” and “correlation.” Since I’m trying to predict the likelihood of tweets based on stores this make stores the independent variable and tweets the dependent variable. I select “Standard Deviation” for my distribution and then click “Finish.” See the result below.

There is a very strong correlation between tweets and stores of 0.87 (a perfect correlation would be 1). One thing that is important when running this type of analysis though is making sure you don’t aggregate to a too high geographic boundary. States are pretty large, so next I’m going to look at things from the county level. Going through the same process I aggregate the data to county level and rerun the correlation. At the county level the correlation is only 0.67, so it is not as strong as at the state level. So remember when correlating data it is important to choose an appropriate geography.
Look for more analytics fun tomorrow when Matt Dew introduces the 4th day of analytics.
2 Responses to The 3rd Day of Analytics — Predict within a Dataset
Leave a Reply Cancel reply
About Us
Welcome to the GeoIQ blog. We write about features of our GeoIQ analytics engine, what is new and exciting in the GeoCommons community, and general industry thought leadership and discussions of geospatial data visualization and analysis.
Please explore what we're working on and let us know if you have any questions or ideas!
New GeoCommons Maps- NYJ city barsone
- Israel Outdoors: Where our applicants are from carine
- jets by state cluster barsone
- Maissade Milko5571
- T-Mobile gulyi01
- AOD MODIS gianluca
Recent Comments
- Matt madigan | Istudyweb on Matt Madigan's Beijing Olympic Report: Camels and 100,000 Flower Pots
- Victor on Dataset of the Day: Who is more Generous? Republicans or Democrats?
- Lidya on TechCamp
- Fares on Dataset of the Day: Profitability of the Fortune 1000
- GIS Blogs – GeoBlogs | GIS Lounge on Off the Map Presents Top 25 Blogs in GIS, GeoWeb and Cartography






[...] of analytics future – Prediction Across Datasets. On the 3rd day of Analytics, Kate showed us prediction within a dataset. Most other analytics tell us what happened, or what is occuring right now. Prediction provides us [...]
[...] Prediction within a Dataset [...]