The 3rd Day of Analytics — Predict within a Dataset
In the original 12 Days…song today is normally “3 French Hens,” today instead we are going to predict within datasets. Previously on the 12 Day of Analytics we learned about merge and aggregation. Both of these functions serve as building blocks to other analysis. Today’s functionality is the ability to “Predict within a Dataset” in statistic nerd talk “predict” is correlation.
In order to predict within a dataset you need to have attributes to perform that prediction. This is why the ability to merge datasets as well as potentially aggregate them. Lets dive further into the previous example of “Tweets on Black Friday about Target and Walmart.” In order to perform my analysis I aggregated the count of both Target Stores and Walmart Stores by state into a single dataset which also had counts of tweets.
The question I was trying to answer was do the number of Walmart Stores in an area predict the likelihood of there being tweets about Walmart in that area. In order to do this I created a map with my data layer of Walmart Stores/Tweets by State. I then selected “Data Analysis” and “correlation.” Since I’m trying to predict the likelihood of tweets based on stores this make stores the independent variable and tweets the dependent variable. I select “Standard Deviation” for my distribution and then click “Finish.” See the result below.

There is a very strong correlation between tweets and stores of 0.87 (a perfect correlation would be 1). One thing that is important when running this type of analysis though is making sure you don’t aggregate to a too high geographic boundary. States are pretty large, so next I’m going to look at things from the county level. Going through the same process I aggregate the data to county level and rerun the correlation. At the county level the correlation is only 0.67, so it is not as strong as at the state level. So remember when correlating data it is important to choose an appropriate geography.
Look for more analytics fun tomorrow when Matt Dew introduces the 4th day of analytics.
2 Responses to The 3rd Day of Analytics — Predict within a Dataset
Leave a Reply Cancel reply
About Us
Welcome to the GeoIQ blog. We write about features of our GeoIQ analytics engine, what is new and exciting in the GeoCommons community, and general industry thought leadership and discussions of geospatial data visualization and analysis.
Please explore what we're working on and let us know if you have any questions or ideas!
New GeoCommons Maps- KIN ZIP dboozer
- UKPoliceTaserUse indeuppal
- Connecticut Pedestrian Fatalities with Rail Buffer RenataPS
- TrainCrimesUK2011 indeuppal
- February 21-22, 2012 CO & WY High Wind Event ExaminerWeather
- Untitled Map reversededgesword
Recent Comments
- Bargain homes in Murrieta on A Quick Test Drive of Google Table Fusion
- Bargain homes in Murrieta on A Quick Test Drive of Google Table Fusion
- balayı otelleri on Dataset of the Day: Early Voting—November 3, 2008
- haber,haberleri,başbakan on Dataset of the Day: Early Voting—November 3, 2008
- realtor tampa bay on The Spillover Effects of Foreclosures






[...] of analytics future – Prediction Across Datasets. On the 3rd day of Analytics, Kate showed us prediction within a dataset. Most other analytics tell us what happened, or what is occuring right now. Prediction provides us [...]
[...] Prediction within a Dataset [...]