Last week we announced GeoIQ Social where anyone can immediately visualize and analyze social media data with their internal and external data. But there is more than just connecting social data, it’s is necessary to make this data available on-demand in realtime.
We’ve been focusing on connectivity across data sources for a while, and this has been a driving force of GeoIQ itself: disparate datasets converging in easy-to-use tools that add value through data centralization and powerful visualization and deep analytics. However, for a while now the idea of centralizing data (in the cloud or anywhere else) has meant uploading data to a new location or a new service that then makes them available in new ways. This mold we’ve now broken.
A few months back we announced GeoIQ Connect. The idea was that within our platform users can tap into existing databases, run our analytics, explore data in new ways, and visualize data next to any dataset in the platform. This also meant that no longer would users have to export their data only to reimport it all into our software, but this also flipped a switched within the platform itself. GeoIQ had become dynamic. Data were dynamic. Maps and analyses were dynamic. We created “adapters” for all sorts of databases including PostgreSQL, MySQL, Oracle, HBase, and MongoDB as well as an even newer types of databases and APIs like Google Fusion Tables.
The real power in our dynamic data-stores is our ability to map data as they change. As data in a database are updated or changed in anyway, maps in GeoIQ can be refreshed and the new data will appear. This is great, but its not great enough. We wanted to go further, and now we have. We now have the ability to stream real-time data directly into maps. No more need to refresh!
To see how the new streaming feature works checkout this video that streams tweets mentioning rain, snow, or weather in real-time:
Streaming data layers allow for instant feedback, rapid decision making, and all an around cool experience. At first opening this streaming up for the Twitter Streaming API. Twitter has paved the way in its implementation of a streaming API and is also great because of the quantity of geo-spatial data that it streams. Of course there are more streaming APIs that just Twitter’s, and one that is very interesting to us is pachube.com. Pachube lets users relay streams of real time data and make use of its API to let others get access. These types of streams of data and APIs are becoming more common and now we’ve got the tools to use them.
Talking about the technical pieces of this new capability is the fun part of what we’ve done. We’ve employed a variety of tools to make it possible to create new streams of data that pour directly in our maps. We’ve also enhanced our API to make it possible for others to create a real-time feel to their maps embedded in other places on the Web. As an introduction I’ve briefly describe the technologies we’re using for streaming data:
Node.js is starting to pop up all over the place. This is because its very simple to build fast and scalable web applications that support high numbers of concurrent requests. We’ve used Node.js to built a service that allows us to tap into an external streaming API. In this service we maintain all the logic for connecting to streaming data sources, processing the data and routing data through a series of messaging queues. Inside the GeoIQ platform we then stream the data in and route them to the correct maps via web-sockets and the Node.js library Socket.IO.
We use use AMQP as a messaging system to communicate and transmit data and results across the web. AMQP provides scalability for our system in that we can bind to the message queue from various places and spawn new worker applications at will. The AQMP server acts a primary hinging point for routing data from the streaming service to the various GeoIQ servers that send data to maps.
One of the most powerful aspects of this capability is our ability to pass data through a series of external services. This means that we can process the data in different ways as we receive it, and we can custom tailor the processing based on user demands, needs or intentions. For instance we have a set of three services that tweets can be passed through: geocoding tweet profile locations, a tweet sentiment engine (Repustate), and klout. This list will grow in the future to include various other services that can help users add more information to streams of data.
All incoming data gets stored in a MongoDB document store. This provides a fast and flexible way to store loosely structured data, and gives us some geo-spatial indexing as well. As part of GeoIQ Connect we built an adapter for MongoDB that allows us to connect to any MongoDB database and pull in the data directly to GeoIQ. For the streaming datasets we’ve re-used this adapter and take advantage of MongoDB’s simple query structure for limiting data to certain spatial extents, filtering the data, and extracting data in a variety ways.
The ability for the web-sockets to pass data to a map is huge, but previously we had no way to change the data in a map once it had been mapped. This all changed when we developed a new API method called addFeatures. The method lets us dynamically append data to any layer in a map. So as new data points are received from the server we add it the correct layer in the map using “addFeatures”. Its very handy, and it allows any one to easily alter the data in their GeoCommons and GeoIQ maps and create their own realtime applications.
Our Next Steps
Its easy to think about all the possibilities that streaming opens up in our applications, and we’re not stopping any time soon. Its probably safe to expect us to be taking the idea of dynamic maps and data pretty far. Without giving it all away we’re thinking along the lines of realtime analytics, dynamic event alerting and more tools for easy collaboration. What we’ve just opens the door to bright, and dynamic future at GeoIQ.
Welcome to the Esri DC Development Center blog. We write about features of our work on big data analytics, open platforms, and open data, what is new and exciting in the Esri and community, and general industry thought leadership and discussions of geospatial data visualization and analysis.
Please explore what we're working on and let us know if you have any questions or ideas!