This week, Chris Helm and I are at O’Reilly’s new Strata Conf to talk about our work in large scale geospatial data analysis and also learn more what is new and exciting in the data industry. Strata has emerged from what has become a community and ecosystem of tools, companies, and clearly a need to share knowledge on how to understand the increasing volumes of data, realtime and historic.

Just this morning the opening sessions have had good insight into what the industry is thinking and doing. We’re on the edge of a new paradigm of production and understanding – akin to the Scientific Discoveries that accompanied the Industrial Revolution. We gained the ability to rapidly buil and produce at astonishing rates, the Web, and produced better understanding and capabilities in improving these new technologies. Data is less ephemeral but is something more tangible that is obtainiable, shareable, and manipulatable.

Mark Madsen pointed out the recent evolution of data:

  • 1950-60′s: Data as product
  • 1970-80′s: Data as byproduct
  • 1990-00′s: Data as asset
  • 2010+: Data as substrate

Data has become more widespread, available, pervasive, and ultimately more useful.

Within this new paradigm of Data as Substrate, Werner Vogels, CTO of Amazon.com discussed how data moves through our systems: collect | store | organize | analyze | share. This is exactly what we realized several years ago in aiming to provide visual analytics that you need to provide smooth flow through that entire chain. It’s something we’ve baked through out GeoCommons to allow users to easily explore data and share their insights.

Mike Olson, CEO of Cloudera, specifically mentioned the power of geospatial data analysis. Investigating geotagged messages and sentiment analysis combined with local demographics and business information can reveal deep insights into markets and communities. It is becoming increasingly important to deliver combinations of information and there is a need for better exploration and analysis tools. We have enough platforms, we need new interfaces.

Matt Biddulph, founder of Dopplr and now at Nokia, discussed prototyping with data and his examples consider mining how mobile phone users give indications of preferred routes and areas of intent by tracking route requests and map tile views. But even in Nokia, with the very powerful Navteq, they have difficulty sharing data. They are using HDFS (Hadoop File store) as essentially an organizational file sharing system – passing around URI paths to data that can be quickly loaded and analyzed.

So far, it’s clear that there is incredible excitement in handling and analyzing data. There is quite a bit of superfluity on the terms “big” and “massive” with regards to Data, but regardless we’re now able to handle volumes of information that were severely limited or even impossible just a few years ago. The tools are commodity and off the shelf – so the interesting questions can be quickly asked and explored without worrying as much about the underlying components or technology. We’ve shared how we’re actually providing new interfaces to users to do their own analytics and explorations – powered by many of the tools discussed and popular but also some new ones we’ve built and will be sharing with the community. Particularly in dealing with geospatial data that provides for some truly interesting context in analysis.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>