Ethics of Crowdsourcing – What Constitutes an Abuse of the Commons
While getting ready to launch Finder! we had an internal debate whether or not to put limits on dataset downloading. There were several options, ranging from requiring a user to be logged in before they downloaded to limiting the number of downloads a user could make in a day. A lot of the argument centered around the value of raw data – echoing the O’Reilly manifesto that “data is the Intel inside“. This belief holds that the value of the NAVTEQ’s and TeleAtlas’s of the world is derived from the proprietary data they collected.
One side of the company felt that by not limiting access to data we were giving away the family jewels. The other side felt that open access was the best way to create a network effect for data by making it as accessible as possible. At the end of the day the open access philosophy prevailed, and from the sound of comments to James Fee’s post after GeoWeb, access to data is still an important facet to both GIS and GeoWeb users.
Now that Finder! has been out for a little while we’ve begun to see a big surge in downloads. I noted last week we hit 18,000 downloads and just a week later we are now over 28,000. This has caused us to take a second look at our access policies. “Knock on wood”, the system has scaled like a champ handling the traffic, but as we get ready to launch Maker! some concerns have come up about potential abuse and its effect on user experience.
The biggest concern is around systematic downloading of data and the potential for that to impact other users experiences on the site. The question is how to make the content available without impinging on the collective user experience. Wikipedia approaches this by making content available as one big tarball and asks users “Please do not use a web crawler to download large numbers of articles. Aggressive crawling of the server can cause a dramatic slow-down of Wikipedia. Our robots.txt blocks many ill-behaved bots.”
I’m not sure a giant tar ball of data is the best way to go for us, especially since the data is available in a variety of formats. A second option is to provide third party access to the data via an API. This API could also work for both download and upload. Andrei had an interesting suggestion in our last post:
“The two-way API will definitely help with the number of uploads. The cool thing to do, would be to add (”Add to Finder!”) a URL request:
…finder.com/add?file=file.kml&type=kml&name…”
If people have other ideas on how they could better access the data in bullk without impinging performance we’d love to hear them. Also thoughts on what the line is between fair use of content and abuse of the commons. It is a bit of gray line in my mind. Is systematic downloading (manually hitting every dataset) abusive? Is scraping datasets with bots abusive? The main goal in my mind is to provide the best service possible without creating a “tragedy of the commons“.
6 Responses to Ethics of Crowdsourcing – What Constitutes an Abuse of the Commons
Leave a Reply Cancel reply
About Us
Welcome to the GeoIQ blog. We write about features of our GeoIQ analytics engine, what is new and exciting in the GeoCommons community, and general industry thought leadership and discussions of geospatial data visualization and analysis.
Please explore what we're working on and let us know if you have any questions or ideas!
New GeoCommons Maps- Rajasthan District Boundary rk5959
- CAS Indre jflacou
- Rarieda eglaser
- Doctor Locations Fixa
- ASEAN Heritage Parks jeejay70
- alameda_-toxic-releases ldegroot
Recent Comments
- Victor on Dataset of the Day: Who is more Generous? Republicans or Democrats?
- Lidya on TechCamp
- Fares on Dataset of the Day: Profitability of the Fortune 1000
- GIS Blogs – GeoBlogs | GIS Lounge on Off the Map Presents Top 25 Blogs in GIS, GeoWeb and Cartography
- mamparas de baño on Visualizing our Changing Climate with Climascope






Interestingly, even for accountants
))))
Good blog!
Good information to me.
Stunning blog and good article. High 5 for u man !
Very nice desgin of your site. It is individual and compares to your posts. Don´t give up and make your own thing!
A Good blog post, I will be sure to save this post in my Newsvine account. Have a good evening.