Announing a new publication: “An Information Platform for Business Intelligence in the Aid Sector based on Open Data and Documents; Integrated Access to structured and unstructured data using the document-oriented database CouchDB”, by Michiel Kuijper.
Earlier this year, I was approached by Michiel Kuijper, who was working on his Bachelor degree at the Amsterdam University of Applied Sciences, and was looking for a project to combine Business Intelligence and text mining. Together, we started exploring how to apply this in the development aid sector.
In development aid, more and more information on aid activities is becoming available as structured data published according to the IATI standard. At the same time, a lot of mostly unstructured information is available in documents. We wanted to bring these two together.
- From a business case perspective: how can a “business intelligence” approach help in data processing and analysis, based on performance indicators.
- From a technical perspective: how can we deal with the large variety and volume of the data and documents.
- From an information perspective: how can we combine a variety of structured and unstructured information for various purposes.
Michiel has put all of this together in a proof of concept “platform for business intelligence” based on CouchDB, using OpenCalais to annotate documents and with a front-end based on Exhibit and Simile.
The current demo version only has a data set of the Dutch Ministry of Foreign Affairs and the documents they provided at the Open Data for Development Camp in 2011. It uses facets to provide a structured way to find activities, transactions and documents, and uses a variety of widgets to present the results.
Some of the take-aways:
- The variation in quality of available IATI data is making it hard to quickly import a lot of sources. Other related projects such as Open Spending and OIPA solve this in different ways.
- CouchDB eliminates the need for a lot of up-front data modelling, and enables quick prototyping.
- CouchDB also allows server-side computation of aggregation levels of data, so that client-side applications can consume this without processing.
- Current discussions about defining APIs for IATI data can benefit from a Business Intelligence-based “pull approach” in addition to the data driven “push approach”.
- Using the replication features of CouchDB could also help build a distributed infrastructure for data and applications, including offline access.
We’re still tidying up the code for publication, and are working on building out the platform with more data, more documents, and other interfaces. In the meantime, you can download the end report by Michiel, and look at the demo site. His thesis defense presentation is available in Dutch.
Looking forward to discussing this project and others at the upcoming Open Knowledge Festival and beyond!