rasdaman newsletter 01/2016

Elephant Meeting on Big Data Services

November 2015. While we try to get to grips with the ever-increasing “Big Data” deluge we recognize that adequate Web services are a key prerequisite for ubiquitous, flexible, and fast data access. In a massive concertation effort several large European initiatives have teamed up now to address the service challenge. From 12 through 13 November 2015 an inaugural EUDAT Workshop on services for Big Data held successfully at the Supercomputing Center in Barcelona, Spain. Representatives of three decisive Big Data projects - EUDAT, EarthServer, and EPOS - have come together to discuss innovative alternatives for value-adding services.

To consolidate activities around these specific themes the workshop was divided in several tracks focusing on the topics of Big Data semantics, federated Data Mining, and multi-dimensional Array Databases for large time series. Discussions started by capturing best practices and discussing the current state of development and activities in the respective areas. Questions like: How can data processing be orchestrated optimally or how can scientific workflows make use of EUDAT services were discussed intensively in different working groups.

Peter Wittenburg, scientific coordinator of the EUDAT Data Infrastructure, convened a critical variety of expertise from Europe and the US. Especially the topic of multidimensional arrays was focused by the experts because of playing a major role in scientific and engineering data. In a summary Mark van de Sanden, EUDAT Workpackage Leader, and Peter Baumann, workshop facilitator of the EUDAT Array Database track, pointed out possible roles of EUDAT in the future:

  • IaaS provider: providing a cloud infrastructure to run Array Databases
  • SaaS provider: providing an Array Database as an domain-independent, horizontal service
  • Providing tools for easy data movement between EUDAT DCI domain and User domain
  • Providing domain services (e.g., geo, astro, life sciences) based on a common horizontal platform of array services, thereby leveraging cross-community effects

Peter Baumann resumed his experiences of running large-scale infrastructures in his presentation: "Of course multidimensional arrays do not stand alone, they are intertwined with other data types, but typically they constitute the "Big Data" part. Therefore, it makes sense to integrate arrays into common data management platforms." The flexibility of querying data, achieving data independency, scalability and standards conformance are critical advantages of Array Database technologies. Among the challenges spotted were integration of heterogeneous data types, including arrays, into a single common information space for users. Array intensive domains like the Earth-, Space-, and Life Sciences were considered as possible candidates of future EUDAT services.