This year we will integrate the programs of WebDB and SBD and co-locate both workshops in the same room. By offering talks as well as interactive sessions with poster presentations, we facilitate interactions and discussions between researchers of both communities, Web Databases and Semantic Big Data.
Detailed Program (Tentative)
- 8:00-10:30: Keynote and Paper session
- Opening
- Keynote: Neoklis Polyzotis.
From data to ML in production: A report from the trenches
The connection between the Web and machine learning (ML) is growing stronger: Web applications are increasingly relying on ML instead of hand-tuned heuristics to drive their functionality, and in turn Web systems generate data that fuel several new ML applications, particularly around deep learning which is well suited to handle the large, heterogeneous, and complex datasets generated through user interactions with Web applications. Typically it takes considerable effort to clean, transform, or otherwise prepare Web data so that it can train ML models, and the literature contains several tools and techniques for this part. However, there is a subsequent, and often overlooked, part to deploy the models in production in order to embed them reliably within Web applications. This part brings in a host of new challenges around validating, analyzing, monitoring, transforming, and generally managing the input data. In this talk I will present evidence for these challenges based on our experience with ML pipelines at Google, and argue that we need new techniques and tools to manage ML data in production. I will then describe our ongoing work in this area in the context of TensorFlow Extended, a platform deployed internally at Google to run production ML pipelines.
Bio: Alkis Polyzotis is a research scientist at Google Research, where he is currently leading the data-management projects in Google’s TensorFlow Extended (TFX) platform for production-grade machine learning. His interests include data management for machine learning, enterprise data search, and interactive data exploration. Before joining Google, he was a professor at UC Santa Cruz. He has received a PhD in Computer Sciences from the University of Wisconsin at Madison and a diploma in engineering from the National Tech. University of Athens, Greece.
[slides] - Cleaning Data with Constraints and Experts. by Ahmad Assadi, Tova Milo and Slava Novgorodov. [slides]
- Processing Class-Constraint K-NN Queries with MISP. by Evica Milchevski, Fabian Neffgen and Sebastian Michel. [slides]
- 10:30-11:00: Coffee break
- 11:00-12:30: Paper session
- Searching for Truth in a Database of Statistics. by Tien-Duc Cao, Ioana Manolescu and Xavier Tannier. [slides]
- Leveraging Wikipedia Table Schemas for Knowledge Graph Augmentation. by Matteo Cannaviccio, Lorenzo Ariemma, Denilson Barbosa and Paolo Merialdo. [slides]
- DataVizard: Recommending Visual Presentations for Structured Data. by Rema Ananthanarayanan, Pranay Lohia and Srikanta Bedathur.[slides]
- 12:30-14:00: Lunch (provided)
- 14:00-15:30: SBD Paper session
- 15:30-16:30: Coffee break with Poster Session
- 16:30-18:30: SBD Paper session