Keynote Talks

Keynote 1

Andreas Rauber
Vienna University of Technology, Austria

Andreas Rauber is Associate Professor at the Department of Software Technology and Interactive Systems (ifs) at the Vienna University of Technology (TU-Wien). He furthermore is president of AARIT, the Austrian Association for Research in IT, a Key Researcher at Secure Business Austria (SBA-Research) and Co-Chair of the RDA Working Group on Dynamic Data Citation. He received his MSc and PhD in Computer Science from the Vienna University of Technology in 1997 and 2000, respectively. In 2001 he joined the National Research Council of Italy (CNR) in Pisa as an ERCIM Research Fellow, followed by an ERCIM Research position at the French National Institute for Research in Computer Science and Control (INRIA), at Rocquencourt, France, in 2002. From 2004-2008 he was also head of the iSpaces research group at the eCommerce Competence Center (ec3).
His research interests cover the broad scope of digital libraries and information spaces, including specifically text and music information retrieval and organization, information visualization, as well as data analysis and digital preservation, all of which start to merge recently under the umbrella of reproducible science.

Reproducibility: On computational processes, dynamic data, and why we should bother

The Information Retrieval discipline has already for a long time been seen as a role model for benchmark-based evaluations that are solid, traceable and comparable. This has a huge impact on progress in IR research, and serves as blueprint for similar initiatives in other domains, as we, in general, observe an increased focus on reproducibility in the experimental sciences.
However, with increasing complexity of our computational environments we also encounter new challenges that may influence what we measure. It might even (mis-)lead us to measure and evaluate aspects that we do not intend to focus on, resulting in unwanted bias.
In this talk I will review a few examples of reproducibility challenges in computational environments and discuss their potential effects. Based on discussions in a recent Dagstuhl seminar on reproducibility we will identify different types of reproducibility. Here, we will focus specifically on what we gain from them, rather than seeing them merely as means to an end. We subsequently will address two core challenges impacting reproducibility, namely (1) understanding and automatically capturing process context and provenance information, and (2) approaches allowing us to deal with dynamically evolving data sets relying on recommendation of the Research Data Alliance (RDA). The goal is to ensure reproducibility not only in strictly defined benchmark but also operational settings. After all, we want to ensure that results obtained in operational conditions are scientifically solid as well, that they can be analyzed, traced, and reproduced even when obtained in a dynamically changing, complex world.


Keynote 2

Isabel Trancoso
Instituto Superior Técnico, Portugal

Isabel Trancoso is a full professor at IST (Univ. Lisbon), and a researcher at INESC-ID. She received the Licenciado, Mestre, Doutor and Agregado degrees in Electrical and Computer Engineering from IST in 1979, 1984, 1987 and 2002, respectively. She was a member of the ISCA Board, the IEEE Speech Technical Committee, and the Permanent Council for the Organization of the International Conferences on Spoken Language Processing. She was elected Editor in Chief of the IEEE Transactions on Speech and Audio Processing (2003-2005), Member-at-Large of the IEEE Signal Processing Society Board of Governors (2006-2008), Vice-President of ISCA (2005-2007) and President of ISCA (2007-2011). She chaired the INTERSPEECH 2005 conference. She is the President of the Electrical and Computer Engineering Department. She also chaired the IEEE James Flanagan Award Committee (2013-2014). She currently integrates the ISCA Advisory Council, the ISCA Distinguished Lecturer Selection Committee (Chair), the ELRA Board (Vice-President), the IEEE Fellows Committee, and the IEEE Publication Services and Products Board Strategic Planning Committee. She received the 2009 IEEE Signal Processing Society Meritorious Service Award. She was elevated to IEEE Fellow in 2011, and to ISCA Fellow in 2014.

Virtual Therapists

Although the title sounds very generic, this talk focus on the development of virtual therapists for several eHealth applications that was done at our Spoken Language Systems Lab, and describes joint work with Alberto Abad, Luísa Coheur, Annamaria Pompili, and Vânia Mendonça. The development started with VITHEA, an on-line platform designed for aphasia treatment, integrating word naming exercises based on keyword spotting. The flexibility and robustness of this platform have motivated its extension to other diseases, integrating several automatic tests that are commonly used for screening cognitive performance, and tracking alterations of cognition over time. The platform now includes all exercises involving speech of the Mini-Mental State Examination (MMSE) and the AD Assessment Scale - Cognitive subscale (ADAS-Cog) tests. Verbal fluency tests, such as animal naming, were next. The platform has been the object of continuous improvements, and further extensions, the most recent one targeting children suffering from ASD. The very large variety of exercises that can now be integrated in VITHEA also suggests its use for cognitive stimulation, building interesting gaming scenarios.


Keynote 3

Djoerd Hiemstra
University of Twente, The Netherlands

Djoerd Hiemstra is associate professor in database and search technology at the University of Twente. He wrote an often cited Ph.D. thesis on language models for information retrieval and contributed to over 200 research papers in the field of information retrieval. His research interests include formal models of information retrieval, peer-to-peer and federated search, and statistical natural language processing. Djoerd was involved in the organization of several workshops and conferences, including many editions of the Dutch-Belgian Information Retrieval workshop series, and the 2007 edition of SIGIR, of which he was the Information Director between 2007 and 2013. Djoerd contributed to several open source search prototypes and published papers with research labs of search engine companies like Microsoft (where he did an internship in 2000), Yahoo (where he was a visiting researcher in 2008), and Yandex (which he visited in 2011).

A case for search specialization and search delegation

Evaluation conferences like CLEF, TREC and NTCIR are important for the field, and keep being important because there is no "one-size-fits-all" for search engines. Different domains need different ranking approaches: For instance, Web search benefits from analyzing the link graph; Twitter search benefits from retweets and likes; Restaurant search benefits from geo-location and reviews; Advertisement search need bids and click-through, etc. Researching many domains will learn us more about the need and the value of the *specialization* of search engines, and about approaches that can quickly learn rankings for new domains using for instance learning-to-rank and clever feature selection.
A search engine that provides results from multiple domains, therefore better *delegates* its queries to specialized search engines. This brings up unique research questions on how to best select a specialized search engine. The TREC Federated Web Search track, that ran in 2013 and 2014, studied these questions in two tasks: the resource selection task studied how to select, given a query but before seeing the results for the query, the top specialized search engines for a query. The vertical selection task studied how to select the top domains from a predefined set of domains such as news, video, Q&A, etc.
I will present the lessons that we learned from running the Federated Web Search track, focusing on successful approaches to resource selection and vertical selection. I will conclude the talk by discussing our steps to take this work to full practice by running the University of Twente's search engine as a federation of more than 30 smaller search engines, including local databases with news, courses, publications, as well as results from social media like Twitter and YouTube. The engine that runs U. Twente search is called Searsia and is available as open source software at: