Publications

Ran Yu, Ujwal Gadiraju, Peter Holtz, Markus Rokicki, Philipp Kemkes, Stefan Dietze (2018) Predicting User Knowledge Gain in Informational Search Sessions Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) [MORE]

[LINK]

Web search is frequently used by people to acquire new knowledge and to satisfy learning-related objectives. In this context, informational search missions with an intention to obtain knowledge pertaining to a topic are prominent. The importance of learning as an outcome of web search has been recognized. Yet, there is a lack of understanding of the impact of web search on a user’s knowledge state. Predicting the knowledge gain of users can be an important step forward if web search engines that are currently optimized for relevance can be molded to serve learning outcomes. In this paper, we introduce a supervised model to predict a user’s knowledge state and knowledge gain from features captured during the search sessions. To measure and predict the knowledge gain of users in informational search sessions, we recruited 468 distinct users using crowdsourcing and orchestrated real-world search sessions spanning 11 different topics and information needs. By using scientifically formulated knowledge tests, we calibrated the knowledge of users before and after their search sessions, quantifying their knowledge gain. Our supervised models utilise and derive a comprehensive set of features from the current state of the art and compare performance of a range of feature sets and feature selection strategies. Through our results, we demonstrate the ability to predict and classify the knowledge state and gain using features obtained during search sessions, exhibiting superior performance to an existing baseline in the knowledge state prediction task.

Seren Yenikent, Brett Buttliere, Besnik Fetahu, & Joachim Kimmerle (2018) Wikipedia article measures in relation to content characteristics of lead sections. Workshop on Learning & Education with Web Data (LILE2018), 10th ACM Conference on Web Science (WebSci), 2018 [MORE]

[LINK]

Ran Yu, Ujwal Gadiraju, Stefan Dietze (2018) Detecting, Understanding and Supporting Everyday Learning in Web Search. Workshop on Learning & Education with Web Data (LILE), 10th ACM Conference on Web Science (WebSci), 2018 [MORE]

[LINK]

Anett Hoppe, Peter Holtz, Yvonne Kammerer, Ran Yu, Stefan Dietze and Ralph Ewerth (2018) Current Challenges for Studying Search as Learning Processes Workshop on Learning & Education with Web Data (LILE), 10th ACM Conference on Web Science (WebSci), 2018 [MORE]

[LINK]

D'Aquin, M., Kowald, D., Fessl, A., Lex, E., & Thalmann, S. (2018) AFEL - Analytics for Everyday Learning In Proceedings of the International Projects Track co-located with the 27th International World Wide Web Conference [MORE]

[LINK]

The goal of AFEL is to develop, pilot and evaluate methods and applications,
which advance informal/collective learning as it surfaces
implicitly in online social environments. The project is following
a multi-disciplinary, industry-driven approach to the analysis and
understanding of learner data in order to personalize, accelerate
and improve informal learning processes. Learning Analytics and
Educational Data Mining traditionally relate to the analysis and exploration
of data coming from learning environments, especially to
understand learners’ behaviours. However, studies have for a long
time demonstrated that learning activities happen outside of formal
educational platforms, also. This includes informal and collective
learning usually associated, as a side effect, with other (social) environments
and activities. Relying on real data from a commercially
available platform, the aim of AFEL is to provide and validate the
technological grounding and tools for exploiting learning analytics
on such learning activities. This will be achieved in relation to cognitive
models of learning and collaboration, which are necessary to the
understanding of loosely defined learning processes in online social
environments. Applying the skills available in the consortium to a
concrete set of live, industrial online social environments, AFEL will
tackle the main challenges of informal learning analytics through 1)
developing the tools and techniques necessary to capture information
about learning activities from (not necessarily educational) online
social environments; 2) creating methods for the analysis of such
informal learning data, based on combining feature engineering and
visual analytics with cognitive models of learning and collaboration;
and 3) demonstrating the potential of the approach in improving
the understanding of informal learning, and the way it is better supported;
4) evaluate all the former items in real world large scale
applications and platforms.

Kowald, D., Seitlinger, P., Ley, T., & Lex, E. (2018) The Impact of Semantic Context Cues on the User Acceptance of Tag Recommendations: An Online Study In Companion Proceedings of the 27th International World Wide Web Conference (WWW'2018) [MORE]

[LINK]

In this paper, we present the results of an online study with the
aim to shed light on the impact that semantic context cues have
on the user acceptance of tag recommendations. Therefore, we
conducted a work-integrated social bookmarking scenario with 17
university employees in order to compare the user acceptance of a
context-aware tag recommendation algorithm called 3Layers with
the user acceptance of a simple popularity-based baseline. In this
scenario, we validated and verified the hypothesis that semantic
context cues have a higher impact on the user acceptance of tag
recommendations in a collaborative tagging setting than in an individual
tagging setting. With this paper, we contribute to the sparse
line of research presenting online recommendation studies.

Ujwal Gadiraju, Ran Yu, Stefan Dietze, Peter Holtz (2018) Analyzing Knowledge Gain of Users in Informational Search Sessions on the Web In Proceedings of the third ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR 2018) [MORE]

[LINK]

Lijun Lyu and Besnik Fetahu (2018) Real-time Event-based News Suggestion for Wikipedia Pages from News Streams. WikiWorkshop 2018, The Web Conference; Lyon, France. [MORE]

[LINK]

Wikipedia is one of the top visited resources on the Web, furthermore,
it is used extensively as the main source of information for
applications like Web search, question & answering etc. This is
mostly attributed to Wikipedia’s coverage in terms of topics and
real-world entities and the fact that Wikipedia articles are constantly
updated with new and emerging facts.
However, only a small fraction of articles are considered to be of
good quality. The large majority of articles are incomplete and have
other quality issues. A strong quality indicator is the presence of
external references from third-party sources (e.g. news sources) as
suggested by the veriability principle in Wikipedia. Even for the
existing references in Wikipedia there is an inherent lag in terms of
the publication time of cited resources and the time they are cited
in Wikipedia articles.
We propose a near real-time suggestion of news references for
Wikipedia from a daily news stream. We model daily news into
specic events, spanning from a day up to year. Thus, we construct
an event-chain from which we determine when the information in
an event has converged and consequentially based on a learning-torank
approach suggest the most authoritative and complete news
article to Wikipedia articles involved in a specic event.
We evaluate our news suggestion approach on a set of 41 events
extracted from Wikipedia currents event portal, and on new corpus
consisting of daily news between the period of 2016-2017 with more
than 14 million news articles. We are able to suggest news articles
to Wikipedia pages with an overall accuracy of MAP=0.77 and with
a minimal lag w.r.t the publication time of the news article.

Tempelmeier, N., Demidova, S., Dietze, S. (2018) Inferring Missing Categorical Information in Noisy and Sparse Web Markup [MORE]

[LINK]

Christoph Hube and Besnik Fetahu (2018) Detecting Biased Statements in Wikipedia. WikiWorkshop 2018, The Web Conference; Lyon, France. [MORE]

[LINK]

Quality in Wikipedia is enforced through a set of editing policies
and guidelines recommended for Wikipedia editors. Neutral point
of view (NPOV) is one of the main principles in Wikipedia, which
ensures that for controversial information all possible points of
view are represented proportionally. Furthermore, language used
in Wikipedia should be neutral and not opinionated.
However, due to the large number of Wikipedia articles and
its operating principle based on a voluntary basis of Wikipedia
editors; quality assurances and Wikipedia guidelines cannot always
be enforced. Currently, there are more than 40,000 articles, which
are flagged with NPOV or similar quality tags. Furthermore, these
represent only the portion of articles for which such quality issues
are explicitly flagged by the Wikipedia editors, however, the real
number may be higher considering that only a small percentage of
articles are of good quality or featured as categorized by Wikipedia.
In this work, we focus on the case of language bias at the sentence
level in Wikipedia. Language bias is a hard problem, as it
represents a subjective task and usually the linguistic cues are subtle
and can be determined only through its context. We propose a
supervised classification approach, which relies on an automatically
created lexicon of bias words, and other syntactical and semantic
characteristics of biased statements.
We experimentally evaluate our approach on a dataset consisting
of biased and unbiased statements, and show that we are able to
detect biased statements with an accuracy of 74%. Furthermore, we
show that competitors that determine bias words are not suitable for
detecting biased statements, which we outperform with a relative
improvement of over 20%.

Peter Holtz, Besnik Fetahu, Joachim Kimmerle (2018) Effects of contributor experience on the quality of health-related Wikipedia articles Journal of Medical Internet Research [MORE]

[LINK]

Background: Consulting the Internet for health-related information is a common and widespread phenomenon, and Wikipedia is arguably one of the most important resources for health-related information. Therefore, it is relevant to identify factors that have an impact on the quality of health-related Wikipedia articles.

Objective: In our study we have hypothesized a positive effect of contributor experience on the quality of health-related Wikipedia articles.

Methods: We mined the edit history of all (as of February 2017) 18,805 articles that were listed in the categories on the portal health & fitness in the English language version of Wikipedia. We identified tags within the articles’ edit histories, which indicated potential issues with regard to the respective article’s quality or neutrality. Of all of the sampled articles, 99 (99/18,805, 0.53%) articles had at some point received at least one such tag. In our analysis we only considered those articles with a minimum of 10 edits (10,265 articles in total; 96 tagged articles, 0.94%). Additionally, to test our hypothesis, we constructed contributor profiles, where a profile consisted of all the articles edited by a contributor and the corresponding number of edits contributed. We did not differentiate between rollbacks and edits with novel content.

Results: Nonparametric Mann-Whitney U-tests indicated a higher number of previously edited articles for editors of the nontagged articles (mean rank tagged 2348.23, mean rank nontagged 5159.29; U=9.25, P<.001). However, we did not find a significant difference for the contributors’ total number of edits (mean rank tagged 4872.85, mean rank nontagged 5135.48; U=0.87, P=.39). Using logistic regression analysis with the respective article’s number of edits and number of editors as covariates, only the number of edited articles yielded a significant effect on the article’s status as tagged versus nontagged (dummy-coded; Nagelkerke R2 for the full model=.17; B [SE B]=-0.001 [0.00]; Wald c2 [1]=19.70; P<.001), whereas we again found no significant effect for the mere number of edits (Nagelkerke R2 for the full model=.15; B [SE B]=0.000 [0.01]; Wald c2 [1]=0.01; P=.94).

Conclusions: Our findings indicate an effect of contributor experience on the quality of health-related Wikipedia articles. However, only the number of previously edited articles was a predictor of the articles’ quality but not the mere volume of edits. More research is needed to disentangle the different aspects of contributor experience. We have discussed the implications of our findings with respect to ensuring the quality of health-related information in collaborative knowledge-building platforms.

Ilire Hasani-Mavriqi, Dominik Kowald, Denis Helic and Elisabeth Lex (2018) Consensus Dynamics in Online Collaboration Systems Computational Social Networks Journal [MORE]

[LINK]

Background

In this paper, we study the process of opinion dynamics and consensus building in online collaboration systems, in which users interact with each other following their common interests and their social profiles. Specifically, we are interested in how users similarity and their social status in the community, as well as the interplay of those two factors, influence the process of consensus dynamics.

Methods

For our study, we simulate the diffusion of opinions in collaboration systems using the well-known Naming Game model, which we extend by incorporating an interaction mechanism based on user similarity and user social status. We conduct our experiments on collaborative datasets extracted from the Web.

Results

Our findings reveal that when users are guided by their similarity to other users, the process of consensus building in online collaboration systems is delayed. A suitable increase of influence of user social status on their actions can in turn facilitate this process.

Conclusions

In summary, our results suggest that achieving an optimal consensus building process in collaboration systems requires an appropriate balance between those two factors.

Ran Yu, Ujwal Gadiraju, Besnik Fetahu, Oliver Lehmberg, Dominique Ritze, Stefan Dietze (2017) KnowMore - Knowledge Base Augmentation with Structured Web Markup Semantic Web Journal [MORE]

[LINK]

Knowledge bases are in widespread use for aiding tasks such as information extraction and information retrieval, for example in Web search. However, knowledge bases are known to be inherently incomplete, where in particular tail entities and properties are under-represented. As a complimentary data source, embedded entity markup based on Microdata, RDFa, and Microformats have become prevalent on the Web and constitute an unprecedented source of data with significant potential to aid the task of knowledge base augmentation (KBA). RDF statements extracted from markup are fundamentally different from traditional knowledge graphs: entity descriptions are flat, facts are highly redundant and of varied quality, and, explicit links are missing despite a vast amount of coreferences. Therefore, data fusion is required in order to facilitate the use of markup data for KBA. We present a novel data fusion approach which addresses these issues through a combination of entity matching and fusion techniques geared towards the specific challenges associated with Web markup. To ensure precise and non-redundant results, we follow a supervised learning approach based on a set of features considering aspects such as quality and relevance of entities, facts and their sources. We perform a thorough evaluation on a subset of the Web Data Commons dataset and show significant potential for augmenting existing knowledge bases. A comparison with existing data fusion baselines demonstrates superior performance of our approach when applied to Web markup data.

Kowald, S., Pujari, S., Lex, E. (2017) Temporal Effects on Hashtag Reuse in Twitter: A Cognitive-Inspired Hashtag Recommendation Approach ACM [MORE]

[LINK]

Hashtags have become a powerful tool in social platforms such as Twitter to categorize and search for content, and to spread short messages across members of the social network. In this paper, we study temporal hashtag usage practices in Twitter with the aim of designing a cognitive-inspired hashtag recommendation algorithm we call BLLI,S. Our main idea is to incorporate the effect of time on (i) individual hashtag reuse (i.e., reusing own hashtags), and (ii) social hashtag reuse (i.e., reusing hashtags, which has been previously used by a followee) into a predictive model. For this, we turn to the Base-Level Learning (BLL) equation from the cognitive architecture ACT-R, which accounts for the timedependent decay of item exposure in human memory. We validate BLLI,S using two crawled Twitter datasets in two evaluation scenarios. Firstly, only temporal usage patterns of past hashtag assignments are utilized and secondly, these patterns are combined with a content-based analysis of the current tweet. In both evaluation scenarios, we find not only that temporal effects play an important role for both individual and social hashtag reuse but also that our BLLI,S approach provides significantly better prediction accuracy and ranking results than current state-of-the-art hashtag recommendation methods.

Yenikent, S., Holtz P., & Kimmerle, J. (2017) The Impact of Topic Characteristics and Threat on Willingness to Engage with Wikipedia Articles: Insights from Laboratory Experiments Frontiers in Psychology [MORE]

[LINK]

A growing body of research aims to identify the factors that motivate people to make contributions in Wikipedia. We conducted two laboratory experiments to investigate the connections between topic characteristics, perception of threat, and willingness to engage with Wikipedia articles. In Study 1 (N = 83), we examined how topic familiarity, topic controversiality, and mortality salience influenced participants’ willingness to engage with Wikipedia articles. We presented the introduction parts of 20 Wikipedia articles and asked participants to rate each article with respect to familiarity and controversiality. In addition, we experimentally manipulated participants’ level of mortality salience in terms of the amount of threat they experienced when reading the article. Participants also indicated their willingness to engage with a particular article. The results revealed that familiar and controversial topics increased the willingness to engage with Wikipedia articles. Although mortality salience increased accessibility of death-related thoughts, it did not result in any changes in people’s willingness to work with the articles. The aim of Study 2 (N = 90) was to replicate the effects of topic characteristics by following a similar procedure. We additionally manipulated uncertainty salience by assigning participants to three experimental conditions: uncertainty salience, certainty salience, and non-salience. As expected, familiar and controversial topics were of high interest in terms of willingness to contribute. However, the manipulation of uncertainty salience did not yield any significant results despite the emergence of negative emotional states. In sum, we demonstrated that topic characteristics were factors that substantially influenced people’s willingness to engage with Wikipedia articles whereas perceived threat was not.

Gadiraju, U., Fetahu, B., Kawase, R., Siehndel, P., Dietze, S. (2017) Using Worker Self-Assessments for Competence-based Pre-selection in Crowdsourcing Microtasks ACM Transactions on Computer-Human Interaction (TOCHI), 2017. [MORE]

[LINK]

Dominik Kowald, Elisabeth Lex (2017) Overcoming the Imbalance Between Tag Recommendation Approaches and Real-World Folksonomy Structures with Cognitive-Inspired Algorithm European Symposium on Computational Social Science (ESCSS'2017) [MORE]

[LINK]

Social tagging systems enable users to collaboratively annotate Web
resources with freely chosen keywords (i.e., tags). In order to assist
users in this annotation process, tag recommendation algorithms
have been proposed, which suggest a set of tags for a given user and
a given resource [5]. Essentially, tag recommendation algorithms
aim to help not only the individual to find appropriate tags [4] but
also the collective to consolidate the shared tag vocabulary and thus,
to reach semantic stability and implicit consensus [10].

Gadiraju, U., Kawase, R. Improving Reliability of Crowdsourced Results by Detecting Crowd Workers with Multiple Identities In Proceedings of the 17th International Conference on Web Engineering, ICWE 2017. [MORE]

[LINK]

S Liu, M d'Aquin (2017) Unsupervised learning for understanding student achievement in a distance learning setting Global Engineering Education Conference (EDUCON), 2017 IEEE [MORE]

[LINK]

Many factors could affect the achievement of students in distance learning settings. Internal factors such as age, gender, previous education level and engagement in online learning activities can play an important role in obtaining successful learning outcomes, as well as external factors such as regions where they come from and the learning environment that they can access. Identifying the relationships between student characteristics and distance learning outcomes is a central issue in learning analytics. This paper presents a study that applies unsupervised learning for identifying how demographic characteristics of students and their engagement in online learning activities can affect their learning achievement. We utilise the K-Prototypes clustering method to identify groups of students based on demographic characteristics and interactions with online learning environments, and also investigate the learning achievement of each group. Knowing these groups of students who have successful or poor learning outcomes can aid faculty for designing online courses that adapt to different students’ needs. It can also assist students in selecting online courses that are appropriate to them.

Gadiraju, U., Yang, J., Bozzon, A. (2017) Clarity is a Worthwhile Quality - On the Role of Task Clarity in Microtask Crowdsourcing In Proceedings of the ACM Conference on Hypertext and Social Media, HT 2017. [MORE]

[LINK]

Gadiraju, U., Checco, A., Gupta, N., Demartini, G. (2017) Modus Operandi of Crowd Workers : The Invisible Role of Microtask Work Environments In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT) presented at The ACM International Joint Conference on Pervasive and Ubiquitous Computing (UBICOMP 2017). [MORE]

[LINK]

S Liu, M d’Aquin, E Motta (2017) Measuring Accuracy of Triples in Knowledge Graphs Language, Data, and Knowledge. LDK 2017. Lecture Notes in Computer Science, vol 10318. Springer [MORE]

[LINK]

An increasing amount of large-scale knowledge graphs have been constructed in recent years. Those graphs are often created from text-based extraction, which could be very noisy. So far, cleaning knowledge graphs are often carried out by human experts and thus very inefficient. It is necessary to explore automatic methods for identifying and eliminating erroneous information. In order to achieve this, previous approaches primarily rely on internal information i.e. the knowledge graph itself. In this paper, we introduce an automatic approach, Triples Accuracy Assessment (TAA), for validating RDF triples (source triples) in a knowledge graph by finding consensus of matched triples (among target triples) from other knowledge graphs. TAA uses knowledge graph interlinks to find identical resources and apply different matching methods between the predicates of source triples and target triples. Then based on the matched triples, TAA calculates a confidence score to indicate the correctness of a source triple. In addition, we present an evaluation of our approach using the FactBench dataset for fact validation. Our findings show promising results for distinguishing between correct and wrong triples.

Jirschitzka J., Kimmerle J., Halatchliyski I., Hancke J., Meurers D., Cress U. (2017) A productive clash of perspectives? The interplay between articles’ and authors’ perspectives and their impact on Wikipedia edits in a controversial domain PLoS ONE 12(6): e0178985. [MORE]

[LINK]

This study examined predictors of the development of Wikipedia articles that deal with controversial issues. We chose a corpus of articles in the German-language version of Wikipedia about alternative medicine as a representative controversial issue. We extracted edits made until March 2013 and categorized them using a supervised machine learning setup as either being pro conventional medicine, pro alternative medicine, or neutral. Based on these categories, we established relevant variables, such as the perspectives of articles and of authors at certain points in time, the (im)balance of an article’s perspective, the number of non-neutral edits per article, the number of authors per article, authors’ heterogeneity per article, and incongruity between authors’ and articles’ perspectives. The underlying objective was to predict the development of articles’ perspectives with regard to the controversial topic. The empirical part of the study is embedded in theoretical considerations about editorial biases and the effectiveness of norms and rules in Wikipedia, such as the neutral point of view policy. Our findings revealed a selection bias where authors edited mainly articles with perspectives similar to their own viewpoint. Regression analyses showed that an author’s perspective as well as the article’s previous perspectives predicted the perspective of the resulting edits, albeit both predictors interact with each other. Further analyses indicated that articles with more non-neutral edits were altogether more balanced. We also found a positive effect of the number of authors and of the authors’ heterogeneity on articles’ balance. However, while the effect of the number of authors was reserved to pro-conventional medicine articles, the authors’ heterogenity effect was restricted to pro-alternative medicine articles. Finally, we found a negative effect of incongruity between authors’ and articles’ perspectives that was pronounced for the pro-alternative medicine articles.

Kowald, D., Kopeinik, S., & Lex, E. (2017) The TagRec Framework as a Toolkit for the Development of Tag-Based Recommender Systems In Proceedings of the Adjunct Publication of the 25th ACM International Conference on User Modeling, Adapation and Personalization (UMAP’2017) [MORE]

[LINK]

Recommender systems have become important tools to support users in identifying relevant content in an overloaded information space. To ease the development of recommender systems, a number of recommender frameworks have been proposed that serve a wide range of application domains. Our TagRec framework is one of the few examples of an open-source framework tailored towards developing and evaluating tag-based recommender systems. In this paper, we present the current, updated state of TagRec, and we summarize and reƒect on four use cases that have been implemented with TagRec: (i) tag recommendations, (ii) resource recommendations, (iii) recommendation evaluation, and (iv) hashtag recommendations. To date, TagRec served the development and/or evaluation process of tag-based recommender systems in two large scale European research projects, which have been described in 17 research papers. ‘us, we believe that this work is of interest for both researchers and practitioners of tag-based recommender systems.

Simone Kopeinik, Elisabeth Lex, Paul Seitlinger, Dietrich Albert und Tobias Ley (2017) Supporting collaborative learning with tag recommendations: a real-world study in an inquiry-based classroom project ACM [MORE]

[LINK]

In online social learning environments, tagging has demonstrated its potential to facilitate search, to improve recommendations and to foster reflection and learning.Studies have shown that shared understanding needs to be established in the group as a prerequisite for learning. We hypothesise that this can be fostered through tag recommendation strategies that contribute to semantic stabilization. In this study, we investigate the application of two tag recommenders that are inspired by models of human memory: (i) the base-level learning equation BLL and (ii) Minerva. BLL models the frequency and recency of tag use while Minerva is based on frequency of tag use and semantic context. We test the impact of both tag recommenders on semantic stabilization in an online study with 56 students completing a group-based inquiry learning project in school. We find that displaying tags from other group members contributes significantly to semantic stabilization in the group, as compared to a strategy where tags from the students’ individual vocabularies are used. Testing for the accuracy of the different recommenders revealed that algorithms using frequency counts such as BLL performed better when individual tags were recommended. When group tags were recommended, the Minerva algorithm performed better. We conclude that tag recommenders, exposing learners to each other’s tag choices by simulating search processes on learners’ semantic memory structures, show potential to support semantic stabilization and thus, inquiry-based learning in groups.

Adamou, A., d'Aquin, M., Allocca, C., and Motta, E. (2017) Supporting virtual integration of Linked Data with just-in-time query recompilation Proceedings of the Semantics 2017 Conference [MORE]

[LINK]

Dietze, S., Taibi, D., Yu, R., Barker, P., d’Aquin, M. (2017) Analysing and Improving embedded Markup of Learning Resources on the Web ACM [MORE]

[LINK]

Web-scale reuse and interoperability of learning resources have been major concerns for the technology-enhanced learning community. While work in this area traditionally focused on learning resource metadata, provided through learning resource repositories, the recent emergence of structured entity markup on the Web through standards such as RDFa and Microdata and initiatives such as schema.org, has provided new forms of entitycentric knowledge, which is so far under-investigated and hardly exploited. The Learning Resource Metadata Initiative (LRMI) provides a vocabulary for annotating learning resources through schema.org terms. Although recent studies have shown markup adoption by approximately 30% of all Web pages, understanding of the scope, distribution and quality of learning resources markup is limited. We provide the first public corpus of LRMI extracted from a representative Web crawl together with an analysis of LRMI adoption on the Web, with the goal to inform data consumers as well as future vocabulary refinements through a thorough understanding of the use as well as misuse of LRMI vocabulary terms. While errors and schema misuse are frequent, we also discuss a set of simple heuristics which significantly improve the accuracy of markup, a prerequisite for reusing learning resource metadata sourced from markup.

Kopeinik, D. Kowald, I. Hasani-Mavriqi, E. Lex (2017) Improving Collaborative Filtering Using a Cognitive Model of Human Category Learning NOW Publishers [MORE]

[LINK]

Classic resource recommenders like Collaborative Filtering treat users as just another entity, thereby neglecting non-linear user-resource dynamics that shape attention and interpretation. SUSTAIN, as an unsupervised hu- man category learning model, captures these dynamics. It aims to mimic a learner’s categorization behavior. In this paper, we use three social bookmarking datasets gathered from BibSonomy, CiteULike and Delicious to investigate SUSTAIN as a user modeling approach to re-rank and enrich Collaborative Filtering following a hybrid recommender strategy. Evaluations against baseline algorithms in terms of recommender accuracy and computational complexity reveal encouraging results. Our approach substantially improves Collaborative Filter- ing and, depending on the dataset, successfully competes with a computationally much more expensive Matrix Factorization variant. In a further step, we explore SUSTAIN’s dynamics in our specific learning task and show that both memorization of a user’s history and clustering, contribute to the algorithm’s performance. Finally, we observe that the users’ attentional foci determined by SUSTAIN correlate with the users’ level of curiosity, identified by the SPEAR algorithm. Overall, the results of our study show that SUSTAIN can be used to efficiently model attention-interpretation dynamics of users and can help improve Collaborative Filtering for resource recommendations.

d’Aquin, Mathieu and Motta, Enrico (2016) The Epistemology of Intelligent Semantic Web Systems Synthesis Lectures on the Semantic Web: Theory and Technology, 6(1) pp. 1–88 [MORE]

[LINK]

The Semantic Web is a young discipline, even if only in comparison to other areas of computer science. Nonetheless, it already exhibits an interesting history and evolution. This book is a reflection on this evolution, aiming to take a snapshot of where we are at this specific point in time, and also showing what might be the focus of future research.

Allocca, C., Adamou, A., d’Aquin, M. and Motta, E. (2016) SPARQL Query Recommendations by Example Demo at Extended Semantic Web Conference, ESWC 2016 [MORE]

[LINK]

In this demo paper, a SPARQL Query Recommendation Tool (called SQUIRE) based on query reformulation is presented. Based on three steps, Generalization, Specialization and Evaluation, SQUIRE implements the logic of reformulating a SPARQL query that is satisfiable w.r.t a source RDF dataset, into others that are satisfiable w.r.t a target RDF dataset. In contrast with existing approaches, SQUIRE aims at recommending queries whose reformulations: i) reflect as much as possible the same intended meaning, structure, type of results and result size as the original query and ii) do not require to have a mapping between the two datasets. Based on a set of criteria to measure the similarity between the initial query and the recommended ones, SQUIRE demonstrates the feasibility of the underlying query reformulation process, ranks appropriately the recommended queries, and offers a valuable support for query recommendations over an unknown and unmapped target RDF dataset, not only assisting the user in learning the data model and content of an RDF dataset, but also supporting its use without requiring the user to have intrinsic knowledge of the data.

Mouromtsev, D. and d’Aquin, M. (eds.) (2016) Open Data for Education: Linked, Shared, and Reusable Data for Teaching and Learning Springer [MORE]

[LINK]

This volume comprises a collection of papers presented at an Open Data in Education Seminar and the LILE workshops during 2014-2015.

In the first part of the book, two chapters give different perspectives on the current use of linked and open data in education, including the use of technology and the topics that are being covered.

The second part of the book focuses on the specific, practical applications that are being put in place to exploit open and linked data in education today.

The goal of this book is to provide a snapshot of current activities, and to share and disseminate the growing collective experience on open and linked data in education. This volume brings together research results, studies, and practical endeavors from initiatives spread across several countries around the world. These initiatives are laying the foundations of open and linked data in the education movement and leading the way through innovative applications.

d’Aquin, M. (2016) On the Use of Linked Open Data in Education: Current and Future Practices Open Data for Education: Linked, Shared, and Reusable Data for Teaching and Learning, eds. Dmitry Mouromtsev and Mathieu d’Aquin [MORE]

[LINK]

Education has often been a keen adopter of new information and communication technologies. This is not surprising given that education is all about informing and communicating. Traditionally, educational institutions produce large volumes of data, much of which is publicly available, either because it is useful to communicate (e.g. the course catalogue) or because of external policies (e.g. reports to funding bodies). Considering the distribution and variety of providers (universities, schools, governments), topics (disciplines and types of educational data) and users (students, teachers, parents), education therefore represents a perfect use case for Linked Open Data. In this chapter, we look at the growing practices in using Linked Open Data in education, and how this trend is opening up opportunities for new services and new scenarios.

Taibi, D., Fulantelli, G., Dietze, S. and Fetahu, B. (2016) Educational Linked Data on the Web – Exploring and Analysing the Scope and Coverage Open Data for Education: Linked, Shared, and Reusable Data for Teaching and Learning, eds. Dmitry Mouromtsev and Mathieu d’Aquin [MORE]

[LINK]

Throughout the last few years, the scale and diversity of datasets published according to Linked Data (LD) principles has increased and also led to the emergence of a wide range of data of educational relevance. However, sufficient insights into the state, coverage and scope of available educational Linked Data seem still missing. In this work, we analyse the scope and coverage of educational linked data on the Web, identifying the most significant resource types and topics and apparent gaps. As part of our findings, results indicate a prevalent bias towards data in areas such as the life sciences as well as computing-related topics. In addition, we investigate the strong correlation of resource types and topics, where specific types have a tendency to be associated with particular types of categories, i.e. topics. Given this correlation, we argue that a dataset is best understood when considering its topics, in the context of its specific resource types. Based on this finding, we also present a Web data exploration tool, which builds on these findings and allows users to navigate through educational linked datasets by considering specific type and topic combinations.

Taibi, D. and Dietze, S. (2016) Towards embedded markup of Learning Resources on the Web: a quantitative analysis of the use of LRMI properties Linked Learning workshop, LILE 2016 [MORE]

[LINK]

Embedded markup of Web pages have emerged as a significant source of structured data on the Web. In this context, the LRMI initiative has provided a set of vocabulary terms, now part of the schema.org vocabulary, to enable the markup of resources of educational value. In this paper we present a preliminary analysis of the use of LRMI terms on the Web by assessing LRMI-based statements extracted from the Web Data Commons dataset.

Ben Ellefi, M., Bellahsene, Z., Dietze, S., Todorov, K. (2016) Beyond Established Knowledge Graphs- Recommending Web Datasets for Data Linking Web Engineering, pp.262-279, 16th International Conference on Web Engineering (ICWE2016) [MORE]

[LINK]

With the explosive growth of the Web of Data in terms of size and complexity, identifying suitable datasets to be linked, has become a challenging problem for data publishers. To understand the nature of the content of specific datasets, we adopt the notion of dataset profiles, where datasets are characterized through a set of topic annotations. In this paper, we adopt a collaborative filtering-like recommendation approach, which exploits both existing dataset profiles, as well as traditional dataset connectivity measures, in order to link arbitrary, non-profiled datasets into a global dataset-topic-graph. Our experiments, applied to all available Linked Datasets in the Linked Open Data (LOD) cloud, show an average recall of up to 81%, which translates to an average reduction of the size of the original candidate dataset search space to up to 86%. An additional contribution of this work is the provision of benchmarks for dataset interlinking recommendation systems.

Ben Ellefi, M., Bellahsene, Z., Dietze, S., Todorov, K. (2016) Intension-based Dataset Recommendation for Data Linking 13th Extended Semantic Web Conference (ESWC2016) [MORE]

[LINK]

With the increasing quantity and diversity of publicly available web datasets, most notably Linked Open Data, recommending datasets, which meet specific criteria, has become an increasingly important, yet challenging problem. This task is of particular importance when addressing issues such as entity retrieval, semantic search and data linking. Here, we focus on that last issue. We introduce a dataset recommendation approach to identify linking candidates based on the presence of schema overlap between datasets. While an understanding of the nature of the content of specific datasets is a crucial prerequisite, we adopt the notion of dataset profiles, where a dataset is characterized through a set of schema concept labels that best describe it and can be potentially enriched by retrieving their textual descriptions. We identify schema overlap by the help of a semantico-frequential concept similarity measure and a ranking criterium based on the tf*idf cosine similarity. The experiments, conducted over all available linked datasets on the Linked Open Data cloud, show that our method achieves an average precision of up to 53% for a recall of 100%. As an additional contribution, our method returns the mappings between the schema concepts across datasets – a particularly useful input for the data linking step.

Holtz, P. (2016) How Popper’s ‘Three Worlds Theory’ resembles Moscovici’s ‘Social Representations Theory’ but why Moscovici’s ‘Social Psychology of Science’ still differs from Popper’s ‘Critical Approach’ Papers on Social Representations, 25, 13.1-13.24 [MORE]

[LINK]

This paper is to my best of knowledge the first to discuss similarities and differences between Karl Popper’s ‘three worlds theory’ and Serge Moscovici’s ‘theory of social representations’. Karl Popper maintained that to be subject to criticism, and hence to falsification attempts and subsequent improvement, scientific theories must first be formulated, disseminated, perceived, and understood by others. As a result, such a theory becomes a partially autonomous object of world 3, the “world of products of the human mind” in contrast to world 1, the “world of things”, and world 2, the “world of mental states” (Popper, 1978, p. 144). Popper’s three worlds theory resembles Moscovici’s social representations theory insofar as social representations / world 3 objects cannot be reduced to individual states of minds, are embedded in interactions between people and objects, and are always rooted in previous representations / knowledge. Hence, Popper – who was very skeptical of the usefulness of a ‘psychology of science’– did in fact employ elements of a ‘social’ social psychology of science in his later works. Moscovici himself in turn may have failed to notice that to Popper science does not take place within a separate ‘reified universe’ in his ‘Social Psychology of Science’ (1993). Although to Popper science aims at increasing objectivity and reification, it is still a part of the social world and the ‘consensual universe’.

Yu R., Gadiraju U., Zhu X., Fetahu B., Dietze S. (2016) Towards Entity Summarisation on Structured Web Markup ESWC 2016 [MORE]

[LINK]

Embedded markup based on Microdata, RDFa, and Microformats have become prevalent on the Web and constitute an unprecedented source of data. However, RDF statements extracted from markup are fundamentally different to traditional RDF graphs: entity descriptions are flat, facts are highly redundant and granular, and co-references are very frequent yet explicit links are missing. Therefore, carrying out typical entity-centric tasks such as retrieval and summarisation cannot be tackled sufficiently with state of the art methods. We present an entity summarisation approach that overcomes such issues through a combination of entity retrieval and summarisation techniques geared towards the specific challenges associated with embedded markup. We perform a preliminary evaluation on a subset of the Web Data Commons dataset and show improvements over existing entity retrieval baselines. In addition, an investigation into the coverage and complementary of facts from the constructed entity summaries shows potential for aiding tasks such as knowledge base population.

Yu, R., Fetahu, B., Gadiraju, U., Dietze, S. (2016) A Survey on Challenges in Web Markup Data for Entity Retrieval poster & short paper at 15th International Semantic Web Conference (ISWC2016) [MORE]

[LINK]

Embedded markup based on Microdata, RDFa, and Microformats have become prevalent on the Web and constitute an unprecedented data source. RDF statements from markup are highly redundant, co-references are very frequent yet explicit links are missing, and frequently contain errors. We present a preliminary analysis on the challenges associated with markup data in the context of entity retrieval. We analyze four main factors: (i) co-references, (ii) redundancy, (iii) inconsistencies, and (iv) accessibility of information in the case of URLs. We conclude with general guidelines on how to handle such challenges when dealing with embedded markup data.

A. Ceroni, U. Gadiraju, J. Matschke, S. Wingert, M. Fisichella. (2016) Where the Event Lies: Predicting Event Occurrence in Textual Documents Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’16) [MORE]

[LINK]

Gadiraju, U., Siehndel, P., Dietze, S. (2016) Estimating Domain Specificity for Effective Crowdsourcing of Link Prediction and Schema Mapping Proceedings of the 8th ACM Conference on Web Science, Pages 323-324 [MORE]

[LINK]

Crowdsourcing has been widely adopted in research and practice over the last decade. In this work, we first investigate the extent to which crowd workers can substitute expert-based judgments in the task of link prediction and schema mapping, which is the creation of explicit links between resources on the Semantic Web at the instance and schema level. This is important since human input is required to evaluate and improve automated approaches for these tasks. We present a novel method to assess the inherent specificity of the link prediction task, and the impact of task specificity on quality of the results. We propose a Wikipedia-based mechanism to estimate specificity and show the influence of concept familiarity in producing high quality link prediction. Our findings indicate that the effectiveness of crowdsourcing the task of link prediction can improve by estimating the specificity.

Ujwal Gadiraju, Gianluca Demartini, Djellel Eddine Difallah, and Michele Catasta. (2016) It’s getting crowded!: how to use crowdsourcing effectively for web science research Proceedings of the 8th ACM Conference on Web Science (WebSci ’16) [MORE]

[LINK]

Since the term crowdsourcing was coined in 2006 [1], we have witnessed a surge in the adoption of the crowdsourcing paradigm. Crowdsourcing solutions are highly sought-after to solve problems that require human intelligence at a large scale. In the last decade there have been numerous applications of crowdsourcing spanning several domains in both research and for practical benefits across disciplines (from sociology to computer science). In the realm of research practice, crowdsourcing has unmistakably broken the barriers of qualitative and quantitative studies by providing a means to scale-up previously constrained laboratory studies and controlled experiments. Today, one can easily build ground truths for evaluation and access potential participants around the clock with diverse demographics at will, all within an unprecedentedly short amount of time. This comes with a number of challenges related to lack of control on research subjects and with respect to data quality.

A core characteristic of Web Science over the last decade has been its interdisciplinary approach to understand the behavior of people on and off the Web, using a wide range of data sources. It is at this confluence that crowdsourcing provides an important opportunity to explore previously unfeasible experimental grounds.

In this tutorial, we will introduce the crowdsourcing paradigm in its entirety. We will discuss altruistic and reward-based crowdsourcing, eclipsing the needs of task requesters, as well as the behavior of crowd workers. The tutorial will focus on paid microtask crowdsourcing, and reflect on the challenges and opportunities that confront us. In an interactive demonstration session, we will run the audience through the entire lifecycle of creating and deploying microtasks on an established crowdsourcing platform, optimizing task settings in order to meet task needs, and aggregating results thereafter. We will present a selection of state-of-the-art methods to ensure high-quality results and inhibit malicious activity. The tutorial will be framed within the context of Web Science. The interdisciplinary nature of Web Science breeds a rich ground for crowdsourcing, and we aim to spread the virtues of this growing field.

Gadiraju, U., & Siehndel, P. (2016) Unlock the Stock: User Topic Modeling for Stock Market Analysis EDBT/ICDT Workshops 2016 [MORE]

[LINK]

The increasing use of Twitter as a medium for sharing news related to various topics, facilitates methods for automatic news creation or event detection and prediction. However, these methods are hindered by users posting and propagating incorrect or irrelevant content. Choosing the right users is crucial in order to sample down the tweets to be analyzed, and preserve the quality of the predicted events or generated news. In this paper, we present an effective method for identifying expert users in defined areas related to the stock market. For each user we generate a model based on the content of their posts. The model represents the domains the user talks about, and allows a selection of users for various tasks. We show the effectiveness of the proposed approach by performing a series of experiments using large Twitter datasets related to Stock Market Companies.

Gadiraju, U. (2016) Make hay while the crowd shines: towards effective crowdsourcing on the web Ujwal Gadiraju, with Prateek Jain as coordinator. in SIGWEB Newsletter, 2016, 3. [MORE]

[LINK]

Ujwal Gadiraju is a third year PhD candidate at L3S Research Center, Leibniz Universität Hannover in Germany. He received his Master of Science degree in Computer Science at Delft University of Technology in the Netherlands. His main research interests encompass the realms of Human Computation and Crowdsourcing. He has published peer-reviewed papers in top-tier conferences in the realms of Information Retrieval, Social Computing, and Crowdsourcing. For more information see http://www.l3s.de/~gadiraju/.

Fetahu, B., Markert, K., Nejdl, W. & Anand, A. (2016) Finding News Citations for Wikipedia Proceedings of the 25th CIKM, ACM [MORE]

[LINK]

An important editing policy in Wikipedia is to provide citations for added statements in Wikipedia pages, where statements can be arbitrary pieces of text, ranging from a sentence to a paragraph. In many cases citations are either outdated or missing altogether.

In this work we address the problem of finding and updating news citations for statements in entity pages. We propose a two-stage supervised approach for this problem. In the first step, we construct a classifier to find out whether statements need a news citation or other kinds of citations (web, book, journal, etc.). In the second step, we develop a news citation algorithm for Wikipedia statements, which recommends appropriate citations from a given news collection. Apart from IR techniques that use the statement to query the news collection, we also formalize three properties of an appropriate citation, namely: (i) the citation should entail the Wikipedia statement, (ii) the statement should be central to the citation, and (iii) the citation should be from an authoritative source.

We perform an extensive evaluation of both steps, using 20 million articles from a real-world news collection. Our results are quite promising, and show that we can perform this task with high precision and at scale.

Hasani-Mavriqi, I., Geigl, F., Pujari, S.C., Lex, E., Helic, D. (2016) The influence of social status and network structure on consensus building in collaboration networks Social Network Analysis and Mining 6 (1), 80 (SNAM). 2016 [MORE]

[LINK]

In this paper, we analyze the influence of social status on opinion dynamics and consensus building in collaboration networks. To that end, we simulate the diffusion of opinions in empirical networks and take into account both the network structure and the individual differences of people reflected through their social status. For our simulations, we adapt a well-known Naming Game model and extend it with the Probabilistic Meeting Rule to account for the social status of individuals participating in a meeting. This mechanism is sufficiently flexible and allows us to model various society forms in collaboration networks, as well as the emergence or disappearance of social classes. In particular, we are interested in the way how these society forms facilitate opinion diffusion. Our experimental findings reveal that (i) opinion dynamics in collaboration networks is indeed affected by the individuals’ social status and (ii) this effect is intricate and non-obvious. Our results suggest that in most of the networks the social status favors consensus building. However, relying on it too strongly can also slow down the opinion diffusion, indicating that there is a specific setting for an optimal benefit of social status on the consensus building. On the other hand, in networks where status does not correlate with degree or in networks with a positive degree assortativity consensus is always reached quickly regardless of the status.

Kopeinik, S., Kowald, D., Lex, E. (2016) Which algorithms Suit Which Learning Environments? A Comparative Study of Recommender Systems in TEL in Proceedings of the European Conference on Technology Enhanced Learning (EC-TEL 2016) [MORE]

[LINK]

In recent years, a number of recommendation algorithms have been proposed to help learners find suitable learning resources on-line. Next to user-centered evaluations, offline-datasets have been used to investigate new recommendation algorithms or variations of collaborative filtering approaches. However, a more extensive study comparing a variety of recommendation strategies on multiple TEL datasets is missing. In this work, we contribute with a data-driven study of recommendation strategies in TEL to shed light on their suitability for TEL datasets. To that end, we evaluate six state-of-the-art recommendation algorithms for tag and resource recommendations on six empirical datasets: a dataset from European Schoolnets TravelWell, a dataset from the MACE portal, which features access to meta-data-enriched learning resources from the field of architecture, two datasets from the social bookmarking systems BibSonomy and CiteULike, a MOOC dataset from the KDD challenge 2015, and Aposdle, a small-scale workplace learning dataset. We highlight strengths and shortcomings of the discussed recommendation algorithms and their applicability to the TEL datasets. Our results demonstrate that the performance of the algorithms strongly depends on the properties and characteristics of the particular dataset. However, we also find a strong correlation between the average number of users per resource and the algorithm performance. A tag recommender evaluation experiment reveals that a hybrid combination of a cognitive-inspired and a popularity-based approach consistently performs best on all TEL datasets we utilized in our study.

Stanisavljevic, D., Hasani-Mavriqi, I., Lex, E., Strohmaier, M., Helic, D. Semantic Stability in Wikipedia Studies in Computational Intelligence (SCI), Springer [MORE]

[LINK]

In this paper we assess the semantic stability of Wikipedia by investigating the dynamics of Wikipedia articles’ revisions over time. In a semantically stable system, articles are infrequently edited, whereas in unstable systems, article content changes more frequently. In other words, in a stable system, the Wikipedia community has reached consensus on the majority of articles. In our work, we measure semantic stability using the Rank Biased Overlap method. To that end, we preprocess Wikipedia dumps to obtain a sequence of plain-text article revisions, whereas each revision is represented as a TF-IDF vector. To measure the similarity between consequent article revisions, we calculate Rank Biased Overlap on subsequent term vectors. We evaluate our approach on 10 Wikipedia language editions including the five largest language editions as well as five randomly selected small language editions. Our experimental results reveal that even in policy driven collaboration networks such as Wikipedia, semantic stability can be achieved. However, there are differences on the velocity of the semantic stability process between small and large Wikipedia editions. Small editions exhibit faster and higher semantic stability than large ones. In particular, in large Wikipedia editions, a higher number of successive revisions is needed in order to reach a certain semantic stability level, whereas, in small Wikipedia editions, the number of needed successive revisions is much lower for the same level of semantic stability.

Ujwal Gadiraju, and Stefan Dietze Improving Learning Through Achievement Priming in Information Finding Microtasks Improving Learning Through Achievement Priming in Information Finding Microtasks. Ujwal Gadiraju and Stefan Dietze. LAK [MORE]

[LINK]