S Liu, M d'Aquin (2017) Unsupervised learning for understanding student achievement in a distance learning setting Global Engineering Education Conference (EDUCON), 2017 IEEE [LINK]

Many factors could affect the achievement of students in distance learning settings. Internal factors such as age, gender, previous education level and engagement in online learning activities can play an important role in obtaining successful learning outcomes, as well as external factors such as regions where they come from and the learning environment that they can access. Identifying the relationships between student characteristics and distance learning outcomes is a central issue in learning analytics. This paper presents a study that applies unsupervised learning for identifying how demographic characteristics of students and their engagement in online learning activities can affect their learning achievement. We utilise the K-Prototypes clustering method to identify groups of students based on demographic characteristics and interactions with online learning environments, and also investigate the learning achievement of each group. Knowing these groups of students who have successful or poor learning outcomes can aid faculty for designing online courses that adapt to different students’ needs. It can also assist students in selecting online courses that are appropriate to them.


S Liu, M d’Aquin, E Motta (2017) Measuring Accuracy of Triples in Knowledge Graphs Language, Data, and Knowledge. LDK 2017. Lecture Notes in Computer Science, vol 10318. Springer [LINK]

An increasing amount of large-scale knowledge graphs have been constructed in recent years. Those graphs are often created from text-based extraction, which could be very noisy. So far, cleaning knowledge graphs are often carried out by human experts and thus very inefficient. It is necessary to explore automatic methods for identifying and eliminating erroneous information. In order to achieve this, previous approaches primarily rely on internal information i.e. the knowledge graph itself. In this paper, we introduce an automatic approach, Triples Accuracy Assessment (TAA), for validating RDF triples (source triples) in a knowledge graph by finding consensus of matched triples (among target triples) from other knowledge graphs. TAA uses knowledge graph interlinks to find identical resources and apply different matching methods between the predicates of source triples and target triples. Then based on the matched triples, TAA calculates a confidence score to indicate the correctness of a source triple. In addition, we present an evaluation of our approach using the FactBench dataset for fact validation. Our findings show promising results for distinguishing between correct and wrong triples.


Jirschitzka J., Kimmerle J., Halatchliyski I., Hancke J., Meurers D., Cress U. (2017) A productive clash of perspectives? The interplay between articles’ and authors’ perspectives and their impact on Wikipedia edits in a controversial domain PLoS ONE 12(6): e0178985. [LINK]

This study examined predictors of the development of Wikipedia articles that deal with controversial issues. We chose a corpus of articles in the German-language version of Wikipedia about alternative medicine as a representative controversial issue. We extracted edits made until March 2013 and categorized them using a supervised machine learning setup as either being pro conventional medicine, pro alternative medicine, or neutral. Based on these categories, we established relevant variables, such as the perspectives of articles and of authors at certain points in time, the (im)balance of an article’s perspective, the number of non-neutral edits per article, the number of authors per article, authors’ heterogeneity per article, and incongruity between authors’ and articles’ perspectives. The underlying objective was to predict the development of articles’ perspectives with regard to the controversial topic. The empirical part of the study is embedded in theoretical considerations about editorial biases and the effectiveness of norms and rules in Wikipedia, such as the neutral point of view policy. Our findings revealed a selection bias where authors edited mainly articles with perspectives similar to their own viewpoint. Regression analyses showed that an author’s perspective as well as the article’s previous perspectives predicted the perspective of the resulting edits, albeit both predictors interact with each other. Further analyses indicated that articles with more non-neutral edits were altogether more balanced. We also found a positive effect of the number of authors and of the authors’ heterogeneity on articles’ balance. However, while the effect of the number of authors was reserved to pro-conventional medicine articles, the authors’ heterogenity effect was restricted to pro-alternative medicine articles. Finally, we found a negative effect of incongruity between authors’ and articles’ perspectives that was pronounced for the pro-alternative medicine articles.

development of Wikipedia

Kowald, D., Kopeinik, S., & Lex, E. (2017) The TagRec Framework as a Toolkit for the Development of Tag-Based Recommender Systems In Proceedings of the Adjunct Publication of the 25th ACM International Conference on User Modeling, Adapation and Personalization (UMAP’2017) [LINK]

Recommender systems have become important tools to support users in identifying relevant content in an overloaded information space. To ease the development of recommender systems, a number of recommender frameworks have been proposed that serve a wide range of application domains. Our TagRec framework is one of the few examples of an open-source framework tailored towards developing and evaluating tag-based recommender systems. In this paper, we present the current, updated state of TagRec, and we summarize and reƒect on four use cases that have been implemented with TagRec: (i) tag recommendations, (ii) resource recommendations, (iii) recommendation evaluation, and (iv) hashtag recommendations. To date, TagRec served the development and/or evaluation process of tag-based recommender systems in two large scale European research projects, which have been described in 17 research papers. ‘us, we believe that this work is of interest for both researchers and practitioners of tag-based recommender systems.


Kowald, S., Pujari, S., Lex, E. (2017) Supporting collaborative learning with tag recommendations: a real-world study in an inquiry-based classroom project ACM [LINK]

In online social learning environments, tagging has demonstrated its potential to facilitate search, to improve recommendations and to foster reflection and learning.Studies have shown that shared understanding needs to be established in the group as a prerequisite for learning. We hypothesise that this can be fostered through tag recommendation strategies that contribute to semantic stabilization. In this study, we investigate the application of two tag recommenders that are inspired by models of human memory: (i) the base-level learning equation BLL and (ii) Minerva. BLL models the frequency and recency of tag use while Minerva is based on frequency of tag use and semantic context. We test the impact of both tag recommenders on semantic stabilization in an online study with 56 students completing a group-based inquiry learning project in school. We find that displaying tags from other group members contributes significantly to semantic stabilization in the group, as compared to a strategy where tags from the students’ individual vocabularies are used. Testing for the accuracy of the different recommenders revealed that algorithms using frequency counts such as BLL performed better when individual tags were recommended. When group tags were recommended, the Minerva algorithm performed better. We conclude that tag recommenders, exposing learners to each other’s tag choices by simulating search processes on learners’ semantic memory structures, show potential to support semantic stabilization and thus, inquiry-based learning in groups.

Supporting collaborative learning with tag

Kowald, S., Pujari, S., Lex, E. (2017) Temporal Effects on Hashtag Reuse in Twitter: A Cognitive-Inspired Hashtag Recommendation Approach ACM [LINK]

Hashtags have become a powerful tool in social platforms such as Twitter to categorize and search for content, and to spread short messages across members of the social network. In this paper, we study temporal hashtag usage practices in Twitter with the aim of designing a cognitive-inspired hashtag recommendation algorithm we call BLLI,S. Our main idea is to incorporate the effect of time on (i) individual hashtag reuse (i.e., reusing own hashtags), and (ii) social hashtag reuse (i.e., reusing hashtags, which has been previously used by a followee) into a predictive model. For this, we turn to the Base-Level Learning (BLL) equation from the cognitive architecture ACT-R, which accounts for the timedependent decay of item exposure in human memory. We validate BLLI,S using two crawled Twitter datasets in two evaluation scenarios. Firstly, only temporal usage patterns of past hashtag assignments are utilized and secondly, these patterns are combined with a content-based analysis of the current tweet. In both evaluation scenarios, we find not only that temporal effects play an important role for both individual and social hashtag reuse but also that our BLLI,S approach provides significantly better prediction accuracy and ranking results than current state-of-the-art hashtag recommendation methods.

Temporal Effects on Hashtag Reuse

Dietze, S., Taibi, D., Yu, R., Barker, P., d’Aquin, M. (2017) Analysing and Improving embedded Markup of Learning Resources on the Web ACM [LINK]

Web-scale reuse and interoperability of learning resources have been major concerns for the technology-enhanced learning community. While work in this area traditionally focused on learning resource metadata, provided through learning resource repositories, the recent emergence of structured entity markup on the Web through standards such as RDFa and Microdata and initiatives such as, has provided new forms of entitycentric knowledge, which is so far under-investigated and hardly exploited. The Learning Resource Metadata Initiative (LRMI) provides a vocabulary for annotating learning resources through terms. Although recent studies have shown markup adoption by approximately 30% of all Web pages, understanding of the scope, distribution and quality of learning resources markup is limited. We provide the first public corpus of LRMI extracted from a representative Web crawl together with an analysis of LRMI adoption on the Web, with the goal to inform data consumers as well as future vocabulary refinements through a thorough understanding of the use as well as misuse of LRMI vocabulary terms. While errors and schema misuse are frequent, we also discuss a set of simple heuristics which significantly improve the accuracy of markup, a prerequisite for reusing learning resource metadata sourced from markup.


Kopeinik, D. Kowald, I. Hasani-Mavriqi, E. Lex (2017) Improving Collaborative Filtering Using a Cognitive Model of Human Category Learning NOW Publishers [LINK]

Classic resource recommenders like Collaborative Filtering treat users as just another entity, thereby neglecting non-linear user-resource dynamics that shape attention and interpretation. SUSTAIN, as an unsupervised hu- man category learning model, captures these dynamics. It aims to mimic a learner’s categorization behavior. In this paper, we use three social bookmarking datasets gathered from BibSonomy, CiteULike and Delicious to investigate SUSTAIN as a user modeling approach to re-rank and enrich Collaborative Filtering following a hybrid recommender strategy. Evaluations against baseline algorithms in terms of recommender accuracy and computational complexity reveal encouraging results. Our approach substantially improves Collaborative Filter- ing and, depending on the dataset, successfully competes with a computationally much more expensive Matrix Factorization variant. In a further step, we explore SUSTAIN’s dynamics in our specific learning task and show that both memorization of a user’s history and clustering, contribute to the algorithm’s performance. Finally, we observe that the users’ attentional foci determined by SUSTAIN correlate with the users’ level of curiosity, identified by the SPEAR algorithm. Overall, the results of our study show that SUSTAIN can be used to efficiently model attention-interpretation dynamics of users and can help improve Collaborative Filtering for resource recommendations.


d’Aquin, Mathieu and Motta, Enrico (2016) The Epistemology of Intelligent Semantic Web Systems Synthesis Lectures on the Semantic Web: Theory and Technology, 6(1) pp. 1–88 [LINK]

The Semantic Web is a young discipline, even if only in comparison to other areas of computer science. Nonetheless, it already exhibits an interesting history and evolution. This book is a reflection on this evolution, aiming to take a snapshot of where we are at this specific point in time, and also showing what might be the focus of future research.

Allocca, C., Adamou, A., d’Aquin, M. and Motta, E. (2016) SPARQL Query Recommendations by Example Demo at Extended Semantic Web Conference, ESWC 2016 [LINK]

In this demo paper, a SPARQL Query Recommendation Tool (called SQUIRE) based on query reformulation is presented. Based on three steps, Generalization, Specialization and Evaluation, SQUIRE implements the logic of reformulating a SPARQL query that is satisfiable w.r.t a source RDF dataset, into others that are satisfiable w.r.t a target RDF dataset. In contrast with existing approaches, SQUIRE aims at recommending queries whose reformulations: i) reflect as much as possible the same intended meaning, structure, type of results and result size as the original query and ii) do not require to have a mapping between the two datasets. Based on a set of criteria to measure the similarity between the initial query and the recommended ones, SQUIRE demonstrates the feasibility of the underlying query reformulation process, ranks appropriately the recommended queries, and offers a valuable support for query recommendations over an unknown and unmapped target RDF dataset, not only assisting the user in learning the data model and content of an RDF dataset, but also supporting its use without requiring the user to have intrinsic knowledge of the data.

SPARQL query recommendations

Mouromtsev, D. and d’Aquin, M. (eds.) (2016) Open Data for Education: Linked, Shared, and Reusable Data for Teaching and Learning Springer [LINK]

This volume comprises a collection of papers presented at an Open Data in Education Seminar and the LILE workshops during 2014-2015.

In the first part of the book, two chapters give different perspectives on the current use of linked and open data in education, including the use of technology and the topics that are being covered.

The second part of the book focuses on the specific, practical applications that are being put in place to exploit open and linked data in education today.

The goal of this book is to provide a snapshot of current activities, and to share and disseminate the growing collective experience on open and linked data in education. This volume brings together research results, studies, and practical endeavors from initiatives spread across several countries around the world. These initiatives are laying the foundations of open and linked data in the education movement and leading the way through innovative applications.


d’Aquin, M. (2016) On the Use of Linked Open Data in Education: Current and Future Practices Open Data for Education: Linked, Shared, and Reusable Data for Teaching and Learning, eds. Dmitry Mouromtsev and Mathieu d’Aquin [LINK]

Education has often been a keen adopter of new information and communication technologies. This is not surprising given that education is all about informing and communicating. Traditionally, educational institutions produce large volumes of data, much of which is publicly available, either because it is useful to communicate (e.g. the course catalogue) or because of external policies (e.g. reports to funding bodies). Considering the distribution and variety of providers (universities, schools, governments), topics (disciplines and types of educational data) and users (students, teachers, parents), education therefore represents a perfect use case for Linked Open Data. In this chapter, we look at the growing practices in using Linked Open Data in education, and how this trend is opening up opportunities for new services and new scenarios.


Taibi, D., Fulantelli, G., Dietze, S. and Fetahu, B. (2016) Educational Linked Data on the Web – Exploring and Analysing the Scope and Coverage Open Data for Education: Linked, Shared, and Reusable Data for Teaching and Learning, eds. Dmitry Mouromtsev and Mathieu d’Aquin [LINK]

Throughout the last few years, the scale and diversity of datasets published according to Linked Data (LD) principles has increased and also led to the emergence of a wide range of data of educational relevance. However, sufficient insights into the state, coverage and scope of available educational Linked Data seem still missing. In this work, we analyse the scope and coverage of educational linked data on the Web, identifying the most significant resource types and topics and apparent gaps. As part of our findings, results indicate a prevalent bias towards data in areas such as the life sciences as well as computing-related topics. In addition, we investigate the strong correlation of resource types and topics, where specific types have a tendency to be associated with particular types of categories, i.e. topics. Given this correlation, we argue that a dataset is best understood when considering its topics, in the context of its specific resource types. Based on this finding, we also present a Web data exploration tool, which builds on these findings and allows users to navigate through educational linked datasets by considering specific type and topic combinations.


Taibi, D. and Dietze, S. (2016) Towards embedded markup of Learning Resources on the Web: a quantitative analysis of the use of LRMI properties Linked Learning workshop, LILE 2016 [LINK]

Embedded markup of Web pages have emerged as a significant source of structured data on the Web. In this context, the LRMI initiative has provided a set of vocabulary terms, now part of the vocabulary, to enable the markup of resources of educational value. In this paper we present a preliminary analysis of the use of LRMI terms on the Web by assessing LRMI-based statements extracted from the Web Data Commons dataset.


Ben Ellefi, M., Bellahsene, Z., Dietze, S., Todorov, K. (2016) Beyond Established Knowledge Graphs- Recommending Web Datasets for Data Linking Web Engineering, pp.262-279, 16th International Conference on Web Engineering (ICWE2016) [LINK]

With the explosive growth of the Web of Data in terms of size and complexity, identifying suitable datasets to be linked, has become a challenging problem for data publishers. To understand the nature of the content of specific datasets, we adopt the notion of dataset profiles, where datasets are characterized through a set of topic annotations. In this paper, we adopt a collaborative filtering-like recommendation approach, which exploits both existing dataset profiles, as well as traditional dataset connectivity measures, in order to link arbitrary, non-profiled datasets into a global dataset-topic-graph. Our experiments, applied to all available Linked Datasets in the Linked Open Data (LOD) cloud, show an average recall of up to 81%, which translates to an average reduction of the size of the original candidate dataset search space to up to 86%. An additional contribution of this work is the provision of benchmarks for dataset interlinking recommendation systems.


Ben Ellefi, M., Bellahsene, Z., Dietze, S., Todorov, K. (2016) Intension-based Dataset Recommendation for Data Linking 13th Extended Semantic Web Conference (ESWC2016) [LINK]

With the increasing quantity and diversity of publicly available web datasets, most notably Linked Open Data, recommending datasets, which meet specific criteria, has become an increasingly important, yet challenging problem. This task is of particular importance when addressing issues such as entity retrieval, semantic search and data linking. Here, we focus on that last issue. We introduce a dataset recommendation approach to identify linking candidates based on the presence of schema overlap between datasets. While an understanding of the nature of the content of specific datasets is a crucial prerequisite, we adopt the notion of dataset profiles, where a dataset is characterized through a set of schema concept labels that best describe it and can be potentially enriched by retrieving their textual descriptions. We identify schema overlap by the help of a semantico-frequential concept similarity measure and a ranking criterium based on the tf*idf cosine similarity. The experiments, conducted over all available linked datasets on the Linked Open Data cloud, show that our method achieves an average precision of up to 53% for a recall of 100%. As an additional contribution, our method returns the mappings between the schema concepts across datasets – a particularly useful input for the data linking step.

intension based dataset

Holtz, P. (2016) How Popper’s ‘Three Worlds Theory’ resembles Moscovici’s ‘Social Representations Theory’ but why Moscovici’s ‘Social Psychology of Science’ still differs from Popper’s ‘Critical Approach’ Papers on Social Representations, 25, 13.1-13.24 [LINK]

This paper is to my best of knowledge the first to discuss similarities and differences between Karl Popper’s ‘three worlds theory’ and Serge Moscovici’s ‘theory of social representations’. Karl Popper maintained that to be subject to criticism, and hence to falsification attempts and subsequent improvement, scientific theories must first be formulated, disseminated, perceived, and understood by others. As a result, such a theory becomes a partially autonomous object of world 3, the “world of products of the human mind” in contrast to world 1, the “world of things”, and world 2, the “world of mental states” (Popper, 1978, p. 144). Popper’s three worlds theory resembles Moscovici’s social representations theory insofar as social representations / world 3 objects cannot be reduced to individual states of minds, are embedded in interactions between people and objects, and are always rooted in previous representations / knowledge. Hence, Popper – who was very skeptical of the usefulness of a ‘psychology of science’– did in fact employ elements of a ‘social’ social psychology of science in his later works. Moscovici himself in turn may have failed to notice that to Popper science does not take place within a separate ‘reified universe’ in his ‘Social Psychology of Science’ (1993). Although to Popper science aims at increasing objectivity and reification, it is still a part of the social world and the ‘consensual universe’.


Yu R., Gadiraju U., Zhu X., Fetahu B., Dietze S. (2016) Towards Entity Summarisation on Structured Web Markup ESWC 2016 [LINK]

Embedded markup based on Microdata, RDFa, and Microformats have become prevalent on the Web and constitute an unprecedented source of data. However, RDF statements extracted from markup are fundamentally different to traditional RDF graphs: entity descriptions are flat, facts are highly redundant and granular, and co-references are very frequent yet explicit links are missing. Therefore, carrying out typical entity-centric tasks such as retrieval and summarisation cannot be tackled sufficiently with state of the art methods. We present an entity summarisation approach that overcomes such issues through a combination of entity retrieval and summarisation techniques geared towards the specific challenges associated with embedded markup. We perform a preliminary evaluation on a subset of the Web Data Commons dataset and show improvements over existing entity retrieval baselines. In addition, an investigation into the coverage and complementary of facts from the constructed entity summaries shows potential for aiding tasks such as knowledge base population.


Yu, R., Fetahu, B., Gadiraju, U., Dietze, S. (2016) A Survey on Challenges in Web Markup Data for Entity Retrieval poster & short paper at 15th International Semantic Web Conference (ISWC2016) [LINK]

Embedded markup based on Microdata, RDFa, and Microformats have become prevalent on the Web and constitute an unprecedented data source. RDF statements from markup are highly redundant, co-references are very frequent yet explicit links are missing, and frequently contain errors. We present a preliminary analysis on the challenges associated with markup data in the context of entity retrieval. We analyze four main factors: (i) co-references, (ii) redundancy, (iii) inconsistencies, and (iv) accessibility of information in the case of URLs. We conclude with general guidelines on how to handle such challenges when dealing with embedded markup data.


A. Ceroni, U. Gadiraju, J. Matschke, S. Wingert, M. Fisichella. (2016) Where the Event Lies: Predicting Event Occurrence in Textual Documents Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’16) [LINK]

Gadiraju, U., Siehndel, P., Dietze, S. (2016) Estimating Domain Specificity for Effective Crowdsourcing of Link Prediction and Schema Mapping Proceedings of the 8th ACM Conference on Web Science, Pages 323-324 [LINK]

Crowdsourcing has been widely adopted in research and practice over the last decade. In this work, we first investigate the extent to which crowd workers can substitute expert-based judgments in the task of link prediction and schema mapping, which is the creation of explicit links between resources on the Semantic Web at the instance and schema level. This is important since human input is required to evaluate and improve automated approaches for these tasks. We present a novel method to assess the inherent specificity of the link prediction task, and the impact of task specificity on quality of the results. We propose a Wikipedia-based mechanism to estimate specificity and show the influence of concept familiarity in producing high quality link prediction. Our findings indicate that the effectiveness of crowdsourcing the task of link prediction can improve by estimating the specificity.


Ujwal Gadiraju, Gianluca Demartini, Djellel Eddine Difallah, and Michele Catasta. (2016) It’s getting crowded!: how to use crowdsourcing effectively for web science research Proceedings of the 8th ACM Conference on Web Science (WebSci ’16) [LINK]

Since the term crowdsourcing was coined in 2006 [1], we have witnessed a surge in the adoption of the crowdsourcing paradigm. Crowdsourcing solutions are highly sought-after to solve problems that require human intelligence at a large scale. In the last decade there have been numerous applications of crowdsourcing spanning several domains in both research and for practical benefits across disciplines (from sociology to computer science). In the realm of research practice, crowdsourcing has unmistakably broken the barriers of qualitative and quantitative studies by providing a means to scale-up previously constrained laboratory studies and controlled experiments. Today, one can easily build ground truths for evaluation and access potential participants around the clock with diverse demographics at will, all within an unprecedentedly short amount of time. This comes with a number of challenges related to lack of control on research subjects and with respect to data quality.

A core characteristic of Web Science over the last decade has been its interdisciplinary approach to understand the behavior of people on and off the Web, using a wide range of data sources. It is at this confluence that crowdsourcing provides an important opportunity to explore previously unfeasible experimental grounds.

In this tutorial, we will introduce the crowdsourcing paradigm in its entirety. We will discuss altruistic and reward-based crowdsourcing, eclipsing the needs of task requesters, as well as the behavior of crowd workers. The tutorial will focus on paid microtask crowdsourcing, and reflect on the challenges and opportunities that confront us. In an interactive demonstration session, we will run the audience through the entire lifecycle of creating and deploying microtasks on an established crowdsourcing platform, optimizing task settings in order to meet task needs, and aggregating results thereafter. We will present a selection of state-of-the-art methods to ensure high-quality results and inhibit malicious activity. The tutorial will be framed within the context of Web Science. The interdisciplinary nature of Web Science breeds a rich ground for crowdsourcing, and we aim to spread the virtues of this growing field.


Gadiraju, U., & Siehndel, P. (2016) Unlock the Stock: User Topic Modeling for Stock Market Analysis EDBT/ICDT Workshops 2016 [LINK]

The increasing use of Twitter as a medium for sharing news related to various topics, facilitates methods for automatic news creation or event detection and prediction. However, these methods are hindered by users posting and propagating incorrect or irrelevant content. Choosing the right users is crucial in order to sample down the tweets to be analyzed, and preserve the quality of the predicted events or generated news. In this paper, we present an effective method for identifying expert users in defined areas related to the stock market. For each user we generate a model based on the content of their posts. The model represents the domains the user talks about, and allows a selection of users for various tasks. We show the effectiveness of the proposed approach by performing a series of experiments using large Twitter datasets related to Stock Market Companies.

Unlock the Stock User Topic Modeling for Stock Market Analysis

Gadiraju, U. (2016) Make hay while the crowd shines: towards effective crowdsourcing on the web Ujwal Gadiraju, with Prateek Jain as coordinator. in SIGWEB Newsletter, 2016, 3. [LINK]

Ujwal Gadiraju is a third year PhD candidate at L3S Research Center, Leibniz Universität Hannover in Germany. He received his Master of Science degree in Computer Science at Delft University of Technology in the Netherlands. His main research interests encompass the realms of Human Computation and Crowdsourcing. He has published peer-reviewed papers in top-tier conferences in the realms of Information Retrieval, Social Computing, and Crowdsourcing. For more information see

Make hay while the crowd shines towards effective crowdsourcing on the web

Fetahu, B., Markert, K., Nejdl, W. & Anand, A. (2016) Finding News Citations for Wikipedia Proceedings of the 25th CIKM, ACM [LINK]

An important editing policy in Wikipedia is to provide citations for added statements in Wikipedia pages, where statements can be arbitrary pieces of text, ranging from a sentence to a paragraph. In many cases citations are either outdated or missing altogether.

In this work we address the problem of finding and updating news citations for statements in entity pages. We propose a two-stage supervised approach for this problem. In the first step, we construct a classifier to find out whether statements need a news citation or other kinds of citations (web, book, journal, etc.). In the second step, we develop a news citation algorithm for Wikipedia statements, which recommends appropriate citations from a given news collection. Apart from IR techniques that use the statement to query the news collection, we also formalize three properties of an appropriate citation, namely: (i) the citation should entail the Wikipedia statement, (ii) the statement should be central to the citation, and (iii) the citation should be from an authoritative source.

We perform an extensive evaluation of both steps, using 20 million articles from a real-world news collection. Our results are quite promising, and show that we can perform this task with high precision and at scale.

Finding News Citations for Wikipedia

Hasani-Mavriqi, I., Geigl, F., Pujari, S.C., Lex, E., Helic, D. (2016) The influence of social status and network structure on consensus building in collaboration networks Social Network Analysis and Mining 6 (1), 80 (SNAM). 2016 [LINK]

In this paper, we analyze the influence of social status on opinion dynamics and consensus building in collaboration networks. To that end, we simulate the diffusion of opinions in empirical networks and take into account both the network structure and the individual differences of people reflected through their social status. For our simulations, we adapt a well-known Naming Game model and extend it with the Probabilistic Meeting Rule to account for the social status of individuals participating in a meeting. This mechanism is sufficiently flexible and allows us to model various society forms in collaboration networks, as well as the emergence or disappearance of social classes. In particular, we are interested in the way how these society forms facilitate opinion diffusion. Our experimental findings reveal that (i) opinion dynamics in collaboration networks is indeed affected by the individuals’ social status and (ii) this effect is intricate and non-obvious. Our results suggest that in most of the networks the social status favors consensus building. However, relying on it too strongly can also slow down the opinion diffusion, indicating that there is a specific setting for an optimal benefit of social status on the consensus building. On the other hand, in networks where status does not correlate with degree or in networks with a positive degree assortativity consensus is always reached quickly regardless of the status.

The influence of social status and network structure on consensus building in collaboration networks

Stanisavljevic, D., Hasani-Mavriqi, I., Lex, E., Strohmaier, M., Helic, D. Semantic Stability in Wikipedia Studies in Computational Intelligence (SCI), Springer [LINK]

In this paper we assess the semantic stability of Wikipedia by investigating the dynamics of Wikipedia articles’ revisions over time. In a semantically stable system, articles are infrequently edited, whereas in unstable systems, article content changes more frequently. In other words, in a stable system, the Wikipedia community has reached consensus on the majority of articles. In our work, we measure semantic stability using the Rank Biased Overlap method. To that end, we preprocess Wikipedia dumps to obtain a sequence of plain-text article revisions, whereas each revision is represented as a TF-IDF vector. To measure the similarity between consequent article revisions, we calculate Rank Biased Overlap on subsequent term vectors. We evaluate our approach on 10 Wikipedia language editions including the five largest language editions as well as five randomly selected small language editions. Our experimental results reveal that even in policy driven collaboration networks such as Wikipedia, semantic stability can be achieved. However, there are differences on the velocity of the semantic stability process between small and large Wikipedia editions. Small editions exhibit faster and higher semantic stability than large ones. In particular, in large Wikipedia editions, a higher number of successive revisions is needed in order to reach a certain semantic stability level, whereas, in small Wikipedia editions, the number of needed successive revisions is much lower for the same level of semantic stability.

Semantic Stability in Wikipedia

Kopeinik, S., Kowald, D., Lex, E. (2016) Which algorithms Suit Which Learning Environments? A Comparative Study of Recommender Systems in TEL in Proceedings of the European Conference on Technology Enhanced Learning (EC-TEL 2016) [LINK]

In recent years, a number of recommendation algorithms have been proposed to help learners find suitable learning resources on-line. Next to user-centered evaluations, offline-datasets have been used to investigate new recommendation algorithms or variations of collaborative filtering approaches. However, a more extensive study comparing a variety of recommendation strategies on multiple TEL datasets is missing. In this work, we contribute with a data-driven study of recommendation strategies in TEL to shed light on their suitability for TEL datasets. To that end, we evaluate six state-of-the-art recommendation algorithms for tag and resource recommendations on six empirical datasets: a dataset from European Schoolnets TravelWell, a dataset from the MACE portal, which features access to meta-data-enriched learning resources from the field of architecture, two datasets from the social bookmarking systems BibSonomy and CiteULike, a MOOC dataset from the KDD challenge 2015, and Aposdle, a small-scale workplace learning dataset. We highlight strengths and shortcomings of the discussed recommendation algorithms and their applicability to the TEL datasets. Our results demonstrate that the performance of the algorithms strongly depends on the properties and characteristics of the particular dataset. However, we also find a strong correlation between the average number of users per resource and the algorithm performance. A tag recommender evaluation experiment reveals that a hybrid combination of a cognitive-inspired and a popularity-based approach consistently performs best on all TEL datasets we utilized in our study.

Which algorithms Suit Which Learning Environments A Comparative Study of Recommender Systems in TEL