Social chatter analyser Sensemaking provides a single stop interface through which all social media related tracking and analysis can be performed, as EIT Digital’s Jyrki Karasvirta explains.
Social media generates a massive amount of data. Facebook alone accounts for more than 2 billion monthly active users and Twitter is hitting above 330 million users, with an average of 500 million tweets produced per day.
It is mostly unstructured and exhibits different information spread patterns with substantial impact on social opinion, politics, markets, customer behaviour, etc. Some topics might quickly catch on and rapidly spread to reach and infect almost all users of a social network, whereas others may die quite soon after their inception.
It is very challenging to identify the causes of such phenomena, and social media synergies are mostly enigmatic. This has made it difficult for industries, campaign runners and social opinion influencers to map, link, utilise, predict, and efficiently use social networks to achieve and reach their goals, like maximising the returns of a marketing campaign, the timely sharing and maximising of the spread and outreach of critical information, or possibly containing the spread of fake news and rumours.
The ‘Sensemaking’ EIT Digital Innovation activity took the objective of bringing order to social media and making sense of its synergies with an understanding of the underlying information spread and adoption patterns as well as their causalities. The outcome is a Sensemaking service which uses, adapts, and improves artificial intelligence technologies and deep learning algorithms to utilise and map the massive unstructured data created within social networks.
The service targets both industrial companies that use social media for marketing and brand awareness and governmental, political, social, and other organisations that make use of social media for campaigning, opinion sharing, and so on. It can also be used for emergency management to allow the early detection of rumours and fake news that have the potential to rapidly spread and become viral and to respond to it with timely and proper containment strategies.
Sensemaking development was led by Sarunas Girdzijauskas, an associate professor at the Swedish KTH Royal Institute of Technology. He said: “Sensemaking analysis responds both to business and academic customer demand by providing a web service that rapidly answers requests regarding expected trending scenarios for given topics that are expressed as a set of input keywords. Its competitive advantage is the automatic and transparent topic extraction, allowing the linking of unstructured natural text to existing knowledge bases and to the underlying social graphs of web chatter. This enables the detection of causal social network communities and events. It also allows integration of various and different knowledge bases and the comparison of data-driven and knowledge-based analysis in the same interface.”
Sensemaking not only provides a tool for analysing and linking relevant data and for tracking topics spread in social networks but also surpasses that to the detection of causalities and the extraction of the key reasons behind a given topic’s spread pattern. Moreover, it allows the prediction of the spread patterns of newly started topics, allowing early engagement in designing the right social media strategy corresponding to the user’s target goal. These can be as opposite as maximising awareness and spread of a topic or adopting containment strategies to help limit the spread.
Girdzijauskas continues: “The results are achieved by linking and combining key parameters from both the data shared over a social network as well as representations of the underlying social connections and the information spread pathways observed in the social network, as well as key metadata, such as the time when the information is shared and geographical locations.”
Combining education, research and business
Sensemaking started as a project aiming at utilising the expertise of three academic EIT Digital partners: Swedish KTH Royal Institute of Technology, SICS – the Swedish Institute of Computer Science, and the Italian University of Trento. The partners used their experience in artificial intelligence, deep learning, graph processing, text analysis and topic modelling to generate models that can explain the causalities in social networks when it comes to topic diffusion and spread, known as ‘virality’, and based on which predictions on a new topic’s virality capabilities could be made.
The development was kicked-off by creating a framework consisting of training a model based on different signal parameters from the social network. As a source information, the team used some 600,000 tweets from some 60,000 users related to Swedish political parties shared between January and December 2016.
First, the unstructured text tweets were grouped into semantic topics. Next, these topics were mapped to the underlying social network of the connected users who participate in spreading or talking about any given topic.
Linking this information allows the detection of spread patterns and the extraction of causalities based on who talks about what at which time, and in which context, and based on the influence of users in the underlying social network.
This influence factor has been considered as being reflected by the number of connections users have, their activity, and the infection capability they have on their connections relative to a given topic. The number of critical information sharing paths that pass through them is also important, as some people act as hubs that connect relatively disconnected and dispersed groups of users across the world.
In order to generate an analysis service that answers real market needs and that creates true business value, the research institutes entered into close collaboration with two Swedish SMEs, Gavagai and UnitedMinds, both working with the analysis and mining of information from different social media and other web platforms. They provided use cases of potential business value derived from real needs of the market.
Virality predicts success
A viral topic is one that can spread across social boundaries of tightly connected users in a social network within very short times. There is already a number of research results that address the question of the virality of a new tweet. They all focus mainly on predicting the virality of a new topic/tweet in social networks based on time and the number of early adopters.
Leila Bahri, a Postdoctoral researcher at KTH and member of the Sensemaking development team, explains:
“The core question is that when two sources launch a social media posting on a similar topic, what would make people from different social and geographical boundaries adopt, share, or react to one of these sources and not to the other. Based on the answer, we can perform more accurate and more timely prediction of a topic’s virality.”
The novelty of Sensemaking is to incorporate and link the richer layers of information and consider key-parameters extracted from the underlying social network and interaction patterns observed between users. The tool detects causalities of topic and tweet spread patterns by exploring the underlying social network, the positioning of well-connected users, and the influence factors of different users based on the history of their topics of interest and on their activation capabilities.
The outcome is a new enriched model for causality detection allowing enhanced predictions with information on how to possibly engage better strategies to either maximise the spread of a new topic or to better respond to its predicted virality, based on the desired outcome.
Spreading like a disease
Diffusion events, also known as cascades, are common phenomena in the natural and cyber world. Such phenomena are usually triggered by a few sources, normally called seeds or opinion leaders, which start to spread a contagion through a network of interacting individuals. Common examples are the diffusion of a virus in a population of individuals, word-of-mouth effects in marketing, the spread of a meme, or the diffusion of fake news in online social networks.
We may observe when a particular individual has been infected by a contagion, but the exact cause or source of the contagion typically remains unknown. Uncovering the diffusion network is thus of great interest for social scientists, epidemiologists, and biologists.
The Sensemaking team have discovered that a delay-aware approach performs well as it utilises the delays between infection events as it is directly embedding users according to the inherent delay patterns exposed by them. Further, it does not stop at the frequency, but also examines how closely in terms of time a specific user interacts with the rest of the users.
Detecting communities with decentralised machine learning
Community detection is Sensemaking analysis’ key focus area. Community detection is also an important problem for network analysis in a number of fields including physics, computer science, social science, and biology.
Community detection is the task of grouping nodes into clusters with better internal connectivity than external connectivity. Being able to accurately detect meaningful communities in the social graph is of big importance to the provision of Sensemaking as it is used to improve virality prediction results. The key is to differentiate between topics that spread within single communities and those that spread across social and topic communities in the network.
Amira Soliman, a senior PhD student at KTH and member of Sensemaking development team, said: “Sensemaking uses machine learning and an appropriator decentralised community detection algorithm to identify key communities in a graph. By mapping detected social communities to topical ones, it is possible to identify causalities based on a topic’s virality and metrics related to the engaged users. In other words, by extracting graph-based metrics and carrying out graph-based analytical tasks we can detect the relationship between topics spread patterns and the underlying social network.”
A random walk in a social community
Most existing community detection algorithms fall into the category of global approaches. These global approaches adapt their detection model focusing on the global structure of the whole network, instead of addressing the approximation at the community level. Thus, global techniques tune their parameters to a ‘one size fits all’ model. As such, they are relatively successful with extracting communities in homogeneous cases but suffer in heterogeneous communities.
Globally-based detection algorithms usually run with high computational cost. Therefore, a technique based on what is known as random walk has been extensively adapted to extract disjoint communities as one of the lowest computational overhead approaches.
This method requires some known members as the prior for the semi-supervised clustering in order to perform probability diffusion from them. The intuition behind random walks and diffusion is that once a random walker enters a region, it tends to stay there for a prolonged period, and movements between regions are relatively rare via one of the few outgoing edges. Thus, this phenomenon can be used for community detection by capturing the boundaries of well-connected regions.
As people usually connect and interact in local communities, and the spread patterns differ based on whether a topic remains contained within a local community or succeeds at spreading beyond to reach other connected communities, accurate community detection on top of the social graph is a must have.
As communities are expected to overlap, an enhanced overlapping community detection algorithm has also been developed and is used in Sensemaking. Communities can be mapped to the topics they are observed to highly interact with and which they spread. The users can also be grouped together based on topical communities. Causalities can be found by overlapping the social and topical layers.
The development team studied community visualisation by collecting a dataset of some 16,000 hashtags and 24,000 users using these hashtags. The visualisation of users and hashtags depict the semantic relationships between these entities, in terms of who talks about which topic. The results showed that if two users or hashtags are close in the representation space then this implies that they appear together in the same context.
At the first stage, the Sensemaking service will focus on Sweden and English-speaking countries via Gavagai’s ten business customers. In the second stage, the intention is to achieve a broader reach by extending the service to other languages.
Girdzijauskas concludes: “Thanks to our business liaison, Sensemaking emerged not only as an academic tool that provides causality and prediction analysis for social networks, but also one that answers business cases that emerge from real needs of potential end-customers. For them, it is important to have monitoring and measurement tools to extract the positioning of given topics in social media, connecting different analytical metrics, and providing predictions based on different scenarios.”
This article will appear in SciTech Europa Quarterly issue 26, which will be published in March, 2018