Detecting malicious behaviour in participatory sensing settings

Security is crucial in modern computer systems hosting private and sensitive information. Our systems are vulnerable to a number of malicious threats such as ransomware, malware and viruses.  Recently, a global cyberattack (ransomware) affected hundred of organisations, most notably the UK’s NHS.  This malicious software “locked” the content stored on organisations’ hard drives, requiring money (to be paid in bitcoins) to “unlock” it and make it available back to their owners. Crowdsourcing (the practice of obtaining information by allocating tasks to a large number of people e.g. Wikipedia) is not immune of malicious behaviour. On the contrary, the very openness of such systems make them ideal for malicious users to alter, corrupt or falsify information (data poisoning). In this post, we present an environmental monitoring example, where ordinary people take air quality readings (using mobile equipment) to monitor air pollution of their city or neighbourhood (see our previous post for more details on this example). Arguably, some people participating in such environmental campaigns can be malicious. Specifically, instead of taking readings to provide information about their environment,  they might deviate by following their own secret agenda. For instance, a factory owner might alter the readings showing that their factory pollutes the environment. The impact of such falsification is huge as it basically changes the overall picture of the environment, which in turn leads authorities to wrong actions regarding urban planning.

We argue that Artificial Intelligence (AI) techniques can be of great help in this domain. Given that measurements have a spatio-temporal correlation, a non-linear regression model can be overlaid over the environment (see previous post). The tricky part however is to differentiate between truthful and malicious readings. A plausible solution is to extend the non-linear regression model by assuming that each measurement has an individual and independent noise (variance) from each other (heteroskedasticity). For instance, a Gaussian Process (GP) model can be initially used and then extended to Heteroskedastic GP (HGP). The consequence of this action is that this individual noise can indicate the deviation of each measurement compared to the truthful measurements, which can either be attributed to sensor noise (which is always present in reality) or in malicious readings. An extended version of HGP, namely Trust-HGP (THGP), assigns a trust parameter to the model that captures the possibility of each measurement being malicious between the interval of (0,1).  The details of the THGP model as well as how it is utilised in this domain will be presented end of October at the fifth AAAI conference on human computation and crowdsourcing (HCOMP 2017). Stay tuned!

Advertisement

Crowdsourcing categories

In this post I attempt to describe different types of crowdsourcing. This post will be continuously updated with examples, descriptions and potentially new categories.

Participatory sensing is about people carrying special equipment with them and take measurements for monitoring for example an environmental phenomenon.
Crowdsensing is about sharing data collected by sensing devices.
Crowdsourcing is an umbrella term that encapsulates a number of crowd-related activities. Wikipedia has the following definition: Crowdsourcing is the process of obtaining needed services, ideas, or content by soliciting contributions from a large group of people, and especially from an online community, rather than from traditional employees or suppliers.
Online crowdsourcing is about outsourcing online tasks to people. For example people doing tasks for micropayments in Amazon Mechanical Turk.
Citizen science is about assisting scientist in complex (for the machine) and time-consuming tasks (for the scientists). For example, identifying fossils in rocky environments from hundred of pictures or identifying the galaxies of the universe. More interesting projects can be found in zooniverse.org.
Spatial crowdsourcing is about doing tasks that requires participants to go to specific locations to do specific tasks. For instance, taking a photo of a plant that grows in a specific location would require participants to physically go to that location to complete their task.
Mobile crowdsourcing describes crowdsourcing activities that are processed on mobile devices such as smartphones and tablets.
Human computation is a type of collective intelligence and crowdsourcing where humans assist machines in performing tasks that are difficult for them.
Opportunistic sensing is about doing tasks without users active contribution. For example, take a measurement automatically when the device is near some location.

 

If you want to add to the descriptions or disagree with something above feel free to comment below.