Detecting malicious behaviour in participatory sensing settings

Security is crucial in modern computer systems hosting private and sensitive information. Our systems are vulnerable to a number of malicious threats such as ransomware, malware and viruses.  Recently, a global cyberattack (ransomware) affected hundred of organisations, most notably the UK’s NHS.  This malicious software “locked” the content stored on organisations’ hard drives, requiring money (to be paid in bitcoins) to “unlock” it and make it available back to their owners. Crowdsourcing (the practice of obtaining information by allocating tasks to a large number of people e.g. Wikipedia) is not immune of malicious behaviour. On the contrary, the very openness of such systems make them ideal for malicious users to alter, corrupt or falsify information (data poisoning). In this post, we present an environmental monitoring example, where ordinary people take air quality readings (using mobile equipment) to monitor air pollution of their city or neighbourhood (see our previous post for more details on this example). Arguably, some people participating in such environmental campaigns can be malicious. Specifically, instead of taking readings to provide information about their environment,  they might deviate by following their own secret agenda. For instance, a factory owner might alter the readings showing that their factory pollutes the environment. The impact of such falsification is huge as it basically changes the overall picture of the environment, which in turn leads authorities to wrong actions regarding urban planning.

We argue that Artificial Intelligence (AI) techniques can be of great help in this domain. Given that measurements have a spatio-temporal correlation, a non-linear regression model can be overlaid over the environment (see previous post). The tricky part however is to differentiate between truthful and malicious readings. A plausible solution is to extend the non-linear regression model by assuming that each measurement has an individual and independent noise (variance) from each other (heteroskedasticity). For instance, a Gaussian Process (GP) model can be initially used and then extended to Heteroskedastic GP (HGP). The consequence of this action is that this individual noise can indicate the deviation of each measurement compared to the truthful measurements, which can either be attributed to sensor noise (which is always present in reality) or in malicious readings. An extended version of HGP, namely Trust-HGP (THGP), assigns a trust parameter to the model that captures the possibility of each measurement being malicious between the interval of (0,1).  The details of the THGP model as well as how it is utilised in this domain will be presented end of October at the fifth AAAI conference on human computation and crowdsourcing (HCOMP 2017). Stay tuned!

Advertisement

How AI and humans can optimise air pollution monitoring

Air pollution is responsible for 7 million deaths per year according to World Health Organization (WHO). Thus, it is crucial to dedicate resources to learn and monitor air quality in cities to assist authorities in urban planning as well as bring awareness to people about the impact of air pollution to their everyday life. In our research, we provide the framework and the algorithms, utilising the power of Machine Learning to effectively monitor an environment over time.
In particular, our proposal relies on the willingness of people to participate in environmental air quality campaigns. People can use  mobile air quality devices to take readings in their city or their neighbourhood. However, the major issue is when and where these readings should be taken to efficiently monitor the city. People cannot provide an unlimited number of measurements and thus readings should be taken in a way such that information about the environment is maximised. In other words, we need to solve an optimisation problem constrained on the number of readings people can provide over a period of time to facilitate an efficient environment exploration.
In order to solve the problem, we need to model the environment in a certain way as well as a way to measure the information entailed in each reading (since we are interested in gaining the most information by taking a limited number of readings). To do that, we overlay a spatio-temporal stochastic process over the area of interest (Gaussian Processes). Gaussian processes can be used to interpolate over the environment, i.e., predict the air quality value at unobserved locations as well as predict the state of the environment into the future. Importantly, Gaussian Processes can also be used to provide a measure of uncertainty/information about each location in space and time (by utilising predictive variance).
The problem is evolved into taking a set of measurements such that a utility function, created based on predictive variance provided by Gaussian Processes, is maximised. Going a step forward, to solve this problem, we use techniques and algorithms from the broad areas of Artificial Intelligence and Multi-agent systems.
In particular, an intelligent agent can decide when and where measurements should be taken to maximise information gained about the air quality, while at the same time minimise the number of readings needed. The agent can employ greedy search techniques combined with meta-heuristics such as stochastic local search, unsupervised learning (clustering) and random simulations.
The main idea is to simulate the environment over time, asking what if kind of questions. What if i take a measurement now, and one in the night. What if i take measurement downtown or near the home. These kind of questions are answered  by running simulations on a cluster computing facility.
Finally, our findings indicate a significant improvement over other approaches.