Detecting malicious behaviour in participatory sensing settings

Security is crucial in modern computer systems hosting private and sensitive information. Our systems are vulnerable to a number of malicious threats such as ransomware, malware and viruses.  Recently, a global cyberattack (ransomware) affected hundred of organisations, most notably the UK’s NHS.  This malicious software “locked” the content stored on organisations’ hard drives, requiring money (to be paid in bitcoins) to “unlock” it and make it available back to their owners. Crowdsourcing (the practice of obtaining information by allocating tasks to a large number of people e.g. Wikipedia) is not immune of malicious behaviour. On the contrary, the very openness of such systems make them ideal for malicious users to alter, corrupt or falsify information (data poisoning). In this post, we present an environmental monitoring example, where ordinary people take air quality readings (using mobile equipment) to monitor air pollution of their city or neighbourhood (see our previous post for more details on this example). Arguably, some people participating in such environmental campaigns can be malicious. Specifically, instead of taking readings to provide information about their environment,  they might deviate by following their own secret agenda. For instance, a factory owner might alter the readings showing that their factory pollutes the environment. The impact of such falsification is huge as it basically changes the overall picture of the environment, which in turn leads authorities to wrong actions regarding urban planning.

We argue that Artificial Intelligence (AI) techniques can be of great help in this domain. Given that measurements have a spatio-temporal correlation, a non-linear regression model can be overlaid over the environment (see previous post). The tricky part however is to differentiate between truthful and malicious readings. A plausible solution is to extend the non-linear regression model by assuming that each measurement has an individual and independent noise (variance) from each other (heteroskedasticity). For instance, a Gaussian Process (GP) model can be initially used and then extended to Heteroskedastic GP (HGP). The consequence of this action is that this individual noise can indicate the deviation of each measurement compared to the truthful measurements, which can either be attributed to sensor noise (which is always present in reality) or in malicious readings. An extended version of HGP, namely Trust-HGP (THGP), assigns a trust parameter to the model that captures the possibility of each measurement being malicious between the interval of (0,1).  The details of the THGP model as well as how it is utilised in this domain will be presented end of October at the fifth AAAI conference on human computation and crowdsourcing (HCOMP 2017). Stay tuned!


Intelligent Express – Making your everyday coach more intelligent

If you live in the UK, you’ve travelled at least once with N. E. Well… If you haven’t, N.E. is a British multinational public transport company. I used to take their coaches to travel across the country. I still do. They connect all the places together by having frequent journeys to hundreds of destinations within at least the UK. So, what is this post about? Well, as I said, I am using their services mainly because of their prices. It is usually much cheaper than taking the train. However, there is one frustrating thing. Delays!! Waiting for the coaches seems never-ending. Other times you expect to go to your destination within a couple of hours and it takes four or more. It happened to me. I know, traffic. We can’t do anything about it. But, yes we can. We can at least know the schedule. We can know that the coach will actually take four hours to go to its destination and thus be prepared of the long journey.

The timetable is actually given along with the expected time to the destination…and, to my experience, it is usually wrong! So, here, I propose a simple solution that could be beneficial for both customers but also for the company.

Machine learning is a fast evolving branch lying between computer science and statistics and it could come handy. We can train intelligent algorithms to find patterns in the schedule of coaches. Specifically, we can learn their departing and arriving times and provide better estimates about each journey’s duration. So, we can know in advance that the trip is going to take more than expected or that is going to be departing late!

To the practical bit now. I believe that Gaussian Processes are ideal for this task. A periodic kernel could be used since we already know that duration depends on the day and the time of the day. Departure and arrival times can be noted down by the drivers and added to the system. Thus, a history of journeys’ times and durations can be created. Next, for any journey requested, an accurate estimation of the duration and departure time can be provided as well as the risk or the confidence interval or the uncertainty about that prediction.