Crowdsourcing categories

In this post I attempt to describe different types of crowdsourcing. This post will be continuously updated with examples, descriptions and potentially new categories.

Participatory sensing is about people carrying special equipment with them and take measurements for monitoring for example an environmental phenomenon.
Crowdsensing is about sharing data collected by sensing devices.
Crowdsourcing is an umbrella term that encapsulates a number of crowd-related activities. Wikipedia has the following definition: Crowdsourcing is the process of obtaining needed services, ideas, or content by soliciting contributions from a large group of people, and especially from an online community, rather than from traditional employees or suppliers.
Online crowdsourcing is about outsourcing online tasks to people. For example people doing tasks for micropayments in Amazon Mechanical Turk.
Citizen science is about assisting scientist in complex (for the machine) and time-consuming tasks (for the scientists). For example, identifying fossils in rocky environments from hundred of pictures or identifying the galaxies of the universe. More interesting projects can be found in zooniverse.org.
Spatial crowdsourcing is about doing tasks that requires participants to go to specific locations to do specific tasks. For instance, taking a photo of a plant that grows in a specific location would require participants to physically go to that location to complete their task.
Mobile crowdsourcing describes crowdsourcing activities that are processed on mobile devices such as smartphones and tablets.
Human computation is a type of collective intelligence and crowdsourcing where humans assist machines in performing tasks that are difficult for them.
Opportunistic sensing is about doing tasks without users active contribution. For example, take a measurement automatically when the device is near some location.

 

If you want to add to the descriptions or disagree with something above feel free to comment below.

Advertisement

Research Internship – Data science/Machine Learning

This post aims to describe my experiences from my three-month research internship at Toshiba Research Labs, Bristol, UK and the project I have been working on (September – December 2015).

I remember the day I first went there for my interview. The building was between a wonderful small square park and a river, and it was just 5 minutes walk from the city centre. But this was not the only thing I really liked. Working there I realised the importance of the culture in a firm. I appreciated the importance of collaboration, brainstorming and creativity. It was an academia-like environment; friendly, down-to-earth people with lots of ideas and knowledge on a variety of subjects. Everyone was approachable and you could discuss with them about anything.  I could communicate with colleagues effectively without having to worry about business formalities.

The project I worked on was intriguing. That was the main reason I applied for this research internship in the first place. It combined my academic interest on Machine Learning and my personal interest on human wellbeing.  In short, the project was about Mood Recognition ar Work using Wearable devices. In other words, understand, learn and attempt to predict someone’s mood (happiness/sadness/anger/stress/boredom/tiredness/excitrment/calmness) using just a wearable device (could be a smart wristband, a chest sensor or anything that is able to capture vital signs). Sounds impossible right? How can you predict such a complicated thing as human emotions? We, as humans, are not able to understand our mood. For example, how would you say you feel right now? Happy, Sad? Ok? This is indicative of the complexity of the problem we were facing. However, we wanted to do unscripted experiments, meaning we did not want to induce any emotions to the participants of our study. We rather wanted them to wear a smart device amd log on their mood in 2-hour intervals while they were still in work as accurate as they could. Surprisingly, at least for me, there was variation in their responses in general. Some higher, some lower but all of them varied. That was encouraging.

We had to study the literature, do some research to answer the following question: How could we extract meaningful features from vital signs and accelerometer signals that will have predictive capabilities in terms of emotions? After some digging around, we found the relevant literature. It was not new concept. There were studies both in Medical literature and in Computer Science, associating heart rate with stress and skin temperature with fatigue. We wanted to take this further. We wanted to check whether a combination of all these could have a more powerful predictive ability. Intuitively, think about the times you felt stressed. Your heart might pumps faster, but sometimes your foot or hand might be shaking as well. These could be captured by the accelerometer and together could be used as an additional indicator stressful situation.

We ended up with hundreds of features, and tested a number of basic machine learning techniques, such as Decision Trees and SVMs.

Our results were good enough, comparable to those in literature. Thus, we decided to publish our findings in the PerCom 2016 conference proceedings (WristSense Workshop)(http://ieeexplore.ieee.org/document/7457166/).

Further, a number of ideas for patents were discussed and exciting new venues for potential work was drawn.

Overall, I would recommend an internship during a PhD programme as it is a very rewarding experience.

I would like to take this opportunity to thank all of the employees, managers, directors there for the unique experience and their confidence in me.

Intelligent Express – Making your everyday coach more intelligent

If you live in the UK, you’ve travelled at least once with N. E. Well… If you haven’t, N.E. is a British multinational public transport company. I used to take their coaches to travel across the country. I still do. They connect all the places together by having frequent journeys to hundreds of destinations within at least the UK. So, what is this post about? Well, as I said, I am using their services mainly because of their prices. It is usually much cheaper than taking the train. However, there is one frustrating thing. Delays!! Waiting for the coaches seems never-ending. Other times you expect to go to your destination within a couple of hours and it takes four or more. It happened to me. I know, traffic. We can’t do anything about it. But, yes we can. We can at least know the schedule. We can know that the coach will actually take four hours to go to its destination and thus be prepared of the long journey.

The timetable is actually given along with the expected time to the destination…and, to my experience, it is usually wrong! So, here, I propose a simple solution that could be beneficial for both customers but also for the company.

Machine learning is a fast evolving branch lying between computer science and statistics and it could come handy. We can train intelligent algorithms to find patterns in the schedule of coaches. Specifically, we can learn their departing and arriving times and provide better estimates about each journey’s duration. So, we can know in advance that the trip is going to take more than expected or that is going to be departing late!

To the practical bit now. I believe that Gaussian Processes are ideal for this task. A periodic kernel could be used since we already know that duration depends on the day and the time of the day. Departure and arrival times can be noted down by the drivers and added to the system. Thus, a history of journeys’ times and durations can be created. Next, for any journey requested, an accurate estimation of the duration and departure time can be provided as well as the risk or the confidence interval or the uncertainty about that prediction.

Inference VS Prediction: What do we mean, where they are used and how?

A lot of people seem to confuse the two terms in the machine learning and statistics domain. This post will try to clarify what we mean by the two, where each one is useful and how they are  applied. I personally understood it when I had a class called Intelligent Data and Probabilistic Inference (by Duncan Gillies) in my Master’s degree. Here, I will present a couple of examples in order to intuitively understand the difference.

Inference:

You observe the grass in your backyard. It is wet. You observe the sky. It is cloudy. You infer it has rained. You then open the TV and watch the channel weather. It is cloudy but no rain for a couple of days. You remember you had a timer for the sprinkler a few hours ago. You infer that this is the cause of the grass being wet.

(The creepy example) Imagine you are staring at an object in the evening that is a bit far away in a corner. Getting closer… you observe that the object is staring back at you. You infer that is an animal. You are brave enough and you are getting closer.  You can now see the eyes, the fur, the legs and other characteristics of the animal.  You infer that it is a catA simple procedure for your brain, right? It feels trivial to you and probably stupid to even discuss it. You can of course recognize a cat. But in fact this is a form of inference. Say the cat has some features like: eyes, fur, shape etc. As you get closer to it, you assign different values to these variables. For example, initially eyes variable was set to 0, as you couldn’t see them. As you move closer you are more certain of what you observe. Your brain takes these observations and converts them in the probability that the object is a cat. Say we have a catness variable that represents the possibility of the object being a cat. Initially, this variable could be near zero. Catness is increased as you move closer to the object. Inference takes place and updates your belief about the catness of the object.   Similar example can be found here: http://www.doc.ic.ac.uk/~dfg/ProbabilisticInference/IDAPISlides01.pdf

Prediction:

You observe the sky. It is cloudy. You predict that is going to rain. You hear in the news that the chances for rain despite the clouds are low. You revise and predict that most probably is not going to rain.

Given the fact that you own a cat, you predict that when you come home, you will find it running around.

Final Example:

Understanding the behaviour of humans in terms of their daily routine, or their daily mobility patterns requires the inference of latent variables that control the dynamics of their behaviour. The knowledge of where people will be in the future is prediction. However, prediction cannot be made if we have not inferred the relationships and dynamics, let’s say, of the humans’ mobility.

Verdict:

Inference and prediction answer different questions. Prediction could be a simple guess or an informed guess based on evidence. Inference is about understanding the facts that are available to you. It is about utilising the information available to you in order to make sense of what is going on in the world. In one sense, prediction is about what is going to happen while inference is about what happened. In the book “An introduction to statistical learning” you can find more detailed explanation. But the point is that given some random variables (X1, X2…Xn) or features or, for simplicity, facts, if you are interested on estimating something (Y) then this is prediction. If you want to understand how (Y) changes as random variables change, then it is inference.

In a short sentence:  Inference is about understanding while prediction is about “guessing”.

Submodularity in Sensor Placement Problems

Many problems in Artificial Intelligence and in computer science in general are hard to solve. What practically this means is that it would take a computer probably hundred/thousands/millions of years of computation to solve it. Thus, many scientists tend to create algorithms that approximately solve difficult problems but in a sensible time period, i.e., seconds/minutes/hours.

A problem like this is the sensor placement problem. The key question here is to find a number of locations to place some sensors in order to achieve better coverage of the interested area. In order to solve this problem the computer has to compute all the possible combinations of placing the sensors we have in all different locations. To give some numbers, having 5 sensors and 100 possible locations, one has to try 75287520 combinations in order to find the best arrangement. Imagine what happens when the problem is about placing hundreds of sensors in a city where there are hundreds or thousands of options.

In such problems submodularity comes handy. It is an extremely important property used in many sensor placement problems.  It is a theorem that describes the behavior of functions. In particular, the main idea is that an addition to a small set has a higher return/utility/value rather than adding the same thing to a larger set. This can be better understood with an example. Imagine having 10000 sensors scattered in a big room taking measurements of the temperature every 2 hours. Now imagine adding another sensor to that room. Have we really gained much for doing so? So, we have a large set and we add something. Similarly, imagine the same room having only 1 sensor. Adding 1 more can give us better understanding of probably some corner or get a better estimate of the true average temperature of the room. So, this sensor was much more valuable to have that in the previous case. This is what i mean by saying that adding something to a smaller set has a higher utility.

It turns out that this property is very useful at maths and in computer science and AI in particular as it allows us to build algorithms that have theoretical guarantees. It has been proved that a greedy algorithm has a 63% of the optimal algorithm in terms of performance. This was initially proved from Nemhauser in maths contents and later from Krause et. al in the field of computer science and especially for the sensor placement problem. The image below shows this property in terms of diagrams to get a better feeling of what this property is about.

Submodularity (taken from Meliou et al. power point presentation)
Submodularity (taken from Meliou et al. power point presentation)

Gaussian Process Summer School

Last September I had the opportunity to attend Gaussian Process Summer School, in Sheffield, UK. It is a twice a year event that holds for 3-4 days. First of all, I have to say that it was an awesome experience even if i had no much time to explore the city. Besides, it was heavy raining most of my time there. Well, we had an excursion to a local brewery.

Anyway, the event was structured like full day lectures, everyday, given by experts in the field. And by saying experts I mean  guys like Rasmussen, who has written the famous book on Gaussian Processes (GPs) cited on any paper that includes these two words nowadays, and of course Neil Lawrence who has a whole lab in Sheffield working on Gaussian Processes and organizes this School.

What I enjoyed the most though were the lab sessions scheduled between lectures. It was the perfect time to get our hands dirty. It was a chance to use GPy, a python library that includes almost everything about GPs, developed in Sheffield. I have to admit that GPy seems a lot more powerful tool to have than GPML which I currently (it is a GP library for Matlab). Anyway, the exercises given were perfectly suited to play around with the features of GPy as well as discover the potential of GPy and Gaussian Processes in general. In fact, the exercises were given in ipython notebooks. Ipython notebook is an interactive computational environment, in which you can combine code execution, rich text and mathematics. Specifically, we were given snippets of code that had some crucial parts missing, which we were supposed to fill in.

Another memory from the GP school was that of Joaquin Quiñonero Candela who gave lectures at the summer school as well as the university of Sheffield. Joaquin was previously a researcher at Microsoft and he is now director of research in Applied Machine Learning at Facebook,  where apparently make use of advanced machine learning techniques and push the field to its limits. Importantly Joaquin co-authored papers with Rasmussen  on Gaussian Processes and he seemed to me a brilliant guy.

That is pretty much my experience from this school. In another post, I will introduce GPs and explain as intuitively as i can their usefulness and applicability.

gpss

Artificial Intelligence to save the environment or destroy the world?

In a recent post I briefly describe my experiences from the AAMAS conference in Turkey. What I haven’t talked about is the topic and the details of the paper of mine that got accepted there. This post aims to introduce my research and provide a summary of my recently published paper.

In 2014, the World Health Organization estimated that 7 million people have died by diseases associated with air pollution. These lives could have potentially been saved if measures were taken on time. But can we really take measures when we do not know where and when pollution is high but  only know vaguely that air pollution is caused by traffic and industrial pollutants released to the atmosphere? What I mean is that a more collective effort is required to really understand air pollution in terms of its spatial as well as its temporal distribution. In fact, there are indeed statics sensors scattered in cities all over the world. But are they enough? These are expensive sensors placed in areas away from pollution sources in order to estimate the average pollution in that area. Is that what we want? Sort of. How about the kid that goes to school, walking everyday for 10 minutes next to a congested road? How about the cyclists that chose to cycle to be environmental friendly and importantly to be more healthy? But, are they really healthy, cycling behind buses and cars? Well, I am sure that the air quality index displayed by the static sensor is nowhere near the reality of those  that spent their time in busy roads.

Here is the alternative! Give people the power to measure their own pollution exposure! Well, this is already happening and this is what participatory sensing is about. Citizens, carrying around sensors are taking all sort of measurements. Let’s think about that. Well, carrying around sensor… We all do. Our smartphone, that at least 7 out of 10 owns in the UK (according to studies in 2013), is a sensor. In fact, it is multiple sensors embedded in a single handheld device.  At the moment, they are not able to measure air pollution but we are getting there. I mean, phones can monitor your heart rate and for each generation of phones a new sensor is added. Even if monitoring air pollution from your phone might be a few years away, there are mobile sensors that could be easily paired to the phones via USB or Bluetooth.

However, people live their lives and follow their own daily agenda. They are not going to run around the city all day and night to take measurements in order to spatio-temporally cover their city.  Even if they want to, their mobile phone’s battery will betray them. How long can it last utilizing their battery draining sensors?

Enough of the introduction. My paper is focused on making these environmental campaigns that expect citizens to take measurements succeed.  How? We first of all assume that people have a cost for taking a measurement. This could represent the inconvenience that the user gets into in order to take a measurement. It could also represent a small payment if the environmental campaigns have the resources to do so. Or, it could even represent the battery life of user’s phone that it was just reduced because of the activation of multiple sensors such as GPS (and Bluetooth if it is paired with an air quality sensor).

Another factor that we consider is the mobility patterns of people. It is known (at least in the research community) that people are typically predictable in their daily routine. I do not know about you but this is definitely true for me. Except some times. Sometimes I deviate. Or so I think. Anyway, there is a lot of literature on this topic and I am not getting into details.

So, the big question is: Where, When and Who should take measurements in order to better monitor the environment for a period of time given that each user occurs a different cost for taking a single measurement? Well, this is the question that my algorithm addresses. It is about mapping each participant to a location at some point in time in a way that taking this suggested measurement would be as significant as possible in the effort of monitoring the environment. The good thing is that no one should deviate their route. Given that I always wake up and go to “work”,  the algorithm could tell me to take a measurement on my way at some point. This is the point of using human mobility patterns in the first place. To exploit some available intel.

Well, what do you think? I think this is better than having people walking around like zombies trying to take measurements for your experiments compensating them with 20 dollars each in a project that will cease to exist the day the funding is over and that no one will actually use it in practice after you have successfully published your paper.  Don’t you think?  Or, your phone could even deal with everything given that privacy concerns are met. For example, you could set it to make measurements where and when it is decided to without you explicitly knowing. These kind of ideas i am circling in my mind. It might not be that good, yet’ but this is the idea of my work. To make participatory sensing campaigns a thing.  We are still a bit far from a real-world trial but we are getting there. We need to get our facts right (in terms of assumptioms about the problem) and make it as good as possible given the uncertain environment and uncertain human behavior that AI will encounter.

So, will AI destroy the world? I don’t know. What I know is that the same research done to save the world, could easily be modified to destroy it. Imagine that 10 F-16 are deployed to bomb different terrorist bases. Or, a number of unmanned aerial vehicles sent with a pre-determined target. Now, someone could ask, where and when should these planes or drones release some bombs on their way to their targets or on their way back in order to maximize the damage caused to the enemy given that each bomb has a specific cost? Well, unfortunately,  the solution is already given by my AI algorithm. However, thankfully, the answer to this cannot be computed now as one important component is missing.

What does it mean when we say we collect information by taking a measurement? We imply that there is some sort of model over the environment that will give us a number or a value or something other than the air pollution index. Fortunately, for environmental monitoring there is a lot of work on how to do this. We choose to use Gaussian Processes because of their power, flexibility but most importantly because they give you the uncertainty over the locations of interest both in space and time. More about them in another post! So, to destroy the world you would need such a model.

AAMAS (Autonomous Agents and Multi-Agent Systems) 2015 Conference

I recently have had the opportunity to attend one of the most well-known and prestigious conferences in the area of Artificial Intelligence and Agents more specifically. The conference this year took place in the Congress Centre in Constantinople.

For me, it was the first conference ever I attended and I have to admit it was a wonderful experience overall. It was also the first time that I gave a talk in front of so many people, experts in the field! I was a bit shaky and nervous but everything went as planned.

My talk was allocated in the Applications session on Wednesday evening on the 6th of May 2015. I was at the conference centre early on to watch other talks. In particular, I was in Bio-inspired Approaches session. in my opinion, the very first talk was the best one in this session as the speaker got into the trouble of explaining in layman’s terms the important bits and pieces of the talk. Second on my list would be the Firefly-Inspired Synchronization in Swarms which I believe was an important concept. Specifically, it was about the way fireflies synchronize their flashing without ever explicitly communicating with each other. As the speaker noted, it is kind of the same with women’s menstrual cycle.

Other talks that got my attention was HAC-ER presented in my session (Applications) which was about a big project, joint paper among three universities (Southampton, Oxford, Nottingham) which was about enabling authorities or first responders to take better action after a major disastrous scenario like earthquake.

Beside attending friends’ talks I had the opportunity to discuss for a while with a guy working in a related area as me in the poster session. Hopefully, a collaboration can come out of this.

All in all, I hope that I will get lots of opportunities like this in the future.

AAMAS15

Participatory Sensing Applications

NoiseTube

NoiseTube is a project that tackles the noise pollution problem in many large cities in Europe. In particular, the deployment is focused on Brussels, Paris and London. It proposes a participative approach of monitoring noise pollution by involving the general public. Part of this project is the use of the NoiseTube app, a smartphone application which turns smartphones into noise sensors, enabling citizens to measure the sound exposure in their everyday environment. Each participant is able to share their geolocalized measurement data in an attempt to create a collective map of noise pollution, which will be available to NoiseTube community members. The main motivation for participation is the social interest. In other words, people contribute in order to understand their noise exposure, to build a collective map, to help local governments in tackling noise pollution by understanding noise statistics and to assist researchers by providing real data to analyse.

On the other hand, this project enables system designers to assess the potential of the participatory sensing approach in the context of environmental monitoring. In particular, developing a smartphone application, which is a widely adopted technology, can potentially reach thousands people that could cover large cities. Thus, provide a complete and accurate, in terms of noise exposure of individuals, noise pollution map to interested parties in order to take action.

The authors argue that although noise pollution is a major problem in cities around the world, current air pollution monitoring approaches fail to assess the actual exposure experienced by citizens. In particular, static sensors are located away from streets and emission sources in order to reflect the average pollution over an area. Consequently, it might underestimate the true exposure of people to air pollution. Thus, participatory sensing provides a low-cost solution for the citizens to measure their personal exposure and contribute to the community by taking measurements at the sources of the air pollution. This approach seems to work well, achieving the same accuracy as standard noise mapping techniques but at a significantly lower cost, as no expertise nor expensive sound level meter equipment is required.

GasMobile

GasMobile is a low-power and low-cost mobile sensing system for participatory air quality monitoring. Instead of relying on the expensive static measurement stations operated by official authorities for highly reliable and accurate measurements, GasMobile relies on the participatory sensing paradigm. In particular, GasMobile is a system developed from the combination of a small-sized, low-cost ozone sensor and an off-the-shelf smartphone. This system, besides taking ozone measurements to calculate air quality, can also exploit nearby static measurement stations to improve calibration and consequently the system’s accuracy. This system was used in a two-month campaign in an urban area. Specifically, the system was attached to a single bicycle and took measurements from several rides all around the city. The sampling interval was pre-set to five seconds, collecting a total of 2815 spatially distributed data points. Data collected were aggregated based on the area excerpt selected by a user interested in the results. To produce the map they divided this area into rectangular regions of 35×35 pixels and took the average ozone concentration of the observations in that region. Then, each region was classified into one of three categories: green, yellow, red depending on the average concentration value.

The system is currently at a prototyping stage but has great potential as it shows that air pollution monitoring can be achieved in a cost-effective manner. The results also show participatory sensing can produce results of high accuracy as the mean error for 2815 measurements was 2.74ppb which is only slightly higher than in static setting.

Citisense

Another important participatory sensing application is Citisense, which its purpose is to monitor air pollution in large regions such as San Diego, California, US. Citisense consists of three components: A wearable pollution sensor, a mobile phone application and a web interface. Users carry the pollution sensor and the mobile phone with them throughout the day in order to learn their air pollution exposure. The web interface provides more detailed reflection on the air pollution exposure as well as air pollution maps built with historic user’s air pollution data. The sensor is connected via Bluetooth to the mobile phone and it is able to take measurements for five days in a single charge. The mobile phone app is responsible for collecting readings from the sensor and presenting them to the user. Each reading is time-stamped and geo-tagged by utilizing mobile phone’s GPS and network-based localization services. Citisense was conducted in the field for one month, involving 16 participants. The results show that the users exposure differ from the average measurements displayed by static sensors scattered in cities. In particular, participatory sensing approach is able to identify pollution hot spots in the micro-environment that have been developed due to busy roads, buildings and natural topology. Also, Citisense made an impact on the awareness of people. Specifically, participants understood better the properties of air pollution and in particular, they realized that being near busy streets or buses, air pollution spikes. However, as the authors admit, power management is an important challenge.

ExposureSense

ExposureSense is a participatory sensing project that attempts to monitor air pollution in cities. It exploits the increasingly number of sensors that smartphones tend to have to convert them in to powerful mobile sensor devices. ExposureSense has a different approach than other participatory sensing applications for air pollution. It attempts to correlate humans’ daily activities and air quality monitoring in order to estimate user’s daily pollution exposure. To do so, smartphone’s accelerometer is used to infer the activities of users and external mobile sensor is used for air quality monitoring. In particular, machine learning techniques are applied on accelerometer data to infer users’ daily activities. In order to gather data from mobile devices they connect smartphones to air quality sensors via a USB cable. Data are also collected from external sensor networks which are combined with data collected from the users and interpolation is performed. Data is spatio-temporally correlated in order to estimate people’s daily pollutant exposure. Exposure intensity is scaled based on activity type, burned calories and movement speed.

HazeWatch

HazeWatch is another low-cost participatory sensing system for urban air pollution monitoring in Sydney. HazeWatch uses several low-cost sensor units attached to vehicles to measure air pollution concentrations, and users’ mobile phones to tag and upload data in real-time. This project identifies the disadvantages of current approaches, i.e., using static sensors to monitor air pollution in cities. In particular, typically, there are only a few statics sensors scattered in cities and air pollution is inferred with the use of mathematical models which require complex input, such as land topography, meteorological variables and chemical compositions. This leads in to potentially inaccurate inferences as well as underestimation of the true exposure of the public to air pollution. HazeWatch aims to crowdsource fine-grained spatial measurements of air pollution in Sydney and to engage users in managing their pollution exposure via personalized tools. Specifically, HazeWatch among others, suggest low pollution routes to users.

P-sense

P-sense is a work in progress that utilizes the concept of participatory sensing to monitor air pollution. The ultimate goal of this project is to allow government officials, international organizations, communities, and individuals to access the pollution data to address their particular problems and needs. P-sense enables air pollution measurements in a finer granularity rather than what is currently achieved by having static sensors in cities. It also enables users to assess their exposure to pollution according to the places visited during their daily activities. P-sense is easily extensible to allow the integration of existing data acquisition systems that could enrich the air quality dataset. P-sense consists of four main components: the sensing devices, the first-level integrator, the data transport network, and the servers. The environmental data are collected by a number of sensors such as gas, temperature, humidity, carbon monoxide, carbon dioxide and air quality sensors integrated to mobile phones via Bluetooth. All environmental data acquired from those sensors are transmitted to first-level integrator device, i.e., mobile phones. The phone is able for real-time analysis of data, providing visual feedback to users. The first level integrators transmit environmental data over the Internet (data transport network) to a dedicated server where they are stored and processed. Users are able to connect to the server and get visual feedback for the data. However, there are several important research challenges to address before this system is deployed in the real-world. These are related to data validity, incentives, visualization, privacy and security. Moreover, as in other applications, the Bluetooth connection drains mobile phone’s battery.

CommonSense

CommonSense is a participatory sensing project that aims to design a mobile air quality monitoring system. They conducted interviews with citizens, scientists and regulators in order to derive the principles and the framework for data collection and citizen participation. Unlike other applications, they break analysis into discrete mini-applications designed to scaffold and facilitate novice contributions. This approach allows the community members impacted by poor air quality to engage in the process of locating pollution sources and exploring local variations in air quality. Based on the fieldwork, a set of personas was developed to characterize relevant stakeholders. Specifically, `Activists’ are responsible for orchestrating actions and publicizing environmental issues. `Browsers’ are interested in environmental quality but not directly involved in sensing. `Data collectors’ are novice community members which are likely to be affected by air pollution. Also, the main principles extracted from the interviews with people are: Goal-oriented, i.e., what is the personal exposure of individuals and what are the hot spots in the city? Local and relevant, i.e., participants are interested mostly about their neighborhood and areas that frequently visit. Elicit latent explanations and expectation as well as prompt realizations are about taking into consideration the local knowledge of people and expertise, such as beliefs about the sources of air pollution in their area. Language barriers, i.e, users could be benefited by be introducing them to the scientific language where possible. This analysis lead to the development of a framework that is divided to six phases: Collect, Annotate, Question/Observe, Predict/Infer, Validate, Synthesize. Collect is the phase where the actual sensing takes place. Annotating is the step after collecting data where data collectors provide addition insights that contextualize and supplement it. Question/Observe is the step where data collectors begin to ask basic questions such as what is the personal exposure of each one or whether air quality is bad in their home based on their own and other collectors data. Infer/Predict builds on these questions and predictions are made for the unobserved locations. Validation is the stage where data collectors’ data are compared against with data from organizers and activists and check whether there is enough coverage of the interested area. Finally, Synthesize is the highest level where data are integrated and documentation, reports and other deliverables are produced.

Besides relying on citizens to take measurements, CommonSense attempts to monitor air pollution by other means. In particular, in one study they run trials with air quality monitoring devices attached to the rooftops of street cleaners in the city of San Francisco. These devices are associated with mobile phones that send data into CommonSense servers. This way, a systematic coverage of a large city can be achieved as well as test, refine and calibrate the system for future deployments.

OpenSense

OpenSense is a project that aims to monitor air pollution in large cities like Zurich, Switzerland. More than 25 million measurements were collected in over a year from sensors attached to the top of public transport vehicles. Based on these data, land-use regression models were built to create pollution maps with spatial resolution 100m X 100m. One of the challenges that this approach aims to tackle is the lack of fine-grained spatio-temporal air quality data. Static sensors are expensive to acquire and to maintain and thus only a few are placed in every city. The proposed system consists of 10 nodes installed on top of public transport vehicles that cover a large urban area on a regular schedule. The collected data are processed and predictions about the unobserved locations are made using the regression models. Although this is a good approach for providing fine grained spatio-temporal information about air pollution, nothing is said about the battery consumption of the sensors that are used to send the measurements in real-time over GSM and use GPS satellites to get their location. Also, measurements are only taken in roads where there are buses routes and since sensors are placed on top of them they endure vibrations, heat, humidity and long operating times which might lead into inaccurate measurements.

The Next Big One

The Next Big One is a participatory sensing application for the early detection of earthquakes. These events are difficult to model and characterize a priori. Thus, it utilizes the accelerometer sensors available on smartphones in a way to detect rare events and earthquakes. The focus of this project is to harness the power of the crowd, i.e., the wide availability of accelerometer sensors, for early earthquake detection. In shake table experiments, it is found that it is possible to distinguish seismic motion from accelerations due to normal daily use. However, for this application to be robust thousands of phones must be utilized. It is estimated that a million phones would produce 30 Terabytes of accelerometer data per day.

TrafficSense

TrafficSense is a participatory sensing application for the monitoring of road and traffic conditions. In particular, this application relies on people carrying their smartphones with them while traveling and utilizing their sensors like accelerometer, microphone, GSM radio, and/or GPS sensors to detect potholes, bumps braking and honking. The effectiveness of the sensing functions were tested in the roads of Bangalore and it is shown that is it possible to monitor the roads using a variety of sensors built in the smartphones that users carry with them. In particular, the accelerometer was used for braking detection and to distinguish pedestrians from users stuck in traffic. Also, it is used to detect spikes that would suggest bumps in the roads. Audio was recorded using phones’ microphone in order to detect noisy and chaotic traffic conditions. Finally, GPS and GSM cell triangulation are used to localize users’ positions.