3 Most Asked Python Interview Questions for Data Scientists

Many blogs and websites have several questions and advice with regards to data science interviews. However, most data science interviews want to test candidates software engineering skills as well. In this post we explore the top 3 most common questions data scientists get asked about Python.

See the rest of the post here.


How AI and humans can optimise air pollution monitoring

Air pollution is responsible for 7 million deaths per year according to World Health Organization (WHO). Thus, it is crucial to dedicate resources to learn and monitor air quality in cities to assist authorities in urban planning as well as bring awareness to people about the impact of air pollution to their everyday life. In our research, we provide the framework and the algorithms, utilising the power of Machine Learning to effectively monitor an environment over time.
In particular, our proposal relies on the willingness of people to participate in environmental air quality campaigns. People can use  mobile air quality devices to take readings in their city or their neighbourhood. However, the major issue is when and where these readings should be taken to efficiently monitor the city. People cannot provide an unlimited number of measurements and thus readings should be taken in a way such that information about the environment is maximised. In other words, we need to solve an optimisation problem constrained on the number of readings people can provide over a period of time to facilitate an efficient environment exploration.
In order to solve the problem, we need to model the environment in a certain way as well as a way to measure the information entailed in each reading (since we are interested in gaining the most information by taking a limited number of readings). To do that, we overlay a spatio-temporal stochastic process over the area of interest (Gaussian Processes). Gaussian processes can be used to interpolate over the environment, i.e., predict the air quality value at unobserved locations as well as predict the state of the environment into the future. Importantly, Gaussian Processes can also be used to provide a measure of uncertainty/information about each location in space and time (by utilising predictive variance).
The problem is evolved into taking a set of measurements such that a utility function, created based on predictive variance provided by Gaussian Processes, is maximised. Going a step forward, to solve this problem, we use techniques and algorithms from the broad areas of Artificial Intelligence and Multi-agent systems.
In particular, an intelligent agent can decide when and where measurements should be taken to maximise information gained about the air quality, while at the same time minimise the number of readings needed. The agent can employ greedy search techniques combined with meta-heuristics such as stochastic local search, unsupervised learning (clustering) and random simulations.
The main idea is to simulate the environment over time, asking what if kind of questions. What if i take a measurement now, and one in the night. What if i take measurement downtown or near the home. These kind of questions are answered  by running simulations on a cluster computing facility.
Finally, our findings indicate a significant improvement over other approaches.

Inference VS Prediction: What do we mean, where they are used and how?

A lot of people seem to confuse the two terms in the machine learning and statistics domain. This post will try to clarify what we mean by the two, where each one is useful and how they are  applied. I personally understood it when I had a class called Intelligent Data and Probabilistic Inference (by Duncan Gillies) in my Master’s degree. Here, I will present a couple of examples in order to intuitively understand the difference.


You observe the grass in your backyard. It is wet. You observe the sky. It is cloudy. You infer it has rained. You then open the TV and watch the channel weather. It is cloudy but no rain for a couple of days. You remember you had a timer for the sprinkler a few hours ago. You infer that this is the cause of the grass being wet.

(The creepy example) Imagine you are staring at an object in the evening that is a bit far away in a corner. Getting closer… you observe that the object is staring back at you. You infer that is an animal. You are brave enough and you are getting closer.  You can now see the eyes, the fur, the legs and other characteristics of the animal.  You infer that it is a catA simple procedure for your brain, right? It feels trivial to you and probably stupid to even discuss it. You can of course recognize a cat. But in fact this is a form of inference. Say the cat has some features like: eyes, fur, shape etc. As you get closer to it, you assign different values to these variables. For example, initially eyes variable was set to 0, as you couldn’t see them. As you move closer you are more certain of what you observe. Your brain takes these observations and converts them in the probability that the object is a cat. Say we have a catness variable that represents the possibility of the object being a cat. Initially, this variable could be near zero. Catness is increased as you move closer to the object. Inference takes place and updates your belief about the catness of the object.   Similar example can be found here: http://www.doc.ic.ac.uk/~dfg/ProbabilisticInference/IDAPISlides01.pdf


You observe the sky. It is cloudy. You predict that is going to rain. You hear in the news that the chances for rain despite the clouds are low. You revise and predict that most probably is not going to rain.

Given the fact that you own a cat, you predict that when you come home, you will find it running around.

Final Example:

Understanding the behaviour of humans in terms of their daily routine, or their daily mobility patterns requires the inference of latent variables that control the dynamics of their behaviour. The knowledge of where people will be in the future is prediction. However, prediction cannot be made if we have not inferred the relationships and dynamics, let’s say, of the humans’ mobility.


Inference and prediction answer different questions. Prediction could be a simple guess or an informed guess based on evidence. Inference is about understanding the facts that are available to you. It is about utilising the information available to you in order to make sense of what is going on in the world. In one sense, prediction is about what is going to happen while inference is about what happened. In the book “An introduction to statistical learning” you can find more detailed explanation. But the point is that given some random variables (X1, X2…Xn) or features or, for simplicity, facts, if you are interested on estimating something (Y) then this is prediction. If you want to understand how (Y) changes as random variables change, then it is inference.

In a short sentence:  Inference is about understanding while prediction is about “guessing”.