Top 3 Books a New Data Scientist Should Read

Photo by Paweł Czerwiński on Unsplash

Starting fresh in data science can be challenging sometimes. There are plenty of courses out there that promise to teach you everything from how to code to what is Principal Component Analysis (PCA) or train a Deep Neural Network. Some of them might do. But in my opinion, there are some books that can greatly benefit any data scientist. Shortlisting all the books I think are likely to be useful will rather confuse the reader as to where to start from. Besides I am sure there are tons of resources to find lots of books in this field. In this post, I will focus only on the top 3 books that in my opinion are sufficient to not just learn the basics but go far beyond. These books are rather classic (but still very relevant). If you are already a practicing data scientist who wants to learn about the latest developments in deep learning or so then this might not help you as much but still might provide you with some resources to refresh the fundamentals.

1. Machine Learning by Tom Mitchell

This is one of the first books I read to understand and learn about machine learning in general. In particular, I enjoyed the pages where it describes how a decision tree works; you know the building block of perhaps the more popular random forest approach. So, it teaches you to calculate entropy (the amount of information) recursively for each attribute/feature in your data. I encourage you to read that part as it is as straightforward as teaching it can get — check out page 52 onwards. The language is simple and there are plenty of details and examples taking you by the hand and showing you how actually the algorithm can learn patterns from the data.

2. Elements of Statistical Learning by Trevor Hastie

This book approaches machine learning from a statistical perspective, which I believe is essential to understand why machine learning algorithms actually work, what could maybe go wrong and understand the importance of having good data. The book spends a lot of pages on linear models but it is worth it as you will get slowly introduced to the concepts. But don’t think that this is just that. It covers everything including neural networks even if it doesn’t get into Deep Learning, which is a “recent” trend anyway! The examples presented along with the associated graphs help the reader grasp the concepts and gain an understanding of the described method. Just as a note, the book includes some maths and the corresponding math notation but don’t get put off by that. Just embrace how elegant formulas can capture the essence of the section. Besides, the description should be sufficient to understand the approach. Reading it is definitely worthwhile. It can also serve as a reference for data science practitioners to remind you of the model assumptions, differences between models and as a general data science refresher.

3. Pattern Recognition and Machine Learning by Christopher Bishop

The book needs no introduction. One of the best machine learning books ever, and recommended textbooks for the machine learning courses at Imperial College London, UK, a few years back. Starting with concepts that are crucial in machine learning like the curse of dimensionality, probabilities and distributions, decision theory and information theory to more advanced mathematical concepts required for machine learning. In order to follow the more advanced concepts, knowledge of mathematics and statistics would be useful. Nevertheless, a must-have book in your collection.

Classification Models Pros and Cons


Classification Model Pros and Cons (Generalized)

  • Logistic Regression
    • Pros
      • low variance
      • provides probabilities for outcomes
      • works well with diagonal (feature) decision boundaries
      • NOTE: logistic regression can also be used with kernel methods
    • Cons
      • high bias
  • Decision Trees
    • Regular (not bagged or boosted)
      • Pros
        • easy to interpret visually when the trees only contain several levels
        • Can easily handle qualitative (categorical) features
        • Works well with decision boundaries parellel to the feature axis
      • Cons
        • prone to overfitting
        • possible issues with diagonal decision boundaries
    • Bagged Trees : train multiple trees using bootstrapped data to reduce variance and prevent overfitting
      • Pros
        • reduces variance in comparison to regular decision trees
        • Can provide variable importance measures
          • classification: Gini index
          • regression: RSS
        • Can easily handle qualitative (categorical) features
        • Out of bag (OOB) estimates can be used for model validation
      • Cons
        • Not as easy to visually interpret
        • Does not reduce variance if the features are correlated
    • Boosted Trees : Similar to bagging, but learns sequentially and builds off previous trees
      • Pros
        • Somewhat more interpretable than bagged trees/random forest as the user can define the size of each tree resulting in a collection of stumps (1 level) which can be viewed as an additive model
        • Can easily handle qualitative (categorical) features
      • Cons
        • Unlike bagging and random forests, can overfit if number of trees is too large
  • Random Forest
    • Pros
      • Decorrelates trees (relative to bagged trees)
        • important when dealing with mulitple features which may be correlated
      • reduced variance (relative to regular trees)
    • Cons
      • Not as easy to visually interpret
  • SVM
    • Pros
      • Performs similarly to logistic regression when linear separation
      • Performs well with non-linear boundary depending on the kernel used
      • Handle high dimensional data well
    • Cons
      • Susceptible to overfitting/training issues depending on kernel
  • Neural Network (This section needs further information based on different types of NN’s)
  • Naive Bayes
    • Pros
      • Computationally fast
      • Simple to implement
      • Works well with high dimensions
    • Cons
      • Relies on independence assumption and will perform badly if this assumption is not met

Microsoft Research PhD Summer School 2016

I was lucky enough (thanks to Dr. Matteo Venanzi) to be invited to Microsoft’s summer school that takes place in Cambridge, UK, every year (4-8 July 2016). This is a one week school that aims to familiarise PhD students with the work at Microsoft. Moreover, students get the opportunity to present their work to other students, communication experts and Microsoft researchers. The benefit of this is twofold. First, students get feedback on their work and their presentation skills and style. Also, it is an opportunity to meet new people and measure the impact of your work in terms of the interest your poster attracts. At the same time, it is an opportunity to evaluate whether your work has any relevance to anything that Microsoft’s is currently working on. The school focuses on early PhD students but late students, like me, attended as well.

In terms of the everyday schedule, as I already mentioned, we had some coaching on how to communicate our research but we also had some more technical lectures as well. For example, I particularly liked Prof. Bishop’s presentation (“The Road to General Artificial Intelligence”). He focused on the advancements of AI in the recent years and that the potential is still greater.

Overall, I believe this yearly summer school is a great initiative from Microsoft Research and I am glad I was a part of it.


Group-pic-2-2 copy.jpg

AAAI (Association for Advancement of Artificial Intelligence) 2016 Conference

Phoenix, Arizona, USA, 2016

The Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) was held in February 12–17 at the Phoenix Convention Center, Phoenix, Arizona, USA. I was lucky enough to attend.

First of all, it was my first time in the USA and I have to admit that I was impressed. The buildings, the streets, the coffee, the food, the life!

Concerning the conference itself, it attracted a lot of people and celebrity academics working in the area of Artificial Intelligence. The first couple of days I attended interesting tutorials. From the traditional heuristic search to the trendy and hot topic of Deep Learning!! I will just mention that the deep learning tutorial was meant to be for students or people who were not familiar with the concepts. Even though they started from scratch, they managed to discuss about complex concepts by the end. So, the speakers presented what is included in their new book (check the book on Amazon) which is available for pre-orders at the time of writing this post.

On the days of the actual conference I had the opportunity to attend a number of talks about machine learning, security and multi-agent systems. I also attended the educational panel where Russell Stuart and Peter Norvig  announced the release of  the 4th edition of their famous book: Artificial Intelligence: A Modern Approach. The book will include material on Deep Learning and Monte Carlo Tree Search and meta-heuristics.

The evening of the first day of the conference I attended an invited talk from Andreas Krause who is currently leading the Learning & Adaptive Systems Group at the ETH University. He covered a lot of application themes as  the presentation’s title was From Proteins to Robots: Learning to Optimize with Confidence. The focus though was on Bayesian Optimization and how submodularity can be handy in such approaches in terms of providing guarantees and bounds on the solutions provided.

The next day, I attended another important talk. Demis Hassabis was there, the CEO of DeepMind which was acquired by Google for hundreds of millions back in 2014. There, it was announced that DeepMind’s AlphaGo will face the world champion in the game of Go in mid March (starting on March 9th 2016) and that it will be live streamed. For more details click here. Beside advertising and promoting DeepMind’s brain child, Demis went in to some details about the algorithms they used, giving us  intuition on how they approached the problem, why and how their algorithm works. Concretely, he said that contrary to people’s belief, not everything in Deepmind has to do with Deep Learning, but instead they rely on reinforcement learning which is an area inspired by behavioural psychology and our attempt to understand how brain works and ultimately how we learn. Put simply and briefly, reinforcement learning is about rewarding behaviour that is beneficial and penalising behaviour that is detrimental in some context.

My talk was just after lunch on the 3rd day of the conference. I was in the Computational Sustainability session and I only had a spotlight talk, which means I had to talk for 2 minutes in order to advertise my work and invite people to the poster session in the evening where I could present and talk more about my work. The session was interesting overall. The key speaker in my opinion was Pascal Van Henteryck. He and his students had 2 or 3 presentations in that session. It was about evacuation plans in cities that are vulnerable to flooding and natural disasters. Pascal is very well-known in the area and I personally have attended his online optimization course in Coursera.

The evening of that day I presented my poster in the allocated session. A lot of people stopped by and asked about my work and I am glad about it.

All in all, AAAI was a very good experience and I feel lucky to have the opportunity to be there! If you want to know more details about it or talk about specific papers please get in touch with me.