My Transition Into Data Science

For the last couple of years, I’ve been fascinated by the progress of machine learning, and the potential applications it enables. I started a journey of learning, building, researching and designing machine learning applications. In this post, I will share my approach and plans for the future.

Getting started

I started learning a few years ago with the classic Coursera Machine Learning course by Andrew Ng. It got me hooked, stirred up my interest in machine learning, provided an intuition into the way it works, but left largely unable to do anything practical about it - if only for the reason that it was based on Octave. I needed a practical challenge, so after my daughter was born, I planned to build an app that would generate nursery rhymes in Polish. I went through the Deep Learning specialisation on Coursera, again by Andrew Ng, which was much more hands on and practical. I experimented with LSTMs to generate the rhymes, found out a lot of practical limitations around machine learning, and ultimately shipped the app - although in the background it used an N-gram based language model, a much more traditional technique.

Another breakthrough in my personal journey was discovering Fast.ai. The courses, the library, the community, and specifically the advice from Fast.ai crew: Jeremy Howard, Rachel Thomas, and Sylvain Gugger, gave me a whole new level of practical skills, understanding and motivation to apply machine learning. It keeps amazing me how we can learn from some of the best teachers in the world, and I’m very grateful for that.

Practice, practice, practice

Outside of my family and machine learning, my great passion is kung fu, so I’m used to regular practice and continuous improvement. In a typical sifu/sensei fashion, Jeremy Howard keeps advising to practice machine learning, experiment with code, develop applications, join Kaggle competitions. For me, Kaggle in particular was a huge boost to my skill level and confidence. I got to develop solutions to practical business problems, including localisation of steel defects, fraud detection, natural question answering and many more. Kaggle platform provides a great way to get started, via starter notebooks and solutions shared by the community, and it provides a way to keep learning and improving by analyzing the final solutions shared by the winning teams. Getting ranked on Kaggle is also a very good credential, in particular for people like me who moved into data science from a different field.

Applications

I’m lucky that my job, which is about identifying and executing ideas to transform business processes with technology, allows me to benefit from my machine learning skills and apply them. I’ve gradually transitioned from a pure manager role to half-manager, half-hands-on practitioner, which I feel results in a higher ROI on my salary and is also much more interesting. Applying machine learning in a big company is subject to multiple constraints though (which is a very sensible approach). I’m compensating for this by working on more ‘crazy’ projects outside of work, which includes some research and some potentially commercial applications, that I’m hoping to reveal in the future.

Blogging

Rachel Thomas wrote a great blog post about blogging. I feel like I’m already late to get this started, but it’s still better to do it now. I’ll keep sharing my learnings, insights, and projects on this blog. I’m especially interested in applying NLP to Polish language, and related applications, so if you’re interested in these topics, do follow me on Twitter!