How to develop machine learning skills in all of your company's employees | Technology
|How to develop machine learning skills in all of your company's employees|
How to build machine learning skills for every employee in your company
Everyone loves Artificial Intelligence (AI) and data science (DS), and it's probably not going to change for the next decade or so. Still, most people only have a general idea of what data science is and what machine learning algorithms or AI can do.
This is quite normal and a common phenomenon for all fields of expertise. Think about it: do you really know what DevOps, Support or NOC (Network Operation Center) actually do? Sure, as tech professionals we can probably explain it better than people who aren't part of the industry, but in most cases it's pretty hard to really understand what other people are doing if you've never done it yourself.
In most cases, this would be quite fine, because while gaining knowledge in other fields of expertise is always good, you can get by quite well without knowing everything. In fact, in most cases, the additional knowledge may not even help you. That said, Data Science is quite different from the other examples I just mentioned because data is everywhere. It's easier than ever to store and manipulate data, so data science and data-driven decision making are always relevant. Every department in every organization can benefit from data-driven decision making. DevOps can use Machine Learning (ML) algorithms to test your pipelines and detect anomalies, Support can use clustering algorithms to group similar client requests together and reduce your workload, and Network Operation Center (NOC) can use anomalies to detect malfunctioning networks. Since everyone can benefit from DS, we decided to find a way to help empower all willing employees with data science skills and spread some data science love.
Data Science Workshop Objectives
We decided to create a workshop that would allow anyone with basic knowledge of Python to quickly understand what DS is and understand "when, why, where and how" it can be used.
We decided to set the following goals:
- Provide "data science ambassadors" outside of our team with tools and a basic understanding of data science so we can work and collaborate with them.
- Keep training short and highly effective so “ambassadors” can do it while they are “on the job”
- Promote awareness of data-driven decision making and communicate the benefits of using it
We would achieve these goals with the following agenda:
- Explain ML and basic algorithms
- Demonstrate how to detect machine learning issues
- Practice hands-on ML with Python and Sklearn
Our first thought was that there must be a DS course available online. And we found many online courses, but none of them suited us for the following reasons:
- Very few courses cover as much material as we wanted to cover in a short amount of time
- Most courses are not designed to be taught in a classroom, but rather are intended for self-learners.
- Mathematics and statistics were not a prerequisite for the workshop, so we needed theoretical explanations of ML models that anyone could understand.
We came to the conclusion that we would have to create the workshop ourselves.
Below, I'll explain why we created our workshop schedule and why it was so effective at teaching data science to newcomers in such a short amount of time.
Explain ML and basic algorithms
As Imperva's largest SD team, we engage with people from many different departments, such as Development, Product, and Support, during our projects. One of the things we noticed was that people have a hard time understanding how our ML solution fits into projects and what it can actually do. At first we thought that because the ML solution was new, it was difficult for people to understand how it fit into the project, but to our surprise, in future projects these problems were not repeated and it became quite easy to explain how our ML solution works. ML. solutions could fit the project. We came to an understanding that people weren't having problems with our ML solution specifically, but with ML in general, and once they got the basics down, everything became easier.
Demonstrate how to detect Machine Learning issues
At work, no matter our title or job description, we are surrounded by manual tasks. For the most part, these tasks cannot be automated and definitely require human interaction. However, there are some tasks that seem like only a human can do, but a decent data scientist can probably create an ML model that can do the job.
Due to priorities, our DS team can only take on a limited number of projects, and these projects typically revolve around the company's core products. When we find some time to work on peripheral projects, we don't want to waste it going through the thousands of manual tasks going on in the business and discovering which ones can be automated using ML.
Our solution was to train employees from other departments to be "data science ambassadors." These ambassadors will have enough knowledge to spot problems that can be solved using ML and then, depending on the complexity of the problem, they will build a model themselves, build a model with our mentorship, or simply pass the problem on to us to add to our backlog.
Practice hands-on ML with Python and Sklearn
Instead of just giving a high-level explanation, we wanted people to actually practice ML because we believe that practice is the best way to learn. Through this type of training, the participants could not only understand ML and spot ML-related problems, but also start thinking about solving these problems on their own. It will also allow them to try out DS and find out if it's something they'd like to do more often.
How did we do it
We divided the workshop into four days, using three hours each day to make the workshop accessible to people while they were doing their jobs. We also made sure that each day had a different theme:
- Pre-workshop: Installations of basic tools such as Python, Jupyter Notebooks, and basic Python math, ML, and visualization packages
- Day 1 - Overview of ML and core Python packages (Numpy, Pandas, Seaborn)
- Day 2 - Supervised Learning: Linear and Logistic Regression, Decision Trees, and Random Forest
- Day 3 – Unsupervised Learning – DBScan and K-Means
- Day 4 – EDL (Exploratory Data Analysis), Feature Engineering, evaluation metrics and final project
For the most effective delivery method, we use the following trinity:
- Code examples
We used an online presentation tool for our slides because we wanted to share the slides and correct/edit them as we go. We use slides to explain ideas, concepts, and algorithms without showing any code.
We use Jupyter Notebooks to show live code samples. Each participant could clone the notebooks from our Git repository and run the code themselves. Not only did this help them understand the different commands, but it was also something they could keep after the workshop was over. Also, because notebooks are common practice for data scientists, simply using a notebook was a training in itself and meant they had an environment set up for them if they wanted to dig deeper into DS.
Fortunately, the Internet is full of example data sets that we could use to demonstrate different concepts, methods, and algorithms.
For the exercises we also use Jupyter Notebooks. This allowed us to start the participants with some basic commands for the exercise, such as uploading the data, and allowing them to focus on the exercise itself. Also, because the code examples were in the same notebook as the exercise, it was very easy to copy and paste the relevant commands required for each exercise.
Behind the scenes: Final Notes
Having a workshop with good content is good, but it takes more than good content to make it great!
First of all, we send out a survey every day so that we can get feedback on the day before and improve the day after the workshop. In the survey, we used a scale from 1 to 5 to obtain feedback on the following topics:
- Logistics: schedule, snacks and breaks
- Content Quality: how good the exercises and slides were.
- Relevance of the content: how relevant was the content to the daily work of the participants
- General comments
Some examples of questions we asked in the survey:
On top of that, we made a few extra efforts to make sure the workshop runs smoothly:
- Communications: We add everyone to a slack channel and mailing list
- Snacks – Because the workshop is quite intense, we wanted to make sure everyone stayed focused during each session, so we ordered large trays filled with delicious snacks like sandwiches, fruits, vegetables, and sweets every day.
- SWAG – finally, in order to give a sense of belonging to the participants, we deliver custom-made t-shirts for the workshop:
Source: Imperva, Security Boulevard, Direct News 99