Herding Code 237: Tess Ferrandez on Three Real World Machine Learning Projects

Download / Listen: Herding Code 237: Tess Ferrandez on Three Real World Machine Learning Projects

At DevSum Stockholm, Jon talks with Tess Ferrandez about some machine learning applications she’s worked on recently, from sports to shoplifting to cancer detection. Tess talks about the specific ethical considerations that come up when classifying and predicting behavior, and how they worked with them in these real-life examples.

Topics:

(00:20) Tess has been working on some applied machine learning projects with large customers lately, all focused on computer vision. One project detects soccer goals using computer vision (saving money over hardware based solutions), another detects cancer in microscopy slides, and the third detects shoplifting patterns to minimize
(02:55) Tess has been doing this work in Python rather than .NET. Jon asks if it’s possible to use ML.NET, but Tess says Python is necessary, both because the language is better suited and the community libraries are all in Python.
(04:35) Jon asks Tess about her experiences moving from .NET to Python, and Tess says it’s a struggle since it’s not strongly typed. You can use testing on the parts that handle data, but not on the machine learning parts.
(05:40) Jon asks how much of Tess’ work is done using Jupyter Notebooks. For data exploration, Jypyter works great, but for the actual execution you’ll want to use scripts so it’s testable.
(07:00) Jon asks more about how you can detect shoplifting behavior, since it’s an activity that happens over time. Tess says it’s also difficult because the prediction may be biased against a demographic, e.g. 20-40 year old men.
(07:54) Tess say ethics and machine learning are close to causing the third machine winter, and goes on to describe the previous two machine winters. We now have the machines and the data, but often the data is so unfair that it could lead to severe ripple effects. This can cause bias in predicting behavior racially, biasing against things like medical analysis due to sample source, etc.
(11:30) Jon and Tess discuss the dangers of creating bad feedback loops. Tess talks about an example where Amazon created a system to review CV’s which was biased against women because historically women have had fewer software engineering positions, so this system would have reinforced that by preventing women from getting software engineering positions in the future.
(13:35) There’s also a danger of classifying people based on pictures, since we may assume the computer is unbiased even though the bias may have been introduced due to the sample data. Classifying based in pictures would imply that either people were born criminals or criminality changes their appearance, neither of which are acceptable assumptions.
(16:09) Going back to the shoplifting case, we need to make sure we’re detecting the action of shoplifting rather than classifying the individual’s appearance. For instance, detecting poses, whether the individual was alone. Pre-trained models for things like object and activities help. There are also subtle sources of bias, for instance if all the source videos are from Christmas, the model may be biased against Santa Claus, so you also need to use pre-trained models for background subtraction.
(18:13) Jon asks how important it is to be able to understand how the decisions were made. Tess says it depends based on the impact of the decision, and explains how in the case of cancer detection they determined that color differentiation could be used as a predictor, so the actual application didn’t require machine learning. In the case of football goal detection, there was such a large amount of data (time, video, and sound), it was possible to get very good results.
(21:26) Jon asks how developers can learn more. Tess says that software engineers don’t need to start with math – you can use pre-trained models and go from there. She recommends a book called Deep Learning with Python by Francois Chollet – it’s very approachable. Tess also recommends the Machine Learning at Microsoft YouTube channel.