Welcome ! this project has been developed as part of the Full Stack Deep Learning course (2022 edition). The aim of the project was to develop an ML-powered web application that allows users to anonymise specific classes of objects in an image (e.g. faces, people, text). This problem is critical in a lot of domains and the applications include preserving privacy, protecting confidential information, removing branding references etc. The work we’ve done is open source and available in Github
The aim of this document is to present our solution, the steps we took to build it as well as the lessons learnt
First of all, this project has been built by team_003
:
At a high level, our solution is based on a two step approach: First, use deep learning models (object detection and segmentation) to locate the target object(s) in the input image; then allow the user to “customize” the way the anonymisation should be done
Our implementation is composed of three main blocks:
The first MVP
blur
and color
, and anonymising the region within the bounding box
or the pixels
(depending on the model) by also varying the blur strength
class
or by instance
compound
feature to allow the user to anonymise the image based on predictions coming from different models or different classes and instancesSwitching to Streamlit
data flywheel
. Gradio, has a flag
mechanism but we wanted to find a way to add user annotations
The admin interface
allows us also to access the feedbacks that the users can send us
Streamlit vs Gradio (which one to choose as a developer) First of all, both Streamlit and Gradio are great tools and we didn’t have any experience with any of them when we started. So the feedback below is based on the difficulties we encountered as beginners:
Reasons to Choose | Things to be aware of | |
---|---|---|
Gradio |
|
|
Streamlit |
|
|
Main lesson learnt: When using Streamlit or Gradio and you want to build a complex UI, you need to deploy as soon as possible and test how your design impacts latency in a production setting (testing in localhost is a very misleading benchmark!); you may then have to use a more “traditional” javascript framework
The backend consists of three main components:
multi-model
support in our application, the detector module manages:
automatic instantiation
of the different models: This is done through a config file that lists the models that should be exposed in the UI as well as their default parameters. This allows us to deactivate/add models with no change/new development in the backend or the frontend, as long as these models are instances of classes that are available in the models modulepredictions data
required by the Stremlit or Gradio apptarget regions
in the image that need to be anonymised based on the user input (as a reminder you can anonymise all instances of a class or by instance)Visualising
output images (note: we are planning to have a separate visualisation module)new user annotations
, as this impacts most functions aboveMain lesson learnt: When dealing with several deep learning models in your app that may be big in size, decoupling the inference server from your web app (even for an MVP) should be considered from the beggining. In particular, the memory requirements (even if the models are loaded only once), can have a huge impact on latency and functioning of the app
Our approach to model training has been very pragmatic since day one: Leverage pre-trained models as much as possible especially during the first stages of MVP building
We identified several alternatives, but then decided to use Detectron2 for multi-class and person detection and segmentation, EasyOCR for text detection and facenet for face detection
We also trained our own face segmentation model by leveraging the Detectron2 framework. The training has been done on lambdalabs and the artifacts stored in Weights and Biases
One of the key elements when it comes to managing multiple detection models, was to define a common output interface
. This allowed us to have almost no model specifc logic in the backend, and to add new models with minimal development effort
Main lesson learnt: Pre-trained models can be a great way to get you started; however be careful and make sure to understand them especially when it comes to the output they generate, their input/default parameters, the dataset that was used etc.
We containarised our services and provided instructions/scripts that can be found in our Github repo
Since we had access to the Lambdalabs instance during the course, and given that several models benefited from GPU inference, we decided to deploy the stack to Lambdalabs for the demo day (using in-app inference)
The app (both streamlit and gradio) can however be used in both GPU and CPU mode
(just by switching the config file), so having access to a GPU during inference is not a requirement; and the inference times in CPU mode are still in the single digit/low teen seconds (so still manageable)
To store data (coming from feedbacks and user annotations), we leverage Docker volumes
that can be shared between containers
We included the download of model artifacts in the docker build stage in order to avoid having long downloads during app start-up. The docker entry point is itself a script that allows us to switch between the different app versions and modes
Finally to provide https access, we use a Ngrok agent (instruction also available on Github)
Main lesson learnt: Even if you are still in an early stage and not using managed/orchestration services yet, there are simple optimisation tasks that can help massively with deployment including:
This has been a very rewarding experience. There are obviously so many things we wish we had more time to do, so many mistakes we should have avoided, but we did learn a lot during this 4 week period and we hope that this write-up can give you some interesting insights if you are at the beginning of your ML-product building journey 👻