Create your own Vehicle Recognition system with Azure Custom Vision
Vehicle Recognition System Goals
In this tutorial, I will show you how I created my own vehicle recognition system. The goal was to determine three main components — the vehicle’s type, vehicle color, and vehicle make given a static image. To do this, I leveraged a couple of different Azure tools.
To start off, I looked at the capabilities of Computer Vision. This API is within Azure’s Cognitive Services. It is a pre-built API with certain functionalities. Given an image of vehicle(s), I could retrieve the vehicle type, dominant color of the picture, and sometimes the make of the vehicle in a JSON format. However, the make of the model did not always appear. I needed more flexibility and customization in my project to recognize specific content in the images. For example, if I wanted to recognize the vehicle make as a Lexus or not. For this reason, I decided to look into Azure’s Custom Vision service.
What is Azure Custom Vision?
Custom Vision is an image recognition service under Azure Cognitive Services. It allows you to train your own custom models with your own images. There are two types of models that you can train, an object detection model as well as a classification model. Both models will allow you to apply one or more labels to the image. With the object detection model however, you can see the coordinates of the specific label as well. In other words, you can see where the object is in an image through a bounding box.
In this project we will train a total of three custom models — one object detection model to determine the vehicle type as well as two classification models to determine the vehicle’s color and vehicle’s make.
Example Use Case:
What You Need:
- Images of Vehicles (Cars, Trucks, RV’s, Busses, etc..) to train the models
- Azure Subscription
- Azure Storage Explorer (optional)
Part 1: Create Custom Vision resources in the Azure Portal
We will need to create a training and prediction resource to use the Custom Vision service. Start by going to the Azure portal.
Click on Create a resource -> Type in Custom Vision in the search bar -> Select Custom Vision from the marketplace -> Create -> Fill out the required information.
Part 2: Create a new project on the Custom Vision page
- Classification vs. Object Detection project types
Custom Vision allows you to train either a classification or object detection model. Classification allows you to apply one or more labels to an image. Object Detection allows you to do the same, with one difference — you also get the coordinates of where the specified labels are found.
- Create three separate projects — Vehicle Type, Vehicle Color, Vehicle Make
Go to the Custom Vision portal page and sign in with your Azure Portal account. To create a new project, we click the New Project button.
Here we can identify the project type as object detection or classification. Let’s start off with the Vehicle Type Project — an object detection model to determine if the vehicle is a car or a truck.
Choose training images
1. Amount of Images
- To start training the model, you need about 15 images per label. However, I would recommend using a lot more per label for training to get more accurate results.
2. Think about lighting, angles, background.
- TIP: Consider the following factor’s while looking for vehicle images — vehicles in different angles, different lighting, image background, etc. This will significantly impact the results of the model that you train.
Upload and tag images with the right label using bounding boxes.
To add your images, click Add images and then upload them from your local files. All of the uploaded images should appear in the untagged section. Go ahead and draw bounding boxes and label them according to the appropriate label. Examples of labels include cars, trucks, rv’s, busses, etc.
Here I have an image with one car. I have drawn a bounding box around the object and labelled it as a car. It is important to ensure every single car or truck present in an image is labelled for better accuracy.
Repeat Part 2 for the other two models.
- Vehicle Color (classification)
For ex: Red vs. Blue
- Vehicle Make (classification)
For ex: Lexus or not
Part 3: Train each of the object detection and classification models
Quick vs. Advanced
Click on the Train button at the top of the Custom Vision page to finally train your model. If you want to see quick results from your training, the Quick method is the best way to go. However, for improved performance, advanced training is useful. You can specify a compute time training budget.
Part 4: Evaluate the Models
From the Azure docs, some key definitions are:
Metrics
- Precision — This is the fraction of identified classifications that were correct.
- Recall — This is the fraction of actual classifications that were correctly identified.
- Mean Average Precision — This is the average area under the precision/recall curve.
Other functionalities
- Probability Threshold — You can use this slider to select the amount of confidence that a prediction needs to have to be considered correct.
- Overlap Threshold — You can use this slider to determine how correct an object/classification prediction must be to be considered correct in training.
Part 5: Test your model
If you are not getting the results you want from your Quick or Advanced training, you might have to add more images and retrain your model.
Part 6: Publish Model
Retrieve endpoint
Click Publish -> Prediction URL -> Grab the prediction URL and prediction key
Part 7: Upload Test Images to Azure Blob Storage through Portal or Storage Explorer
Test Images
Grab some new images to test the model. Upload this to Blob Storage through the portal or through storage explorer.
Through Portal:
Go to your storage account -> Containers -> +Container -> Fill out info -> Upload your images
Azure Storage Explorer:
Find your storage account within your subscription -> Create a container under Blob Containers -> Upload your images
Code
Snippet 1: All required imports.
from azure.cognitiveservices.vision.customvision.training import CustomVisionTrainingClientfrom azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClientfrom azure.cognitiveservices.vision.customvision.training.models import ImageFileCreateBatch, ImageFileCreateEntry, Regionfrom msrest.authentication import ApiKeyCredentialsimport timefrom msrest.exceptions import HttpOperationErrorimport mathfrom PIL import Imagefrom io import BytesIOimport ioimport base64import requestsimport globimport osfrom dotenv import load_dotenvload_dotenv()
Snippet 2: Variables to store import endpoints and keys and authenticate the prediction client
credentials = ApiKeyCredentials(in_headers={“Training-key”: os.getenv(‘training_key’)})trainer = CustomVisionTrainingClient(os.getenv(‘ENDPOINT’), credentials)prediction_credentials = ApiKeyCredentials(in_headers={“Prediction-key”: os.getenv(‘prediction_key’)})predictor = CustomVisionPredictionClient(os.getenv(‘ENDPOINT’), prediction_credentials)
Snippet 3: Retrieve the three custom models
# vehicle detection: car vs truckproject = trainer.get_project(project_id= os.getenv(‘type_project_id’))iteration_name = “Iteration2”# vehicle color: blue vs redproject2 = trainer.get_project(project_id=os.getenv(‘color_project_id’))iteration_name2 = “Iteration1”# vehicle make: lexus vs acuravehicle_make = trainer.get_project(project_id=os.getenv(‘make_project_id’))iteration_name3 = “Iteration1”
Snippet 4: Retrieve the test images from Blob using the SAS tokens. Here# use trained endpoint to make a prediction
img_url = os.getenv('test_three') #red and blue lexusresults = predictor.detect_image_url(project.id, iteration_name, img_url)results1 = predictor.detect_image_url(vehicle_make.id, iteration_name3, img_url)# Now there is a trained endpoint that can be used to make a predictionprediction_credentials = ApiKeyCredentials(in_headers={"Prediction-key": os.getenv('prediction_key')})predictor = CustomVisionPredictionClient(os.getenv('ENDPOINT'), prediction_credentials)“test_three” is a variable that has a stored SAS token for the respective image. Now we can call on the trained endpoint to make a prediction on the test image.
Snippet 5: View the Image we are testing
response = requests.get(img_url)img = Image.open(BytesIO(response.content))img.show()width = img.size[0]height = img.size[1]
Output of Snippet 5:
Snippet 6: Takes main image, crops it according to the objects, and then determines the vehicle color and make for the cropped images.
f = r’c://Users/dthakar/repos/VehicleRecognition/’images = glob.glob(“C:/Users/dthakar/repos/VehicleRecognition/Images/*.jpg”)vehicles_in_img=[]img_count = 1for prediction in results.predictions:if (prediction.probability >= 0.10) :# normalized bbox coordinates -> actual coordinatesupper_x = math.floor(prediction.bounding_box.left * width)upper_y = math.floor(prediction.bounding_box.top * height)h = math.ceil(prediction.bounding_box.height * height)w = math.ceil(prediction.bounding_box.width * width)lower_x = upper_x + wlower_y = upper_y + hcroppedimg = img.crop((upper_x, upper_y, lower_x, lower_y))vehicles_in_img.append(croppedimg)name = ‘C:/Users/dthakar/repos/VehicleRecognition/Images/file_’ + str(img_count) + ‘.jpg’for i in vehicles_in_img:# i.show()i.save(name)img_count +=1for im in images:with open(im, “rb”) as file:# print(file)results2 = predictor.classify_image(project2.id, iteration_name2, file.read())# Display the results.print(“Image “ + im)print(“Vehicle Type: “ + prediction.tag_name)for prediction2 in results2.predictions:if (prediction2.probability >= 0.50):print(“Vehicle Color: “ + prediction2.tag_name +“ {0:.2f}%”.format(prediction2.probability * 100))for prediction1 in results1.predictions:if (prediction1.probability >= 0.50):print(“Vehicle Make: “ + prediction1.tag_name)
Output of Snippet 6:
Results & Next Steps:
As seen through the code snippets above, we were able to determine the type of each vehicle in the image along with its respective color and make. To summarize, the main image was cropped according to the number of vehicle types present. In our case, the test image file had two cars. Therefore, two new files were created. The first file contained a cropped image of the red car. The second image contained a cropped image of the blue car. As a next step, the custom vision models for color and make were called on the cropped images. In our results, we see an accurate response. File 1 contained a Red Lexus Car. File 2 contained a Blue Lexus Car.
Azure Custom Vision allows for lots of customization as you train the models. By giving it a high quantity of images from various backgrounds and angles, one can ensure accurate results. As a next step, one could also extend the classifiers to classify between more colors and more makes. This would be known as having multi-class classifiers instead of binary classifiers to handle more vehicles. I would encourage everyone to build their own custom model through this service!