Even less frequent social media users can recognize the increased importance of images and videos: each of them holds a tremendous amount of information. The more images your user generates, the higher the need is to implement some level of artificial intelligence. As a data scientist or application developer you should not miss this source of data.
Image Recognition APIs
While the former one would find the professional challenge in developing their own deep learning solution with the tools like CNTK, Tensorflow, Keras and the rest, the later would rather look at the shelves for a solution working out of the box. Even data analysts can save themselves a couple of days or hours with these tools. Fortunately, there are a lot of options out there provided by big names like Amazon Web Services, Google, Microsoft and smaller, specialized developers. You can use the tags/concepts/labels from these services for recommendations, organizing content or identifying faulty products. Your use case will determine which provider you opt: some are better on clean images, some can cope with dense scenes, others provide pre-trained models for specific industries.
Let's challenge Google's, Microsoft's and Clarafai's AI's visual perception with some images further from the most common use cases.
Abandoned V6 On Battery To Buffs Trail
This is an image I made when running on the trails around the Golden Gate Bridge. We are setting some obstacles in front of the AIs:
- rusty main topic
- out of its usual environment,
- taken by distorting ultra-wide lens of the action camera.
The table below shows what concepts we get back from the APIs of the three services. Confidence levels are also passed.
You can see all the three services understood the image pretty well on a high level: there is some waste, rust in the outdoor. You can also notice that Clarifai's AI is not even slightly "braver" and detailed than the others: not only the confidence levels are higher but it provides many abstract concepts as well: calamity, demolition, damage tell us Clarifai's service recognized that the main object should not look the way it is photographed. Azure Computer Vision has also seen scenes from Mad Max: it is an abandoned (environment? scene?) but did not try to guess causes.
Dog Waiting For Sleepy Caretaker
Bejgli is eager to run even pretty early in the morning but not that fond of waiting for me to finish my coffee.
It is not the best quality image but the scene is clear. No surprise, all the three services performed well: there is a dog, indoor, on the floor.
Just like with "flame" on the former image, it seems it is easier to get type I error with Clarify: e.g. it says there is a girl on the image (0.85). In statistics we would generally disregard this hypothesis but when tagging images we tend to accept much lower confidence levels. But bear in mind, the more guess and the lower confidence you accept, the more faulty tags you get. It is your domain that determines the importance of such false identifications or missed concepts.
By the way, according to Google, we see a labrador retriever (0.60). Azure thinks we see a beagle (0.46) or a labrador retriever (0.19), maybe a golden retriever (0.15). The first guess is okay, there is a half-breed beagle on the picture.
The Battle Of The Living Room
So far we are happy with the results. Combining themes of the former two images leads us to The Battle Of The Living Room: what happens when a meteorite falls right into Bejgli's container of toys and training tools. Okay, it was me who washed the things and laid them to dry. Anyway, the image is dense, even hard for the human eyes to digest.
Google's API gave it up completely, it cannot even recognize any element of the scene, it says car, auto part, vehicle, the corresponding confidence rates are 0.56, 0.55, 0.53.
Azure Computer Vision gives back 5 tags but only "indoor" and "floor" has high confidence. "Cluttered" is the perfect word here, but if we rely on the confidence score (0.30), we will not use it. Azure also see a cat and a dog on the picture but it is not sure in the results.
We cannot describe Clarifai the same way, it provides 20 concepts, all above 83% confidence score. Beside funny ideas, like "music" (maybe toys on the drying lines reminds the AI to a score?) there are ones describing the situation well: dog, toy, festival or my favorite one, battle.
Google Cloud Vision API plays safe: it gives a few labels and does it with a generally low confidence score. Sure thing, if you are not working, you cannot make mistakes. At least the confidence levels are clear indicators.
Microsoft Azure Computer Vision is not bad in the labels. Unfortunately, confidence levels imply that it is far from sure that an object recognized on one image is recognized on another one.
Clarifai's model is the quite opposite of Google: high confidence scores, numerous concepts but sometimes faulty ones. Even with these errors, I can recommend this service from the three. It is great to see that data scientists at Clarifai concentrated on more abstract concepts as well. Not to mention how easy it is to teach your own model at Clarifai.
Please bear in mind that these are just examples and not a representative study. If your organization sticks with Google and or inputs of your application are usually clear. There are situations where type I errors can be considered to be better than type II: it is usually better to suspect an illness than miss it. In other cases, this approach is not acceptable due to financial considerations.
Also, other aspects of these APIs, like text recognition can lead to a different order of the services.
A Small Image Processing Application In Python
Let's build a small application analyzing images with one of the APIs while storing both the input and the output of the process in the cloud.
Tools We Are Using
In this sample application, we will
- store images from the local machine in Azure Blob Storage,
- analyze images stored with Clarifai API,
- store results in MongoDB document database.
The packages we should install:
If you store your credentials in config.ini your next lines of code should be the followings.
Saving To Blob Storage
Creating Service And Container
Once we created a storage account at Azure, we can create a service. With this service, we create a container, a space to store our binary data and set public access on it. This last is needed to let Clarify read the blobs.
Please note that containers handle folders only virtually: you can prefix your files with the path, can navigate in them in the Storage Explorer but path belongs to the blobs, they are all stored in the same flat container.
Uploading Local File
In Python it is enough to refer to the path and name of the file in create_blob_from_path method of block_blob_service class:
Sharing URL Of Blob
We need another class from blob package: BaseBlobService to generate URL which will be shared with Clarifai.
Predicting Concepts From URL By Clarifai
The function used above is very simple with predict_by_url() method: we need to pass only the URL we generated in the former step. app.public_models.general_model refers to the general model, there are other public models but you can train your own one as well. If you do not plan to store your blobs, you should use predict_by_filename() .
Saving Results To MongoDB
MongoDB Atlas requires TLS and can be used with a connection string. We are storing the results into blobResults collection references as results.
In process_blob() we used save_results() function. This is our function inserting the response from Clarifai API calling insert_one() method from our collection.
Just pass the name of the file to process to local_file_name variable to operate the script.
To lookup specific concepts and list blob URLs tagged with them you can use the following function. Query is a dictionary in a format find() method can interpret. We need the concepts from outputs as query and URL from the input nested document as field to retrieve.
MongoDB does not retrieve the full dataset meeting your criteria. Instead, find() provides a cursor, here we read its values in a for loor.
You can see the full sample scripts below. Function loading blob to container is mostly based on Microsoft's sample code. You can find further Azure Storage samples here and pymongo documentation under this link.
Analyzing and saving our images.
Lookup concepts in our database and retrieve URLs to the files.