Working With General Image Recognition APIs

Even less frequent social media users can recognize the increased importance of images and videos: each of them holds a tremendous amount of information. The more images your user generates, the higher the need is to implement some level of artificial intelligence. As a data scientist or application developer you should not miss this source of data.

Even less frequent social media users can recognize the increased importance of images and videos: each of them holds a tremendous amount of information. The more images your user generates, the higher the need is to implement some level of artificial intelligence. As a data scientist or application developer you should not miss this source of data.

Image Recognition APIs

While the former one would find the professional challenge in developing their own deep learning solution with the tools like CNTK, Tensorflow, Keras and the rest, the later would rather look at the shelves for a solution working out of the box. Even data analysts can save themselves a couple of days or hours with these tools. Fortunately, there are a lot of options out there provided by big names like Amazon Web Services, Google, Microsoft and smaller, specialized developers. You can use the tags/concepts/labels from these services for recommendations, organizing content or identifying faulty products. Your use case will determine which provider you opt: some are better on clean images, some can cope with dense scenes, others provide pre-trained models for specific industries.

Let's challenge Google's, Microsoft's and Clarafai's AI's visual perception with some images further from the most common use cases.

Abandoned V6 On Battery To Buffs Trail

undefined

This is an image I made when running on the trails around the Golden Gate Bridge. We are setting some obstacles in front of the AIs:

  • rusty main topic
  • out of its usual environment,
  • taken by distorting ultra-wide lens of the action camera.

The table below shows what concepts we get back from the APIs of the three services. Confidence levels are also passed.

Google Cloud Vision API

confidence

geological phe.

tree

waste

plant

soil

rock

geology

0.8503

0.8151

0.6468

0.6358

0.6267

0.6067

0.5815

Microsoft Azure Computer Vision API

confidence

outdoor

nature

tree

abandoned

landscape

rust

0.9962

0.9962

0.7013

0.4635

0.4170

0.3383

Clarifai General Model

confidence

nature

wood

tree

broken

no person

abandoned

outdoors

soil

trash

calamity

environment

landscape

tree log

firewood

flame

waste

demolition

damage

travel

pollution

0.980

0.977

0.969

0.968

0.960

0.939

0.936

0.932

0.921

0.920

0.919

0.911

0.907

0.905

0.904

0.904

0.904

0.897

0.896

0.891

You can see all the three services understood the image pretty well on a high level: there is some waste, rust in the outdoor. You can also notice that Clarifai's AI is not even slightly "braver" and detailed than the others: not only the confidence levels are higher but it provides many abstract concepts as well: calamity, demolition, damage tell us Clarifai's service recognized that the main object should not look the way it is photographed. Azure Computer Vision has also seen scenes from Mad Max: it is an abandoned (environment? scene?) but did not try to guess causes.

Dog Waiting For Sleepy Caretaker

undefined

Bejgli is eager to run even pretty early in the morning but not that fond of waiting for me to finish my coffee.

It is not the best quality image but the scene is clear. No surprise, all the three services performed well: there is a dog, indoor, on the floor.
Just like with "flame" on the former image, it seems it is easier to get type I error with Clarify: e.g. it says there is a girl on the image (0.85). In statistics we would generally disregard this hypothesis but when tagging images we tend to accept much lower confidence levels. But bear in mind, the more guess and the lower confidence you accept, the more faulty tags you get. It is your domain that determines the importance of such false identifications or missed concepts.

By the way, according to Google, we see a labrador retriever (0.60). Azure thinks we see a beagle (0.46) or a labrador retriever (0.19), maybe a golden retriever (0.15). The first guess is okay, there is a half-breed beagle on the picture.

The Battle Of The Living Room

So far we are happy with the results. Combining themes of the former two images leads us to The Battle Of The Living Room: what happens when a meteorite falls right into Bejgli's container of toys and training tools. Okay, it was me who washed the things and laid them to dry. Anyway, the image is dense, even hard for the human eyes to digest.

undefined

Google's API gave it up completely, it cannot even recognize any element of the scene, it says car, auto part, vehicle, the corresponding confidence rates are 0.56, 0.55, 0.53.

Azure Computer Vision gives back 5 tags but only "indoor" and "floor" has high confidence. "Cluttered" is the perfect word here, but if we rely on the confidence score (0.30), we will not use it. Azure also see a cat and a dog on the picture but it is not sure in the results.

We cannot describe Clarifai the same way, it provides 20 concepts, all above 83% confidence score. Beside funny ideas, like "music" (maybe toys on the drying lines reminds the AI to a score?) there are ones describing the situation well: dog, toy, festival or my favorite one, battle.

Conclusion

Google Cloud Vision API plays safe: it gives a few labels and does it with a generally low confidence score. Sure thing, if you are not working, you cannot make mistakes. At least the confidence levels are clear indicators.

Microsoft Azure Computer Vision is not bad in the labels. Unfortunately, confidence levels imply that it is far from sure that an object recognized on one image is recognized on another one.

Clarifai's model is the quite opposite of Google: high confidence scores, numerous concepts but sometimes faulty ones. Even with these errors, I can recommend this service from the three. It is great to see that data scientists at Clarifai concentrated on more abstract concepts as well. Not to mention how easy it is to teach your own model at Clarifai.

Please bear in mind that these are just examples and not a representative study. If your organization sticks with Google and or inputs of your application are usually clear. There are situations where type I errors can be considered to be better than type II: it is usually better to suspect an illness than miss it. In other cases, this approach is not acceptable due to financial considerations.

Also, other aspects of these APIs, like text recognition can lead to a different order of the services.

A Small Image Processing Application In Python

Let's build a small application analyzing images with one of the APIs while storing both the input and the output of the process in the cloud.

Tools We Are Using

In this sample application, we will

  1. store images from the local machine in Azure Blob Storage,
  2. analyze images stored with Clarifai API,
  3. store results in MongoDB document database.

The packages we should install:


import os, uuid, sys, pprint as pp
import azure.storage.blob
from azure.storage.blob import BlockBlobService, PublicAccess, baseblobservice
from clarifai.rest import ClarifaiApp
from pymongo import MongoClient
import configparser
Copy code

If you store your credentials in config.ini your next lines of code should be the followings.


config = configparser.ConfigParser()
config.read('config.ini')

azure_account_name = config['Azure Blob Storage']['account name']
azure_account_key = config['Azure Blob Storage']['account key']

clarifai_api_key = config['Clarifai']['api key']

mongoServer = config['MongoDB']['server']
mongo_port = int(config['MongoDB']['port'])
mongo_database = config['MongoDB']['database']
mongo_user = config['MongoDB']['admin user']
mongoPassword = config['MongoDB']['admin password']
Copy code

Saving To Blob Storage

Creating Service And Container

Once we created a storage account at Azure, we can create a service. With this service, we create a container, a space to store our binary data and set public access on it. This last is needed to let Clarify read the blobs.

Please note that containers handle folders only virtually: you can prefix your files with the path, can navigate in them in the Storage Explorer but path belongs to the blobs, they are all stored in the same flat container.


global block_blob_service
block_blob_service = BlockBlobService(account_name = azure_account_name, account_key = azure_account_key) 

global container_name
container_name ='workshopblobs'
        
block_blob_service.create_container(container_name) 
block_blob_service.set_container_acl(container_name, public_access=PublicAccess.Container)
Copy code

Uploading Local File

In Python it is enough to refer to the path and name of the file in create_blob_from_path method of block_blob_service class:


global full_path_to_file
full_path_to_file = os.path.join(local_path, local_file_name)

block_blob_service.create_blob_from_path(container_name, local_file_name, full_path_to_file)
Copy code

Sharing URL Of Blob

We need another class from blob package: BaseBlobService to generate URL which will be shared with Clarifai.


block_blob_service.create_blob_from_path(container_name, local_file_name, full_path_to_file)
        
base_blob_service = baseblobservice.BaseBlobService(account_name = azure_account_name, account_key = azure_account_key) 

global blob_url
blob_url = base_blob_service.make_blob_url(container_name, local_file_name, protocol=None, sas_token=None, snapshot=None)

process_blob(blob_url)
Copy code

Predicting Concepts From URL By Clarifai

The function used above is very simple with predict_by_url() method: we need to pass only the URL we generated in the former step. app.public_models.general_model refers to the general model, there are other public models but you can train your own one as well. If you do not plan to store your blobs, you should use predict_by_filename() .


def process_blob(input_blob_url):
    try:
        app = ClarifaiApp(api_key = clarifai_api_key)
        model = app.public_models.general_model
        response = model.predict_by_url(url=input_blob_url)
        
        save_results(response)

    except Exception as e:
        print(e)
Copy code

Saving Results To MongoDB

MongoDB Atlas requires TLS and can be used with a connection string. We are storing the results into blobResults collection references as results.


mongodb_connection_string = "mongodb+srv://" + mongo_user + ":" + mongoPassword + "@" + mongoServer + "/" + mongo_database + "?retryWrites=true"
client = MongoClient(mongodb_connection_string)
db = client[mongo_database]
results = db["blobResults"]
Copy code

In process_blob() we used save_results() function. This is our function inserting the response from Clarifai API calling insert_one() method from our collection.


def save_results(result):
    try:
        results.insert_one(result)

    except Exception as e:
        print(e)
Copy code

Just pass the name of the file to process to local_file_name variable to operate the script.


local_file_name = "battle.jpg"
Copy code

Querying Results

To lookup specific concepts and list blob URLs tagged with them you can use the following function. Query is a dictionary in a format find() method can interpret. We need the concepts from outputs as query and URL from the input nested document as field to retrieve.


concept = "battle"

def find_concept(concept):
    query = {
                "outputs.data.concepts":
                    {
                        "$elemMatch":
                            {
                                "name": concept
                            }
                    }
            }
    
    fields = {
                "outputs.input.data.image.url": 1
            }

    try:
        cursor = list(db.blobResults.find(query, fields))
        for doc in cursor:
            pp.pprint(doc)

    except Exception as e:
        print(e)     
Copy code

MongoDB does not retrieve the full dataset meeting your criteria. Instead, find() provides a cursor, here we read its values in a for loor.

You can see the full sample scripts below. Function loading blob to container is mostly based on Microsoft's sample code. You can find further Azure Storage samples here and pymongo documentation under this link.

Full code

Blob App

Analyzing and saving our images.


import os, uuid, sys, pprint as pp
import azure.storage.blob
from azure.storage.blob import BlockBlobService, PublicAccess, baseblobservice
from clarifai.rest import ClarifaiApp
from pymongo import MongoClient

import configparser
config = configparser.ConfigParser()
config.read('config.ini')
azure_account_name = config['Azure Blob Storage']['account name']
azure_account_key = config['Azure Blob Storage']['account key']
clarifai_api_key = config['Clarifai']['api key']
mongoServer = config['MongoDB']['server']
mongo_port = int(config['MongoDB']['port'])
mongo_database = config['MongoDB']['database']
mongo_user = config['MongoDB']['admin user']
mongoPassword = config['MongoDB']['admin password']

mongodb_connection_string = "mongodb+srv://" + mongo_user + ":" + mongoPassword + "@" + mongoServer + "/" + mongo_database + "?retryWrites=true"
client = MongoClient(mongodb_connection_string)
db = client[mongo_database]
results = db["blobResults"]

# image to process
local_file_name = "monster.jpg"

def store_and_analyze_blob():
    try:
        global block_blob_service
        block_blob_service = BlockBlobService(account_name = azure_account_name, account_key = azure_account_key) 

        global container_name
        container_name ='workshopblobs'
        
        block_blob_service.create_container(container_name) 
        block_blob_service.set_container_acl(container_name, public_access=PublicAccess.Container)

        # our image in images subfolder
        local_path=os.path.join(os.getcwd(), "images")
        
        global full_path_to_file
        full_path_to_file = os.path.join(local_path, local_file_name)

        block_blob_service.create_blob_from_path(container_name, local_file_name, full_path_to_file)
        
        base_blob_service = baseblobservice.BaseBlobService(account_name = azure_account_name, account_key = azure_account_key) 

        global blob_url
        blob_url = base_blob_service.make_blob_url(container_name, local_file_name, protocol=None, sas_token=None, snapshot=None)

        process_blob(blob_url)
   
    except Exception as e:
        print(e)

def cleanup_storage():
    try:
        sys.stdout.write("When you hit ENTER, the container and the downloaded file will be deleted.")
        sys.stdout.flush()
        input()

    except Exception as e:
        print(e)

def process_blob(input_blob_url):
    try:
        app = ClarifaiApp(api_key = clarifai_api_key)
        model = app.public_models.general_model
        response = model.predict_by_url(url=input_blob_url)
        save_results(response)

    except Exception as e:
        print(e)

def save_results(result):
    try:
        results.insert_one(result)

    except Exception as e:
        print(e)

if __name__ == '__main__':
    store_and_analyze_blob()
Copy code

Query results

Lookup concepts in our database and retrieve URLs to the files.


from pymongo import MongoClient
import pprint as pp

# read configuration file
import configparser
config = configparser.ConfigParser()
config.read('config.ini')

mongoServer = config['MongoDB']['server']
mongo_port = int(config['MongoDB']['port'])
mongo_database = config['MongoDB']['database']
mongo_user = config['MongoDB']['admin user']
mongo_password = config['MongoDB']['admin password']

mongo_url = "mongodb+srv://" + mongo_user + ":" + mongo_password + "@" + mongoServer + "/" + mongo_database + "?retryWrites=true"
client = MongoClient(mongo_url)
db = client[mongo_database]

concept_to_find = "bike"

def find_concept(concept):
    query = {
                "outputs.data.concepts":
                    {
                        # we are looking for elements of concepts array
                        "$elemMatch":
                            {
                                "name": concept
                            }
                    }
            }
    
    fields = {
                "outputs.input.data.image.url": 1
            }

    try:
        cursor = list(db.blobResults.find(query, fields))
        for doc in cursor:
            pp.pprint(doc)

    except Exception as e:
        print(e)       

# Main method.
if __name__ == '__main__':
    find_concept(concept_to_find)
Copy code
No items found.
The Mobile DevOps Newsletter

Explore more topics

App Development

Learn how to optimize your mobile app deployment processes for iOS, Android, Flutter, ReactNative, and more

Bitrise & Community

Check out the latest from Bitrise and the community. Learn about the upcoming mobile events, employee spotlights, women in tech, and more

Mobile App Releases

Learn how to release faster, better apps on the App Store, Google Play Store, Huawei AppGallery, and other app stores

Mobile DevOps

Learn Mobile DevOps best practices such as DevOps for iOS, Android, and industry-specific DevOps tips for mobile engineers

Mobile Testing & Security

Learn how to optimize mobile testing and security — from automated security checks to robust mobile testing and more.

Product Updates

Check out the latest product updates from Bitrise — Build Insights updates, product news, and more.

The Mobile DevOps Newsletter

Join 1000s of your peers. Sign up to receive Mobile DevOps tips, news, and best practice guides once every two weeks.