Augment Image Metadata with Cognitive Services

Hello Folks,

Last Month we partnered with a local startup to create a solution to address the ever-growing problem we all have. How do we categorize and index all our digital pictures in such a way that we can search and retrieve based on picture content and not just based on date taken or location.

The Kwilt team is based out of Invest Ottawa. They are a startup with high-level goals for their applications, they currently have several apps to manage how people interact with their photos.
Their products are:

They’ve created an application that helps navigate the sea of photos located in social, cloud and messaging accounts. This is completed in a manner that is seamless to the end user as all the search work is completed through the Kwilt app. In a Mobile App and on the web.

During the project, we leveraged Azure Cognitive Services to augment the capabilities of the app. We introduced capabilities that will assist users facing the challenge of “tagging” all their photos to allow for more accurate searches. Capabilities that will address users mistyping tags that would result in no or wrong photos found. Address accents and special characters for search. i.e.: Montreal ≠ Montréal and more…

Technologies used in this project:

Azure Storage
Azure Functions
Microsoft Cognitive Services
Computer Vision API
Azure Cosmos DB (formerly DocumentDB)
Azure Search

Here is how we get this done!

And by the way the code we used is available here.

Ingest and Analyze

Ingesting all the data from Kwilt database in Azure Cognitive Services allows the service to automatically tag photos, and eliminating the need for manual user input. We started by building a robust/efficient workflow to push data from the Kwilt backend database to Azure in order to facilitate analysis by leveraging Cognitive Services.

You can test the capabilities of the service yourselves.

First the data is received from the Kwilt backend into an Azure Storage Queue. The process to feed the queue is a proprietary PHP script/process that extract new entries in the Kwilt database, converts them into a JSON format and sends them to the Azure Storage Queue using the Azure Storage PHP SDK to send messages to the configured Storage Account & Key.

Here is a sample message that is being stored in the queues.

{
    "sorting_time": "2015-06-07 22:50:36",
    "type": "image",
    "id": 68682364,
    "name": "010309_0800_6154_nals",
    "created_time": "2015-06-08 05:50:36",
    "width": 919,
    "height": 602,
    "mime_type": "image/jpeg",
    "size": 576761,
    "time_taken": "2015-06-07 22:50:36",
    "modified_time": "2015-06-08 05:50:38",
    "source_url": "https://farm1.staticflickr.com/333/18585231402_798c4247fe_o.jpg",
    "recent_time": "2015-06-07 22:50:36",
    "thumbnail_url": "https://farm1.staticflickr.com/333/18585231402_eac0b3fe77_z.jpg"
}

Once the Message is in the queue it triggered the Azure function. (below is a screen capture of the Azure Function configuration.)

Once the data is in the queue for analysis, we leverage Azure Functions
to send the info to Cognitive Services to analyze the image. Of course,
since we want to utilize proper DevOps practices we have configured
Azure Functions for continuous integration from a Github repository
setup for the various functions.

var https = require('https');

module.exports = function (context, message) {
  logInfo(context, 'Analyzing Image Id: ' + message.id)
  logVerbose(context, 'Queue Message:n' + JSON.stringify(message));

  // Validate Configuration
  if (!process.env.OcpApimSubscriptionKey) {
    throwError(context, 'Missing Configuration, OcpApimSubscriptionKey not configured in Application Settings.');
  }

  // Validate Message
  if (!message.thumbnail_url) {
    throwError(context, 'Invalid Message, thumbnail_url missing.');
  }

  // Define Vision API options
  var options = {
    host: 'westus.api.cognitive.microsoft.com',
    port: 443,
    path: '/vision/v1.0/analyze?visualFeatures=Categories,Tags,Description,Faces,ImageType,Color,Adult&details=&language=en',
    method: 'POST',
    headers: {
      'content-type': 'application/json',
      'Ocp-Apim-Subscription-Key': process.env.OcpApimSubscriptionKey
    }
  };

  logVerbose(context, 'Thumbnail Url: ' + message.thumbnail_url);

  var req = https.request(options, function (res) {
    res.setEncoding('utf8');

    res.on('data', function (data) {
      logVerbose(context, 'Vision API Responsen' + data);

      var visionData = JSON.parse(data);

      // Was the image successfully processed
      switch (res.statusCode) {
        case 200: // Sucess
          updateFileMetaData(message, visionData);
          break;
        case 400: // Error processing image
          var errorMessage = visionData.message ? visionData.message : "Unknown error processing image";
          logInfo(context, errorMessage);
          updateFileMetaData(message, null, visionData);
          break;
        case 403: // Out of call volume quota
          context.done(new Error('Out of cal volume quota'));
          return;
        case 429: // Rate limit is exceeded
          context.done(new Error('Rate limit is exceeded'));
          return;
      }

      // Set the object to be stored in Document DB
      context.bindings.outputDocument = JSON.stringify(message);

      context.done();
    });
  });

  req.on('error', function (e) {
    logVerbose(context, 'Vision API Errorn' + JSON.stringify(e));
    throwError(context, e.message);
  });

  // write data to request body
  var data = {
    url: message.thumbnail_url
  };

  req.write(JSON.stringify(data));
  req.end();
};


function updateFileMetaData(message, visionData, error) {
  // Document DB requires ID to be a string
  // Convert message id to string
  message.id = message.id + '';

  // Keep a record of the raw/unedited Vision data
  message['azure_vision_data'] = {
    timestamp: new Date().toISOString().replace(/T/, ' ').replace(/..+/, ''),
    data: visionData,
    error: error
  };

  if (visionData) {
    // Flatten/append vision data to the file object
    message['isAdultContent'] = visionData.adult.isAdultContent;
    message['isRacyContent'] = visionData.adult.isRacyContent;
    message['auto_tags'] = extractConfidenceList(visionData.tags, 'name', 0.1);
    message['auto_categories'] = visionData.categories ? extractConfidenceList(visionData.categories, 'name', 0.1) : [];
    message['auto_captions'] = extractConfidenceList(visionData.description.captions, 'text', 0.1);
    message['auto_description_tags'] = visionData.description.tags;
    message['auto_dominantColorForeground'] = visionData.color.dominantColorForeground;
    message['auto_dominantColorBackground'] = visionData.color.dominantColorBackground;
    message['auto_accentColor'] = visionData.color.accentColor;
    message['auto_isBWImg'] = visionData.color.isBWImg;
    message['auto_clipArtType'] = visionData.imageType.clipArtType;
    message['auto_lineDrawingType'] = visionData.imageType.lineDrawingType;
  }

  // Convert existing tags field from comma seperated string to array
  if (message.tags && typeof message.tags === 'string') {
    message.tags = message.tags.split(',');
  } else {
    message.tags = [];
  }

  // Azure Search requires location to be a single field
  if (message.latitude && typeof message.latitude === 'number') {
    message['location'] = {
      type: 'Point',
      coordinates: [message.longitude, message.latitude]
    }
  }
}

function throwError(context, message) {
  logVerbose(context, 'Error: ' + message);
  throw new Error(message);
}

function logInfo(context, message) {
  context.log('+[Info] ' + message);

}

function logVerbose(context, message) {
  if (process.env.VerboseLogging) {
    context.log('![Verbose] ' + message);
  }
}

// Extracts a list of values by field from an array of objects
// where the confidence value is greater than or equal to the
// optional minConfidenceValue.
function extractConfidenceList(objArray, field, minConfidenceValue) {
  if (Object.prototype.toString.call(objArray) !== '[object Array]') {
    throw new Error("objArray (type: " + Object.prototype.toString.call(objArray) + ") in extractConfidenceList is not an array.");
  }

  if (!field || typeof field !== 'string') {
    throw new Error("field in extractConfidenceList is missing or not an string.");
  }

  // If not min confidence value specified or value is undefined set to 0
  if (!minConfidenceValue) { minConfidenceValue = 0; }

  var list = new Array();

  objArray.forEach(function (obj) {
    // Do we need to do a confidence check?
    if (minConfidenceValue > 0 && typeof obj['confidence'] === 'number') {
      // Is confidence >= min required?
      if (obj['confidence'] >= minConfidenceValue) {
        list.push(obj[field]);
      }
    }
    else {
      // No check needed push field into array
      list.push(obj[field]);
    }
  });

  return list;
}

Analysis of the images (visual features & details) is configured in the
function when the Vision API HTTP API call options are defined.

// Define Vision API options
  var options = {
    host: 'westus.api.cognitive.microsoft.com',
    port: 443,
    path: '/vision/v1.0/analyze?visualFeatures=Categories,Tags,Description,Faces,ImageType,Color,Adult&details=&language=en',
    method: 'POST',
    headers: {
      'content-type': 'application/json',
      'Ocp-Apim-Subscription-Key': process.env.OcpApimSubscriptionKey
    }
  };

In this case the following features were configured in English (only
English and Chinese are available for the API):

Categories– categorizes image content according to a taxonomy defined in documentation.
Tags– tags the image with a detailed list of words related to the image content.
Description– describes the image content with a complete English sentence.
Faces– detects if faces are present. If present, generate coordinates, gender and age.
ImageType– detects if image is clipart or a line drawing.
Color– determines the accent color, dominant color, and whether an image is black&white.
Adult– detects if the image is pornographic in nature (depicts nudity or a sex act). Sexually suggestive content is also detected.

Once the analysis is complete the resulting JSON message is stored in Cosmos DB.

Here is the result the processing of the following image:

The following is the result of the analysis:

{
	"sorting_time": "2016-06-19 15:59:58",
	"type": "image",
	"id": "68772289",
	"name": "Photo Jun 19, 15 59 58.jpg",
	"created_time": "2016-07-28 15:54:59",
	"modified_time": "2016-07-28 15:55:16",
	"size": 2289041,
	"mime_type": "image/jpeg",
	"latitude": 46.8124,
	"longitude": -71.2038,
	"geoname_id": 6325494,
	"city": "Québec",
	"state": "Quebec",
	"state_code": "QC",
	"country": "Canada",
	"country_code": "CA",
	"time_taken": "2016-06-19 15:59:58",
	"height": 3264,
	"width": 2448,
	"thumbnail_url": "",
	"recent_time": "2016-06-19 15:59:58",
	"tags": [
		"town",
		"property",
		"road",
		"neighbourhood",
		"residential area",
		"street"
	],
	"account_id": 111193,
	"storage_provider_id": 105156,
	"altitude": 53,
	"camera_make": "Apple",
	"camera_model": "iPhone 6",
	"azure_vision_data": {
		"timestamp": "2017-03-29 15:59:55",
		"data": {
			"categories": [
				{
					"name": "outdoor_street",
					"score": 0.96484375
				}
			],
			"adult": {
				"isAdultContent": false,
				"isRacyContent": false,
				"adultScore": 0.007973744533956051,
				"racyScore": 0.010262854397296906
			},
			"tags": [
				{
					"name": "outdoor",
					"confidence": 0.9993922710418701
				},
				{
					"name": "sky",
					"confidence": 0.9988007545471191
				},
				{
					"name": "building",
					"confidence": 0.9975806474685669
				},
				{
					"name": "street",
					"confidence": 0.9493720531463623
				},
				{
					"name": "walking",
					"confidence": 0.9154794812202454
				},
				{
					"name": "sidewalk",
					"confidence": 0.8519290685653687
				},
				{
					"name": "people",
					"confidence": 0.7953380942344666
				},
				{
					"name": "way",
					"confidence": 0.7908639311790466
				},
				{
					"name": "scene",
					"confidence": 0.7276134490966797
				},
				{
					"name": "city",
					"confidence": 0.624116063117981
				}
			],
			"description": {
				"tags": [
					"outdoor",
					"building",
					"street",
					"walking",
					"sidewalk",
					"people",
					"road",
					"city",
					"narrow",
					"bicycle",
					"man",
					"group",
					"woman",
					"standing",
					"old",
					"pedestrians",
					"holding",
					"platform",
					"parked",
					"carriage",
					"riding",
					"train",
					"clock"
				],
				"captions": [
					{
						"text": "a group of people walking down a narrow street",
						"confidence": 0.8872056096672615
					}
				]
			},
			"requestId": "38fa30e6-2a50-4a7f-b780-e6472c6d1a52",
			"metadata": {
				"width": 600,
				"height": 800,
				"format": "Jpeg"
			},
			"faces": [],
			"color": {
				"dominantColorForeground": "Grey",
				"dominantColorBackground": "Grey",
				"dominantColors": [
					"Grey",
					"White"
				],
				"accentColor": "2C759F",
				"isBWImg": false
			},
			"imageType": {
				"clipArtType": 0,
				"lineDrawingType": 0
			}
		}
	},
	"isAdultContent": false,
	"isRacyContent": false,
	"auto_tags": [
		"outdoor",
		"sky",
		"building",
		"street",
		"walking",
		"sidewalk",
		"people",
		"way",
		"scene",
		"city"
	],
	"auto_categories": [
		"outdoor_street"
	],
	"auto_captions": [
		"a group of people walking down a narrow street"
	],
	"auto_description_tags": [
		"outdoor",
		"building",
		"street",
		"walking",
		"sidewalk",
		"people",
		"road",
		"city",
		"narrow",
		"bicycle",
		"man",
		"group",
		"woman",
		"standing",
		"old",
		"pedestrians",
		"holding",
		"platform",
		"parked",
		"carriage",
		"riding",
		"train",
		"clock"
	],
	"auto_dominantColorForeground": "Grey",
	"auto_dominantColorBackground": "Grey",
	"auto_accentColor": "2C759F",
	"auto_isBWImg": false,
	"auto_clipArtType": 0,
	"auto_lineDrawingType": 0,
	"location": {
		"type": "Point",
		"coordinates": [
			-71.2038,
			46.8124
		]
	}
}

Once this analysis is stored in the Cosmos DB it can be indexed and searched using Azure Search. as you can see in the following screen captures The Kwilt Team was able to digest the analysis and with Azure Search to build an extremely user friendly search proposition for their users.

In the project beta client we were able to search by keywords (Food, Plates, Fireworks…) and by location (Gatineau) without any manual tagging of the pictures.

All I can say is that i cannot wait to process my own photo streams through this service. I’ve already installed the app on my phone….

Cheers!!

Pierre Roman
@pierreroman

Augment Image Metadata with Cognitive Services

Ingest and Analyze

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112