Overview of Azure resources for AI applications

Photo by Carlos Muza on Unsplash

Overview of Azure resources for AI applications

Find out about the main Azure services to build language, vision and search AI apps

Jan 17, 2022·

5 min read

Play this article

It can be a daunting task to navigate cloud providers finding how to build your AI app in the cloud. This post goes over the main Azure AI resources for different AI applications.


1. Computer Vision

Computer Vision is the main Azure resource for image applications. In its most basic form, it supports image classification and multiclass object detection.


Its first interesting feature is image analysis, it can extract several features from images: it can detect brands, celebrities and locations. It can detect the gender and age of the people in the images. It can add a description text to the image, detect color scheme, generate a thumbnail using smart cropping and much more.


Additionally, it can read text inside the image with OCR techniques. This can be used for further text analysis with NLP tools, content moderation etc.

Another very useful feature is spatial analysis, computer vision can analyze how people move in a space in real time for occupancy count, social distancing and face mask detection.

It also integrates with Face API, explained below.

Computer vision has many integrations and can be deployed in containers.

2. Custom Vision


Custom Vision is Azure's simplest image classification and object detection tool. You can import your images, label them (Custom vision requires at least 50) and quickly train a state-of-the-art model on your data.

You can explore the model's precision and recall results, and publish the model to an endpoint where you can quickly access it and obtain predictions.

3. Face API


This is Azure's advanced facial recognition tool. It supports several face capabilities:

  • Face detection: detect faces in image
  • Face identification: search and identify faces
  • Face verification: check that two faces belong to the same person
  • Face similarity: given a face/person, find similar faces/persons
  • Face grouping: organize unidentified faces into groups based on similarity

When analyzing an image, Face API can extract following features:

  • Face location: a bounding box showing the location of the face
  • Face landmarks: collections of detailed points on a face, including eye position
  • Face attributes: age, gender, hair color, mask detection, accessories, emotion, facial hair, glasses, head pose, makeup, smile

4. Video Analyzer

Video Analyzer is Azure's main tool for video analytics. It can extract actionable insights from videos.

The integrated AI models extract accurate and meaningful data. They harness spatial analysis for real-time understanding of people’s movements in a physical spaces. With the metadata, you can create timeline-based visualizations, heatmaps, and anomaly detection.


Video Analyzer is mostly used for workplace safety, digital asset management and process optimization.

This tool can extract the following image features from videos:

  • Deep search: allows search across a video library. For example, indexing spoken words and faces can enable the search experience of finding moments in a video where a person spoke certain words or when two people were seen together
  • Face detection
  • Celebrity identification
  • Visual text recognition
  • Visual content moderation
  • Scene segmentation
  • Rolling credits
  • etc.

Regarding audio insights, Video Analyzer can generate the following insights:

  • Audio transcription
  • Automatic language detection
  • Multi-language speech identification and transcription
  • Closed captioning
  • Noise reduction
  • Speaker statistics
  • Emotion detection
  • etc.


Azure has many many resources for NLP applications. The basic ones include:

  • Language detection
  • Key-phrase extraction
  • Sentiment analysis
  • Named entity recognition
  • Entity linking
  • Text translation
  • Question answering
  • Content moderation


This is Azure's flagship NLP app for language understanding. It enables language interactions between users and conversational AI tools, such as chatbots. LUIS can interpret user goals and extracts key information from conversational phrases

With LUIS you can build an enterprise-grade conversational bot, a commerce chatbot or control IoT devices using a voice assistant.

In your LUIS app, you can create intents (BookFlight) and utterances that should trigger that action. You can add entities to have better insights, e.g. detect days of the week, destinations, airport names etc.

After you have defined intents and sample utterances, train the app. You can add the integration of speech and spell check. You can test the app with different utterances to check how the app performs. You can enable active learning and improve the app's performance with LUIS's prediction insights.

When the model is good enough, you can publish it and integrate it with Azure's Bot framework or QnA Maker.

2. Bot framework


This is the service to create bots, for instance using LUIS in the backend.

You can use open source SDK and tools to easily connect your bot to popular channels and devices.


3. QnA Maker


Publish a simple question and answer bot based on existing FAQ URLs, structured documents and product manuals.

Without needing to have any prior experience, you can use QnA Maker's automatic extraction tool to extract question-answer pairs from semi-structured content, including FAQ pages, support websites, Excel files, SharePoint documents etc.

QnA Maker allows you to design complex multi-turn conversations easily through QnA Maker portal or using REST APIs. It enables active learning and it supports more than 50 languages


1. Speech to text

With Azure Speech to text, you can get accurate audio to text transcriptions with state-of-the-art speech recognition. You can add specific words to your base vocabulary or build your own speech-to-text models. Speech-to-text can run in the cloud or at the edge in containers

2. Text to speech

Text-to-speech allows you to synthesize text with more than 270 neural voices across 119 languages and variants. You can adjust intonation, voice type and many other features.


This is just a general overview of the Azure AI resources, ranging from vision, language and speech.

Thank you for reading and follow me on Twitter 🚀

Did you find this article valuable?

Support Ane by becoming a sponsor. Any amount is appreciated!