Image Recognition in Apps: Why and How to Use

Image recognition technology is more and more often used in our everyday lives. Companies and enterprises use it to solve various problems, from security to client satisfaction investigations. Estimates of the global image recognition market stood at $ 28,3 billion. It is expected to reach $126,8 billion by the year 2032 with CAGR of 16,5%. This factor has created market demand because businesses and industries want to shift from traditional manual operations to automated processes. Also, constant development in deep learning and computer vision is one of the most significant drivers of the market.

The Usage of Image Recognition

Below are examples of how image recognition is used in different spheres.

Healthcare

Illness changes everybody’s looks, and not for the better. But there are diseases which alter appearance more than others. By comparison of the patient’s facial distortions with those which are preserved in the database, the doctors put more precise diagnosis and even define the intensity of pain the patient suffers from. Apps like AiCure record if the patient has taken medications. And Orcam MyEye is a real resque for people with poor eyesight: the app informs the user about everything it ‘sees’.

Computer vision detecting the consumption of medication. Image from www.ncbi.nlm.nih.gov

Tourism

Do you see an amazing photo on Instagram and want to dine in the same cafe? What about uploading the image into the app to bring you to that place? Based on the Instagram photos, Jetpac City Guides service even creates the top lists for tourists with descriptions and reviews.

Transport

Image recognition is the function without which the driverless car won’t make even the smallest distance without an accident. But if your fleet is brought to action by human drivers, you can also make use of this machine learning technology. For example, in order to prevent the unauthorized use of cars or drivers’ inattentiveness

E-commerce

Here, image recognition is most widely applied. Online-shops eBuy and Bohoo help users to find «that very» item even when the client can’t remember its name. And such apps as Lookwish and NICE even let you virtually «try» clothes.

Do you believe it’s only for large and rich players? Indeed, the implementation of machine learning (which is the core of image recognition) may turn troublesome when introduced from scratch. Luckily, there are public libraries which allow using ready-made models when developing your products. Firebase ML Kit is one of them. Further on, we will tell you how it can be used in the app development.

What is Firebase ML Kit and How Does It Work

ML Kit is a mobile development kit specialized in machine learning from Google. It combines Android and iOS applications into a versatile and easily integrated solution. It has no steep learning curve, even if you are a specialist or a beginner in machine learning. It can be developed in a few lines of code. It is unnecessary to remember complex formulas of neural networks, or the methods of models’ optimization to start from. However, experienced machine learning developers, can integrate their own TensorFlow Lite models in their mobile applications by using convenient APIs provided in ML Kit.

ML Kit can work both online (in this case you get a free access to a much bigger database, but with a limited number of inquiries: only the first thousand of them will be free of charge) and offline. Such functions as text, barcode, and image recognition are available both online and offline. Landmark recognition (well-known buildings, rivers, streets, etc.) is available only online, and face recognition — only offline from the device.

How to Integrate Image Recognition into the App

As an example, let’s start with a barcode scanning:

Firstly, you have to add ML Kit barcode scanning dependency in your gradle file.

Implementation 'com.google.android.gms:play-services-mlkit-barcode-scanning:18.3.0'

The next step is to init the options for our barcode scanning client. You can define which barcode formats you want to handle. The full list of supported formats can be found here.

private val options = BarcodeScannerOptions.Builder()
   .setBarcodeFormats(Barcode.FORMAT_QR_CODE)
   .build()

Also, if you want to detect all barcodes even if they can`t be decoded you can enable their detection using the BarcodeScanningOptions Builder method `enableAllPotentialBarcodes()`.

Now you can use these options in a function that will detect our barcodes on an InputImage object.

private suspend fun detectBarCodes(image: InputImage): List<barcode>{return BarcodeScanning.getClient(options)
       .process(image).await()}</barcode>

And let's use this function to get all barcodes and update our UI.

private fun proceedPhoto(photo: Uri){
   lifecycleScope.launch{
       val barcodes =
           detectBarCodes(InputImage.fromFilePath(requireContext(), photo))for(barcode in barcodes){
           updateUi(barcode)}}}

The last step is to process the barcode and update the UI. Each barcode has their type. You can find all of them here.

And here is the handling of the most common types of barcodes.

private fun updateUi(barcode: Barcode){
   when (barcode.valueType){
       Barcode.TYPE_WIFI-&gt;{
           val ssid = barcode.wifi?.ssid
           val password = barcode.wifi?.password//Do something with this data}
 
 
       Barcode.TYPE_EMAIL-&gt;{
           val email = barcode.email?.address//Do something with this data}
 
 
       Barcode.TYPE_PHONE-&gt;{
           val phone = barcode.phone?.number//Do something with this data}
 
 
       Barcode.TYPE_URL-&gt;{
           val url = barcode.url?.url//Do something with this data}
 
 
       Barcode.TYPE_GEO-&gt;{
           val point = barcode.geoPoint//Do something with this data}
 
 
       else-&gt;{
           val raw = barcode.rawValue//Do something with this data}}}
 

The next one is Image Labeling. With ML Kit's image labeling APIs, you can detect and extract information about entities in an image across a broad group of categories. The default image labeling model can identify general objects, places, activities, animal species, products, and more.

The flow is pretty much the same with only a few differences.

Here is the dependency for the Image Labeling library.

implementation 'com.google.android.gms:play-services-mlkit-image-labeling:16.0.8'
 

This time we will create a Client with default options and proceed with our image like in Barcode Scanning.

private suspend fun detectLabels(image: InputImage): List<imagelabel>{return ImageLabeling.getClient(ImageLabelerOptions.DEFAULT_OPTIONS)
       .process(image).await()}
 
 
</imagelabel>

Or you can init your options and set the required confidence level.

val options = ImageLabelerOptions.Builder()
   .setConfidenceThreshold(0.5F)
   .build()
 
 

Then we can proceed with the photo and get labels

private fun proceedPhoto(photo: Uri){
   lifecycleScope.launch{
       val labels =
           detectLabels(InputImage.fromFilePath(requireContext(), photo))
       labels.forEach{
           updateUi(it)}}}
 
 
 

From the label, you can get information about labeled objects. You can get each label's text description, index among all labels supported by the model, and the confidence score of the match.

private fun updateUi(label: ImageLabel){
   val labelText:String= label.text
   val labelIndex = label.index
   val labelConfidence:Float= label.confidence//Do something with this data}
 
 

The last one we will look at is Text Translation. You should require Internet permission in your manifest file to download the translation model.

<uses-permission android:name="android.permission.INTERNET"></uses-permission>
 
 

To translate text from an image you should first extract it. So, we will use two libraries: Text Recognition and Text Translation.

implementation 'com.google.android.gms:play-services-mlkit-text-recognition:19.0.0'
implementation 'com.google.mlkit:translate:17.0.2'
 
 

TextRecognition library detects only the Latin language. To recognize Asian languages, you should add different dependencies

implementation 'com.google.mlkit:text-recognition-korean:16.0.0'
implementation 'com.google.mlkit:text-recognition-chinese:16.0.0'
implementation 'com.google.mlkit:text-recognition-japanese:16.0.0'
implementation 'com.google.mlkit:text-recognition-devanagari:16.0.0'
 
 

Firstly, let's create a function that extracts text from our image. You should use different RecognizerOptions for different languages.

  • TextRecognizerOptions
  • KoreanTextRecognizerOptions
  • JapaneseTextRecognizerOptions
  • ChineseTextRecognizerOptions
  • DevanagariTextRecognizerOptions
private suspend fun detectText(image: InputImage): Text {return TextRecognition
       .getClient(TextRecognizerOptions.DEFAULT_OPTIONS)
       .process(image)
       .await()}
 
 
 

Then declare translator options and specify source language and target language.

val options = TranslatorOptions.Builder()
   .setSourceLanguage(TranslateLanguage.ENGLISH)
   .setTargetLanguage(TranslateLanguage.UKRAINIAN)
   .build()
 
 
 
 

Also, you should declare translation conditions to download the translation model. You can requireWifi or requireCharging

val conditions = DownloadConditions.Builder()
   .requireWifi()
   .build()
 
 

The next step is to create the translation client, download the translation model, detect the text, and finally translate it.

private suspend fun translateText(image: InputImage):String{
   val translator = Translation.getClient(options)
   translator.downloadModelIfNeeded(conditions).await()
   val text = detectText(image)return translator.translate(text.text).await()}
 
 

Now you can use this function to translate text on your image.

private fun proceedPhoto(photo: Uri){
   lifecycleScope.launch{
       val translatedText = translateText(InputImage.fromFilePath(requireContext(), photo))}}
 
 
 

As you can see, there are no complexities, and for most libraries, you should use the same algorithm.

  1. Add dependency
  2. Declare options
  3. Get the client by using options
  4. Process your image
  5. Process received data

That's all!

Let’s Sum Up

Machine learning and image recognition, in particular, is not necessarily something out of the ordinary. For example, in your app, it can be used on the stage when the users add their photos when registering. All you need is to check whether the picture is appropriate — at least if it’s not the image of the user’s pet. And the client wants to start using your product as soon as possible and doesn’t want to wait for the photo to be approved by the moderator.

Image recognition implementation is not always costly. You may use, fully or partially, already existing solutions, Firebase ML Kit being one of them. we hope that the algorithm described will help you to do it without any extra efforts. and if you want to be totally sure in the result — address our team. We will readily help you with the development of web services and mobile apps.