How to build an end to end scalable visual search system with AI Computer Vision and AWS

Written by

With the rise of e-commerce, online retail, visual search is a rapid trend because they are largely driven by visual content. In this article, I will share with you guys the visual search project that was built in 2018 for my Japanese partner.

What are the problems?

My partner is one of the biggest retailers of toys, clothing, and baby product in Japan. They have lots of stores all over the world. On holiday, the long queues of customers wait to checkout happen in lots of stores. So to solve this problem, they decide to apply the new store like Amazon Go, no queues, no checkout, just walk out of the store. In short, this not only improves their customers’ buying experience but also maximizes revenue growth.

Why do we need Visual Search?

provider

My partner has lots of providers. Each of them distributes different types of products. The products will be updated frequently by weekly, monthly, or yearly. So visual search system needs to be updated the same as the scale of products.

What is the core technology here?

provider

By the time I started this project in 2018, Triplet loss had proved efficient in Face Recognition. With that starting point, we decide to apply Triplet Loss as our core visual search technology because our problem is quite similar to Face Recognition. To get the highest accuracy and make the system scalable, we classify products into different categories and subcategories because Triplet loss will have the best performance when all the products are in the same domain property. For example, when you do Face Recognition, all the images are facial, right? Then in our problem, we train all the toys that have a similar domain with one AI Model. For example, the Lego toys will be trained together, the Figures will be trained together, and so on. And not only that, one of the most advantages here is when new products need to be updated, our system don’t need to re-training the whole model, the whole process which can reduce the cost. Normally, it’s very costly and takes us lots of time for training an AI model.

The architecture overview

As you can see, we have two main components: Serving and Training. The training component is built to fit with Admin, Operator, Providers, and Developers with different purposes. For any Machine Learning production, one of the most challenging parts is how your system can run automatically with the minimum of human interference. The serving component will host all APIs needed for our web app and mobile app.

Triplet Loss training strategy

Our strategy here to get the best model is: Clustering the categories to group categories with similar type samples. For each anchor image, we will pick 1 hardest positive, 1 hardest negative among the image batch. We will keep multiple anchors and compute the centroid of anchors for each category. To reduce computation cost we will first match with anchor centroids. If the distance is lower than TL and higher than TH, we will give an immediate decision that the product exists and doesn’t exist respectively. We are using two thresholds. Lower threshold, TL, and higher threshold And in the end, our model has very good accuracy, for the trained data, the accuracy is over 99 percent and 97 percent for un-trained data respectively.

Demo

A Video Worth a Million words, so, please see this demo below on how we increase customer engagement with AR and AI.

Conclusion

This article is focused on sharing the overview flow, architecture, and an example use case when AI is applied in the real world. If you are curious more about the technology, the development, or the business side, please follow my next articles.

Comments

2 responses to “How to build an end to end scalable visual search system with AI Computer Vision and AWS”

August 10, 2021

son

Bài này đi sâu hơn tính toán kỹ về chi phí thì ngon hẳn luôn

Loading…

Reply
April 16, 2025

Code of destiny

I’m really inspired together with your writing abilities and also with the layout on your blog. Is this a paid topic or did you customize it yourself? Either way keep up the nice high quality writing, it’s uncommon to look a nice blog like this one these days!

Loading…

Reply

How to build an end to end scalable visual search system with AI Computer Vision and AWS

What are the problems?

Why do we need Visual Search?

What is the core technology here?

The architecture overview

Triplet Loss training strategy

Demo

Conclusion

Comments

2 responses to “How to build an end to end scalable visual search system with AI Computer Vision and AWS”

Leave a Reply Cancel reply

More posts

When it comes to compensation and benefits, other than base salary, which is the most important to you?

Giúp background của View co giãn tối ưu với 9-patch image trong Android

Delegate trong Kotlin

4 mức độ hiểu biết của developers

How to build an end to end scalable visual search system with AI Computer Vision and AWS

What are the problems?

Why do we need Visual Search?

What is the core technology here?

The architecture overview

Triplet Loss training strategy

Demo

Conclusion

Share this:

Comments

2 responses to “How to build an end to end scalable visual search system with AI Computer Vision and AWS”

Leave a Reply Cancel reply

More posts

When it comes to compensation and benefits, other than base salary, which is the most important to you?

Giúp background của View co giãn tối ưu với 9-patch image trong Android

Delegate trong Kotlin

4 mức độ hiểu biết của developers