
FinlyWealth
Find Me the Better Product! | Student Capstone Project
Ever tried finding the best deal online, only to struggle with the right keywords—or miss a better option because you didn’t know exactly what to search for? FinlyWealth tackles this problem with new a natural language and multimodal search engine that understands how shoppers actually describe what they’re looking for — no perfect keywords required.
FinlyWealth was co-founded in 2022 by UBC graduates Abid Salahi and Kevin Shahnazari, an alumnus of the UBC Master of Data Science (Vancouver) program. FinlyWealth is expanding into e-commerce, making it easier for people to find the right products by letting them search the way they naturally speak or even with a picture.
To help with the development, FinlyWealth worked with a group of UBC Master of Data Science (MDS) Vancouver students to build a fast and scalable multimodal search engine that lets users search use text or images to find the most relevant products.
Currently, keyword-based searching only matches exact words, not the meanings and struggles with paraphrased user search queries. As well, user search queries are very hard to predict and are very diverse. So, there is a need to support better search for e-commerce products such as fashion related items, books, electronics.
FinlyWealth tasked the students to develop a search engine to support natural language queries, have a response time of less than 5 seconds, and have reusable API endpoints. It also should have a reproducible pipeline. While not part of the initial scope, a web interface as well as use of a larger data set and an evaluation would be a nice to have.
For their project, the students grouped users into three types: Basic queries, attribute queries and natural language queries. Basic queries are short and general, attribute queries include the brand, colour, and size, while natural language queries are free-form text.
The data set the students had to work with were one million low resolutions .jpeg images of products and a text file that contains product name and product attributes. A TF-IDF (Term Frequency-Inverse Document Frequency) was applied to the product name and metadata, which ranked the products according to term weight, term frequency and length of text.
At first, the students were met with some false positives with a pure text search pipeline. For example, “desk for office use” did not yield any desks in their search results but rather products that had the keywords within the product description (i.e., office desk use). To improve the search results, the students implemented a system that transformed user queries and product information into semantic representations using advanced models. These representations were then compared to retrieve the top 20 most relevant matches.
While this showed improvement, further optimization was needed. The students used an LLM to re-rank the results based on relevance. They passed a series of prompts to the LLM and gave it some conditions to focus on (i.e., direct keyword matches, brand name mentions, and price comparisons). The LLM re-ranked the results to be more relevant to the actual search query.
In order to maintain performance while searching through a myriad of products, the students introduced into their pipeline the Facebook AI Similarity Search (FAISS). FAISS clusters the one million products into different groups so that when someone does a search, it only searches similar clusters and not the entire catalog. The trade-off in using FAISS is while it speeds up search time, it only searches a subset of products.
To evaluate the search engine system they created, the students synthesized their own benchmark dataset of 300 queries that they made up themselves as the partner did not provide actual customer data.
The students benchmarked three different metrics: Recall@20 (is the target product in the top 20 results?), Precision@20 (how many relevant products are in the top 20 results), and Search Time (how long does it take to retrieve search results).
The final system achieved a Recall@20 of 0.56, Precission@20 of 0.64, and the average search time was 4.24 seconds
The end result is the students were able to meet the project goal of creating a search engine that supports natural language queries with a response of less than five seconds and reusable API endpoints. As well, the partners were given indexing and inference/evaluation pipelines and a web interface along with an evaluation plan.
Explore our Data Science Programs Explore Other Data in Action Stories