Even as a seasoned and serial technology entrepreneur, Jacky Chan was surprised by the reaction to a new technology advancement. It was February 2023, a couple of months after the release of ChatGPT had exploded interest in AI’s capabilities, and Chan’s phone simply wouldn’t stop ringing.
“People kept asking if I was working on ChatGPT,” Chan gleefully recalls. “I remember saying to my team, this is unusual. I've been involved in tech start-ups for years and I'd never experienced anything like it.
“I called Pak and said ‘this is what we should be doing.’”
Pak-Sun Ting, a former Head of Fixed Income Sales at Bank of China, who co-founded Votee with Chan in 2013, refers to it as “the pivot”: The moment when Votee, which for a decade had specialised in helping brands understand more about their customers through polling and micro research, became an AI company – and one with highly ambitious intentions.
Poor translation?
Votee’s USP is one of accessibility. Local, underrepresented languages – there are over 6,000 in Asia alone – are neglected by AI, Chan and Ting claim. OpenAI is banned in large parts of Asia, and in any case LLMs typically translate poorly to what they describe as low-resource languages.
In Hong Kong, where Votee is based, enterprises can only use OpenAI through a cloud provider. This leaves organisations who for data sensitivity and regulatory reasons require an on-premise deployment, struggling to access generative AI, as GPU chips from the likes of AI leader Nvidia are also banned from export to Chinese markets including Hong Kong.
Even for the organisations that can go through cloud providers, the accuracy rate when OpenAI is translated back into Cantonese, Hong Kong’s primary language, is below 50%.
“LLMs are usually in English or other European languages, or in Asia there are LLMs built in Japanese or Chinese but usually it's simplified Chinese,” says Chan. “Cantonese is a dialect, different from traditional Chinese. We use traditional Chinese in written form, but Cantonese verbally. In China they use simplified Chinese in written form and also verbally. It's totally different, and we found most of the existing LLMs can't handle Cantonese well.”
Training the model
The opportunity to open generative AI up to underserved parts of the world, starting in Votee’s home turf of Hong Kong, was clear, and the company’s decade of helping local companies do research and social listening met the remaining technical challenge: data.
“There was not enough online data [to train a Cantonese LLM] because it's a verbal language,” Chan adds. “AI and LLMs should be language-agnostic, but when there is not any data you can’t train them well. We were in a unique position because having done voting, micro research and social listening for over ten years, we had plenty of data to help train a Cantonese LLM.”
This included voice feedback from many years of brand surveys.
Within just a few months, Votee had built the first enterprise grade Cantonese LLM platform. Having previously built applications using MongoDB, Chan turned to MongoDB Atlas and its flexible document model to hold their vast quantities of unstructured data and underpin the new platform on multiple fronts – including with Atlas Vector Search to support the company in building RAG applications for its enterprise customers.
“When we are building low resource language LLMs, we have our own synthetic data pipelines and then we do all the data connections, cleaning and training everything on MongoDB. We chose MongoDB because it's multi-cloud, multi-region and enterprise grade,” he explains simply.
Security was also an important consideration.
“When we talk about application security, usually databases will only talk about encryption at rest or encryption in transit. MongoDB will also have encryption during execution. They really do encryption and security very well" Votee co-founder Jackie Chan
Proving the concept
Use cases for Hong Kong organisations have included applications for chatbots, contact centres, knowledge banks, appointment booking and brand social listening. The latter is an important tool for organisations to see what people are saying about their brand on social media and other places, but the lack of a Cantonese LLM previously held them back.
“On social media, Cantonese people kind of type how they would say things,” says Ting. “It's not written Chinese, it's written versus spoken Cantonese. If you use a traditional LLM, which searches for data that's been trained on text only, things that people say online will not make any sense. Social listening precision is so much higher with the Cantonese LLM.”
Having proven the concept for the Cantonese language, Votee is now setting its sights on unleashing the power of generative AI in other underrepresented parts of the world. The company is looking to target Southeast Asia, and beyond that will expand into Africa, while it is also set to build a talent base in Toronto next year to target the North America market.
“Now we've built Cantonese LLM, [we aim to get] this technology used by people across Hong Kong,” Ting adds. “There's this pent up interest and demand for LLMs [because of the restricted access]. The education and the knowledge is out there… so for us adoption is going to be a little easier. I think in this next year there is going to be a lot more POCs and deployment.
“We want to be the OpenAI for neglected languages, building AI stacks for low-resource languages and building enterprise applications on top of that,” Ting adds. “Creating a base model is the first step. The second step is to recycle some of the applications we already built in Hong Kong, and provide that into Southeast Asia, Africa, and so on and so forth.”
Delivered in partnership with MongoDB