Few organisations process as much data as Idealo, Europe’s biggest price comparison service. Idealo consolidates the latest offers from 50,000 merchants, including billions of offers from Amazon and eBay alone, to inform over 70 million monthly browsers of the cheapest places to buy products. 

The process of getting up-to-date offers from merchants is, on the surface, quite simple – mass updates sent every hour to Idealo via large CSV or XML feeds. However, the sheer volume, regularity and need for both speed and accuracy makes for a massively intensive data exercise.

The first step for the developer team in charge of the offer store is to check which incoming updates are relevant for Idealo. All the data then flows to the next components, which are responsible for applying any discounts to the base prices of the offers. Then everything must be stored in the offer store and provided to the downstream consumers at Idealo to present what specifically has changed. With 60 fields on each offer and regular updates, data moves rapidly.

Reaching the limits

The growing popularity of Idealo, with traffic increasing by six times in four years, meant Idealo was running the biggest on-premises MongoDB database in Europe. Processing 160,000 requests per second on an on-premises system warranted a 21-shot system with 72 CPU cores per node for the offer store alone. With traffic growth expected to continue at an approximate rate of 20% per year, Idealo was fast approaching the limits of its on-premises infrastructure. 

“Importing and filtering the data from the shops is complex,” says Jens Lippmann, a software developer who is part of the team responsible for Idealo’s offer store. “There are also huge databases involved in this process to keep fingerprints and timestamps because we get different feeds with different generation times and delays, and we have to keep all of this stuff in sync. 

“That's quite challenging. With billions of offers to keep up to date and traffic growing quickly, it became clear we would not be able to grow any further without expanding our data centre. All the existing cages were filled with hardware and scaling the self-hosted MongoDB environments to meet demand was taking several months. To reduce the cost of acquiring and running hardware, and the maintenance and administrative burden on staff, we had to turn to the cloud.”

Rapid read and write

While Idealo was already a MongoDB customer, it studied alternative options for a managed cloud service, however MongoDB’s multi-cloud database Atlas proved the most suitable for Idealo’s central use case: getting data out, updating it and writing it back as fast as possible. 

“We have other teams using advanced features like AI and machine learning, but the most important thing for us is the raw power to read and write data extremely fast,” says Lippmann. “We don’t have a huge number of queries, maybe 20,000 to 40,000 per second, but the work behind every single query is enormous. With MongoDB Atlas we can read thousands of documents in a single batch, perform a bulk update operation and write all the updates back in one operation.”

Idealo set MongoDB a target of completing the cloud migration before Black Friday last year, which was successfully achieved. The company is now in an optimisation phase to ensure it is using the new infrastructure as efficiently as possible as well as other areas such as security.

Greater flexibility to innovate

Today, Idealo is 100% on the cloud having decommissioned its data centre due to the success of the cloud migration, which has significantly reduced the resources required to operate Idealo’s service. The number of shards needed has dropped from 25 to 12 and the size of each node has reduced by two-thirds. Idealo can support up to 200,000 queries per second and 60,000 updates per second. It supported over 150,000 queries per second for 14 hours straight. 

Asides from the core benefits of being able to run the same number of offers with improved performance and lower cost, from a developer perspective Lippmann’s team are enjoying the greater flexibility they have to adjust and optimise systems and innovate new features at speed.

See also: MongoDB eyes "Java apps running on Oracle” in customer app modernisation drive

“It is completely different from the situation before in the data centre where we had to apply for new databases and then wait for weeks or months until it got provided to us. We can innovate in minutes rather than weeks or months,” Lippmann says. “Even if we want to just change different parameters on our clusters, it’s easy to use and we have never had a situation where something didn't perform as expected. So from a technical perspective it is a very good setup. 

“The training was excellent and the support engineers allocated to our team are great. We are in close contact with a success manager. They know our use cases and usage patterns for the various databases so can help us best. We are really happy with the relationship and support. It is fast and professional and we can rely on it. The biggest compliment I can say is it simply works. It’s a great feeling for us as developers when we can rely on support when we need it.”

Delivered in partnership with MongoDB

The link has been copied!