PayPal moved 400 petabytes of data into Google’s enterprise data warehouse Big Query in a migration it’s calling the biggest in history. 

The payments processing company had 400 petabytes of data spread out over multiple siloed systems. Execs said they knew that to fully unlock generative AI workflows that data needed to be consolidated: "We had to get our disparate data platforms in order, first."

In a blog posted last week, PayPal says its massive amounts of data was stored in a fragmented mix of Teradata, Hadoop, Redshift, and Snowflake – systems that it had acqui-hired over the years. 

By dumping its 400 petabytes of data into Google’s serverless data warehouse, PayPal said it has, obviously, reduced the data infrastructure vendors it relies on. But, the migration has also as much as 10x’ed its data query speed and provides data that is 16 times fresher for training AI models. 

See also: Not the CIO’s job? Getting your organisation AI agent ready

The migration hasn’t been quick. PayPal said it had finished the “first of a multi-step Google Cloud platform journey” in May 2021 migrating half of its Teradata into BigQuery. 

Even back in 2021, the payments company could see AI potential in the data warehouse, “BigQuery allows us to centralize our data platform without losing capabilities such as SQL access, Spark integration, and advanced ML training. Also, BigQuery has some advanced features such as ML and real-time analytics that can be leveraged without moving data out to another system.”

The considerations before the storm 

The payments processing company said it wanted to refresh its data foundation after acquisitions and growth had created a tangled web of data pipelines, “After 25 years of success in expanding services and capabilities, we’d created complexity in our data analytics infrastructure. Some 400 petabytes of data was spread across a dozen siloed systems due to limitations of scale and acquisitions.”

PayPal’s head of data, AI and machine learning Mani Iyer and director of data engineering Vaishali Walia said PayPal considered building a solution in house. 

The execs said after looking into scaling PayPal’s on-prem solutions, “the cost and time-to-complete would have been prohibitive.” So they turned to the cloud providers. 

See also: $300 million cloud bill triggered a rethink - and a shopping spree on modular hardware

AI was the driving force behind getting this mega migration over the line, “We knew fragmented data would severely limit our ability to create the intelligent experiences customers have come to expect.” 

This could be why they went with Google’s warehouse product. With Gemini arguably pulling ahead as a frontier model, Google is the only one out of the big three hyperscalers that is also building competitive in-house models.

This AI edge is getting noticed. The PayPal execs said the most important deciding factor for going all in with BigQuery was its “native integrations with AI” for data analytics. 

Under estimating the data mess

PayPal made the transition a company-wide priority and required military-grade organising. FinOps was constantly tracking performance and spend, while detailed inventories of data and lineage meant scope, effort and cost could be established early, as well as dependencies. Once the plan was clear, “We automated every possible task and developed live dashboards to continuously monitor the progress of migrations.”

Moving all its data into BigQuery meant PayPal reduced its data infrastructure vendors from four to one – which they say also eliminated duplicate data. With the help of Google consultants, PayPal said they were also able to streamline by decommissioning 25% of their workloads in the process. 

Engineers working on new features now have access to clean, governed data for building, which they say is “a crucial step in AI development.” 

The execs had warnings for companies that haven’t yet undergone the data renovation process. Don’t underestimate “how under-utilized your data may be, and how unorganized” or the power of “ensuring data is accessible to everyone within your organization.” 

PayPal says one of the biggest opportunities they see for this migration is for breaking down silos within the enterprise. 

The link has been copied!