In the world of enterprise SaaS, flexibility is a double-edged sword. Work management platform Smartsheet provides users with a blank canvas toolkit to build anything from simple checklists to massive capital infrastructure portfolios; customers include Arc’teryx, Bayer, HP, Fox Sports, Puma, and Uber.
But that same flexibility creates a data engineering challenge: how do you map the invisible web of relationships between millions of users and potentially billions of disparate assets in a way that can remove meaningful friction for users – particularly when your software’s features are built on microservices?
And how do you integrate AI in a way that lets it provide unique insights and intelligent recommendations for users, securely, transparently and traceably?
For Divya Moorjaney, a Principal ML Engineer at Smartsheet, the answer was unlikely to be found in the rows and columns of a relational database.
Instead, she told The Stack, Smartsheet adopted a graph-based data model, built on the Amazon Neptune managed service, that has changed how the company handles entity resolution and real-time recommendations at scale.
To the uninitiated, building a recommendation engine to suggest collaborators might seem like a solved problem. In a typical, small SQL environment, for example, you might simply query for users who have accessed the same file.
But at Smartsheet’s scale (it serves over 100,000 enterprise customers and tens of millions of users) those connections are actually rarely linear.
"The Smartsheet Knowledge Graph is a flexible, unified data model that connects people, content, and work," Moorjaney explains.
It’s already providing customers with smart sharing suggestions, suggesting contacts users frequently collaborate with. The Smartsheet roadmap includes plans to serve customers with more personalized workspace insights, including through new AI agents and AI capabilities that understand the full context of users’ work and can offer contextual guidance and recommendations to optimize their projects.
In a standard relational database (RDBMS) or even a noSQL one, discovering signals that enable collaboration recommendations with the level of depth Smartsheet is targeting might require complex operations that become exponentially more expensive as the depth of the connection increases.
If you want to find a collaborator who is "three degrees away" (e.g. someone who worked on a different project within the same portfolio managed by a common teammate) a user interface could grind to a halt as a result.
Is the juice worth the squeeze?
Packing a lot of engineering work and building on a hyperscaler’s managed service to create the Smartsheet Knowledge Graph was a meaningful investment in a talented team’s time and effort. Pressed by The Stack on whether this felt like "using a sledgehammer to crack a nut" to deliver what are ultimately recommendations, Moorjaney was candid about the trade-offs.
"We asked ourselves that a couple of times... is the juice worth the squeeze?" Moorjaney says. The answer, she concludes, is "absolutely."
"The motivation was to understand the depth of connections.”
“Representing [multi-degree connections] through relational DBs or NoSQL DBs would become extremely difficult to do. We would have to probably train machine learning models and keep them up to date, and that’s a lot of work on our engineering teams...:
By opting for a graph-native architecture, Smartsheet shifted the complexity from the application logic to the data layer. Smartsheet uses Neptune to store entities as "nodes" (users, sheets, workspaces) and their interactions as "edges" (viewed, shared, commented). This allows Smartsheet to perform "k-hop" queries (traversing multiple edges in milliseconds) to surface the most relevant person to include in a project the moment a user opens it up.
Smartsheet uses AWS services to power other critical capabilities, making the choice of Neptune a matter of ecosystem synergy and the go-to graph database foundation. It uses Snowflake to store the customer’s usage data, Amazon ECS to prepare data for graph ingestion and also to expose graph insights and recommendations using Cypher queries via internal APIs to Smartsheet services, and AWS Step Functions to orchestrate ECS tasks to transform and load data to Neptune. (nb: Divya also went a little deeper here.)
“Empowering the team…”
For the engineering team, the shift was about future-proofing – and also capacity-building. It was also, Moorjaney hints, a fun process for them.
“The most enjoyable aspect [of a project like this] is learning with my team, empowering the team to learn something new, something different, taking on that challenge… Graphs and data are the bedrock and foundation of AI and we’re positioning ourselves well for that [further innovation with AI].
“It's also challenging, managing that scale, managing the ingestion processes, truly understanding the data, keeping a close eye on security and governance.”
There’s a lot of innovation that Smartsheet plans to keep doing, building on what her team has already delivered.
Smartsheet isn’t just layering a chatbot on top of a spreadsheet: it is rebuilding the underlying map of how work happens within an enterprise.
As Moorjaney puts it, the maintenance of a graph is "worth every moment" because it allows the platform to evolve as business needs change. For the engineers building the next generation of collaborative tools, the lesson is clear: if your data is inherently connected, your database should be, too.
Learn more about Amazon Neptune.
Delivered in partnership with AWS.