Here’s how you can predict graphs accurately, quickly, and securely with Amazon Neptune

At TechSparks 2021, Rahul Shringarpure, Solutions Architect, AWS India (AISPL), talks about challenges within graph database and relational database models, and how Amazon Neptune helps users work with highly connected data sets and make predictions, easily, quickly, and securely.

"Knowledge graph lets you store information in a graph model to easily navigate high connected data sets like suggesting friends on social media, processing financial transactions real-time, etc," said Rahul Shringarpure, Solutions Architect, AWS India (AISPL), during a masterclass 'Going Graph with Amazon Neptune’ at Techsparks 2021, YourStory’s flagship event.

Knowledge graphs are often used to store interlinked descriptions of entities — objects, events, situations, or abstract concepts — with free-form semantics. They can also be used to detect fraud patterns, get restaurant recommendations based on customer interests, among other use cases.

Graph database vs relational database

A knowledge graph is made up of three components: nodes, edges, and labels. Any object, place, or person can be a node, while the edge defines the relationship between these nodes, and labels are attributes that group similar nodes together.

Apart from graph databases, there are also relational databases, and the main difference between the two is the way relationships between entities are stored. In a graph database, relationships are stored at the individual record level, while a relational database uses predefined structures, otherwise known as table definitions.

According to Rahul, if you need to frequently scan for data that meets certain requirements, e.g. finding employees with a particular skill from a large chunk of data, it's best to use relational databases. On the other hand, graph databases are more suitable for highly connected data, as each record has to be examined individually.

Most common graph models frameworks

There are typically two main types of graph databases:

  1. Labelled Property Graphs (LPG) - It's an attributed multi-relational graph that provides important traverse language called Gremlin, which is supported by a number of open source and vendor implementations.
  2. Resource Description Framework (RDF) - This is standardised by W3C (World Wide Web Consortium) in a set of standards collectively known as semantic web. It allows users to express their graph queries against RDF models and uses the concept of triples (subject-object-predicate) to encode the graph. Customers prefer this model as it provides flexibility to model complex domains.

While there are certain advantages for both these models, Rahul said that a few challenges exist. "It's difficult to scale if you set it up yourself, complex to maintain in a high availability configuration, too expensive, and there is limited support for open standards."

Amazon Neptune: A fully managed graph database

This is where Amazon Neptune comes in. It's fast, reliable, easy, and open, and helps users easily work with highly connected data sets. It queries billions of relationships with millisecond latency and supports Apache TinkerPop and W3C RDF graph models. You can also easily build powerful queries with Gremlin and SPARQL and create six replicas of your data across three Availability Zones with complete backup and restore.

"The core of Amazon Neptune is a purpose-built, high-performance graph database engine optimised for storing billions of relationships and querying the graphs within milliseconds latency," Rahul said.

Amazon Neptune supports popular graph models like property graph, W3C, RDF, Apache TinkerPop, and Gremlin. "Since it's secure and fully managed, one doesn't need to worry about hardware provisioning, software patching, configurations, or backups," he added.

Companies like Dream11 and Games 24x7 are using Amazon Neptune to scale their social network and for fraud detection and analysis.

Quick and accurate predictions with Amazon Neptune NL

Amazon Neptune NL is a new capability of Neptune, which uses Graph Neural Networks (GNN), powered by Deep Graph Library (DGL) and Amazon SageMaker. You can make predictions of graphs within hours instead of weeks, increase accuracy, scale to large datasets, easily detect fraud, recommend personalised products, and save money.

Recently, Amazon Neptune introduced a cache for query results. Query result cache is an in-memory solution to increase the Gremlin query performance for long and short running queries and queries that need pagination of results.

"It significantly reduces the time and effort that customers spend to build and manage an external caching solution, as it automatically fetches all matching results of the query and stores them in memory, greatly improving query performance," said Rahul.

To log in to our virtual events platform and experience TechSparks 2021 with thousands of other startup-tech enthusiasts from around the world, join here. Don't forget to tag #TechSparks2021 when you share your experience, learnings and favourite moments from TechSparks 2021.

For a line-up of all the action-packed sessions at YourStory's flagship startup-tech conference, check out TechSparks 2021 website.

Edited by Saheli Sen Gupta