Gleaning Meaningful eCommerce Insights from Millions of Transactional Data Points

You are currently viewing Gleaning Meaningful eCommerce Insights from Millions of Transactional Data Points
Gleaning Meaningful eCommerce Insights from Millions of Transactional Data Points

According to etailinsights, there are around eight million online retailers worldwide. They produce billions of dollars in sales and transactions. Amazon alone ships billions of products annually. Having insight into transactions is increasingly crucial to staying competitive. The larger the retailer, the more beneficial using a graph database will be to analyze transactional data. But how can eCommerce managers best leverage this technology?

A single transaction can be a little telling. An online retailer can learn about the time of a transaction, the type of payment used, the product and its category, the price, and more. In a traditional database, this information sits mostly disparate from other such transactions. But using a graph database, a vendor can piece it all together. Patterns across millions of users can be seen. As a result, better sales, marketing, and operations decisions can be made based on strong intelligence.

Today’s large data sets in big eCommerce businesses can be millions, billions, even trillions of data points. And as mentioned, looked at in silos, the data is usually meaningless. But if data scientists can understand how transactions can relate, an eCommerce vendor can find they sit on a goldmine of data. A graph database is the most suited technology for connecting these dots.

Why a Graph Database?

The database most people are familiar with is known as a relational database. This is because it has served users well for decades. But with today’s immensely large data, its performance degrades as more data is ingested.

Most important is that, ironically, a relational database pretty much fails in handling relational processes of complex big data. They cannot efficiently traverse billions or more data points. Compared to a graph database, they are very limited in relating data to each other, especially as they need to do multiple hops to other data sets.  As a result, graph databases have recently grown in popularity. They are purposeful in traversing data sets to establish relationships between them. Furthermore, it is possible to scale a graph database to meet most data processing requirements of the largest organizations in the world.

Ecommerce transactions happen to be an ideal situation for a graph database. Users can specifically uncover relationships between one purchase transaction data point and another and between many data sets. And data scientists can take advantage of graph databases that employ graphical user interfaces to simplify data analysis and insight.

General Data Capability Needs

A data scientist’s or CIO’s ability to write data and query the data promptly is at the core of a graph database. Therefore, a testing environment might be ideal for an eCommerce company to assess graph database capabilities before selecting one to go with.

Setting up a testing environment is simple for the common IT manager, requiring minimal technical resources. This might include a server environment with hardware capabilities suited to the minimum requirements of an organization to run graph databases, like four gigabytes of memory and 10 GB of disk space.

Next, at least one data set similar in size to what you expect to use in a real-world environment is needed. For example, your eCommerce data set might need a couple of million vertices and 20 to 30 million edges on each graph database candidate. Then you can test performance via queries that walk multiple hops from a vertex and find all neighboring data points.

An Example Data Set to Use

Data sets can be found publicly available for testing and analysis. To look closely at an example for retail, we can review one from Kaggle, titled eCommerce behavior data from multi-category stores. It contains 285 million users’ events from an eCommerce website.

The behavior data spans seven months from a large multi-category online store. In the data, each row represents an event related to products and users. Thus, each event represents a many-to-many relationship between products and users.

Graphing of Retail Transactions

So, a graph database will have VERTICES, also known as nodes. Depending on the graph database used, a VERTEX can also be assigned TAGs. In this case, we can assign tags like “user”, “product”, “merchant”, etc.  So, using a TAG, we can create many VERTICES. Then PROPERTIES are defined on TAGS. For example, the “user” TAG can have the PROPERTIES of “name” and “email,” and the “product” one can have “category” and “name.”

With VERTICES, TAGS, and PROPERTIES defined, a data scientist can connect the vertices/nodes with one or more EDGES. Defining EDGES is what connects data points one way or both ways. We can also create EDGE TYPES. Here, we might have two EDGE TYPES. One might be “Favorite,” and the other might be “Purchase.”

With such a basic graph model, we might find that all users share one or more purchased products with a certain user, like  user_ID.email:[email protected]. This can provide certain profile insights. So, with a query in a graph database, we can uncover that all those users are connected through those products purchased by that one user. This could be done for millions of users, products, categories, and more.

Such a query could be done in a millisecond in a graph database, while in a relational database when transaction records are at a huge scale, the response time will be much longer. Furthermore, our example is only a one hop query.  In a query with multiple hops, the difference could be seconds versus hours of waiting. With such technology, you can provide improved real-time recommendations to shoppers, like “you might also want to buy this to go with your product.”

For instance, we may have this simplified data schema in a relational database about a user, product, and purchase_record.

The SQL for a one hop query to get the above information: get all users that purchased the same products that a certain user: with id:31415926 had purchased, would be as follows:

SELECT a.id, a.name , c.name AS product_name

FROM user a

JOIN purchase_record r ON a.id=r.user_id

JOIN product c ON c.id=r.product_id

WHERE c.name IN(

SELECT c.name

FROM user a

JOIN purchase_record r ON a.id=r.user_id

JOIN product c ON c.id=r.product_id

WHERE a.id = 31415926)

A graph database query would be like this:

GO FROM “user_31415926” OVER purchase YIELD purchase._dst AS product |

GO FROM $-.product OVER purchase REVERSELY

It is important to note; it is not only that the query composition is much simpler, but it is also that performance of the query is much faster. For example, in multi-hop concurrent queries, as shown in the following table, the response time could end up 1,000 times faster, which is strongly ideal for online services.

Such data findings are just the tip of the iceberg. Graph databases are already in use for such purposes across eCommerce. One familiar example is the “you might also like” recommendations you get during checkout when buying a certain product. This added sales opportunity requires speed to show a potential buyer as they checkout to pay.

Getting Started with Graph Databases

It can get complicated to interrelate data sets, particularly the more data sets and data points you have. Graph databases are adopted mainly for this reason.

But as data sets the scale, infrastructure must also scale to ensure performance. This is true across data scenarios, from historical retail transaction information to real-time recommendations. But a seasoned programmer can get started anytime. Open-source graph databases are commonly available to test the waters. And existing relational database users can conveniently add a graph database to their mix for deep traversing of data that provides better business insights.


Author Bio:

Sherman Ye is the founder and CEO of vesoft, Inc. Ye previously worked at Facebook and Ant Financial, leading graph database efforts. He also opened sourced the distributed graph database Nebula Graph in 2019.

Gleaning Meaningful eCommerce Insights from Millions of Transactional Data Points

eCommerce FAQs

Passionate advocate for digital inclusivity, leading the charge at Understanding eCommerce to provide web accessibility solutions for businesses and organizations. Committed to making the online world accessible to all.