February 29, 2024

min read

Technical Deep Dive Shopify AI Store Demo

Fokke Dekker

Introduction

If you're new to our blog, you may have missed our previous posts introducing Seaplane, outlining our core beliefs, and offering a glimpse into our demo store. For those catching up, here's a quick overview before we delve into the technical details of our demo store.

Introducing Seaplane: Seaplane is a global platform designed for building AI-infused applications. Through a straightforward Python SDK, it seamlessly integrates cutting-edge (edge) AI models and data stores with top-tier global infrastructure.
Our Core Beliefs: Summarizing all our beliefs into a single sentence is quite the challenge (we've got plenty to say – feel free to dive into our blog for the full scoop!). But at the heart of it all, we're all about making life easier for developers and speeding up the adoption of generative AI to fuel globally-scaled AI-infused applications on the edge.
Unveiling Our Demo Store: Picture our demo store as a fictional showroom for snowboard enthusiasts, it's our version of a concept car in the e-commerce industry. It showcases all the awesome features we've engineered into Seaplane, consolidated into one demo. Will every e-commerce platform resemble our demo store in the future? Unlikely. Much like a concept car, this demo serves to ignite creativity and showcase the endless possibilities available today.

Demo Store

Welcome to Double Black Snowboard Co., where we've recreated the friendly atmosphere of a traditional brick-and-mortar shop online. When you first arrive, you'll meet our virtual store clerk, ready to help you find what you want.

Our chat interface will walk you through seven questions to better understand your preferences. These questions cover everything from your skill level to your favorite type of riding, ensuring that we only show you the products that perfectly fit you and your riding style.

Once you've answered these questions, you can ask anything about specific boards or products. The chat app will use a 7b parameter LLM deployed on the edge enriched with retrieval augmented generation (RAG) from product information and information about you, the customer, to answer your questions best.

In the background, our system quietly works to provide you with hyper-personalized product descriptions and taglines using small LLMs on the edge. Once you enter the Q&A portion of the app, it will use buying history from other clients to develop one super recommendation powered by collaborative filtering run on Llama70b.

Want to see how it all works? Check out our live demo video below!

Requirements

As we've mentioned earlier, think of this demo as our version of a concept car: it's designed to highlight the broad capabilities of Seaplane into the future, while focusing on practical benefits any AI Developer can leverage for an AI-infused app right now.

Operating a global e-commerce store like Double Black Snowboarding Co. comes with challenges, and one of the biggest concerns is latency. Every millisecond counts in e-commerce, and delays can significantly impact customer experience and conversion rates. While large language models aren't known for their speed, we've prioritized minimizing latency in this demo. We do this in a number of ways: first, by utilizing smaller, edge-deployed LLMs delivering quicker responses; second, by geographically spreading where model instances can run - close to local, active users wherever they may be globally - we can further improve responsiveness. And third, at the same time, we mask longer latencies of larger models to generate more detailed or deeper results by running them in parallel to the end-to-end chat experience only when we see higher user intent.

Another important consideration is data privacy and data sovereignty. We understand the importance of complying with international legislation such as GDPR while serving customers globally. In this demo, we'll demonstrate how to run a global store, serve traffic from different regions, and maintain data sovereignty—all with just a few lines of application code.

Technical Deep Dive

Now, let's delve into the technical details of our demo store. In this deep dive, we'll explore the components we've utilized and the rationale behind their selection. Our goal with this demo is to showcase the breadth of features Seaplane offers within a single, simple but realistic application.

Here's a breakdown of key components of the Seaplane SDK and platform leveraged in this demo:

Low latency Key-Value Store (KV-Store)
Low-latency (small) LLMs run on Seaplane edge GPUs
Zephyr-7b-chat
Mystral-7b-instruct
(Large) LLM for collaborative filtering
Llama-70b-chat
Global SQL with a regional by-row table
Seaplane Vector Store

Before continuing further, let's clarify some foundational concepts. In Seaplane, applications and their data flows are orchestrated through Directed Acyclic Graphs [DAGs]. DAGs wire together individually scalable compute containers known as tasks. Within our demo store, we've structured four primary DAGs, each serving a distinct function:

Chat-Intake DAG: Guides users through seven intake questions and updates available products based on their answers.
Question and Answering DAG: Manages the Q&A component post-intake questions.
Generate Descriptions DAG: Generates hyper-personalized product descriptions and taglines based on user preferences.
Collaborative Filtering DAG: Recommends products based on historical buying patterns and user interests.

All four DAGs combined form the entire demo application.

‍

‍

Using DAGs facilitates the re-usability of components. For example, the collaborative filtering DAG can be reused in any other application. We envision AI developers building up a library of reusable tasks and DAGs over time. Eventually, creating new applications becomes as simple as combining the relevant existing DAG blocks, much like playing with a Lego set.

Looking at the diagram above, one task stands out seemingly outside the scope of any DAG — the "question counter task." This task is not part of any of the four aforementioned DAGs, but it is a critical component: it manages routing within the application. A question counter state is created in the low-latency KV store for each new user. Every answered question updates this counter. For each run through the pipe, we query the KV store and store the current value as metadata to our messages.

Downstream from the counter task, we leverage a crucial feature of the Seaplane platform known as conditional routing in DAGs. Conditional routing allows us to route the user to the relevant sub-DAG based on their current state stored in the metadata. For example, users with a question number below seven are routed through the intake DAG. Users with a question number over seven are routed to the Q&A section of the app.

You can see references to the question counter in multiple locations in the DAG, triggering distinct features at specific times.

‍

Chat-Intake DAG

The Chat Intake DAG is the backbone for the seven intake questions designed to filter available products for each new user. It employs two parallel low-latency LLM calls, each triggered by one of seven prompts determined by the question counter.

The first track captures user input and extracts relevant information in JSON format using the Seaplane edge deployed Mistral-7b-instruct running on Nvidia l40s. This extracted data is then utilized downstream to filter available products and update a SQL table for this user.

Mistral prompt example to extract name and age from an input message.

‍

Simultaneously, the second track generates the subsequent question using the Seaplane edge deployed Zephyr-7b-chat run on Nvidia A40s. While these questions follow a script, we leverage an LLM to maintain the natural conversational tone of the chatbot, ensuring a personalized interaction with each user. Since both LLM calls run parallel, we minimize the additional latency incurred during question generation.

Zephyr prompt example to generate one of the seven input questions

‍

Both prompts are directed through Seaplane's reusable DAG called Substation. This DAG is responsible for all GenAI model inference calls in the Seaplane platform and can be wired into any application by simply importing it from the Seaplane Python package. By doing so, Seaplane automatically creates all the required logic and tasks inside the DAG. Giving us, as the developers, instant access to LLMs with support for asynchronous calls without the hassle of setting everything up ourselves.

Seaplane has created a set of DAGs to perform basic functionality. For example, we have DAGs for model calls, collecting results from multiple inputs and collecting results from a debatched run. In the future, we expect to greatly expand the number of reusable DAGs to improve developer velocity even more. Furthermore, we are opening an app store for community-created tasks and DAGs.

‍

The outputs from Substation, consisting of the inference results from both LLMs, are directed to specific post-processing tasks. Here, we validate the results, particularly from Mistral-7b-instruct, to ensure successful extraction and structuring of the required information in JSON format.

For example, the first question asks about the user's age and name. We expect the output of Mistral to look something like this.

We store this success or failure as a boolean and use it at the end of the pipe to decide if we should update the question state. If the model was not successful in extracting a JSON or the user did not provide all required information, i.e., missing keys in the JSON, then we do not update the question state; instead of the next question, we ask the user to answer the previous question again.

Both post-processed outputs are fed into the second Seaplane-created reusable DAG: the multiple inputs collector. Similar to the Substation DAG, we can add this DAG and all its internal tasks simply by importing it from the Seaplane main package. The multiple inputs collector ensures we have both results from both LLMs before proceeding to the next step.

After completing both LLM tasks, the update q-state task is invoked to update the user's question state in the low-latency KV store. Meanwhile, a separate branch in our DAG, the update products task, adjusts the available products in SQL based on user input, i.e., the extracted JSON. Running these tasks on separate sub-branches minimizes overall latency and enhances the chat-like experience of our application.

Finally, upon reaching the seventh intake question, conditional routing within the DAG based on the question state directs all information to a set of tasks dedicated to generating hyper-personalized product descriptions using Seaplane’s edge deployed Zephyr-7b-chat. By executing this branch in parallel and outside the direct user input-output path, we further reduce latency for the end user.

Q&A DAG

Upon completing the seven intake questions, we use conditional routing based on the question state stored in the low latency KV-store to transition users into the next section of our application: the Q&A segment. Here, the DAG orchestrates a two-part retrieval augmented generation (RAG) process executed in parallel to minimize end-to-end latency.

In the first part, the system retrieves the user's information from a regional-by-row global SQL table based on their unique ID (more on global SQL in the Regionality and Data Compliance section below). Simultaneously, the second part conducts a k-nearest neighbor search on a vector store to fetch the relevant product ID. This process matches the user's free-form text input (e.g., "I would like to know more about AirMaster Pro Freestyle Snowboard") to a specific product name and associated product data.

The user's input, profile information, and relevant product details are then sent to another instance of the reusable substation DAG to query Seaplane’s low latency edge LLM Zephyr-7b-chat to generate a response.

Concurrently, alongside the LLM call, the component responsible for product information retrieval sends its output to the collaborative filtering DAG. More on that in the section below.

Collaborative Filtering DAG

As previously mentioned, the RAG element responsible for retrieving product information also initiates a parallel branch in our DAG to generate personalized recommendations based on the purchasing history of similar customers.

This recommendation process is resource-intensive ($$$), utilizing a larger model Llama-70B deployed by Seaplane, and therefore only triggered upon detecting a buying intent, i.e., when the user requests more information about a product.

By executing this process as a side branch of the DAG, leveraging the conditional routing and multiple outputs functionality of the Seaplane SDK, we ensure that the recommendation computation occurs outside of the direct customer interaction path, thus minimizing latency experienced by the user.

The collaborative filtering DAG comprises three main steps: pre-processing, LLM inference, and post-processing.

In the pre-processing task, all available products are retrieved from the Seaplane global SQL based on the initial user input provided during the intake flow. These products are then matched with our simulated buying history. Since this is a demo environment lacking real purchasing data, to maintain the integrity of the collaborative filtering DAG, we've opted to utilize an Amazon ratings dataset sourced from Kaggle, slightly adapted for our use case.

A prompt is constructed to guide the LLM inference process using the available products and ratings dataset. We use a low-temperature setting of 0.01 to reduce the risk of hallucinations and extraneous output.

Additionally, we use a system prompt to guide the LLM only to return a JSON string. Llama-70b was particularly chatty (perhaps because we deployed Llama-70b-chat) and often included additional information. Rather than wasting hours optimizing the prompt, we opted to use a regular expression to remove any unwanted text and extract just the JSON string.

In the post-processing stage, the output is filtered, and a query to Seaplane global SQL is executed to update the available product rows to reflect the recommendation.

Generate Descriptions DAG

The Generated Descriptions DAG is directly linked to the Intake DAG, executing only once all intake questions are completed. We do so by leveraging the conditional routing feature in Seaplane. The descriptions are generated once the question state reaches seven in the low-latency KV store.

Like other DAGs within the application, the Generated Descriptions DAG operates in parallel with the Chat Intake DAG, minimizing end-to-end latency for the user.

The generate descriptions DAG retrieves all available products for this user (based on their input) and creates two prompts for each product. One to generate the tagline and one to generate the hyper-personalized description.

Each prompt is run through another instance of the re-usable substation DAG utilizing Seaplane's edge deployed Zephyr-7b-chat to generate the response. All results are stored in Seaplane global SQL.

‍

Image Generation DAG

The image generation part of our application is a standalone Seaplane application that takes a prompt - the description of the snowboard - and runs it through Stable Diffusion XL (SDXL). We use the in-painting capabilities of SDXL by providing it with a mask image, as shown below. We can instruct SDXL only to generate a new image in a specific location. In other words, replace the black pixels in this image with the generated image. The application returns the image as a bytes object for the front end to store.

Output, prompt: A snowy mountain with a touch of Hawaii and Aloha

Data Serving Endpoint

With an understanding of each chatbot component, let's delve into how we deliver this information to the front end. A crucial aspect of the demo is a dynamic storefront that updates in real time, filtering products based on user input.

The serving process is relatively straightforward compared to the complexity of the multiple DAG applications handling the chat intake. The serving endpoint is a stand-alone Seaplane application that queries Seaplane Global SQL for all available products for a given user ID. It consists of a straightforward linear DAG with a single task for the SQL call. Essentially, turning the SQL table into a REST queryable endpoint.

Shopify Integration

This demo's web interface was created using Shopify's own headless commerce stack Hydrogen as a base.Besides striping away functionality not within the scope of what we wanted to showcase and a handful of CSS rules to make it all look nice, the bulk of the interaction was introduced by piggybacking of existing calls from the browser to the server.When a request to provide predictive search results arrives to the server, a new job is submitted via HTTP to Seaplane's AI infused pipeline, its results include further questions from the AI clerk to the user, as well as a list of product IDs that are then used to fetch product information from Shopify's official GraphQL and then returned to the browser for display

Interaction between Shopify and Seaplane

‍

Regionality and Data Compliance

By default seaplane deploys to region earth, but as highlighted at the outset of this blog, regional compliance and data sovereignty are important considerations for our demo store, driven by two primary factors: adherence to data regulations such as GDPR and latency optimization for the end user.

While GDPR and similar data compliance laws regulate a broad spectrum of requirements beyond the scope of Seaplane, addressing regional restrictions is an area where our platform can help. Data regulations often impose constraints on the transfer of personal data outside the regulated area or country to ensure adequate data protection levels. In practical terms, for example, GDPR necessitates storing data related to EU citizens within EU borders.

In the current version of the Seaplane SDK, achieving this is facilitated by configuring the region setting in the .env file and appending a prefix for the application, such as eu-

‍

We create a distinct .env file for each new location we want to support and execute seaplane deploy to deploy an application instance to serve local traffic. Each newly deployed instance will have a unique URL; for example, our EU instance is available at https://carrier.cplane.cloud/v1/endpoints/eu-chatbot/request

We currently support the following regions (with more regions coming very soon):

‍

Frankfurt , Germany
New York, USA
London, UK
Narita, Japan
Singapore, Singapore
San Jose, USA

‍

Each location has access to all data stores deployed locally, such as KV-store, object-store, and Global SQL, but also edge-deployed low latency LLMs, such as Zephyr and Mistral.

This streamlined approach already simplifies the setup compared to independently configuring cloud infrastructure across multiple regions, zones, and clouds and setting up your own data stores and LLMs. However, as detailed in our “we believe” blog post, we are committed to providing the best developer experience; frankly, this is not its final form. Stay tuned for future updates that further improve regionality.

Global SQL

While most data stores, such as Key-Value and Object Store, typically deploy multiple nodes within a single location specified in the .env file, Seaplane Global SQL offers greater flexibility and customization. In this section, we want to address some of that flexibility and why we chose our chosen structure.

Seaplane Global SQL provides three distinct table types for data regionality with some implications on fault tolerance and latency:

As illustrated in the table above, two table types support regional deployment: Row-level geofence tables and regional tables. While regional tables allocate the entire table to a specified region, row-level geofence tables, as implied by the name, distribute data across different regions per row.

For our demo use case, the row-level geofence table proves more suitable. It affords us granular control over data storage locations without necessitating a new table for each location. Instead, we tag each row with the appropriate region, and Seaplane automatically allocates them to the correct location.

Lessons Learned

While building this demo, we've had some eye-opening moments about our platform and brainstormed a ton of ideas for making it even better. To avoid turning this blog post into a book series longer than Discord, we will leave those for a future blog.

Furthermore, we are still crunching numbers to measure the latency of each component to measure how fast our demo runs. Once we have all the juicy details, we'll definitely share them in a future blog post.

Keep an eye out and give us a follow on LinkedIn to stay in the loop when those posts drop!

Try it Yourself

If you’re interested in checking out the code for this application or trying it yourself, go here to join the Seaplane beta. (Select “Shopify Recommender” in the dropdown.) We’re responding as fast as we can and are happy to help integrate with your e-commerce site (Shopify or otherwise).

‍

Bring Your Apps to the World

Join the Seaplane Beta