Skip to content

SessionPath is a deep learning model that provides personalized category suggestions for type-ahead APIs. This repo re-implements the original paper (https://arxiv.org/abs/2005.12781) leveraging Ludwig capabilities.

License

Notifications You must be signed in to change notification settings

ANUDAVIS/session-path

 
 

Repository files navigation

SessionPath (Ludwig re-mastered edition)

Personalized Category Suggestions for eCommerce Type-Ahead

Overview

This repo contains working code from our blog post Building personalized category suggestions with Ludwig. By leveraging Ludwig capabilities, we implement an encoder-decoder architecture to provide personalized and dynamic category suggestion to augment type-ahead API.

A typical type-ahead experience is this one:

Amazon Category Suggestion Example

What we are trying to build is a smarter system, one that suggests different categories depending on contextual factors as well (e.g. the products the user interacted with):

Dynamic Category Suggestion Example

Blog post and code are inspired by our research paper presented @ ACL 2020: How to Grow a (Product) Tree.

Setup

Code has been written for Python 3.7 - the provided requirements.txt can be used with a virtualenv to run the project in a separate virtual environment.

Credentials and global parameters can be set with the standard .env file (*.env.local is provided as a template), and they are available in the pipeline script through dotenv.

Repo Structure

We provide two main scripts to test out our models for category prediction in type-ahead: a simplified, but realistic end-to-end "stateless" pipeline, creating from scratch from raw data all input features and a Ludwig-friendly dataset; a stand-alone folder with a minimal Ludwig script in case you already have embeddings and data rows ready for the model.

Luigi-powered pipeline

By running model_pipeline.py, a Luigi local pipeline executes a DAG comprising four tasks:

  • prod2vec training: product embeddings are trained from browsing data and stored locally as text in the Glove format;
  • dataset preparation: extract data from search logs and prepare a csv with three columns, "query" (the input query), "skus_in_session" (product identifiers for in-session interactions: view, add, etc.), "path" (the target taxonomy path). "skus_in_session" and "path" are sequences, so they are saved as tokens separated by a space;
  • Ludwig training: define the deep learning model and feed it to Ludwig for training and local persistence;
  • Ludwig testing: load the model from storage, test it on held-out data and print out summary statistics.

By using Luigi, we wrap this DAG in a convenient flow that saves us time if we need to re-run the pipeline from a particular step, and ensure consistency if we perform a clean run.

Please note that data retrieval functions in data_service.py and prod2vec_train.py are just stubs: in our original repository they contained our Snowflake-based code to load behavioral and search data from our warehouse; modify them with your own logic to extract behavioral and search data so that downstream tasks can run seamlessly (we left a small snowflake client in the repo for convenience).

The folder ludwig_playground contains *.local files that show sample datasets and sample ancillary files.

The folder data contains catalog.csv.local, which is a sample csv file representing product information (identifiers, images, taxonomy path): it may be useful to have a product lookup if your search logs (e.g. products clicked after a search) report product identifiers and you need to join products with paths to prepare the final dataset.

Standalone Ludwig training

If you already have embeddings ready (stored in a tab-separated text file, as in the "Glove format") and a dataset file, you can put them in the ludwig_playground folder and play directly with Ludwig code with no other dependency: ludwig_playground.py have some global variables you can set to re-run training, or just running a trained model on new input rows.

The *.local files in the folder show the accepted format for a dataset and an embedding file to run the Ludwig code.

Acknowledgments

This repo is a joint effort of Jacopo, Bingqing and Marie.

We wish to thank our friend Piero Molino, Ludwig's creator, for showing us how to re-write our model (SessionPath) with Ludwig.

How to Cite our Work

If you find this repo (and the ideas in it) useful for your research, please cite our work:

@inproceedings{tagliabue-etal-2020-grow,
    title = "How to Grow a (Product) Tree: Personalized Category Suggestions for e{C}ommerce Type-Ahead",
    author = "Tagliabue, Jacopo  and
      Yu, Bingqing  and
      Beaulieu, Marie",
    booktitle = "Proceedings of The 3rd Workshop on e-Commerce and NLP",
    month = jul,
    year = "2020",
    address = "Seattle, WA, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.ecnlp-1.2",
    doi = "10.18653/v1/2020.ecnlp-1.2",
    pages = "7--18",
    abstract = "In an attempt to balance precision and recall in the search page, leading digital shops have been effectively nudging users into select category facets as early as in the type-ahead suggestions. In this work, we present SessionPath, a novel neural network model that improves facet suggestions on two counts: first, the model is able to leverage session embeddings to provide scalable personalization; second, SessionPath predicts facets by explicitly producing a probability distribution at each node in the taxonomy path. We benchmark SessionPath on two partnering shops against count-based and neural models, and show how business requirements and model behavior can be combined in a principled way.",
}

The arxiv version is available here.

License

The code in this repo is freely available and provided "as is" as covered by the MIT License.

About

SessionPath is a deep learning model that provides personalized category suggestions for type-ahead APIs. This repo re-implements the original paper (https://arxiv.org/abs/2005.12781) leveraging Ludwig capabilities.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%