ReadNext 0.0.4: Local Embedding Model

I just release ReadNext version 0.0.4. The primary goal of this new version is to remove the dependency on the Cohere Embedding web service endpoint by using a local embedding model by default. To enable that, ReadNext got integrated with Hugging Face and is currently uses the BAAI/bge-base-en model.

Local vs. Remote

This new change remove dependency on one external service which makes it more stable. The processing time is a little bit longer with the local model, but it also depends on the capabilities of your local computer.

In terms of performance, the two systems are comparable. In my experience, about 80% of the propositions are the same, and the remaining 20% that are different yeld no major difference in accuracy. However, I do prefer the BAAI/bge-base-en propositions a little better for what I experienced so far.

You may want to experiment with both to see what works best for you. The only thing you have to do is to change the EMBEDDING_SYSTEM environment variable and to reload your terminal instance.

New Configurations

Two new configuration options have been added to this version:

  1. EMBEDDING_SYSTEM: This is the embedding system you want to use. One of: BAAI/bge-base-en(local) or cohere.
  2. MODELS_PATH: This is the local path where you want the models files to be saved on your local file system (ex: /Users/me/.readnext/models/)

If you already have ReadNext installed on your computer, please make sure to add those two new environment variables to you environment.

New Commands

Two new commands have been added as well. They have been added to help understanding the current status of the ReadNext tool. Those two commands are:

  1. readnext version: this gives the version of ReadNext that you are currently using
  2. readnext config: this gives the configuration parameters, and their values, currently used to run that instance of ReadNext

Literate Programming

While at it, I decided to migrate ReadNext’s Python codebase to use nbdev to continue its development using literate programming

All the literate files (notebooks in this case) where the code is tangled and the documentation weaved from are accessible in the nbs folder. The tangled codebase is available in the readnext folder. Finally, the weaved documentation is available as GitHub pages here.

Introducing ReadNext: A Personal Papers Recommender

Every day, approximately 500 new papers are published in the cs category on arXiv, with tens of new papers in cs.AI alone. Amidst the recent craze around Generative AI, I found it increasingly challenging to keep up with the rapid influx of papers. Distilling the ones that were most relevant to my work and my employer’s interests became a daunting task.

ReadNext is born out of my need to have a command-line tool that gets the most recent papers from arXiv, and feed the most relevants ones to my current interests into Zotero.

The key focus is to recommend papers that align with my evolving interests and research objectives, which may change on a daily basis and need to be continuously accounted for.

Why ReadNext?

  • Command-line Tool: ReadNext can be executed directly or scheduled as a cron job for easy access.
  • ReadNext fetches the latest papers from arXiv, ensuring you’re informed about your current interests
  • ReadNext integrates with Zotero, allowing you to manage your research library and organize recommended papers.
  • The core focus of ReadNext is to provide personalized paper recommendations based on your research interests, directly in your personal papers management tool.

How to Install

Getting started with ReadNext is simple. Install it using pip:

pip install readnext

Requirements

ReadNext relies on two fundamental external services to enhance its functionality:

  • Zotero: Zotero serves as the primary papers management tool, playing a pivotal role in ReadNext’s workflow. To configure ReadNext on your local computer, you have to create a Zotero account. If you do not already have one, you will have to create one for yourself, please refer to the section below.
  • Cohere: ReadNext leverages Cohere’s services for generating paper embeddings and summaries. These embeddings and summaries are essential components for providing personalized and relevant paper recommendations. It is necessary to create an account with Cohere. We will be expending support for additional embeddings and summarization services in the future, offering increased flexibility.

By integrating these services, ReadNext helps in discovering papers that align with your research interests and focus.

Read more about how to properly configure ReadNext here.

How Does ReadNext Work?

  1. As a Zotero user, I will create one or multiple “Focus” collections in my Zotero library. Those are the collections where I will add the papers that are the most interesting to my current research. It is expected that the content of those collections will change over time as my research focus and interests evolves.
  2. On a daily basis, I will run readnext in my terminal, or I will create a cron job to run it automatically for me.
    1. ReadNext will fetch the latest papers from arXiv
    2. ReadNext will identify the papers that are relevant to your research focus, as defined in Zotero
    3. ReadNext will propose the relevant papers to me and add them to Zotero in a dedicated collection where proposed papers are saved
  3. I will go in Zotero, start to read the proposed papers, and if any are of a particular interest I will add them to one of the “Focus” collections
  4. ReadNext will learn from your feedback to improve the quality of the proposed papers

How to Use ReadNext?

Using ReadNext is easy. Here are the main commands you’ll use:

Help

To get contextual help for any command, run:

readnext --help 
readnext personalized-papers --help

Get New Paper Proposals

The following command will propose 3 papers from the cs.AI caterory, based on the Readnext-Focus-LLMcollection in my Zotero library, save them in Zotero in the Readnext-Propositions-LLM with all related artifacts:

readnext personalized-papers cs.AI Readnext-Focus-LLM --proposals-collection=Readnext-Propositions-LLM --with-artifacts --nb-proposals=3

Full documentation of how to use the command line tool is available here.

Future Work and Contributions

Future work includes adding an abstraction layer for multiple embedding services, expanding paper sources, enhancing test coverage, providing interactive configuration, and refining the paper selection process.

Contributions to ReadNext are welcome! Follow the steps outlined in the README file of the project to contribute.