I just release ReadNext version 0.0.4. The primary goal of this new version is to remove the dependency on the Cohere Embedding web service endpoint by using a local embedding model by default. To enable that, ReadNext got integrated with Hugging Face and is currently uses the BAAI/bge-base-en model.
Local vs. Remote
This new change remove dependency on one external service which makes it more stable. The processing time is a little bit longer with the local model, but it also depends on the capabilities of your local computer.
In terms of performance, the two systems are comparable. In my experience, about 80% of the propositions are the same, and the remaining 20% that are different yeld no major difference in accuracy. However, I do prefer the BAAI/bge-base-en
propositions a little better for what I experienced so far.
You may want to experiment with both to see what works best for you. The only thing you have to do is to change the EMBEDDING_SYSTEM
environment variable and to reload your terminal instance.
New Configurations
Two new configuration options have been added to this version:
EMBEDDING_SYSTEM
: This is the embedding system you want to use. One of:BAAI/bge-base-en
(local) orcohere
.MODELS_PATH
: This is the local path where you want the models files to be saved on your local file system (ex:/Users/me/.readnext/models/
)
If you already have ReadNext installed on your computer, please make sure to add those two new environment variables to you environment.
New Commands
Two new commands have been added as well. They have been added to help understanding the current status of the ReadNext tool. Those two commands are:
readnext version
: this gives the version of ReadNext that you are currently usingreadnext config
: this gives the configuration parameters, and their values, currently used to run that instance of ReadNext
Literate Programming
While at it, I decided to migrate ReadNext’s Python codebase to use nbdev to continue its development using literate programming.
All the literate files (notebooks in this case) where the code is tangled and the documentation weaved from are accessible in the nbs folder. The tangled codebase is available in the readnext folder. Finally, the weaved documentation is available as GitHub pages here.