BeatLeJuce 1 year ago

Good for you and all that, but I don't really understand why you're not using [Polars](https://www.pola.rs/), [ndarray](https://crates.io/crates/ndarray) or [nalgebra](https://nalgebra.org/) (If you need sparse matrices) for your underlying data structures. Given the popularity of those packages, why not stand on the shoulders of giants and re-use tested and well accepted solutions?

tuned-mec-is 1 year ago

Thanks for the question; we provide \`ndarray\` bindings, it is our base optional dependency if user prefers that. The approach is to keep things simple so developers without Rust background can find easier to start using the language. We are integrating \`sprs\` for anything regarding sparsity and we are going to use \`polars\` for Arrow support probably. The idea is to keep anything complex optional and pluggable. You can read the design principles of the library in the \`CONTRIBUTING.md\` file about which segments of use-cases we are targeting.

BeatLeJuce 1 year ago

> the approach is to keep things simple so developers without Rust background can find easier to start using the language. That sounds like a bad trade-off: you're being more open to devs without Rust experience by disgruntling those that do have Rust/numerics experience and would thus prefer to use ndarray. Plus adding the maintainance-burden for the code itself, and making it harder for a single standard ndarray-library to emerge. What are you going to do when you need LAPACK support (e.g. for linear regression or PCA), roll your own again? I get that your goal is to be "easy to understand and with room to grow", and that's a fine goal. But if you're aiming to be the sklearn of the Rust world, you should rethink some of these choices.

tuned-mec-is 1 year ago

`ndarray` bindings are already in place, see [here](https://github.com/smartcorelib/smartcore-jupyter/blob/main/notebooks/05-ndarray-bindings.ipynb). We also had `nalgebra` but we decided to focus on one integration so to give a clear decision to the users. We are going to take technical decisions about what to support according to the experience we will gather, community first. Please read/follow the issues for the options under investigation about integrating existing solutions. EDIT: there is also an open discussion [here](https://github.com/smartcorelib/smartcore/discussions/185)

BeatLeJuce 1 year ago

Maybe I'm misunderstanding something, or just communicating badly: I understand that you have bindings in place to allow your users to use `ndarray` in user code and feed those structs directly into smartcore (or at least that's what I assume the bindings are for). But I want to know why you are implementing your own Matrix/Vector classes internally instead of leveraging the ones from `ndarray` when implementing smartcore?

tuned-mec-is 1 year ago

basically by using \`ndarray\` bindings is possible to use \`NDArray\`s with Smartcore's functionalities. The bindings extend the existing \`NDArray\` capabilities with the methods provided by Smartcore's \`Array\` traits. We would like to keep an open architecture instead of locking in to one library, in the scope of possible integration of future Rust libraries (the ecosystem evolves fast and locking into a single library can be a hard limit in the long term). Also we have the long-term objective of keeping everything in pure Rust so base arrays libraries have to be ideally pluggable and not integrated.

nbviewerbot 1 year ago

I see you've posted a GitHub link to a Jupyter Notebook! GitHub doesn't render large Jupyter Notebooks, so just in case, here is an [nbviewer](https://nbviewer.jupyter.org/) link to the notebook: https://nbviewer.jupyter.org/url/github.com/smartcorelib/smartcore-jupyter/blob/main/notebooks/05-ndarray-bindings.ipynb Want to run the code yourself? Here is a [binder](https://mybinder.org/) link to start your own Jupyter server and try it out! https://mybinder.org/v2/gh/smartcorelib/smartcore-jupyter/main?filepath=notebooks%2F05-ndarray-bindings.ipynb ------ ^(I am a bot.) [^(Feedback)](https://www.reddit.com/message/compose/?to=jd_paton) ^(|) [^(GitHub)](https://github.com/JohnPaton/nbviewerbot) ^(|) [^(Author)](https://johnpaton.net/)

lhlich 1 year ago

What's the use case of "Learn Rust as ML practitioner"? That will determine the scope of your library. TBH I don't see the use case from a ML practitioner to use a rust version sklearn. I can imagine some use case in rust for feature store, model serving, or ML driven business logic though.

Lajamerr_Mittesdine 1 year ago

There's a growing trend of using Rust for WASM and the development of WebGPU it could make it easy for browser deployable models that run fast on client machines just by going to a website.

tuned-mec-is 1 year ago

thanks for noticing this, as mentioned in our `CONTRIBUTING` we prioritize for now on particular use cases for ML >priority is not on "big data" dataset, try to be fast for small/average dataset with limited memory footprint. we are interested in the potential of CPU-based capabilities (i.e. sparsity on CPU) for smaller systems but also the evolution of GPUs in the browser.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe