A safe file format that is planned (by Huggingface) to become the default for all pretrained model distribution needs. Besides being safe, it has several features that (esp.) large models can benefit from. See comparison link below.
Interesting reads:
A deep learning framework focusing on internal simplicity, with PyTorch-like API. The interesting part is -- the company behind this, the tiny corp, aims to end NVIDIA's monopoly on AI hardware, by writing better software (drivers, APIs etc.) for AMD GPUs! (No, they're not making their own chips.)
Interesting reads:
There are many high quality large language models (LLMs) available these days, with access to model weights, training code and datasets.
Here's a leaderboard of LLMs on HuggingFace hug, based on various benchmarks: Open LLM Leaderboard
However, even the ones on the smaller side (with ~13B parameters) would normally require expensive hardware to run on. Here's an example script to run a small model (FLAN-T5-base) from HuggingFace hub: code/20230616_hf.py
If we want to run a mid-sized model locally, we'll need to resort to some involved techniques such as quantization to get the model size small enough to fit into our machine. Fortunately, people have already done just that and have made some of these quantized model publicly available:
- GPT4All: A user-friendly package to locally run quantized versions of popular open LLMs, even on laptops without GPUs. They have desktop applications that we can use with minimal configuration. However I prefer to use their Python API and write my own scripts: code/20230616_g4a.py
- ggml: A C library by G. Gerganov for running models on CPU. GPT4All library was originally based on this code. Also includes scripts to 4-bit-quantize various models, given the original checkpoints.
- llama.cpp: A C++ library by G. Gerganov, originally aimed at running the LLaMa model in a Macbook (using 4-bit quantization). It now grew into a library which supports many more models and all major platforms.
When running these large models yourself, you should be aware of basic prompt engineering techniques. Read more about it here: Prompt engineering
Related reads:
Source: https://stats.stackexchange.com/a/470786/146083
When training/fitting model to some data, it is important to understand the various costs at play:
- Prediction Cost: This is the cost associated with errors that the model makes. This should be representative of how the negative impacts on the real world (e.g., monetary loss) grow with respect to prediction errors (e.g., quadratic growth).
- Estimation Loss: This controls how the model parameters are optimized. A data scientist is free to choose this function based on tractability and performance, but ideally, minimizing estimation loss should minimize prediction cost. A popular choice of estimation loss is mean squared error (MSE or Ordinary Least Squares, OLS), however note that MSE is not robust against outliers.