Skip to content

Commit

Permalink
deploy: 725ecc0
Browse files Browse the repository at this point in the history
  • Loading branch information
hsiangjenli committed Oct 13, 2024
1 parent 3bc83dc commit bf8a7f1
Show file tree
Hide file tree
Showing 17 changed files with 613 additions and 15 deletions.
2 changes: 1 addition & 1 deletion .buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: ce0b4f37b7613ee54bc102c2df0fb432
config: f88da4068ace9218541ef2133e54e552
tags: 645f666f9bcd5a90fca523b33c5a78b7
34 changes: 34 additions & 0 deletions _sources/dspy/0_intro.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
DSPy: Programming—not prompting—Foundation Models
=================================================

Full name of DSPy is DSPy "Declarative Self-improving Language Programs (in Python)"

Developed by Stanford NLP

- :cite:`khattab2023dspy`
- :cite:`khattab2022demonstrate`

Contributions
-------------
#. separate program from parameters
#. new optimizer, which is LM-driven and can be used to optimize prompt and weights


:cite:`medium_bc_dspy`

LLMs are sensitive to how they are prompted :cite:`towardsdatascienceIntroDSPy`

Key concept is "programming with foundation models" :cite:`towardsdatascienceIntroDSPy`


How to use DSPy :cite:`dspydocsUsingDSPy`
-----------------------------------------

#. Define your task
#. Define your pipeline
#. Explore a few examples (Simply run the code and see the results)
#. Define data
#. Define metric
#. Collect data
#. Set up the optimizer
#. Train
1 change: 1 addition & 0 deletions _sources/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Note LLM
:maxdepth: 1

llm/*
dspy/*

.. toctree::
:glob:
Expand Down
12 changes: 11 additions & 1 deletion _sources/llm/0_terminology.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,17 @@ Terminology for LLM
#. Byte Pair Encoding (BPE)
#. Zero-shot task transfer

#. reinforcement learning with human feedback (RLHF)

#. Fine-Tuning (FT)
#. Few-Shot (FS)
#. One-Shot (1S)
#. Zero-Shot (0S)
#. Zero-Shot (0S)

#. Multi-turn conversational ability 多輪對話能力
#. LLM-as-a-judge

#. Position Bias
#. Verbosity Bias
#. Self-Enhancement Bias
#. Limited Reasoning Ability
35 changes: 35 additions & 0 deletions _sources/llm/2_benchmark.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
Benchmark for evaluating the performance of the LLM model
=========================================================

Existing benchmarks can be divided into n categories :cite:`zheng2023judging` :

#. Core-knowledge benchmarks
#. Instruction-following benchmarks
#. Conversational benchmarks
#. Traditional evaluation metrics

- ROUGE :cite:`lin2004rouge` (:cite:year:`lin2004rouge`), BLEU :cite:`papineni2002bleu` (:cite:year:`papineni2002bleu`)

Open Source Benchmark
---------------------
Expand All @@ -10,8 +18,35 @@ Open Source Benchmark
- https://github.com/taide-taiwan/taide-bench-eval


#. :title-ref:`zheng2023judging`

- https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge

#. MMLU :cite:`hendrycks2020measuring`

#. HELM :cite:`liang2022holistic`

#. MT-bench :cite:`zheng2023judging`

#. Chatbot Arena :cite:`zheng2023judging`

Paper
-----

#. :cite:year:`zheng2023judging` :title-ref:`zheng2023judging`

- In this paper, the authors argue that the aligned model achieves better user preference, but the results cannot be accurately assessed by current benchmarks.

- LLM-as-a-Judge

#. Pairwise comparison

- Position Bias : LLM judges favor the first position

- Verbosity Bias : LLM favors longer, verbose responses

- Self-Enhancement Bias : LLM prefer the responses that generate by themselves

#. Single answer grading

#. Reference-guided grading
Loading

0 comments on commit bf8a7f1

Please sign in to comment.