deploy: 725ecc0

hsiangjenli · Oct 13, 2024 · bf8a7f1 · bf8a7f1
1 parent 3bc83dc
commit bf8a7f1
Show file tree

Hide file tree

Showing 17 changed files with 613 additions and 15 deletions.
diff --git a/.buildinfo b/.buildinfo
@@ -1,4 +1,4 @@
 # Sphinx build info version 1
 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
-config: ce0b4f37b7613ee54bc102c2df0fb432
+config: f88da4068ace9218541ef2133e54e552
 tags: 645f666f9bcd5a90fca523b33c5a78b7
diff --git a/_sources/dspy/0_intro.rst b/_sources/dspy/0_intro.rst
@@ -0,0 +1,34 @@
+DSPy: Programming—not prompting—Foundation Models
+=================================================
+
+Full name of DSPy is DSPy "Declarative Self-improving Language Programs (in Python)"
+
+Developed by Stanford NLP
+
+- :cite:`khattab2023dspy`
+- :cite:`khattab2022demonstrate`
+
+Contributions
+-------------
+#. separate program from parameters
+#. new optimizer, which is LM-driven and can be used to optimize prompt and weights
+
+
+:cite:`medium_bc_dspy`
+
+LLMs are sensitive to how they are prompted :cite:`towardsdatascienceIntroDSPy`
+
+Key concept is "programming with foundation models" :cite:`towardsdatascienceIntroDSPy`
+
+
+How to use DSPy :cite:`dspydocsUsingDSPy`
+-----------------------------------------
+
+#. Define your task
+#. Define your pipeline
+#. Explore a few examples (Simply run the code and see the results)
+#. Define data
+#. Define metric
+#. Collect data
+#. Set up the optimizer
+#. Train
diff --git a/_sources/index.rst b/_sources/index.rst
@@ -13,6 +13,7 @@ Note LLM
    :maxdepth: 1
 
    llm/*
+   dspy/*
 
 .. toctree::
    :glob:

diff --git a/_sources/llm/0_terminology.rst b/_sources/llm/0_terminology.rst
@@ -4,7 +4,17 @@ Terminology for LLM
 #. Byte Pair Encoding (BPE)
 #. Zero-shot task transfer
 
+#. reinforcement learning with human feedback (RLHF)
+
 #. Fine-Tuning (FT)
 #. Few-Shot (FS)
 #. One-Shot (1S)
-#. Zero-Shot (0S)
+#. Zero-Shot (0S)
+
+#. Multi-turn conversational ability 多輪對話能力
+#. LLM-as-a-judge
+
+#. Position Bias 
+#. Verbosity Bias 
+#. Self-Enhancement Bias 
+#. Limited Reasoning Ability 
diff --git a/_sources/llm/2_benchmark.rst b/_sources/llm/2_benchmark.rst
@@ -1,6 +1,14 @@
 Benchmark for evaluating the performance of the LLM model
 =========================================================
 
+Existing benchmarks can be divided into n categories :cite:`zheng2023judging` :
+
+#. Core-knowledge benchmarks
+#. Instruction-following benchmarks
+#. Conversational benchmarks
+#. Traditional evaluation metrics 
+
+   - ROUGE :cite:`lin2004rouge` (:cite:year:`lin2004rouge`), BLEU :cite:`papineni2002bleu` (:cite:year:`papineni2002bleu`)
 
 Open Source Benchmark
 ---------------------
@@ -10,8 +18,35 @@ Open Source Benchmark
    - https://github.com/taide-taiwan/taide-bench-eval
 
 
+#. :title-ref:`zheng2023judging` 
+
+   - https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge
+
+#. MMLU :cite:`hendrycks2020measuring`
+
+#. HELM :cite:`liang2022holistic`
+
+#. MT-bench :cite:`zheng2023judging`
+
+#. Chatbot Arena :cite:`zheng2023judging`
+
 Paper 
 -----
 
 #. :cite:year:`zheng2023judging` :title-ref:`zheng2023judging`
+
+   - In this paper, the authors argue that the aligned model achieves better user preference, but the results cannot be accurately assessed by current benchmarks.
+
+   - LLM-as-a-Judge 
+
+      #. Pairwise comparison
+
+         - Position Bias : LLM judges favor the first position
+
+         - Verbosity Bias : LLM favors longer, verbose responses
+
+         - Self-Enhancement Bias : LLM prefer the responses that generate by themselves
+
+      #. Single answer grading
 
+      #. Reference-guided grading
-Original file line number
+Diff line change
@@ Expand Up / @@ -13,6 +13,7 @@ Note LLM @@
        :maxdepth: 1
        llm/*
+       dspy/*
     .. toctree::
        :glob:
@@ Expand Down @@