YuanBoXie

Follow

skysys YuanBoXie

Follow

MSc@UCAS | Interest: AGI & Web3

59 followers · 202 following

Beijing
https://www.ucas.ac.cn/

Achievements

Achievements

Starred repositories

jiaxiaojunQAQ / I-GCG

Improved techniques for optimization-based jailbreaking on large language models (ICLR2025)

Python 84 5 Updated Jan 31, 2025

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 8,133 2,172 Updated Mar 6, 2025

zjunlp / EasyEdit

[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.

Jupyter Notebook 2,108 259 Updated Mar 5, 2025

NY1024 / Foundation-Model-Paper-Notes

42 2 Updated Dec 27, 2024

huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences

Python 5,034 432 Updated Nov 21, 2024

centerforaisafety / HarmBench

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Jupyter Notebook 563 79 Updated Aug 16, 2024

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 43,109 5,278 Updated Mar 5, 2025

tatsu-lab / alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

Jupyter Notebook 1,673 259 Updated Dec 27, 2024

WintermuteResearch / Alpha-Challenge

A set of case studies for the Wintermute Alpha Challenge

136 70 Updated Aug 23, 2024

inspire-group / RobustRAG

Python 16 2 Updated Sep 15, 2024

sleeepeer / PoisonedRAG

[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models

Python 124 19 Updated Feb 23, 2025

Unispac / shallow-vs-deep-alignment

Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep

Python 76 5 Updated Jul 5, 2024

bigscience-workshop / promptsource

Toolkit for creating, sharing and using natural language prompts.

Python 2,789 364 Updated Oct 23, 2023

706creators / Fuel-co-learn

Fuel Network 共学教程仓库

5 1 Updated Nov 8, 2024

RobustNLP / DeRTa

A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.

Python 58 Updated Jan 25, 2025

PKU-Alignment / safe-rlhf

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Python 1,424 118 Updated Jun 13, 2024

karpathy / minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Python 21,512 2,790 Updated Aug 15, 2024

AISG-Technology-Team / GCSS-Track-2A-Submission-Guide

Submission Guide + Discussion Board for AI Singapore Global Challenge for Safe and Secure LLMs (Track 2A).

Python 11 1 Updated Jan 8, 2025

tml-epfl / llm-adaptive-attacks

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]

Shell 268 26 Updated Jan 23, 2025

Confirm-Solutions / flrt

Fluent student-teacher redteaming

Jupyter Notebook 19 3 Updated Jul 25, 2024

coset-io / zkp-academy

A month-long zkp study group, one topic at a time.

Python 126 33 Updated Mar 5, 2025

hzysvilla / Academic_LLM_Sec_Papers

Academic Papers about LLM Application on Security

128 8 Updated Feb 7, 2025

datawhalechina / learn-nlp-with-transformers

we want to create a repo to illustrate usage of transformers in chinese

Shell 2,664 451 Updated Aug 18, 2024

rishub-tamirisa / tamper-resistance

[ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"

Python 47 6 Updated Feb 28, 2025

DLR-SC / style-vectors-for-steering-llms

Code release for the paper "Style Vectors for Steering Generative Large Language Models", accepted to the Findings of the EACL 2024.

OpenEdge ABL 26 1 Updated Sep 26, 2024

andyzoujm / representation-engineering

Representation Engineering: A Top-Down Approach to AI Transparency

Jupyter Notebook 797 92 Updated Aug 14, 2024

andyrdt / refusal_direction

Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".

Python 188 43 Updated Oct 1, 2024

GraySwanAI / circuit-breakers

Improving Alignment and Robustness with Circuit Breakers

Jupyter Notebook 188 26 Updated Sep 24, 2024

AkariAsai / self-rag

This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.

Python 1,990 179 Updated May 25, 2024

Sakura-gh / ML-notes

notes about machine learning

HTML 3,318 916 Updated Nov 22, 2021

Starred topics

Bitcoin

Hacktoberfest