CERD

This is the official repository for the paper CERD: A Comprehensive Chinese Rhetoric Dataset for Rhetorical Understanding and Generation in Essays. (Accepted by EMNLP 2024 Findings)

[Dataset] | [Paper]

Code and Dataset

Fine-tuning

To fine-tune RoBERTa, simply run the scripts/train_roberta.sh script. Note that the task type can be selected from the following options: Rhetoric Classification, Form Classification, Content Classification, and Component Extraction.

To fine-tune Qwen1.5, first use the helper functions in src/preprocess/chat.py to convert the dataset into an instruction-following format using our pre-defined chat templates in templates/. Then, simply run the scripts/train_qwen.sh script.

Evaluation

For inference, call the method evaluate() of GPTEvaluator, QwenEvaluator, and RoBERTaEvaluator in src/evaluator/. For example, to evaluate RoBERTa on the rhetoric classification task, run the following code:

from src.typing import TaskType

evaluator = RoBERTaEvaluator(
    model_name_or_path='path/to/roberta',
    task_type=TaskType.RC,
    test_set_path='path/to/test_set',
    batch_size=4,
    save_results=True,
    save_path='path/to/save_results'
)
evaluator.evaluate()

Apart from the evaluators above, you can implement your own evaluator by inheriting the BaseEvaluator class in src/evaluator/base.py. Specifically, you need to implement three abstract methods: evaluate_classification_task(), evaluate_extraction_task(), and evaluate_generation_task(). Afterward, simply call the evaluate() method to evaluate your models. Here is a simple snippet for illustration.

# 1. Implement your own evaluator
class YourCustomEvaluator(BaseEvaluator):
    def __init__(
        self,
        model_name_or_path: str,
        task_type: TaskType,
        test_set_path: str,
        batch_size: int = 1,
        save_results: bool = False,
        save_path: str = None,
        **kwargs
    ):

    def evaluate_classification_task(self, sentences: List[str]) -> List[List[int]]:
        # TODO implement the abstract method
        pass

    def evaluate_extraction_task(self, sentences: List[str]) -> List[List[str]]:
        # TODO implement the abstract method
        pass

    def evaluate_generation_task(self, rhetoric_list: List[str], object_list: List[str], previous_sentences_list: List[List[str]]) -> List[str]:
        # TODO implement the abstract method
        pass

# 2. Run the evaluate() method
evaluator = YourCustomEvaluator(...)
evaluator.evaluate()

Citation

@misc{liu2024cerd,
    title={CERD: A Comprehensive Chinese Rhetoric Dataset for Rhetorical Understanding and Generation in Essays}, 
    author={Nuowei Liu and Xinhao Chen and Hongyi Wu and Changzhi Sun and Man Lan and Yuanbin Wu and Xiaopeng Bai and Shaoguang Mao and Yan Xia},
    year={2024},
    eprint={2409.19691},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2409.19691}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
scripts		scripts
src		src
templates		templates
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CERD

Code and Dataset

Fine-tuning

Evaluation

Citation

About

Releases

Packages

Languages

cubenlp/cerd

Folders and files

Latest commit

History

Repository files navigation

CERD

Code and Dataset

Fine-tuning

Evaluation

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages