This is the code repository of our ICLR 2025 paper. Download a model (e.g., Llama3-8B-Instruct), which you are going to fine-tune, and set the path to model_path. Download datasets from Ultrafeedback ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results