๐—ง๐—ต๐—ฒ ๐—ฃ๐—ผ๐˜„๐—ฒ๐—ฟ ๐—ข๐—ณ ๐—ฃ๐—ฎ๐—ถ๐—ฟ๐˜„๐—ถ๐˜€๐—ฒ ๐—ฃ๐—ฟ๐—ฒ๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ You want to know how large language models are evaluated. One key method is pairwise preference. This is where you compare two options and choose the better one. You can use this method to align language models with human judgement. This helps to improve the models and make them more accurate.