Papers·1개월 전

첫 토큰 신뢰도로 환각 탐지 — phi_first, multi-sample agreement AUROC 0.820

Temple University 연구진이 단일 greedy decode의 첫 번째 정답 토큰에서 normalized entropy로 계산한 phi_first가, semantic self-consistency보다 환각 탐지 성능이 약간 높거나 같다는 결과를 내놓았습니다. 7-8B instruction-tuned 모델 3종과 두 벤치마크에서 평균 AUROC 0.820으로, semantic agreement(0.793)와 surface-form self-consistency(0.791)를 능가했습니다. 다만 closed-book short-answer factual QA에 한정된 결과이며, phi_first와 semantic agreement의 상관관계가 높아 둘을 합쳐도 큰 개선이 없었습니다.

#hallucination-detection
#self-consistency
#temple-university
#phi-first

Temple University

원문 보기 →

첫 토큰 신뢰도로 환각 탐지 — phi_first, multi-sample agreement AUROC 0.820

Comments