DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence? Arxiv paper The example of DetectBench Statistic Information about DetectBench Name #Sample Avg #Token Avg #Evidence Avg #Jumps train 365 177 4.27 7.10 dev 1,770 178 4.34 7.13 test-noremal 1,193 179 4.24 7.03 test-hard 300 261 7.79 13.83 test-distract 300 10,779 4.16 7.27 All 3,928 994 4.55 7.62 The detail comparsion of implicit evidence Among Other Works