Objective: To systematically review studies of observer agreement among doctors classifying proximal humeral fractures according to the Neer system. Study Design and Setting: A systematic review. We searched for observational studies in which doctors classified proximal humeral fractures according to the Neer system, and randomized trials of any intervention aimed at improving agreement. We analyzed potential eligible studies independently, and data were extracted using pretested forms. Authors were contacted for missing information. Summary statistics for observer agreement were noted, and the methodological quality was assessed. Results: We included 11 observational studies (88 observers and 468 cases). Mean κ-values for interobserver agreement ranged from 0.17 to 0.52. Agreement did not improve through selection of experienced observers, advanced imaging modalities, or simplification of the classification system. Intra-observer agreement was moderately higher than interobserver agreement. One randomized trial (14 observers and 42 cases) reported a clear effect of training (mean κ-value 0.62 after training compared to no training 0.34). Conclusion: We found a consistently low level of observer agreement. The widely held belief that experts disagree less than nonexperts could not be supported. One randomized trial indicated that training improves agreement among both experts and nonexperts.