Background: Orthopaedic surgeons disagree considerably when classifying fractures of the proximal humerus. However, the clinical implications of low observer agreement remain unclear. The purpose of the study was to compare the agreement on Neer classification with the agreement on treatment recommendations. Methods. We conducted a multi-centre observer-study. Five experienced shoulder surgeons independently assessed a consecutive series of 193 radiographs at two occasions three months apart. All pairs of radiographs were classified according to Neer. Subsequently, the observers were asked to recommend one of three treatment modalities for each case: non-operative treatment, locking plate osteosynthesis, or hemiarthroplasty. Results: At both classification rounds mean kappa-values for inter-observer agreement on treatment recommendations (0.48 and 0.52) were significantly higher than the agreement on Neer classification (0.33 and 0.36) (p<0.001 at both rounds). The highest mean kappa-values were found for inter-observer agreement on non-surgical treatment (0.59 and 0.55). In 36% (345 out of 965) of observations an observer changed Neer category between first and second classification round. However, in only 34% of these cases (116 out of 345) the observers changed their treatment recommendations. Conclusions: We found a significantly higher agreement on treatment recommendations compared to agreement on fracture classification. The low observer agreement on the Neer classification reported in several observer studies may have less clinical importance than previously assumed. However, inter-observer agreement did not exceed moderate levels.