Introduction: In 2011, the St. Gallen Consensus Conference introduced the use of pathology to define the intrinsic breast cancer subtypes by application of immunohistochemical (IHC) surrogate markers ER, PR, HER2 and Ki67 with a specified Ki67 cutoff (>14%) for luminal B-like definition. Reports concerning impaired reproducibility of Ki67 estimation and threshold inconsistency led to the initiation of this quality assurance study (2013–2015). The aim of the study was to investigate inter-observer variation for Ki67 estimation in malignant breast tumors by two different quantification methods (assessment method and count method) including measure of agreement between methods. Material and methods: Fourteen experienced breast pathologists from 12 pathology departments evaluated 118 slides from a consecutive series of malignant breast tumors. The staining interpretation was performed according to both the Danish and Swedish guidelines. Reproducibility was quantified by intra-class correlation coefficient (ICC) and Lights Kappa with dichotomization of observations at the larger than (>) 20% threshold. The agreement between observations by the two quantification methods was evaluated by Bland–Altman plot. Results: For the fourteen raters the median ranged from 20% to 40% by the assessment method and from 22.5% to 36.5% by the count method. Light’s Kappa was 0.664 for observation by the assessment method and 0.649 by the count method. The ICC was 0.82 (95% CI: 0.77–0.86) by the assessment method vs. 0.84 (95% CI: 0.80–0.87) by the count method. Conclusion: Although the study in general showed a moderate to good inter-observer agreement according to both ICC and Lights Kappa, still major discrepancies were identified in especially the mid-range of observations. Consequently, for now Ki67 estimation is not implemented in the DBCG treatment algorithm.