Surgical pathology is a diagnostic system that turns flesh into information.
The need for information drives the process and the type and amount of information in pathology specimens
Pathology errors are variations in the diagnostic process that lose information
or that add misinformation.
Detecting such variation is usually a matter of discovering discrepancies.
Currently discrepancy discovery takes one of two general forms: creating opportunities for discrepancy,
as in the double reading of slides and the sending of cases for consultations, or observing discrepancies
that arise within the process, such as cytological-histological correlation and the combination of
pathological information with clinical and imaging findings in clinical conferences like tumor boards.
In its current state, surgical pathology error detection records discrepancies across variable domains
(all surgical specimens, specimens newly diagnosing malignancies, specimens of specific organs, or those
producing particularly difficult-poorly reproducible-diagnoses, etc). The methods used to detect
discrepancies are also inconsistent, so it is not surprising that their results are incommensurate and
difficult to generalize.
In particular, research into pathology error up to now has not been
able to separate quality variables, differences in the process that facilitate or hinder
obtaining accurate results, from stratifying variables, differences in practice
circumstances that facilitate or hinder the detection of discrepancies. Among the stratifying variables,
some influence the sort of review that turns up discrepancies; others influence the kind of reviewer who
judges discrepancies to be present. Examples of review-influential stratifying variables include whether
the detection effort is carried out before vs. after the completion (reporting or sign-out) of a case;
whether all cases (or all cases of particular type) are reviewed or only selected cases examined; and
whether the review is done among departmental colleagues or 'sent out' to an external expert. Instances
of reviewer-influential characteristic are whether the reviewer has specialist expertise in the
diagnostic issue at hand or is (only) generally competent in the area in question; whether a single
reviewer or a consensus-building panel renders judgment, or whether the reviewers practice in one vs.
Against this background we decided to study amended reports as the best approach to
the study of surgical pathology errors.
Amendments, as indices of errors, are changes, not additions, to diagnostic
reports after they have been released ('signed-out', 'finalled'), not before. We apply this definition
strictly in a semantic field that includes a variety of other terms, which encompass, or sometimes
encompass changes that we recognize as amendments: addenda, additions, corrections, revisions, and
supplements. These alternative designations are sometimes used in self- serving ways.
Five characteristics caused us to consider amended reports as the best index of error. First, they
call attention to themselves in the course of routine surgical pathology practice. Second, they cause
immediate harm by sowing distrust between pathologists as purveyors of information, and clinicians as
information consumers. Third, they point out the different steps of the diagnostic process at which the
mischievous variations arise. Fourth, they sometimes - rarely in our experience - contribute to the
mismanagement of patients; even more infrequently, they may lead to patient harm. Finally, they always
cause re-work, decreasing the efficiency of the diagnostic process.
Previous studies of amended reports as indices of error have been incomplete (not covering the full
range of diagnosis produced in practice), inconsistent (allowing for ambiguity in classifying a
particular error), and institution-specific (not exportable from the site of their development). To
overcome these deficiencies, we further specified our objective as developing and validating a taxonomy
of the errors revealed by amended reports, that (i) covered the full range of defects which we found,
(ii) used unambiguous categories, and (iii) demonstrated reproducibility. Thus, the refined objective of
our study was taxonomic success: all amendments classified, each amendment classified into
one-and-only-one category, and classifications carried out with measured, high, inter-observer agreement.
To achieve this objective, we first studied the discrepancies found in a derivation
set of amended reports. Gathering the variations that lost information or added misinformation into
groups, we identified four categories of errors. Next, three of us (FAM, RCV, RJZ) attempted to apply
the categories to a training set of a different series of amended reports. This exercise revealed
vagueness, ambiguity, and confusion in some of the attributes that we had used to construct our
categories, and prompted their revision. We then independently tested for consistency the revised
classification in a third validation set, measuring our agreement using a kappa statistic.
Using the validated classification, we also measured two quantitative indices of quality: rates of
amendment and fractions of amendments in each of the four categories. We further measured two
stratifying variables that characterized amendment-causing error: who found them ('defect discoverer')
and how they were found ('mechanism of discovery').
(a) Defect categories: Our study design validated four defect
categories: misinterpretation, misidentification, defective specimens, and defective reports.
Misinterpretations were diagnostic conclusions that added inaccurate information [false positives or
'overcalls'] or subtracted accurate information [false negatives or 'undercalls'] or confused similar
diagnostic categories [misclassification]. We registered whether these defects occurred at the primary
diagnostic level [positive vs. negative, malignant vs. benign] or at the secondary diagnostic level [i.e.
regarding grade, stage, margin, or lymph node status]; misclassifications were categorical errors that
affected neither primary nor secondary diagnostic levels.
Misidentifications could be wrong or lacking patient or specimen identification, confusion of tissues
(e.g. lung vs. liver), misdesignation of laterality (e.g. right vs. left), or slip-ups in anatomic
localization (e.g. calling skin of thigh skin of shoulder).
Defective specimens were those that were lost, of inadequate sample size, those inadequately or
inappropriately measured or insufficiently sampled, or not used to provide necessary or appropriate
Report defects included the loss or erroneous transmission of non-diagnostic information (i.e. about
practitioners, procedures, dates, diagnostic codes etc.), dictation or typing errors (typographic errors
in the strict sense), and failures or aberrations of electronic formats or electronic transmission of
Note : Although identification and report defects arise at any stage in the diagnostic process
and specimen defects appear in both the pre-analytic and analytic phases, we found that identification
and specimen defects arise mostly in the pre-analytic phase step of initial patient and specimen
identification and the linked step of sample collection, transport, and processing. Report defects, in
contrast, tended to occur mostly in the post-analytic phase of report production, transmission, and
(b) Stratifying variables: We divided the defect discovers into pathologists,
clinicians, other staff (e.g. transcriptionists and physician office personnel), and unknown. The
mechanisms of discovery were: unprompted review, review prompted by new information coming directly to
the pathologist's attention without a clinician's prompt, clinician prompted review, and review as part
of the conference (tumor board) exercise.
(c) Classification development: The derivation set consisted of 141 cases that we
reviewed together, the training set of 130 cases which we examined independently, and the validation set
of 430 cases, also independently and prospectively reviewed. In the latter set, classifier agreement was
excellent: 0.8780 kappa (range: 0.8416-0.9144).
(d) Amended report rates: During the year from which the derivation set was
culled (2001) the amendment rate was 2.8 amendments/1000 surgical pathology reports. In the year of the
training set of amendments (2002), the rate was 2.6/1000. Over the next two years (2003-2004), during
which most of the validation set was collected, rates were slightly higher (3.4/1000 and 4.5/1000), then
strikingly higher during active post-validation monitoring in 2005 (10.5/1000).
(e) Amended report error types: Over the five years of study,
misinterpretations consistently caused about one in four amended reports. Misidentifications varied more
from year to year – from a high of 38% during the validation set period (2003-2004) to a low of 14%
during the post validation year (2005). Specimen defects contributed the smallest yearly fractions
(11%-2%). Report defects varied the most from year to year, from as low as 29% (2003) to as high as 60%
(2005) following a change in the report preparation procedure (see discussion below).
(f) Defect discoverer: In a sample covering three quarters of 2004 (N=142) in which 53%
of amendments were due to report defects, 21% due to interpretation errors, 20% to misidentification, and
7% to specimen defects, pathologists discovered five-sixths of interpretation errors and 40% of reported
defects, but clinicians discovered 72% of misidentifications.
(g) Mechanisms of discovery: In a sample covering the same time span (N=140),
59% of interpretation errors were discovered during conference review. Among report errors, about 20%
were discovered by pathologists unprompted, a similar proportion after clinician prompting, and the same
fraction through conference review. The majority (58%) of misidentifications were discovered following
Through the sequence of derivation and training sets we were able to develop a
classification of amended report errors that demonstrates excellent inter-observer agreement in a
prospectively classified validation set. We are now attempting to apply the classification in multiple
institutions to quantify inter-institutional as well as inter-observer agreement. Diagnostic
misinterpretation, the defect on which pathologists – and previous studies – have concentrated the most,
consistently accounted for only a quarter of the defects that prompt amendments. Other process errors,
particularly misidentification, but also report defects, contribute more defects. At Henry Ford Hospital
, we are now applying root cause analysis and Toyota Production System principles of process improvement
in a concerted effort to reduce to the point of elimination these systemic sources of errors.
In the post-validation year (2005), we saw the report defect rate rise sharply. This was associated
with the introduction of a new anatomic pathology computer system that had the untoward effect of
distributing the preparation of final reports among sixteen pathologists, where previously four
transcriptionists – greatly increasing report errors, had prepared the reports.
The examination of defect discovery revealed that clinicians play an important role in rooting out
patient and specimen misidentifications that interfere with, or destroy, the utility of surgical
In the Surgical Pathology Division at Henry Ford Hospital we have made interpretation and specimen
defects the focus of a regular conference of senior staff. In this venue some themes have emerged as
recurring in instances of interpretative discrepancies: non-reproducible diagnostic distinctions,
distracting histological backgrounds, contrasting histological vs. cytological emphases in
interpretation, presence of critical findings at central vs. peripheral, or marginal, topological foci,
and, of course, the impact of differing knowledge of clinical context.
Finally, the conference (tumor board) process has repeatedly shown itself to be a variably effective
but consistently valuable as a mechanism for discovering defects, particularly defects in interpretation.
The validated taxonomy of amended report defects is of practical value. At Henry
Ford Hospital, we now apply the taxonomy – as an "Amended Reports Dictionary" – in real time, throughout
our practice, with one of us (RCV) acting as the controlling editor of the amendment procedure. This
insures that defects are recorded with documentation specific enough to permit accurate classification of
errors and complete enough to provide 'triggering information' for root cause analysis.
The taxonomy is also of further research value. We are currently testing its utility in other error
domains in anatomic pathology – frozen- section vs. permanent section (non-) correlation and
cytological-histological (non-) correlations. In the original domain of amended reports we are also
linking the classification of errors to measure of their clinical influence - a harm severity scale. In
this way consistent classification will serve as the foundation for reproducible measurement of the
actual impact of pathology error on patient safety.
The authors thank Daniel Schultz, MD, MPH for help with statistical analysis.