—  SYMPOSIUM #13  —

Patient Safety in Anatomic Pathology
Moderator: Peter Furness

Section 2 - Amended Reports: Indices of Pathology Errors

Frederick A. Meier
Henry Ford Hospital
Department of Pathology
Detroit , MI USA


Introduction:
Surgical pathology is a diagnostic system that turns flesh into information. The need for information drives the process and the type and amount of information in pathology specimens constrains it.

Definition:
Pathology errors are variations in the diagnostic process that lose information or that add misinformation.

Detection:
Detecting such variation is usually a matter of discovering discrepancies. Currently discrepancy discovery takes one of two general forms: creating opportunities for discrepancy, as in the double reading of slides and the sending of cases for consultations, or observing discrepancies that arise within the process, such as cytological-histological correlation and the combination of pathological information with clinical and imaging findings in clinical conferences like tumor boards. In its current state, surgical pathology error detection records discrepancies across variable domains (all surgical specimens, specimens newly diagnosing malignancies, specimens of specific organs, or those producing particularly difficult-poorly reproducible-diagnoses, etc). The methods used to detect discrepancies are also inconsistent, so it is not surprising that their results are incommensurate and difficult to generalize.

Stratifying variables:
In particular, research into pathology error up to now has not been able to separate quality variables, differences in the process that facilitate or hinder obtaining accurate results, from stratifying variables, differences in practice circumstances that facilitate or hinder the detection of discrepancies. Among the stratifying variables, some influence the sort of review that turns up discrepancies; others influence the kind of reviewer who judges discrepancies to be present. Examples of review-influential stratifying variables include whether the detection effort is carried out before vs. after the completion (reporting or sign-out) of a case; whether all cases (or all cases of particular type) are reviewed or only selected cases examined; and whether the review is done among departmental colleagues or 'sent out' to an external expert. Instances of reviewer-influential characteristic are whether the reviewer has specialist expertise in the diagnostic issue at hand or is (only) generally competent in the area in question; whether a single reviewer or a consensus-building panel renders judgment, or whether the reviewers practice in one vs. several institutions.

Objective:
Against this background we decided to study amended reports as the best approach to the study of surgical pathology errors.

Definition:
Amendments, as indices of errors, are changes, not additions, to diagnostic reports after they have been released ('signed-out', 'finalled'), not before. We apply this definition strictly in a semantic field that includes a variety of other terms, which encompass, or sometimes encompass changes that we recognize as amendments: addenda, additions, corrections, revisions, and supplements. These alternative designations are sometimes used in self- serving ways.

Five characteristics caused us to consider amended reports as the best index of error. First, they call attention to themselves in the course of routine surgical pathology practice. Second, they cause immediate harm by sowing distrust between pathologists as purveyors of information, and clinicians as information consumers. Third, they point out the different steps of the diagnostic process at which the mischievous variations arise. Fourth, they sometimes - rarely in our experience - contribute to the mismanagement of patients; even more infrequently, they may lead to patient harm. Finally, they always cause re-work, decreasing the efficiency of the diagnostic process.

Previous studies of amended reports as indices of error have been incomplete (not covering the full range of diagnosis produced in practice), inconsistent (allowing for ambiguity in classifying a particular error), and institution-specific (not exportable from the site of their development). To overcome these deficiencies, we further specified our objective as developing and validating a taxonomy of the errors revealed by amended reports, that (i) covered the full range of defects which we found, (ii) used unambiguous categories, and (iii) demonstrated reproducibility. Thus, the refined objective of our study was taxonomic success: all amendments classified, each amendment classified into one-and-only-one category, and classifications carried out with measured, high, inter-observer agreement.

Methods:
To achieve this objective, we first studied the discrepancies found in a derivation set of amended reports. Gathering the variations that lost information or added misinformation into groups, we identified four categories of errors. Next, three of us (FAM, RCV, RJZ) attempted to apply the categories to a training set of a different series of amended reports. This exercise revealed vagueness, ambiguity, and confusion in some of the attributes that we had used to construct our categories, and prompted their revision. We then independently tested for consistency the revised classification in a third validation set, measuring our agreement using a kappa statistic.

Using the validated classification, we also measured two quantitative indices of quality: rates of amendment and fractions of amendments in each of the four categories. We further measured two stratifying variables that characterized amendment-causing error: who found them ('defect discoverer') and how they were found ('mechanism of discovery').

Results:
(a) Defect categories: Our study design validated four defect categories: misinterpretation, misidentification, defective specimens, and defective reports. Misinterpretations were diagnostic conclusions that added inaccurate information [false positives or 'overcalls'] or subtracted accurate information [false negatives or 'undercalls'] or confused similar diagnostic categories [misclassification]. We registered whether these defects occurred at the primary diagnostic level [positive vs. negative, malignant vs. benign] or at the secondary diagnostic level [i.e. regarding grade, stage, margin, or lymph node status]; misclassifications were categorical errors that affected neither primary nor secondary diagnostic levels.

Misidentifications could be wrong or lacking patient or specimen identification, confusion of tissues (e.g. lung vs. liver), misdesignation of laterality (e.g. right vs. left), or slip-ups in anatomic localization (e.g. calling skin of thigh skin of shoulder).

Defective specimens were those that were lost, of inadequate sample size, those inadequately or inappropriately measured or insufficiently sampled, or not used to provide necessary or appropriate ancillary studies.

Report defects included the loss or erroneous transmission of non-diagnostic information (i.e. about practitioners, procedures, dates, diagnostic codes etc.), dictation or typing errors (typographic errors in the strict sense), and failures or aberrations of electronic formats or electronic transmission of information.

Note : Although identification and report defects arise at any stage in the diagnostic process and specimen defects appear in both the pre-analytic and analytic phases, we found that identification and specimen defects arise mostly in the pre-analytic phase step of initial patient and specimen identification and the linked step of sample collection, transport, and processing. Report defects, in contrast, tended to occur mostly in the post-analytic phase of report production, transmission, and reception.

(b) Stratifying variables: We divided the defect discovers into pathologists, clinicians, other staff (e.g. transcriptionists and physician office personnel), and unknown. The mechanisms of discovery were: unprompted review, review prompted by new information coming directly to the pathologist's attention without a clinician's prompt, clinician prompted review, and review as part of the conference (tumor board) exercise.

(c) Classification development: The derivation set consisted of 141 cases that we reviewed together, the training set of 130 cases which we examined independently, and the validation set of 430 cases, also independently and prospectively reviewed. In the latter set, classifier agreement was excellent: 0.8780 kappa (range: 0.8416-0.9144).

(d) Amended report rates: During the year from which the derivation set was culled (2001) the amendment rate was 2.8 amendments/1000 surgical pathology reports. In the year of the training set of amendments (2002), the rate was 2.6/1000. Over the next two years (2003-2004), during which most of the validation set was collected, rates were slightly higher (3.4/1000 and 4.5/1000), then strikingly higher during active post-validation monitoring in 2005 (10.5/1000).

(e) Amended report error types: Over the five years of study, misinterpretations consistently caused about one in four amended reports. Misidentifications varied more from year to year – from a high of 38% during the validation set period (2003-2004) to a low of 14% during the post validation year (2005). Specimen defects contributed the smallest yearly fractions (11%-2%). Report defects varied the most from year to year, from as low as 29% (2003) to as high as 60% (2005) following a change in the report preparation procedure (see discussion below).

(f) Defect discoverer: In a sample covering three quarters of 2004 (N=142) in which 53% of amendments were due to report defects, 21% due to interpretation errors, 20% to misidentification, and 7% to specimen defects, pathologists discovered five-sixths of interpretation errors and 40% of reported defects, but clinicians discovered 72% of misidentifications.

(g) Mechanisms of discovery: In a sample covering the same time span (N=140), 59% of interpretation errors were discovered during conference review. Among report errors, about 20% were discovered by pathologists unprompted, a similar proportion after clinician prompting, and the same fraction through conference review. The majority (58%) of misidentifications were discovered following clinicians' calls.

Discussion:
Through the sequence of derivation and training sets we were able to develop a classification of amended report errors that demonstrates excellent inter-observer agreement in a prospectively classified validation set. We are now attempting to apply the classification in multiple institutions to quantify inter-institutional as well as inter-observer agreement. Diagnostic misinterpretation, the defect on which pathologists – and previous studies – have concentrated the most, consistently accounted for only a quarter of the defects that prompt amendments. Other process errors, particularly misidentification, but also report defects, contribute more defects. At Henry Ford Hospital , we are now applying root cause analysis and Toyota Production System principles of process improvement in a concerted effort to reduce to the point of elimination these systemic sources of errors.

In the post-validation year (2005), we saw the report defect rate rise sharply. This was associated with the introduction of a new anatomic pathology computer system that had the untoward effect of distributing the preparation of final reports among sixteen pathologists, where previously four transcriptionists – greatly increasing report errors, had prepared the reports.

The examination of defect discovery revealed that clinicians play an important role in rooting out patient and specimen misidentifications that interfere with, or destroy, the utility of surgical pathology testing.

In the Surgical Pathology Division at Henry Ford Hospital we have made interpretation and specimen defects the focus of a regular conference of senior staff. In this venue some themes have emerged as recurring in instances of interpretative discrepancies: non-reproducible diagnostic distinctions, distracting histological backgrounds, contrasting histological vs. cytological emphases in interpretation, presence of critical findings at central vs. peripheral, or marginal, topological foci, and, of course, the impact of differing knowledge of clinical context.

Finally, the conference (tumor board) process has repeatedly shown itself to be a variably effective but consistently valuable as a mechanism for discovering defects, particularly defects in interpretation.

Conclusion:
The validated taxonomy of amended report defects is of practical value. At Henry Ford Hospital, we now apply the taxonomy – as an "Amended Reports Dictionary" – in real time, throughout our practice, with one of us (RCV) acting as the controlling editor of the amendment procedure. This insures that defects are recorded with documentation specific enough to permit accurate classification of errors and complete enough to provide 'triggering information' for root cause analysis.

The taxonomy is also of further research value. We are currently testing its utility in other error domains in anatomic pathology – frozen- section vs. permanent section (non-) correlation and cytological-histological (non-) correlations. In the original domain of amended reports we are also linking the classification of errors to measure of their clinical influence - a harm severity scale. In this way consistent classification will serve as the foundation for reproducible measurement of the actual impact of pathology error on patient safety.

Acknowledgment:
The authors thank Daniel Schultz, MD, MPH for help with statistical analysis.