Purpose:
This study (1) examines a variety of real-world cases where systematic errors were not detected by widely accepted methods for IMRT/VMAT dosimetric accuracy evaluation, and (2) drills-down ...to identify failure modes and their corresponding means for detection, diagnosis, and mitigation. The primary goal of detailing these case studies is to explore different, more sensitive methods and metrics that could be used more effectively for evaluating accuracy of dose algorithms, delivery systems, and QA devices.
Methods:
The authors present seven real-world case studies representing a variety of combinations of the treatment planning system (TPS), linac, delivery modality, and systematic error type. These case studies are typical to what might be used as part of an IMRT or VMAT commissioning test suite, varying in complexity. Each case study is analyzed according to TG-119 instructions for gamma passing rates and action levels for per-beam and/or composite plan dosimetric QA. Then, each case study is analyzed in-depth with advanced diagnostic methods (dose profile examination, EPID-based measurements, dose difference pattern analysis, 3D measurement-guided dose reconstruction, and dose grid inspection) and more sensitive metrics (2% local normalization/2 mm DTA and estimated DVH comparisons).
Results:
For these case studies, the conventional 3%/3 mm gamma passing rates exceeded 99% for IMRT per-beam analyses and ranged from 93.9% to 100% for composite plan dose analysis, well above the TG-119 action levels of 90% and 88%, respectively. However, all cases had systematic errors that were detected only by using advanced diagnostic techniques and more sensitive metrics. The systematic errors caused variable but noteworthy impact, including estimated target dose coverage loss of up to 5.5% and local dose deviations up to 31.5%. Types of errors included TPS model settings, algorithm limitations, and modeling and alignment of QA phantoms in the TPS. Most of the errors were correctable after detection and diagnosis, and the uncorrectable errors provided useful information about system limitations, which is another key element of system commissioning.
Conclusions
: Many forms of relevant systematic errors can go undetected when the currently prevalent metrics for IMRT/VMAT commissioning are used. If alternative methods and metrics are used instead of (or in addition to) the conventional metrics, these errors are more likely to be detected, and only once they are detected can they be properly diagnosed and rooted out of the system. Removing systematic errors should be a goal not only of commissioning by the end users but also product validation by the manufacturers. For any systematic errors that cannot be removed, detecting and quantifying them is important as it will help the physicist understand the limits of the system and work with the manufacturer on improvements. In summary, IMRT and VMAT commissioning, along with product validation, would benefit from the retirement of the 3%/3 mm passing rates as a primary metric of performance, and the adoption instead of tighter tolerances, more diligent diagnostics, and more thorough analysis.
Purpose
Despite improvements in optimization and automation algorithms, the quality of radiation treatment plans still varies dramatically. A tool that allows a priori estimation of the best possible ...sparing (Feasibility DVH, or FDVH) of an organ at risk (OAR) in high‐energy photon planning may help reduce plan quality variability by deriving patient‐specific OAR goals prior to optimization. Such a tool may be useful for (a) meaningfully evaluating patient‐specific plan quality and (b) supplying best theoretically achievable DVH goals, thus pushing the solution toward automatic Pareto optimality. This work introduces such a tool and validates it for clinical Head and Neck (HN) datasets.
Methods
To compute FDVH, first the targets are assigned uniform prescription doses, with no reference to any particular beam arrangement. A benchmark 3D dose built outside the targets is estimated using a series of energy‐specific dose spread calculations reflecting observed properties of radiation distribution in media. For the patient, the calculation is performed on the heterogeneous dataset, taking into account the high‐ (penumbra driven) and low‐ (PDD and scatter‐driven) gradient dose spreading. The former is driven mostly by target dose and surface shape, while the latter adds the dependence on target volume. This benchmark dose is used to produce the “best possible sparing” FDVH for an OAR, and based on it, progressively more easily achievable FDVH curves can be estimated. Validation was performed using test cylindrical geometries as well as 10 clinical HN datasets. For HN, VMAT plans were prepared with objectives of covering the primary and the secondary (bilateral elective neck) PTVs while addressing only one OAR at a time, with the goal of maximum sparing. The OARs were each parotid, the larynx, and the inferior pharyngeal constrictor. The difference in mean OAR doses was computed for the achieved vs. FDVHs, and the shapes of those DVHs were compared by means of the Dice similarity coefficient (DSC).
Results
For all individually optimized HN OARs (N = 38), the average DSC between the planned DVHs and the FDVHs was 0.961 ± 0.018 (95% CI 0.955–0.967), with the corresponding average of mean OAR dose differences of 1.8 ± 5.8% (CI −0.1–3.6%). For realistic plans the achieved DVHs run no lower than the FDVHs, except when target coverage is compromised at the target/OAR interface.
Conclusions
For the validation of VMAT plans, the OAR DVHs optimized one‐at‐a‐time were similar in shape to and bound on the low side by the FDVHs, within the confines of planner's ability to precisely cover the target(s) with the prescription dose(s). The method is best suited for the OARs close to the target. This approach is fundamentally different from “knowledge‐based planning” because it is (a) independent of the treatment plan and prior experience, and (b) it approximates, from nearly first principles, the lowest possible boundary of the OAR DVH, but not necessarily its actual shape in the presence of competing OAR sparing and target dose homogeneity objectives.
A high‐resolution diode array has been comprehensively evaluated. It consists of 1013 point diode detectors arranged on the two 7.7 × 7.7 cm2 printed circuit boards (PCBs). The PCBs are aligned face ...to face in such a way that the active volumes of all diodes are in the same plane. All individual correction factors required for accurate dosimetry have been validated for conventional and flattening filter free (FFF) 6MV beams. That included diode response equalization, linearity, repetition rate dependence, field size dependence, angular dependence at the central axis and off‐axis in the transverse, sagittal, and multiple arbitrary planes. In the end‐to‐end tests the array and radiochromic film dose distributions for SRS‐type multiple‐target plans were compared. In the equalization test (180° rotation), the average percent dose error between the normal and rotated positions for all diodes was 0.01% ± 0.1% (range −0.3 to 0.4%) and −0.01% ± 0.2% (range −0.9 to 0.9%) for 6 MV and 6MV FFF beams, respectively. For the axial angular response, corrected dose stayed within 2% from the ion chamber for all gantry angles, until the beam direction approached the detector plane. In azimuthal direction, the device agreed with the scintillator within 1% for both energies. For multiple combinations of couch and gantry angles, the average percent errors were −0.00% ± 0.6% (range: −2.1% to 1.6%) and −0.1% ± 0.5% (range −1.6% to 2.1%) for the 6MV and 6MV FFF beams, respectively. The measured output factors were largely within 2% of the scintillator, except for the 5 mm 6MV beam showing a 3.2% deviation. The 2%/1 mm gamma analysis of composite SRS measurements produced the 97.2 ± 1.3% (range 95.8‐98.5%) average passing rate against film. Submillimeter (≤0.5 mm) dose profile alignment with film was demonstrated in all cases.
Purpose
To investigate (i) the dosimetric leaf gap (DLG) and the effect of the “trailing distance” between leaves from different multileaf collimator (MLC) layers in Halcyon systems and (ii) the ...ability of the currently available treatment planning systems (TPSs) to approximate this effect.
Methods
DICOM plans with transmission beams and sweeping gap tests were created in Python for measuring the DLG for each MLC layer independently and for both layers combined. In clinical Halcyon plans both MLC layers are interchangeably used and leaves from different layers are offset, thus forming a trailing pattern. To characterize the impact of such configuration, new tests called “trailing sweeping gaps” were designed and created where the leaves from one layer follow the leaves from the other layer at a fixed “trailing distance” t between the tips. Measurements were carried out on five Halcyons SX2 from different institutions and calculations from both the Eclipse and RayStation TPSs were compared with measurements.
Results
The dose accumulated during a sweeping gap delivery progressively increased with the trailing distance t. We call this “the trailing effect.” It is most pronounced for t between 0 and 5 mm, although some changes were obtained up to 20 mm. The dose variation was independent of the gap size. The measured DLG values also increased with t up to 20 mm, again with the steepest variation between 0 and 5 mm. Measured DLG values were negative at t = 0 (the leaves from both layers at the same position) but changed sign for t ≥ 1 mm, in line with the positive DLG sign usually observed with single‐layer rounded‐end MLCs. The Eclipse TPS does not explicitly model the leaf tip and, as a consequence, could not predict the dose reduction due to the trailing effect. This resulted in dose discrepancies up to +10% and −8% for the 5 mm sweeping gap and up to ±5% for the 10 mm one depending on the distance t. RayStation implements a simple model of the leaf tip that was able to approximate the trailing effect and improved the agreement with measured doses. In particular, with a prototype version of RayStation that assigned a higher transmission at the leaf tip the agreement with measured doses was within ±3% even for the 5 mm gap. The five Halcyon systems behaved very similarly but differences in the DLG around 0.2 mm were found across different treatment units and between MLC layers from the same system. The DLG for the proximal layer was consistently higher than for the distal layer, with differences ranging between 0.10 mm and 0.24 mm.
Conclusions
The trailing distance between the leaves from different layers substantially affected the doses delivered by sweeping gaps and the measured DLG values. Stacked MLCs introduce a new level of complexity in TPSs, which ideally need to implement an explicit model of the leaf tip in order to reproduce the trailing effect. Dynamic tests called “trailing sweeping gaps” were designed that are useful for characterizing and commissioning dual‐layer MLC systems.
Even with advanced inverse‐planning techniques, radiation treatment plan optimization remains a very time‐consuming task with great output variability, which prompted the development of more ...automated approaches. One commercially available technique mimics the actions of experienced human operators to progressively guide the traditional optimization process with automatically created regions of interest and associated dose‐volume objectives. We report on the initial evaluation of this algorithm on 10 challenging cases of locoreginally advanced head and neck cancer. All patients were treated with VMAT to 70 Gy to the gross disease and 56 Gy to the elective bilateral nodes. The results of post‐treatment autoplanning (AP) were compared to the original human‐driven plans (HDP). We used an objective scoring system based on defining a collection of specific dosimetric metrics and corresponding numeric score functions for each. Five AP techniques with different input dose goals were applied to all patients. The best of them averaged the composite score 8% lower than the HDP, across the patient population. The difference in median values was statistically significant at the 95% confidence level (Wilcoxon paired signed‐rank test p=0.027). This result reflects the premium the institution places on dose homogeneity, which was consistently higher with the HDPs. The OAR sparing was consistently better with the APs, the differences reaching statistical significance for the mean doses to the parotid glands (p<0.001) and the inferior pharyngeal constrictor (p=0.016), as well as for the maximum doses to the spinal cord (p=0.018) and brainstem (p=0.040). If one is prepared to accept less stringent dose homogeneity criteria from the RTOG 1016 protocol, nine APs would comply with the protocol, while providing lower OAR doses than the HDPs. Overall, AP is a promising clinical tool, but it could benefit from a better process for shifting the balance between the target dose coverage/homogeneity and OAR sparing.
PACS number(s): 87.55.D
The American Association of Physicists in Medicine (AAPM) is a nonprofit professional society whose primary purposes are to advance the science, education and professional practice of medical ...physics. The AAPM has more than 8,000 members and is the principal organization of medical physicists in the United States.
The AAPM will periodically define new practice guidelines for medical physics practice to help advance the science of medical physics and to improve the quality of service to patients throughout the United States. Existing medical physics practice guidelines will be reviewed for the purpose of revision or renewal, as appropriate, on their fifth anniversary or sooner.
Each medical physics practice guideline represents a policy statement by the AAPM, has undergone a thorough consensus process in which it has been subjected to extensive review, and requires the approval of the Professional Council. The medical physics practice guidelines recognize that the safe and effective use of diagnostic and therapeutic radiology requires specific training, skills, and techniques, as described in each document. Reproduction or modification of the published practice guidelines and technical standards by those entities not providing these services is not authorized.
The following terms are used in the AAPM practice guidelines:
Must and Must Not: Used to indicate that adherence to the recommendation is considered necessary to conform to this practice guideline.
Should and Should Not: Used to indicate a prudent practice to which exceptions may occasionally be made in appropriate circumstances.
Pencil beam (PB) and collapsed cone convolution (CCC) dose calculation algorithms differ significantly when used in the thorax. However, such differences have seldom been previously directly ...correlated with outcomes of lung stereotactic ablative body radiation (SABR).
Data for 201 non-small cell lung cancer patients treated with SABR were analyzed retrospectively. All patients were treated with 50 Gy in 5 fractions of 10 Gy each. The radiation prescription mandated that 95% of the planning target volume (PTV) receive the prescribed dose. One hundred sixteen patients were planned with BrainLab treatment planning software (TPS) with the PB algorithm and treated on a Novalis unit. The other 85 were planned on the Pinnacle TPS with the CCC algorithm and treated on a Varian linac. Treatment planning objectives were numerically identical for both groups. The median follow-up times were 24 and 17 months for the PB and CCC groups, respectively. The primary endpoint was local/marginal control of the irradiated lesion. Gray's competing risk method was used to determine the statistical differences in local/marginal control rates between the PB and CCC groups.
Twenty-five patients planned with PB and 4 patients planned with the CCC algorithms to the same nominal doses experienced local recurrence. There was a statistically significant difference in recurrence rates between the PB and CCC groups (hazard ratio 3.4 95% confidence interval: 1.18-9.83, Gray's test P=.019). The differences (Δ) between the 2 algorithms for target coverage were as follows: ΔD99GITV = 7.4 Gy, ΔD99PTV = 10.4 Gy, ΔV90GITV = 13.7%, ΔV90PTV = 37.6%, ΔD95PTV = 9.8 Gy, and ΔDISO = 3.4 Gy. GITV = gross internal tumor volume.
Local control in patients receiving who were planned to the same nominal dose with PB and CCC algorithms were statistically significantly different. Possible alternative explanations are described in the report, although they are not thought likely to explain the difference. We conclude that the difference is due to relative dosimetric underdosing of tumors with the PB algorithm.
To assure accurate treatment delivery on any image-guided radiotherapy system, the relative positions and walkout of the imaging and radiation isocenters must be periodically verified and kept within ...specified tolerances. In this work, we first validated the multiaxis ion chamber array as a tool for finding the radiation isocenter position of a magnetic resonance–guided linear accelerator. The treatment couch with the array on it was shifted in 0.2-mm increments and the reported beam center position was plotted against that shift and fitted to a straight line, in both X and Y directions. From the goodness-of-fit and intercepts of the regression lines, the accuracy and precision were conservatively estimated at 0.2 and 0.1 mm, respectively. This holds true whether the array is irradiated from the front or from the back, which allows efficient collecting the data from the 4 cardinal gantry angles with just 2 array positions. The average isocenter position agreed to within at most 0.4 mm along any cardinal axis with the linac vendor’s film-based procedure, and the maximum walkout radii were 0.32 mm and 0.53 mm, respectively. The magnetic resonance imaging isocenter walkout as a function of gantry angle was studied with 2 different phantoms, one employing a single fiducial at the center and another extracting the rigid displacement values from the distortion map fit of 523 fiducials dispersed over a large volume. The results were close between the 2 phantoms and demonstrated variation in the magnetic resonance imaging isocenter location as high as 1.3 mm along a single axis in the transverse plane. Verification of the magnetic resonance imaging isocenter location versus the gantry angle should be a part of quality assurance for magnetic resonance-guided linear accelerators.
The treatment of central and ultracentral lung tumors with radiotherapy remains an ongoing clinical challenge. The risk of Grade 5 toxicity with ablative radiotherapy doses to these high-risk regions ...is significant as shown in recent prospective studies. Magnetic resonance (MR) image-guided adaptive radiotherapy (MRgART) is a new technology and may allow the delivery of ablative radiotherapy to these high-risk regions safely. MRgART is able to achieve this by utilizing small treatment margins, real-time gating/tracking and on-table plan adaptation to maintain dose to the tumor but limit dose to critical structures. The process of MRgART is complex and has nuances and challenges for the treatment of lung tumors. We outline the critical steps needed for appropriate delivery of MRgART for lung tumors safely and effectively.
Purpose:
The authors designed data, methods, and metrics that can serve as a standard, independent of any software package, to evaluate dose‐volume histogram (DVH) calculation accuracy and detect ...limitations. The authors use simple geometrical objects at different orientations combined with dose grids of varying spatial resolution with linear 1D dose gradients; when combined, ground truth DVH curves can be calculated analytically in closed form to serve as the absolute standards.
Methods:
dicom RT structure sets containing a small sphere, cylinder, and cone were created programmatically with axial plane spacing varying from 0.2 to 3 mm. Cylinders and cones were modeled in two different orientations with respect to the IEC 1217 Y axis. The contours were designed to stringently but methodically test voxelation methods required for DVH. Synthetic RT dose files were generated with 1D linear dose gradient and with grid resolution varying from 0.4 to 3 mm. Two commercial DVH algorithms—pinnacle (Philips Radiation Oncology Systems) and PlanIQ (Sun Nuclear Corp.)—were tested against analytical values using custom, noncommercial analysis software. In Test 1, axial contour spacing was constant at 0.2 mm while dose grid resolution varied. In Tests 2 and 3, the dose grid resolution was matched to varying subsampled axial contours with spacing of 1, 2, and 3 mm, and difference analysis and metrics were employed: (1) histograms of the accuracy of various DVH parameters (total volume, Dmax, Dmin, and doses to % volume: D99, D95, D5, D1, D0.03 cm3) and (2) volume errors extracted along the DVH curves were generated and summarized in tabular and graphical forms.
Results:
In Test 1, pinnacle produced 52 deviations (15%) while PlanIQ produced 5 (1.5%). In Test 2, pinnacle and PlanIQ differed from analytical by >3% in 93 (36%) and 18 (7%) times, respectively. Excluding Dmin and Dmax as least clinically relevant would result in 32 (15%) vs 5 (2%) scored deviations for pinnacle vs PlanIQ in Test 1, while Test 2 would yield 53 (25%) vs 17 (8%). In Test 3, statistical analyses of volume errors extracted continuously along the curves show pinnacle to have more errors and higher variability (relative to PlanIQ), primarily due to pinnacle’s lack of sufficient 3D grid supersampling. Another major driver for pinnacle errors is an inconsistency in implementation of the “end‐capping”; the additional volume resulting from expanding superior and inferior contours halfway to the next slice is included in the total volume calculation, but dose voxels in this expanded volume are excluded from the DVH. PlanIQ had fewer deviations, and most were associated with a rotated cylinder modeled by rectangular axial contours; for coarser axial spacing, the limited number of cross‐sectional rectangles hinders the ability to render the true structure volume.
Conclusions:
The method is applicable to any DVH‐calculating software capable of importing dicom RT structure set and dose objects (the authors’ examples are available for download). It includes a collection of tests that probe the design of the DVH algorithm, measure its accuracy, and identify failure modes. Merits and applicability of each test are discussed.