This repository has been archived by the owner on Mar 8, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathmidterm_update_nov1.html
341 lines (312 loc) · 25.9 KB
/
midterm_update_nov1.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge"/>
<title>Computer Vision Class Project| CS, Georgia Tech | Fall 2020: CS4476</title>
<link rel="stylesheet" href="styles.css">
<meta name="description" content="">
<meta name="author" content="">
</head>
<body>
<div class="container">
<div class="page-header">
<!-- Title and Name -->
<h1>Detecting Cardiomegaly and Tissue Growth using Computer Vision on Chest X-rays</h1> <!--maybe change to something like Chest X-ray Imaging?-->
<span style="font-size: 20px; line-height: 1.5em;"><strong>Yash Kothari, Nima Jadali, Aditya Tapshalkar, Brayden Richardson, and Raymond Bryant</strong></span><br>
<span style="font-size: 18px; line-height: 1.5em;">CS 4476 - Intro to Computer Vision <br> Fall 2020 Class Project</span><br>
<span style="font-size: 18px; line-height: 1.5em;">Georgia Institute of Technology</span>
<hr>
<h2>Midterm Update</h2>
<div class="proposal_items">
<h3>Abstract</h3>
<div class="text_indent">
Radiologists and physicians manually observe chest X-Rays to diagnose certain illnesses and irregularities.
Our team is developing a computer vision neural network system to facilitate and validate the diagnosis of chest
nodules, masses, and cardiomegaly. This will be done through pre-processing of a given X-Ray image, running the
X-Ray through a convolutional neural network (implemented with TensorFlow) resulting in confidence scores for
the presence of the considered illnesses, and conducting Hough transform techniques and deformable contouring
to detect and localize possible masses, nodules, and cardiomegaly. The output of our system is a confidence
measure (probability) that the detected figure is actually a cardiomegaly or an unusual growth, and the
location of the growth or cardiomegaly in the image. Our system can then help prioritize images of patients
with higher confidence measures of a given illness (or ilnesses) and can also recommend further analysis for
images producing results with high uncertainty.
</div>
<h3>Teaser Figure</h3>
<div class="image_container">
<img src="Images\teaser_image.png" alt="teaser_image"/>
</div>
<br><br>
<h3>Introduction</h3>
<div class="text_indent">
Chest X-rays are very useful in diagnosing injuries and ailments within the thoracic region.
However, medical diagnosis of these ailments can often prove very challenging. Our team is
exploring new ways to efficiently diagnose conditions such as nodules, masses, and cardiomegaly.
Our system will aid in diagnosis using computer vision techniques integrated with a convolutional
neural net. <br><br>
Masses and nodules are similar in that both growths are tumors; the only
variance is that masses are larger in size, usually greater than 3cm in diameter,
whereas nodules are smaller, with a width smaller than 3cm. These growths could
be categorized as benign or malignant, but malignant growths have a high probability
of metastasizing and could be life-threatening if not treated soon. (“Lung Masses and Growth”) <br><br>
A cardiomegaly is essentially an enlargement of the heart and is primarily caused by short-term
stresses on the body such as pregnancies or underlying medical conditions like heart valve problems
and arrhythmias. Like most conditions, cardiomegaly can be treated if diagnosed early enough, which
is usually done through treatment and correcting the cause of cardiomegaly; treatments include
medications and surgery. <br><br>
These three ailments were chosen because of the feasibility of diagnosis given the
time constraint of the project. These three are solid ailments in the chest and easy
to identify and classify, whereas the other ailments are all some form of liquid in the
lungs, making them harder to identify and distinguish from one another. <br><br>
The goal of our project is to create a model, using supervised classification, that is
able to identify and diagnose nodules, masses, and cardiomegaly from an inputted Chest X-ray.
A Hough transformation and deformable contour will then be used to locate the position of the
ailment specifically within the image. A user of our system will be able to input a Chest X-ray of a patient,
and our model will then determine the presence (with a corresponding confidence level) of a
nodule, mass, or cardiomegaly. The model will then output the image with a
boundary encasing area of interest (the enlarged heart, nodule, or mass)
and the estimated size of the nodule/mass.
</div>
<br><br>
<h3>Approach</h3>
<div class="text_indent">
<div class="image_container">
<img src="Images\overview_flowchart.png" alt="overview_flowchart">
<br><br>
A high-level overview of our approach can be seen in the above figure which will be explained in detail in the rest of this section.
</div>
<h4>Preprocessing</h4>
The first step in our process was to pre-process the input X-ray images to aid the neural network in identifying each of our three main
ailments as facilitating post-processing. In deciding on what preprocessing to apply to the chest X-rays, our group considered both HSV
and RGB quantization through the use of K-Means clustering. We wanted to quantize the input images in order to reduce the number of
distinct colors in each X-ray to make diagnosing for cardiomegaly, masses, and nodules easier.
<br><br>
Each condition involves a solid mass in the body, much more visible and distinct in an X-ray than features in conditions such as infiltration
or pneumonia. Upon trying HSV quantization, we realized that, due to the nature of X-ray images, the output quantizations of the images were
insufficient. An illustration of this can be found in Figure 1 in the Qualitative Results section. Therefore, we resolved to utilize RGB
quantization; however, we had to then test different amounts of clusters to quantize with. Starting at 2 clusters, we could tell that the
X-rays were being over-simplified and turning into binary images, losing too much information about the contents of the X-ray. 7 clusters were
far too many, as the images were still noisy from the patient’s ribs and other tissues and organs in the chest cavity. With this range in mind, we
tested cluster sizes of 3 and 4, and finally settled on 3. Choosing 3 clusters kept enough of the X-ray information to where a mass, nodule,
or cardiomegaly could still be identified, but abstracted the X-ray enough to get rid of any noise in the image, like the ribs. Examples of
each cluster value can be seen in the Qualitative Results section, in Figure 2.
<br><br>
After conducting the RGB quantization on all input images, each image was then convoluted with a box linear filter to produce a smoother
pre-processing output to facilitate feature recognition during reinforcement learning.
<h4>Convolutional Neural Network (CNN) Reinforcement Learning</h4>
The original CheXNet architecture was analyzed, having been built using Keras/TensorFlow. The 121-layer neural network was deemed too
complex for the scope of this project, and hence, simplifications were made. These include:
<ul>
<li>Removal of certain disease/problem categories (Atelectasis, Consolidation, Edema, Effusion, Emphysema, Fibrosis, Hernia, Infiltration,
Pleural Thickening, Pneumonia and Pneumothorax were all removed) - categories that were left were cardiomegaly, mass and nodule.</li>
<li>Training has been done on a smaller subset of the original training set to simplify computation. However, accuracy has not been compromised
to a great degree, due to the usage of pre-processed images in training epoch 1 as will be discussed further.</li>
<li>Testing was only (so far) done on a far smaller subset of randomly sampled (seeded) images from the testing dataset.</li>
<li>The architecture was simplified by removing a large percentage of the layers (from 121 originally to 60) to make the training, testing and
running of the model more computationally easy and efficient given our limited resources.</li>
<li>Application of the algorithm was done to a set of images that are provided within the NIH chest X-Ray dataset, used by various models for
algorithm application (not within either the subset of training or testing)</li>
</ul>
<h4>Post-processing</h4>
After running our input X-ray through our CNN, we decided to run a Hough transform, a feature-extracting technique utilizing a vote-accumulating
matrix, for nodule and mass detection, and deformable contouring, another feature-extraction method using a general shape as an initial contour
that then deforms to fit the shape of a feature in the image, to detect for nodules, masses, and cardiomegaly.
<br><br>
For the Hough transform used to locate the nodule and masses in the X-rays, we first tried using a modified version of our Hough transformation
problem set (PS2) with binning that grouped horizontally by 5 pixels and vertically by 10 pixels. This binning worked the best on masses and it
ran faster than our generalized Hough transformation we created afterwards. The modified Hough transformation with bins did not work very well
on nodules as there were often small details and noise in the X-ray images that led to misidentification (the same happened with identifying
masses, but to a lesser extent). A workaround for this was to choose the region containing the most centers as the region with the mass/nodule.
So, if the bottom left contained 5 centers above the threshold, while the other regions contain around 2 - 3 centers, the inputted X-ray would
be labeled as having a nodule or mass, depending on the identification, most likely in the bottom left region. This worked decently well for
labeling X-rays of nodules and masses with regions, but identification of their specific location was still not very accurate. The next step was
to create a generalized Hough transformation that used an R-table created based on an input image of a mass or nodule to perform a general Hough
transformation. The generalized Hough transformation worked by creating an R-table based on a reference image which is a cropped sample X-ray
of either a mass or nodule taken from some other patient. In the future, we plan on making our own reference image that might be more consistent.
The R-table is used to create an accumulator array for the canny edge image of the input image which takes a very long time considering the
number of values in the R-table. The generalized Hough transformation worked with a much greater accuracy, and often the mass or nodule would be
within the top 5 predicted centers. The downside to using the general Hough transformation is that the process takes a long time to run.
Even with vectorization, optimization, and using a smaller less accurate reference image, the generalized Hough transformation can take up to 2
minutes to run, making it not feasible for quickly labeling or identifying the location of masses/nodules. Going forward we are trying to find
ways of optimizing through possibly decreasing the search space by making bounding boxes around the lungs (which we think could be done via a HOG
because of the bones’ strong contrast with the rest of the X-ray) and making the reference image more concise. Creating a bounding box limiting
the search to the lungs would also make our results more accurate (and could maybe even help with finding the center where the heart would be
located which would be useful for the deformable contour on patients with irregularly shaped X-rays.
<br><br>
The deformable contouring works well on cardiomegaly since the starting position does not vary much for the heart. For the heart, the starting
position is slightly to the left side of the patient’s chest’s center (skewed right on the X-ray). There are some X-rays that are unusually
oriented, so that the patient's body is not centered (as mentioned we might use bounding boxes to help solve this problem). The starting radius
of the circle is 200 pixels. Going forward we can change the initial shape of the contour to make it more irregular instead of a circle so that it
fits the shape of a heart better; this would lead to a more accurate contour that would circle around the heart. For the specific implementation, it
was based on the class lecture and a technical report by D.N Davis. There are 400 points total in the contour. The alpha is .01 and beta is 2,
gamma is 300, and we found that with 150 iterations, the contour reaches a point where it no longer changes. Using a number of iterations higher
than this (200 iterations or greater) will lead to x-rays of normal hearts having a different contour with a smaller area, since their contour will
continue to decrease in size more after 150 iterations. Each iteration, the 400 points move to minimize the energy function which is based on alpha,
beta, and gamma.
<br><br>
</div>
<h3>Experiments and Results</h3>
<div class="text_indent">
<ul>
<ul>
<li>The Tensorflow/Keras model and PyTorch model were both considered - GitHub repositories for the code were looked at as cited in the references. The first experiment that was run was a straight test of the training efficiencies of both models by running the pre-given training code/steps for each model. </li>
<ul>
<li>Repository 1 (PyTorch) was far less computationally expensive and more efficient, while providing an AUROC (<span style="font-weight: bold;">Area Under a Receiver Operating Characteristic Curve</span>) of 0.836</li>
<li>Repository 2 (Keras/TensorFlow) was computationally more expensive and inefficient, while providing an AUROC of 0.841 (original AUROC as described in the CheXNet paper)</li>
<li>Based on the above, the training algorithm we used was inspired by the PyTorch repository - certain adjustments were made, including, but not limited to, weight update mechanism changes, addition of reinforcement learning as a mechanism to train, batch size modifications to prevent available GPUs from running out of memory</li>
</ul>
<li>The second experiment was purely testing the performance of both models on an untouched “new” set of images (from the NIH dataset and sub-sampled by the GitHub repositories themselves as previously un-tested images). The AUROCs that resulted were as follows:</li>
<ul>
<li>Repository 1 (PyTorch): 0.8108</li>
<li>Repository 2 (Keras/TensorFlow): 0.841</li>
<li>From the above, we see that the Keras/TensorFlow model was able to retain its original AUROC score and hence, accuracy, from the original testing on new images, showcasing its superior applicability. </li>
<li>Following from the above, the network was implemented using the architecture of the Keras/TensorFlow model as a basis. Large modifications were also made due to computational resource limitations. Improvements were made to the model’s ability to detect certain illnesses, while experiencing loss in accuracy for some others (losses were relatively insignificant - order of magnitude of 0.001, while the improvements in accuracy - of the same order of magnitude - are far more important for improving upon the originally published CheXNet algorithm)</li>
</ul>
<li>The original CheXNet has 121 layers. This was deemed to be a computationally extravagant model (again, given the resources and time available to our group and for the scope of this project). The architecture was simplified to a 60 layer network, reducing the computation time by a factor of approximately 0.445. This was based on experimentation done with the resulting AUROC scores from 5 different simplified network configurations (shown in the table below). </li>
</ul>
</ul>
<div>
<table>
<tbody>
<tr>
<td><span style="font-weight: bold;">NUMBER OF LAYERS</span></td>
<td><span style="font-weight: bold;">AUROC</span></td>
</tr>
<tr>
<td>121</td>
<td>0.841</td>
</tr>
<tr>
<td>93</td>
<td>0.8381</td>
</tr>
<tr>
<td>71</td>
<td>0.8378</td>
</tr>
<tr>
<td>61</td>
<td>0.8371</td>
</tr>
<tr>
<td>50</td>
<td>0.821</td>
</tr>
</tbody>
</table>
</div>
<ul>
<li>Based on the above, it was clear that further simplification (number of layers below 61) was undesirable, so the chosen architecture was a network with 60 layers - with an initial AUROC of 0.8370. Further simplification produced results that made a significant difference to the model’s predictions and were deemed too inaccurate. Hence, the stated architecture was chosen. </li>
</ul>
<ul>
<li>Next experiment focused on determining the optimal number of training epochs necessary for our selected architecture. Unfortunately, some errors were made during the collection of data for this experiment, so a figure/table was omitted. However, the determined optimal number of epochs was <span style="font-style: italic;">6</span> (considering values ranged 4 to 10). Two factors were considered when determining this value: accuracy of resultant model and computational complexity/time of training. This value produced an optimal balance of the two factors. </li>
<ul>
<li>Modifications to the original training methods were made after initial experimentation using only the training set. </li>
<li>During each epoch, a batch of images was split up and used as follows: </li>
<ul>
<li>Epoch 1: pre-processed images - pre-processing as specified in the above pre-processing section (qualitative results below)</li>
<li>Epochs 2 & 3: concentration of true positives - larger amount of images representing a true presence of the considered disease were consolidated into a training set for these epochs</li>
<li>Epochs 4, 5 & 6: random sample from a larger training set</li>
<li>The above splits provided optimal results in terms of the predictions made by the resultant trained model. </li>
<li>For comparison, the PyTorch model utilized 5 training epochs in their pre-trained model that was provided in the GitHub repository (cited below).</li>
</ul>
</ul>
<li>The final experiment focused on the testing of the overall architecture on a random sample of 100 images from the testing dataset. The results of these predictions can be found in the Google Sheets document linked below:</li>
</ul>
<p><a href="https://drive.google.com/file/d/1-hyDlWrTdAriJ9Mz_GX4atin6OWVoN97/view?usp=sharing" target="_blank" rel="noopener"><span style="font-weight: bold;">Link to Predictions Made by Our Model</span></a></p>
<p>Keep in mind that these results will need to be thresholded to provide a final decision for the presence of a given illness. The confidence scores are relative rather than absolute.</p>
<ul>
<li>Additionally, the AUROC score of our model when compared to the AUROC score of the original CheXNet is displayed in the figure below: </li>
</ul>
<div>
<table>
<tbody>
<tr>
<td><span style="font-weight: bold;">LABEL</span></td>
<td><span style="font-weight: bold;">OUR MODEL AUROC</span></td>
<td><span style="font-weight: bold;">CheXNet AUROC</span></td>
</tr>
<tr>
<td>Cardiomegaly</td>
<td>0.9017</td>
<td>0.9248</td>
</tr>
<tr>
<td>Mass</td>
<td>0.8509</td>
<td>0.8676</td>
</tr>
<tr>
<td>Nodule</td>
<td>0.7658</td>
<td>0.7802</td>
</tr>
</tbody>
</table>
</div>
<p>As shown above, our model’s accuracy is slightly lower than the original CheXNet. However, considering the complexity of the CheXNet architecture being greater by a factor of 2 (in terms of number of layers used), these initial results are quite promising. For our final update, we aim to close the gap in accuracy even further, possibly surpassing the original accuracy of the CheXNet for the subset of considered illnesses. We will also compare the effects of using an entirely pre-processed dataset to re-train both our model and the original CheXNet to see the resultant changes in prediction accuracy.</p>
</div>
<br>
<h3>Qualitative results</h3>
<div class="image_container">
Attempts of HSV quantization resulted in the following images, where k is the number of clusters used.
<br><br>
<img src="Images\hsv_test.png"><br>
<strong>Figure 1.</strong> Effects of using HSV quantization on X-ray input images.
<br><br>
Several values of k were used (ranging from 2 to 7 clusters), resulting in the following outputs.
<br><br>
<img src="Images\cluster_test.png"><br>
<strong>Figure 2.</strong> Results of using k=2,3,4,7 clusters for RGB quantization on X-ray input images.
<br><br>
<img src="Images\deform_contour_CM.png"><br>
<strong>Figure 3.</strong> Results of using alpha = .01, beta = 2, gamma = 300 and 200 iterations (red is final contour and blue is every 20th contour)
for deformable contouring on two chest x-rays, one with a cardiomegaly (left) and one with no illness (right).
<br><br>
<img src="Images\Acc_DetectPoint_GHT.png"><br>
<strong>Figure 4.</strong> Results of using the generalized Hough Transform on an image with a cardiomegaly. The reference
image depicts a mass occurrence in another image. The Accumulator is the result considering the input and reference images.
The Detected point(s) image contains the thresholded points from the accumulator.
</div>
<h3>Conclusion</h3>
<div class="text_indent">
Currently, our model, for a given image, outputs a set of probabilities each indicating its confidence in the presence of a given illness from the
set of illnesses we are considering. This gives us an initial idea of what these probabilities will look like, and for the final project update, we
will threshold these probabilities such that a single label (representing the detected illness) is assigned to each image. We will also consider two
additional possibilities: the confidence is not high enough to indicate any of the considered illnesses, or the confidence measures indicate the
presence of multiple illnesses. We will handle the first case by assigning a label to indicate that no illness was detected, and we will follow up with
further post-processing (Hough transform and contouring) to verify the model output. The second case must also be considered because there will
inevitably be some small amount of overlap within our dataset of the considered illnesses, and images falling under this case will be flagged for uncertainty.
These images would be recommended for further analysis by a radiologist or an expert in thoracic pathology. As discussed above, we will also utilize a
larger amount of pre-processed images during the training of our model as well as refine the techniques we used for quantization and smoothing for the
final update. We will also attempt to use Hough transform and contouring to localize the area of interest in a given image prior to training in order to
verify its assigned ground truth label and assist in feature extraction. These experiments will hopefully expose our model to more relevant features
that can be extracted, and it will allow us to further explore how and to what extent the overall effectiveness of our model changes.
<br><br>
Additionally, we can conclude that our model may or may not perform well for a random set of Chest X-Rays outside this dataset. A focus for future
work, within the scope of the final update, will be validating the results of this model on additional datasets for generalization testing and also
ensuring that over-fitting has not taken place. Furthermore, the F1 score will also be computed and analyzed to determine accuracy as mentioned in the
project proposal (to be done by final update).
<br><br>
<h3>References</h3>
<div style="text-indent: -36px; padding-left: 36px;">
<p>Chou, Bruce. “ChexNet-Keras.” Github Repository, github.com/brucechou1983/CheXNet-Keras.</p>
<p>D.N. Davis. <em>The Application Of Active Contour Models To MR and CT Images.</em> Technical report,
Medical Vision Group, University of Birmingham, Edgbaston, Birmingham, UK, 1995.</p>
<p>“Enlarged Heart.” <em>Mayo Clinic</em> , 16 Jan. 2020, https://www.mayoclinic.org/diseases-conditions/enlarged-heart/symptoms-causes/syc-20355436.</p>
<p>Irvin, Jeremy et al. “CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison.” Proceedings of the AAAI
Conference on Artificial Intelligence 33 (2019): 590–597. Crossref. Web.</p>
<p>“Lung Masses And Growths.” <em>Beaumont Health</em>, https://www.beaumont.org/conditions/lung-masses-and-growths. Accessed 30 Sept. 2020.</p>
<p>“Pulmonary Nodules.” <em>University of Rochester Medical Center</em>,
https://www.urmc.rochester.edu/encyclopedia/content.aspx?contenttypeid=22&contentid=pulmonarynodules. Accessed 30 Sept. 2020. </p>
<p>Summers, Ronald. <em>NIH Chest X-Ray Dataset of 14 Common Thorax Diseases</em>. 5 Nov. 2018.
NIH, https://nihcc.app.box.com/v/ChestXray-NIHCC/file/220660789610.</p>
<p>Zech, J. Reproduce-Chexnet. GitHub/GitHub Repository, 2018, github.com/jrzech/reproduce-chexnet.</p>
</div>
</div>
<hr>
</div>
</div>
<br><br>
</body></html>