From c1e9cbf237b6a46ab2b5fbadab1b36719d61173d Mon Sep 17 00:00:00 2001
From: Tristan Glatard <tristan.glatard@concordia.ca>
Date: Wed, 13 Mar 2024 14:34:17 -0400
Subject: [PATCH] Removed more duplicates

---
 paper/sea-neuro/paper-neuro.tex | 38 ---------------------------------
 1 file changed, 38 deletions(-)

diff --git a/paper/sea-neuro/paper-neuro.tex b/paper/sea-neuro/paper-neuro.tex
index 586fee0..1c1eea4 100644
--- a/paper/sea-neuro/paper-neuro.tex
+++ b/paper/sea-neuro/paper-neuro.tex
@@ -211,9 +211,6 @@
     shared storage. To avoid interrupting ongoing
     processing with data management operations, this is accomplished via a separate thread (known as the
     ``flusher'') that moves data from the caches to long-term storage. Users must inform Sea of files
-    shared storage. To avoid interrupting ongoing
-    processing with data management operations, this is accomplished via a separate thread (known as the
-    ``flusher'') that moves data from the caches to long-term storage. Users must inform Sea of files
     that need to be persisted to storage within a file called
     \texttt{.sea\_flushlist}, and temporary files which can be removed from
     cache within a file called the \texttt{.sea\_evictlist}. Both these files
@@ -322,8 +319,6 @@ \subsection{Speedups observed in controlled environment}
 level of the applications in our experiments. As a result, each application
 process was likely attempting to utilize all cores, resulting in longer wait times
 with the increase in parallelism. With this unintentional additional wait time that
-process was likely attempting to utilize all cores, resulting in longer wait times
-with the increase in parallelism. With this unintentional additional wait time that
 is added to the applications due to compute contention, the benefits of using
 Sea becomes more limited.
 
@@ -392,9 +387,6 @@ \subsection{Speedups observed in production environments}
 \label{fig:seaneuro:beluga-wflush}
 \end{figure*}
 
-  We also benchmarked Sea on a production cluster of the Digital Alliance of Canada
-  shared with other users,
-  using the same datasets and pipelines as for the controlled cluster. 
   We also benchmarked Sea on a production cluster of the Digital Alliance of Canada
   shared with other users,
   using the same datasets and pipelines as for the controlled cluster. 
@@ -406,10 +398,6 @@ \subsection{Speedups observed in production environments}
   resulting in Sea and Baseline performing
   quite similarly to each other.
   
-  Our results with flushing enabled confirmed this theory. When running with
-  flushing enabled, which had the additional overhead of having to
-  ensure that all the data was copied to Lustre, we noticed that occasionally, we
-  obtained very large speedups with both the SPM and AFNI pipelines, suggesting that Lustre
   Our results with flushing enabled confirmed this theory. When running with
   flushing enabled, which had the additional overhead of having to
   ensure that all the data was copied to Lustre, we noticed that occasionally, we
@@ -441,16 +429,12 @@ \subsection{Speedups observed in production environments}
     Although Sea usage did result in performance degradation at times, the magnitude
     of the degradation was less than that of possible speedups
     obtained. Our controlled experiments demonstrated that using Sea with a
-    of the degradation was less than that of possible speedups
-    obtained. Our controlled experiments demonstrated that using Sea with a
     degraded Lustre file system can result in average speedups of up to
     2.5$\times$. Such results indicate that the risk of a slowdown with Sea are
     less important that the potential speedups that can arise.
 
     \subsection{Limitations of using Sea}
     
-    Sea's benefit is maximal when it is executed on data-intensive pipelines, that is,
-     when the data is written at a rate which exceeds the rate at which the
     Sea's benefit is maximal when it is executed on data-intensive pipelines, that is,
      when the data is written at a rate which exceeds the rate at which the
     compute node's page cache can flush results to Lustre. In our experiments,
@@ -463,18 +447,10 @@ \subsection{Speedups observed in production environments}
     our controlled experiments did not showcase the benefits of Sea in a highly
     data-intensive scenario. Since HCP is one of the largest fMRI datasets,
     the only way to augment data intensiveness of the applications is
-    our controlled experiments did not showcase the benefits of Sea in a highly
-    data-intensive scenario. Since HCP is one of the largest fMRI datasets,
-    the only way to augment data intensiveness of the applications is
     through increased parallelism which is limited by the number of available
     cores. As a result, we could only demonstrate the speedups brought by Sea when
     Lustre performance had been degraded.
-    cores. As a result, we could only demonstrate the speedups brought by Sea when
-    Lustre performance had been degraded.
-
 
-    Sea cannot speedup applications that are very compute-intensive, as is the case with the FSL Feat pipeline.
-    In such cases, the performance bottleneck is the
     Sea cannot speedup applications that are very compute-intensive, as is the case with the FSL Feat pipeline.
     In such cases, the performance bottleneck is the
     CPU time, which Sea does not address. Moreover, depending on the
@@ -489,12 +465,6 @@ \subsection{Speedups observed in production environments}
     %but it's actually more complicated. When dataset increases
     \subsection{Predicting speedups}
 
-    Our results expose the complexities of performance predictions on HPC clusters. We presented results on a dedicated cluster
-    where we had
-    control over the global cluster usage and were able to demonstrate that
-    significant speedups could be obtained when all the Lustre storage devices were
-    busy writing other data concurrently. However, even these experiments were
-    limited in demonstrating potential speedups with Sea as there were many
     Our results expose the complexities of performance predictions on HPC clusters. We presented results on a dedicated cluster
     where we had
     control over the global cluster usage and were able to demonstrate that
@@ -826,9 +796,6 @@ \subsection{Speedups observed in production environments}
     writers, the fMRI preprocessing pipelines had exclusive access to the
     cluster. With busy writers, the fMRI preprocessing pipelines were executed
     alongside an Apache Spark application that continuously read and wrote approximately
-    writers, the fMRI preprocessing pipelines had exclusive access to the
-    cluster. With busy writers, the fMRI preprocessing pipelines were executed
-    alongside an Apache Spark application that continuously read and wrote approximately
     1000$\times$ \SI{617}{\mebi\byte} blocks using 64 threads, with a 5 seconds sleep between reads
     and writes. In our experiments, we either had 6 nodes each executing the
     Spark application or no busy writers.
@@ -837,9 +804,6 @@ \subsection{Speedups observed in production environments}
     only pipeline configured to prefetch was SPM, as the input data was otherwise
     read through a memmap (i.e. only loading necessary portions of the file to memory) 
     with Lustre. Flushing would normally be necessary with
-    only pipeline configured to prefetch was SPM, as the input data was otherwise
-    read through a memmap (i.e. only loading necessary portions of the file to memory) 
-    with Lustre. Flushing would normally be necessary with
     preprocessing as the user would always require the output data. To
     investigate the impacts of flushing on these pipelines, we performed a
     separate set of experiments where AFNI and SPM would flush all data
@@ -849,8 +813,6 @@ \subsection{Speedups observed in production environments}
     alone (Baseline). To ensure that system performance was equivalent between
     Sea and our Baseline, we executed Sea and Baseline pipelines together.
     
-    All experiments were executed using 1, 8 and 16 processes.
-    Each process consisted of a single application call processing a single fMRI image.
     All experiments were executed using 1, 8 and 16 processes.
     Each process consisted of a single application call processing a single fMRI image.