Merge pull request #66 from Imageomics/dev

Remove image filename dependency, fix dropdown bug, update workflow actions
Imageomics · May 21, 2024 · dbfb63c · dbfb63c
2 parents 407baff + 8793463
commit dbfb63c
Show file tree

Hide file tree

Showing 14 changed files with 127 additions and 156 deletions.
diff --git a/.github/workflows/deploy-image.yml b/.github/workflows/deploy-image.yml
@@ -17,18 +17,18 @@ jobs:
 
     steps:
       - name: Checkout repository
-        uses: actions/checkout@v3
+        uses: actions/checkout@v4
 
       - name: Log in to the Container registry
-        uses: docker/login-action@f054a8b539a109f9f41c372932f1ae047eff08c9
+        uses: docker/login-action@3
         with:
           registry: ${{ env.REGISTRY }}
           username: ${{ github.actor }}
           password: ${{ secrets.GITHUB_TOKEN }}
 
       - name: Extract metadata (tags, labels) for Docker
         id: meta
-        uses: docker/metadata-action@v4
+        uses: docker/metadata-action@v5
         with:
           images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
           tags: |
@@ -37,7 +37,7 @@ jobs:
             type=semver,pattern={{major}}
 
       - name: Build and push Docker image
-        uses: docker/build-push-action@ad44023a93711e3deb337508980b4b5e9bcdc5dc
+        uses: docker/build-push-action@v5
         with:
           context: .
           push: true

diff --git a/.github/workflows/run-tests.yml b/.github/workflows/run-tests.yml
@@ -10,9 +10,9 @@ jobs:
 
     steps:
       - name: Checkout repository
-        uses: actions/checkout@v3
+        uses: actions/checkout@v4
 
-      - uses: actions/setup-python@v4
+      - uses: actions/setup-python@v5
         with:
           python-version: '3.11'
 

diff --git a/CITATION.cff b/CITATION.cff
@@ -18,14 +18,14 @@ authors:
   given-names: "Hilmar"
   orcid: "https://orcid.org/0000-0001-9107-0714"
 cff-version: 1.2.0
-date-released: "2024-03-08"
+date-released: "2024-05-21"
 identifiers:
-  - description: "The GitHub release URL of tag 1.2.0."
+  - description: "The GitHub release URL of tag 1.3.0."
     type: url
-    value: "https://github.com/Imageomics/dashboard-prototype/releases/tag/v1.2.0"
+    value: "https://github.com/Imageomics/dashboard-prototype/releases/tag/v1.3.0"
   - description: "The GitHub URL of the commit tagged with 1.2.0."
     type: url
-    value: "https://github.com/Imageomics/dashboard-prototype/tree/d848922a28c37c03523413d5950823d13339aee1"
+    value: "https://github.com/Imageomics/dashboard-prototype/tree/d848922a28c37c03523413d5950823d13339aee1" #update on release
 keywords:
   - "EDA"
   - "data"

diff --git a/README.md b/README.md
@@ -1,24 +1,24 @@
 # Dashboard Prototype
 Prototype data dashboard using the [Cuthill Gold Standard Dataset](https://huggingface.co/datasets/imageomics/Curated_GoldStandard_Hoyal_Cuthill), which was processed from Cuthill, et. al. (original dataset available at [doi:10.5061/dryad.2hp1978](https://doi.org/10.5061/dryad.2hp1978)). Test datasets (the processed version of Cuthill's data with and without filepath URLs) are available in [test_data](./test_data).
 
+This dashboard focuses on images labeled at the species and subspecies level as described in a CSV.
 
 ## How it works
 
-For full dashboard functionality, upload a CSV or XLS file with the following columns: 
-- `Image_filename`*: Filename of each image, must be unique. **Note:** Images should be in PNG or JPEG format, TIFF may fail to render in the sample image display.
+For full dashboard functionality, upload a CSV or XLS file with the following columns:
 - `Species`: Species of each sample.
 - `Subspecies`: Subspecies of each sample.
 - `View`: View of the sample (eg., 'ventral' or 'dorsal' for butterflies).
 - `Sex`: Sex of each sample.
 - `hybrid_stat`: Hybrid status of each sample (eg., 'valid_subspecies', 'subspecies_synonym', or 'unknown').
 - `lat`*: Latitude at which image was taken or specimen was collected: number in [-90,90].
 - `lon`*:  Longitude at which image was taken or specimen was collected: number in [-180,180]. `long` will also be accepted.
-- `file_url`*: URL to access file.
+- `file_url`*: URL to access file. **Note:** Images should be in PNG or JPEG format, TIFF may fail to render in the sample image display.
 
 ***Note:** 
 - Column names are **not** case-sensitive.
 - `lat` and `lon` columns are not required to utilize the dashboard, but there will be no map view if they are not included. Blank (or null) entries are recorded as `unknown`, and thus excluded from map view.
-- `Image_filename` and `file_url` are not required, but there will be no sample images option if either one is not included.
+- `file_url` is not required, but there will be no sample images option if it is not included.
 - `locality` may be provided, otherwise it will take on the value `lat|lon` or `unknown` if these are not provided.
 
 ## Running Dashboard

diff --git a/components/divs.py b/components/divs.py
@@ -29,7 +29,7 @@
 
 def get_hist_div(mapping):
     '''
-    Function to generate the histogram options section of the dashboard, including button to select 'Map View'. 
+    Generates the histogram options section of the dashboard, including button to select 'Map View'. 
     Provides choice of variables for distribution and to color by, with options for order to sort x-axis.
 
     Parameters:
@@ -93,7 +93,7 @@ def get_hist_div(mapping):
 
 def get_map_div():
     '''
-    Function to generate the mapping options section of the dashboard. 
+    Generates the mapping options section of the dashboard. 
     Provides choice of variables to color by and button to switch back to histogram ('Show Histogram').
 
     Returns:
@@ -152,8 +152,8 @@ def get_map_div():
 
 def get_img_div(df, all_species, img_url):
     '''
-    Function to generate the Image Sampling options section of the dashboard, including button to display images. 
-    Provides empty list if no URLS are provided in the DataFrame for the entries.
+    Generates the Image Sampling options section of the dashboard, including button to display images. 
+    Provides empty list if no URLs are provided in the DataFrame for the entries.
 
     Parameters:
     -----------
@@ -241,7 +241,7 @@ def get_img_div(df, all_species, img_url):
 
 def get_main_div(hist_div, img_div):
     '''
-    Function to return main div based on upload of data.
+    Returns main div based on upload of data.
 
     Parameters:
     -----------
@@ -299,7 +299,7 @@ def get_main_div(hist_div, img_div):
 
 def get_error_div(error_dict):
     '''
-    Function to return appropriate error message if there's a problem uploading the selected file.
+    Returns appropriate error message if there's a problem uploading the selected file.
 
     Parameters:
     -----------

diff --git a/components/query.py b/components/query.py
@@ -8,7 +8,7 @@
 
 def get_data(df, mapping, features):
     '''
-    Function to read in DataFrame and perform required manipulations: 
+    Reads in DataFrame and performs required manipulations: 
         - fill null values in required columns with 'unknown'
         - add 'lat-lon', `Samples_at_locality`, 'Species_at_locality', and 'Subspecies_at_locality' columns.
         - make list of categorical columns.
@@ -18,7 +18,7 @@ def get_data(df, mapping, features):
     df - DataFrame of the data to visualize.
     mapping - Boolean. True when lat/lon are given in dataset.
     features - List of features (columns) included in the DataFrame. This is a subset of the suggested columns: 
-                'Species', 'Subspecies', 'View', 'Sex', 'Hybrid_stat', 'Lat', 'Lon', 'File_url', 'Image_filename'
+                'Species', 'Subspecies', 'View', 'Sex', 'Hybrid_stat', 'Lat', 'Lon', 'File_url'
             
     Returns:
     --------
@@ -68,7 +68,7 @@ def get_data(df, mapping, features):
 
 def get_species_options(df):
     '''
-    Function to pull in DataFrame and produce a dictionary of species options (Melpomene, Erato, and Any)
+    Pulls in DataFrame and produces a dictionary of species options (eg., melpomene, erato, and Any)
 
     Parameters:
     -----------
@@ -79,13 +79,13 @@ def get_species_options(df):
     all_species - Dictionary of all potential species options and their subspecies.
 
     '''
-    species_list = list(df.Species.unique())
+    species_list = list(df.Species.dropna().unique()) # drop nulls to avoid adding non-species (or subspecies below)
     all_species = {}
     for species in species_list:
-        subspecies_list = df.loc[df.Species == species, 'Subspecies'].unique()
-        subspecies_list = np.insert(subspecies_list, 0 , 'Any-' + species.capitalize())
-        all_species[species.capitalize()] = list(subspecies_list)
-    all_subspecies = np.insert(df.Subspecies.unique(), 0, 'Any')
+        subspecies_list = df.loc[df.Species == species, 'Subspecies'].dropna().unique()
+        subspecies_list = np.insert(subspecies_list, 0 , 'Any-' + species) # need this to match as filled for img selection
+        all_species[species] = list(subspecies_list)
+    all_subspecies = np.insert(df.Subspecies.dropna().unique(), 0, 'Any')
     all_species['Any'] = list(all_subspecies)
 
     return all_species
@@ -94,7 +94,7 @@ def get_species_options(df):
 
 def get_images(df, subspecies, view, sex, hybrid, num_images):
     '''
-    Function to retrieve the user-selected number of images.
+    Retrieves the user-selected number of images.
 
     Parameters:
     -----------
@@ -109,28 +109,20 @@ def get_images(df, subspecies, view, sex, hybrid, num_images):
     --------
     Imgs - List of html image elements with `src` element pointing to paths for the requested number of images matching given parameters.
            Returns html header4 "No Such Images. Please make another selection." if no images matching parameters exist.
-           Returns html header4 indicating number of matching entries without filename or filepath.
+           Returns html header4 indicating number of matching entries without filepath(s).
     '''
     try:
-        filenames, filepaths = get_filenames(df, subspecies, view, sex, hybrid, num_images)
+        filepaths = get_filenames(df, subspecies, view, sex, hybrid, num_images)
     except ValueError as e:
         return html.H4(str(e) + " Please make another selection.", 
                     style = PRINT_STYLE)
-    Imgs = []
-    for i in range(len(filenames)):
-        if filenames[i] in filepaths[i]:
-            image_path = filepaths[i]
-        else:
-            if filepaths[i][-1] == '/':
-                image_path = filepaths[i] + filenames[i]
-            else:
-                image_path = filepaths[i] + '/' + filenames[i]
-        Imgs.append(html.Img(src = image_path, style = IMG_STYLE))
+    Imgs = [html.Img(src = filepaths[i], style = IMG_STYLE) for i in range(len(filepaths))]
+
     return Imgs
 
 def get_filenames(df, subspecies, view, sex, hybrid, num_images):
     '''
-    Funtion to randomly select the given number of filenames for images adhering to specified filters.
+    Randomly selects the given number of filepaths (file urls) for images adhering to specified filters.
     Raises ValueError indicating no such images if none match the user selections.
     
     Parameters:
@@ -144,15 +136,16 @@ def get_filenames(df, subspecies, view, sex, hybrid, num_images):
 
     Returns:
     --------
-    filenames - List of filenames meeting specified conditions (the lesser of the requested amount or number available). 
     filepaths - List of filepaths (URLs) corresponding to the selected filenames. 
     
     '''
-    if 'Any' in subspecies and type(subspecies) == str:
+    if ('Any' in subspecies and type(subspecies) == str) or ('Any' in subspecies[0] and len(subspecies) == 1):
+        if type(subspecies) == list:
+            subspecies = subspecies[0]
         if subspecies == 'Any':
             df_sub = df.copy()
         else:
-            species = subspecies.split('-')[1].lower()
+            species = subspecies.split('-')[1] # should match case as filled
             df_sub = df.loc[df.Species == species].copy()
     else:
         df_sub = df.loc[df.Subspecies.isin(subspecies)].copy()
@@ -161,8 +154,7 @@ def get_filenames(df, subspecies, view, sex, hybrid, num_images):
     df_sub = df_sub.loc[df_sub.Hybrid_stat.isin(hybrid)]
 
     num_entries = len(df_sub)
-    # Filter out any entries that have missing filenames or URLs:
-    df_sub = df_sub.loc[df_sub.Image_filename != 'unknown']
+    # Filter out any entries that have missing URLs:
     df_sub = df_sub.loc[df_sub.File_url != 'unknown']
     max_imgs = len(df_sub)
     missing_vals = num_entries - max_imgs
@@ -172,12 +164,13 @@ def get_filenames(df, subspecies, view, sex, hybrid, num_images):
         else:
             num = min(num_images, max_imgs)
         df_filtered = df_sub.sample(num)
-        filenames = df_filtered.Image_filename.astype('string').values
         filepaths = df_filtered.File_url.astype('string').values
-        #return list of filenames for min(user-selected, available) images randomly selected images from the filtered dataset
-        return list(filenames), list(filepaths)
+        #return list of filepaths for min(user-selected, available) images randomly selected images from the filtered dataset
+        return list(filepaths)
     # If there aren't any images to display, check if there are no such entries or just missing information.
     elif missing_vals == 0:
+        # No images & no matching records
         raise ValueError("No Such Images.")
     else:
-        raise ValueError("No Such Images. Unknown filename(s) or path(s).")
+        # There are records matching, but not able to display images for them
+        raise ValueError(f"No Such Images to display; {missing_vals} record(s) with unknown filepath(s) match this selection.")
diff --git a/dashboard.py b/dashboard.py
@@ -55,7 +55,7 @@
 
 def parse_contents(contents, filename):
     '''
-    Function to read uploaded data.
+    Reads uploaded data, checks that it meets requirements, and processes it. Returns processed data and available options in JSON.
     '''
     if contents is None:
         raise PreventUpdate
@@ -81,7 +81,7 @@ def parse_contents(contents, filename):
     # If no image urls, disable sample image options
     mapping = True
     img_urls = True
-    features = ['Species', 'Subspecies', 'View', 'Sex', 'Hybrid_stat', 'Lat', 'Lon', 'File_url', 'Image_filename']
+    features = ['Species', 'Subspecies', 'View', 'Sex', 'Hybrid_stat', 'Lat', 'Lon', 'File_url']
     included_features = []
     df.columns = df.columns.str.capitalize()
     for feature in features:
@@ -97,10 +97,6 @@ def parse_contents(contents, filename):
                     mapping = False
             elif feature == 'File_url':
                 img_urls = False
-            elif feature == 'Image_filename':
-                # If 'Image_filename' missing, return missing column if 'file_url' is included.
-                if img_urls:
-                    return json.dumps({'error': {'feature': feature}})
             else:
                 return json.dumps({'error': {'feature': feature}})
         else:
@@ -120,10 +116,10 @@ def parse_contents(contents, filename):
 
     # get dataset-determined static data:
         # the dataframe and categorical features - processed for map view if mapping is True
-        # all possible species, subspecies
+        # all possible species, subspecies -- must run first to avoid adding "unknown" to lists
         # will likely include categorical options in later instance (sooner)
+    all_species = get_species_options(df)
     processed_df, cat_list = get_data(df, mapping, included_features)
-    all_species = get_species_options(processed_df)
     # save data to dictionary to save as json 
     data = {
             'processed_df': processed_df.to_json(date_format = 'iso', orient = 'split'),
@@ -154,7 +150,7 @@ def update_output(contents, filename):
 
 def get_visuals(jsonified_data):
     '''
-    Function that usese the processed and saved data to get the main div (histogram, pie chart, and image example options).
+    Fetches the main div (histogram, pie chart, and image example options) based on the processed and saved data.
     Returns error div if error occurs in upload or essential features are missing.
     '''
     # load saved data
@@ -181,7 +177,7 @@ def get_visuals(jsonified_data):
 
 def update_dist_view(n_clicks, children, jsonified_data):
     '''
-    Function to update the upper left distribution options based on selected distribution chart (histogram or map).
+    Updates the upper left distribution options based on selected distribution chart (histogram or map).
     Activates on click to change, defaults to histogram view.
 
     Parameters:
@@ -221,7 +217,7 @@ def update_dist_view(n_clicks, children, jsonified_data):
 
 def update_dist_plot(x_var, color_by, sort_by, btn, jsonified_data):
     '''
-    Function to update distribution figure with either map or histogram based on selections.
+    Updates distribution figure with either map or histogram based on selections.
     Selection is based on current label of the button ('Map View' or 'Show Histogram'), which updates prior to graph.
 
     Parameters:
@@ -285,7 +281,7 @@ def update_pie_plot(var, jsonified_data):
 
 def set_subspecies_options(selected_species, jsonified_data):
     ''' 
-    Function to set subspecies options in dropdown based on user-selected species.
+    Sets subspecies options in dropdown based on user-selected species.
 
     Parameters:
     -----------
@@ -325,7 +321,7 @@ def set_subspecies_value(available_options):
 # Retrieve selected number of images
 def update_display(n_clicks, jsonified_data, subspecies, view, sex, hybrid, num_images):
     '''
-    Function to retrieve the user-selected number of images adhering to their chosen parameters when the 'Display Images' button is pressed.
+    Retrieves the user-selected number of images adhering to their chosen parameters when the 'Display Images' button is pressed.
     
     Parameters:
     -----------