Skip to content

Commit

Permalink
Merge pull request #66 from Imageomics/dev
Browse files Browse the repository at this point in the history
Remove image filename dependency, fix dropdown bug, update workflow actions
  • Loading branch information
egrace479 authored May 21, 2024
2 parents 407baff + 8793463 commit dbfb63c
Show file tree
Hide file tree
Showing 14 changed files with 127 additions and 156 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/deploy-image.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,18 +17,18 @@ jobs:

steps:
- name: Checkout repository
uses: actions/checkout@v3
uses: actions/checkout@v4

- name: Log in to the Container registry
uses: docker/login-action@f054a8b539a109f9f41c372932f1ae047eff08c9
uses: docker/login-action@3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v4
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
Expand All @@ -37,7 +37,7 @@ jobs:
type=semver,pattern={{major}}
- name: Build and push Docker image
uses: docker/build-push-action@ad44023a93711e3deb337508980b4b5e9bcdc5dc
uses: docker/build-push-action@v5
with:
context: .
push: true
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/run-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ jobs:

steps:
- name: Checkout repository
uses: actions/checkout@v3
uses: actions/checkout@v4

- uses: actions/setup-python@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'

Expand Down
8 changes: 4 additions & 4 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,14 @@ authors:
given-names: "Hilmar"
orcid: "https://orcid.org/0000-0001-9107-0714"
cff-version: 1.2.0
date-released: "2024-03-08"
date-released: "2024-05-21"
identifiers:
- description: "The GitHub release URL of tag 1.2.0."
- description: "The GitHub release URL of tag 1.3.0."
type: url
value: "https://github.com/Imageomics/dashboard-prototype/releases/tag/v1.2.0"
value: "https://github.com/Imageomics/dashboard-prototype/releases/tag/v1.3.0"
- description: "The GitHub URL of the commit tagged with 1.2.0."
type: url
value: "https://github.com/Imageomics/dashboard-prototype/tree/d848922a28c37c03523413d5950823d13339aee1"
value: "https://github.com/Imageomics/dashboard-prototype/tree/d848922a28c37c03523413d5950823d13339aee1" #update on release
keywords:
- "EDA"
- "data"
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,24 @@
# Dashboard Prototype
Prototype data dashboard using the [Cuthill Gold Standard Dataset](https://huggingface.co/datasets/imageomics/Curated_GoldStandard_Hoyal_Cuthill), which was processed from Cuthill, et. al. (original dataset available at [doi:10.5061/dryad.2hp1978](https://doi.org/10.5061/dryad.2hp1978)). Test datasets (the processed version of Cuthill's data with and without filepath URLs) are available in [test_data](./test_data).

This dashboard focuses on images labeled at the species and subspecies level as described in a CSV.

## How it works

For full dashboard functionality, upload a CSV or XLS file with the following columns:
- `Image_filename`*: Filename of each image, must be unique. **Note:** Images should be in PNG or JPEG format, TIFF may fail to render in the sample image display.
For full dashboard functionality, upload a CSV or XLS file with the following columns:
- `Species`: Species of each sample.
- `Subspecies`: Subspecies of each sample.
- `View`: View of the sample (eg., 'ventral' or 'dorsal' for butterflies).
- `Sex`: Sex of each sample.
- `hybrid_stat`: Hybrid status of each sample (eg., 'valid_subspecies', 'subspecies_synonym', or 'unknown').
- `lat`*: Latitude at which image was taken or specimen was collected: number in [-90,90].
- `lon`*: Longitude at which image was taken or specimen was collected: number in [-180,180]. `long` will also be accepted.
- `file_url`*: URL to access file.
- `file_url`*: URL to access file. **Note:** Images should be in PNG or JPEG format, TIFF may fail to render in the sample image display.

***Note:**
- Column names are **not** case-sensitive.
- `lat` and `lon` columns are not required to utilize the dashboard, but there will be no map view if they are not included. Blank (or null) entries are recorded as `unknown`, and thus excluded from map view.
- `Image_filename` and `file_url` are not required, but there will be no sample images option if either one is not included.
- `file_url` is not required, but there will be no sample images option if it is not included.
- `locality` may be provided, otherwise it will take on the value `lat|lon` or `unknown` if these are not provided.

## Running Dashboard
Expand Down
12 changes: 6 additions & 6 deletions components/divs.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@

def get_hist_div(mapping):
'''
Function to generate the histogram options section of the dashboard, including button to select 'Map View'.
Generates the histogram options section of the dashboard, including button to select 'Map View'.
Provides choice of variables for distribution and to color by, with options for order to sort x-axis.
Parameters:
Expand Down Expand Up @@ -93,7 +93,7 @@ def get_hist_div(mapping):

def get_map_div():
'''
Function to generate the mapping options section of the dashboard.
Generates the mapping options section of the dashboard.
Provides choice of variables to color by and button to switch back to histogram ('Show Histogram').
Returns:
Expand Down Expand Up @@ -152,8 +152,8 @@ def get_map_div():

def get_img_div(df, all_species, img_url):
'''
Function to generate the Image Sampling options section of the dashboard, including button to display images.
Provides empty list if no URLS are provided in the DataFrame for the entries.
Generates the Image Sampling options section of the dashboard, including button to display images.
Provides empty list if no URLs are provided in the DataFrame for the entries.
Parameters:
-----------
Expand Down Expand Up @@ -241,7 +241,7 @@ def get_img_div(df, all_species, img_url):

def get_main_div(hist_div, img_div):
'''
Function to return main div based on upload of data.
Returns main div based on upload of data.
Parameters:
-----------
Expand Down Expand Up @@ -299,7 +299,7 @@ def get_main_div(hist_div, img_div):

def get_error_div(error_dict):
'''
Function to return appropriate error message if there's a problem uploading the selected file.
Returns appropriate error message if there's a problem uploading the selected file.
Parameters:
-----------
Expand Down
55 changes: 24 additions & 31 deletions components/query.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

def get_data(df, mapping, features):
'''
Function to read in DataFrame and perform required manipulations:
Reads in DataFrame and performs required manipulations:
- fill null values in required columns with 'unknown'
- add 'lat-lon', `Samples_at_locality`, 'Species_at_locality', and 'Subspecies_at_locality' columns.
- make list of categorical columns.
Expand All @@ -18,7 +18,7 @@ def get_data(df, mapping, features):
df - DataFrame of the data to visualize.
mapping - Boolean. True when lat/lon are given in dataset.
features - List of features (columns) included in the DataFrame. This is a subset of the suggested columns:
'Species', 'Subspecies', 'View', 'Sex', 'Hybrid_stat', 'Lat', 'Lon', 'File_url', 'Image_filename'
'Species', 'Subspecies', 'View', 'Sex', 'Hybrid_stat', 'Lat', 'Lon', 'File_url'
Returns:
--------
Expand Down Expand Up @@ -68,7 +68,7 @@ def get_data(df, mapping, features):

def get_species_options(df):
'''
Function to pull in DataFrame and produce a dictionary of species options (Melpomene, Erato, and Any)
Pulls in DataFrame and produces a dictionary of species options (eg., melpomene, erato, and Any)
Parameters:
-----------
Expand All @@ -79,13 +79,13 @@ def get_species_options(df):
all_species - Dictionary of all potential species options and their subspecies.
'''
species_list = list(df.Species.unique())
species_list = list(df.Species.dropna().unique()) # drop nulls to avoid adding non-species (or subspecies below)
all_species = {}
for species in species_list:
subspecies_list = df.loc[df.Species == species, 'Subspecies'].unique()
subspecies_list = np.insert(subspecies_list, 0 , 'Any-' + species.capitalize())
all_species[species.capitalize()] = list(subspecies_list)
all_subspecies = np.insert(df.Subspecies.unique(), 0, 'Any')
subspecies_list = df.loc[df.Species == species, 'Subspecies'].dropna().unique()
subspecies_list = np.insert(subspecies_list, 0 , 'Any-' + species) # need this to match as filled for img selection
all_species[species] = list(subspecies_list)
all_subspecies = np.insert(df.Subspecies.dropna().unique(), 0, 'Any')
all_species['Any'] = list(all_subspecies)

return all_species
Expand All @@ -94,7 +94,7 @@ def get_species_options(df):

def get_images(df, subspecies, view, sex, hybrid, num_images):
'''
Function to retrieve the user-selected number of images.
Retrieves the user-selected number of images.
Parameters:
-----------
Expand All @@ -109,28 +109,20 @@ def get_images(df, subspecies, view, sex, hybrid, num_images):
--------
Imgs - List of html image elements with `src` element pointing to paths for the requested number of images matching given parameters.
Returns html header4 "No Such Images. Please make another selection." if no images matching parameters exist.
Returns html header4 indicating number of matching entries without filename or filepath.
Returns html header4 indicating number of matching entries without filepath(s).
'''
try:
filenames, filepaths = get_filenames(df, subspecies, view, sex, hybrid, num_images)
filepaths = get_filenames(df, subspecies, view, sex, hybrid, num_images)
except ValueError as e:
return html.H4(str(e) + " Please make another selection.",
style = PRINT_STYLE)
Imgs = []
for i in range(len(filenames)):
if filenames[i] in filepaths[i]:
image_path = filepaths[i]
else:
if filepaths[i][-1] == '/':
image_path = filepaths[i] + filenames[i]
else:
image_path = filepaths[i] + '/' + filenames[i]
Imgs.append(html.Img(src = image_path, style = IMG_STYLE))
Imgs = [html.Img(src = filepaths[i], style = IMG_STYLE) for i in range(len(filepaths))]

return Imgs

def get_filenames(df, subspecies, view, sex, hybrid, num_images):
'''
Funtion to randomly select the given number of filenames for images adhering to specified filters.
Randomly selects the given number of filepaths (file urls) for images adhering to specified filters.
Raises ValueError indicating no such images if none match the user selections.
Parameters:
Expand All @@ -144,15 +136,16 @@ def get_filenames(df, subspecies, view, sex, hybrid, num_images):
Returns:
--------
filenames - List of filenames meeting specified conditions (the lesser of the requested amount or number available).
filepaths - List of filepaths (URLs) corresponding to the selected filenames.
'''
if 'Any' in subspecies and type(subspecies) == str:
if ('Any' in subspecies and type(subspecies) == str) or ('Any' in subspecies[0] and len(subspecies) == 1):
if type(subspecies) == list:
subspecies = subspecies[0]
if subspecies == 'Any':
df_sub = df.copy()
else:
species = subspecies.split('-')[1].lower()
species = subspecies.split('-')[1] # should match case as filled
df_sub = df.loc[df.Species == species].copy()
else:
df_sub = df.loc[df.Subspecies.isin(subspecies)].copy()
Expand All @@ -161,8 +154,7 @@ def get_filenames(df, subspecies, view, sex, hybrid, num_images):
df_sub = df_sub.loc[df_sub.Hybrid_stat.isin(hybrid)]

num_entries = len(df_sub)
# Filter out any entries that have missing filenames or URLs:
df_sub = df_sub.loc[df_sub.Image_filename != 'unknown']
# Filter out any entries that have missing URLs:
df_sub = df_sub.loc[df_sub.File_url != 'unknown']
max_imgs = len(df_sub)
missing_vals = num_entries - max_imgs
Expand All @@ -172,12 +164,13 @@ def get_filenames(df, subspecies, view, sex, hybrid, num_images):
else:
num = min(num_images, max_imgs)
df_filtered = df_sub.sample(num)
filenames = df_filtered.Image_filename.astype('string').values
filepaths = df_filtered.File_url.astype('string').values
#return list of filenames for min(user-selected, available) images randomly selected images from the filtered dataset
return list(filenames), list(filepaths)
#return list of filepaths for min(user-selected, available) images randomly selected images from the filtered dataset
return list(filepaths)
# If there aren't any images to display, check if there are no such entries or just missing information.
elif missing_vals == 0:
# No images & no matching records
raise ValueError("No Such Images.")
else:
raise ValueError("No Such Images. Unknown filename(s) or path(s).")
# There are records matching, but not able to display images for them
raise ValueError(f"No Such Images to display; {missing_vals} record(s) with unknown filepath(s) match this selection.")
22 changes: 9 additions & 13 deletions dashboard.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@

def parse_contents(contents, filename):
'''
Function to read uploaded data.
Reads uploaded data, checks that it meets requirements, and processes it. Returns processed data and available options in JSON.
'''
if contents is None:
raise PreventUpdate
Expand All @@ -81,7 +81,7 @@ def parse_contents(contents, filename):
# If no image urls, disable sample image options
mapping = True
img_urls = True
features = ['Species', 'Subspecies', 'View', 'Sex', 'Hybrid_stat', 'Lat', 'Lon', 'File_url', 'Image_filename']
features = ['Species', 'Subspecies', 'View', 'Sex', 'Hybrid_stat', 'Lat', 'Lon', 'File_url']
included_features = []
df.columns = df.columns.str.capitalize()
for feature in features:
Expand All @@ -97,10 +97,6 @@ def parse_contents(contents, filename):
mapping = False
elif feature == 'File_url':
img_urls = False
elif feature == 'Image_filename':
# If 'Image_filename' missing, return missing column if 'file_url' is included.
if img_urls:
return json.dumps({'error': {'feature': feature}})
else:
return json.dumps({'error': {'feature': feature}})
else:
Expand All @@ -120,10 +116,10 @@ def parse_contents(contents, filename):

# get dataset-determined static data:
# the dataframe and categorical features - processed for map view if mapping is True
# all possible species, subspecies
# all possible species, subspecies -- must run first to avoid adding "unknown" to lists
# will likely include categorical options in later instance (sooner)
all_species = get_species_options(df)
processed_df, cat_list = get_data(df, mapping, included_features)
all_species = get_species_options(processed_df)
# save data to dictionary to save as json
data = {
'processed_df': processed_df.to_json(date_format = 'iso', orient = 'split'),
Expand Down Expand Up @@ -154,7 +150,7 @@ def update_output(contents, filename):

def get_visuals(jsonified_data):
'''
Function that usese the processed and saved data to get the main div (histogram, pie chart, and image example options).
Fetches the main div (histogram, pie chart, and image example options) based on the processed and saved data.
Returns error div if error occurs in upload or essential features are missing.
'''
# load saved data
Expand All @@ -181,7 +177,7 @@ def get_visuals(jsonified_data):

def update_dist_view(n_clicks, children, jsonified_data):
'''
Function to update the upper left distribution options based on selected distribution chart (histogram or map).
Updates the upper left distribution options based on selected distribution chart (histogram or map).
Activates on click to change, defaults to histogram view.
Parameters:
Expand Down Expand Up @@ -221,7 +217,7 @@ def update_dist_view(n_clicks, children, jsonified_data):

def update_dist_plot(x_var, color_by, sort_by, btn, jsonified_data):
'''
Function to update distribution figure with either map or histogram based on selections.
Updates distribution figure with either map or histogram based on selections.
Selection is based on current label of the button ('Map View' or 'Show Histogram'), which updates prior to graph.
Parameters:
Expand Down Expand Up @@ -285,7 +281,7 @@ def update_pie_plot(var, jsonified_data):

def set_subspecies_options(selected_species, jsonified_data):
'''
Function to set subspecies options in dropdown based on user-selected species.
Sets subspecies options in dropdown based on user-selected species.
Parameters:
-----------
Expand Down Expand Up @@ -325,7 +321,7 @@ def set_subspecies_value(available_options):
# Retrieve selected number of images
def update_display(n_clicks, jsonified_data, subspecies, view, sex, hybrid, num_images):
'''
Function to retrieve the user-selected number of images adhering to their chosen parameters when the 'Display Images' button is pressed.
Retrieves the user-selected number of images adhering to their chosen parameters when the 'Display Images' button is pressed.
Parameters:
-----------
Expand Down
Loading

0 comments on commit dbfb63c

Please sign in to comment.