October 27, 2017, Friday

New thoughts on the Community version
- We don't need to enforce defining outputs with default output names
- We will list all outputs in the right column
- We will allow building workflows with all existing Agave apps
- However, if output name changes, the workflow will not run
- This will make it easy to handle duplicates
- This is compatible with the workflow built with enforced filenames
Still need to think about how to relax on enforcement that makes workflow completely reproducible
- One idea is to use job_id.output_id.filename
  - job_id helps resolving replicates, it can be retrieved from folder name at run time
  - out_id is captured by workflow json so filename can be variable
- Or we can try job_id/output_id/filename(folder name)
  - In the wrapper script, this is doable
  - When job is submitted, we can decode this in the backend
  - New job will take the folder not just filename?
- Or we try just output_id/filename
  - Wrapper script can easily code this by 'mkdir output_id',then 'mv $filename output_id/.'
  - Workflow json captures just output_id, and display it on history and diagram
  - Wrapper script access input as $(ls $output_id) instead of $output_id to recover real input
    - Disadvantage: make this not compatible with other apps
      - If user pass folder as input and didn't add output_id layer, the downstream app will fail
      - So its better to handle it at workflow code level?
      - We can add additional boolean parameter 'workflow' or just ignore this?
    - Different output conditions
      - One file, out=$(ls $output_id)
      - Many files, e.g. index, out=$(ls $output_id/.fa) or gzip the files then gunzip, then out=$(ls $output_id/.fa)
      - Folder, out=$(ls $output_id), out is a long string separated by space, first one is folder name, this can be combined with 'one file' situation
      - Many folders, too complicate
      - Folders and files, too complicate
    - Compatibility
      - To ensure compatibility, we can add _SciApps after output_id, which becomes output_id_SciApps
      - In wrapper script, we check every input to see whether it ends with _SciApps
        
        If yes, we will decode it for the file or folder name
        
        If not, we will not decode it
      - This will allow us to still use the fixed default filename approach for apps that can take fixed filename
      - However, apps that are not designed to take the output_id folder will fail
        
        To deal with this, we need to add option for an app to output fixed name or put output in an output_id folder
  - We don't even need to change workflow code if we set default_name=output_id
    - Do we really need to do this since we are mapping backwards now?
  - We still rely on user to resolve collapsing filenames
    - To avoid it, we need to add job_id, e.g. output_id-job_id/filename
- Or we just create output_id.tar with 'tar -cf $filename'
  - This is easy to do in wrapper script to create the tar file
  - how to get $filename from output_id.tar and how to deal with collapsing?
    - filename = $(tar -tf output_id.tar)
      - This only works for file, will return the entire structure for folder
    - For folder and file
      - q=$(tar -tf output_id.tar)
      - bb=(${q//// })
      - filename=${bb[0]}
    - To deal with collapsing, we have to add job_id to the tar file (output_id-job_id.tar)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

October 27, 2017, Friday

Clone this wiki locally