-
Notifications
You must be signed in to change notification settings - Fork 1
October 27, 2017, Friday
Liya Wang edited this page Nov 1, 2017
·
13 revisions
- New thoughts on the Community version
- We don't need to enforce defining outputs with default output names
- We will list all outputs in the right column
- We will allow building workflows with all existing Agave apps
- However, if output name changes, the workflow will not run
- This will make it easy to handle duplicates
- This is compatible with the workflow built with enforced filenames
- Still need to think about how to relax on enforcement that makes workflow completely reproducible
- One idea is to use job_id.output_id.filename
- job_id helps resolving replicates, it can be retrieved from folder name at run time
- out_id is captured by workflow json so filename can be variable
- Or we can try job_id/output_id/filename(folder name)
- In the wrapper script, this is doable
- When job is submitted, we can decode this in the backend
- New job will take the folder not just filename?
- Or we try just output_id/filename
- Wrapper script can easily code this by 'mkdir output_id',then 'mv $filename output_id/.'
- Workflow json captures just output_id, and display it on history and diagram
- Wrapper script access input as $(ls $output_id) instead of $output_id to recover real input
- Disadvantage: make this not compatible with other apps
- If user pass folder as input and didn't add output_id layer, the downstream app will fail
- So its better to handle it at workflow code level?
- We can add additional boolean parameter 'workflow' or just ignore this?
- Different output conditions
- One file, out=$(ls $output_id)
- Many files, e.g. index, out=$(ls $output_id/.fa) or gzip the files then gunzip, then out=$(ls $output_id/.fa)
- Folder, out=$(ls $output_id), out is a long string separated by space, first one is folder name, this can be combined with 'one file' situation
- Many folders, too complicate
- Folders and files, too complicate
- Compatibility
- To ensure compatibility, we can add _SciApps after output_id, which becomes output_id_SciApps
- In wrapper script, we check every input to see whether it ends with _SciApps
- If yes, we will decode it for the file or folder name
- If not, we will not decode it
- This will allow us to still use the fixed default filename approach for apps that can take fixed filename
- However, apps that are not designed to take the output_id folder will fail
- To deal with this, we need to add option for an app to output fixed name or put output in an output_id folder
- Disadvantage: make this not compatible with other apps
- We don't even need to change workflow code if we set default_name=output_id
- Do we really need to do this since we are mapping backwards now?
- We still rely on user to resolve collapsing filenames
- To avoid it, we need to add job_id, e.g. output_id-job_id/filename
- Or we just create output_id.tar with 'tar -cf $filename'
- This is easy to do in wrapper script to create the tar file
- how to get $filename from output_id.tar and how to deal with collapsing?
- filename = $(tar -tf output_id.tar)
- This only works for file, will return the entire structure for folder
- For folder and file
- q=$(tar -tf output_id.tar)
- bb=(${q//// })
- filename=${bb[0]}
- To deal with collapsing, we have to add job_id to the tar file (output_id-job_id.tar)
- filename = $(tar -tf output_id.tar)
- One idea is to use job_id.output_id.filename