Skip to content

October 27, 2017, Friday

Liya Wang edited this page Nov 1, 2017 · 13 revisions
  • New thoughts on the Community version
    • We don't need to enforce defining outputs with default output names
    • We will list all outputs in the right column
    • We will allow building workflows with all existing Agave apps
    • However, if output name changes, the workflow will not run
    • This will make it easy to handle duplicates
    • This is compatible with the workflow built with enforced filenames
  • Still need to think about how to relax on enforcement that makes workflow completely reproducible
    • One idea is to use job_id.output_id.filename
      • job_id helps resolving replicates, it can be retrieved from folder name at run time
      • out_id is captured by workflow json so filename can be variable
    • Or we can try job_id/output_id/filename(folder name)
      • In the wrapper script, this is doable
      • When job is submitted, we can decode this in the backend
      • New job will take the folder not just filename?
    • Or we try just output_id/filename
      • Wrapper script can easily code this by 'mkdir output_id',then 'mv $filename output_id/.'
      • Workflow json captures just output_id, and display it on history and diagram
      • Wrapper script access input as $(ls $output_id) instead of $output_id to recover real input
        • Disadvantage: make this not compatible with other apps
          • If user pass folder as input and didn't add output_id layer, the downstream app will fail
          • So its better to handle it at workflow code level?
          • We can add additional boolean parameter 'workflow' or just ignore this?
        • Different output conditions
          • One file, out=$(ls $output_id)
          • Many files, e.g. index, out=$(ls $output_id/.fa) or gzip the files then gunzip, then out=$(ls $output_id/.fa)
          • Folder, out=$(ls $output_id), out is a long string separated by space, first one is folder name, this can be combined with 'one file' situation
          • Many folders, too complicate
          • Folders and files, too complicate
        • Compatibility
          • To ensure compatibility, we can add _SciApps after output_id, which becomes output_id_SciApps
          • In wrapper script, we check every input to see whether it ends with _SciApps
            • If yes, we will decode it for the file or folder name
            • If not, we will not decode it
          • This will allow us to still use the fixed default filename approach for apps that can take fixed filename
          • However, apps that are not designed to take the output_id folder will fail
            • To deal with this, we need to add option for an app to output fixed name or put output in an output_id folder
      • We don't even need to change workflow code if we set default_name=output_id
        • Do we really need to do this since we are mapping backwards now?
      • We still rely on user to resolve collapsing filenames
        • To avoid it, we need to add job_id, e.g. output_id-job_id/filename
    • Or we just create output_id.tar with 'tar -cf $filename'
      • This is easy to do in wrapper script to create the tar file
      • how to get $filename from output_id.tar and how to deal with collapsing?
        • filename = $(tar -tf output_id.tar)
          • This only works for file, will return the entire structure for folder
        • For folder and file
          • q=$(tar -tf output_id.tar)
          • bb=(${q//// })
          • filename=${bb[0]}
        • To deal with collapsing, we have to add job_id to the tar file (output_id-job_id.tar)
Clone this wiki locally