Introduction to CWL generating tools
This post is an overview and quick start for the tools that were developed in this GSoC 2016 project. All the subprojects are aimed to spread Common Workflow Language more widely in the developer community by making porting Python tools to CWL easier, with less amount of routine job.
gxargparse
gxargparse is a tool the original purpose of which was to form Galaxy XML tools from Python tools that use argparse
. The design of the argparse translator was adapted to create the following converters (both of them are installed along with gxargparse)
argparse2cwl
argparse2cwl allows to generate CWL tools out of Python tools with argparse as their argument parser. It works like a wrapper around the original argparse, capturing its arguments and using them to generate CWL tools when the tool is run with --generate_cwl_tool
option. Basically, it looks like you are calling this tool without any parameters, just a command with some argparse2cwl options.
For instance, you have a tool named search.py.
#!/usr/bin/env python
import argparse
parser = argparse.ArgumentParser(description="Toy program to search inverted index and "
"print out each line the term appears")
parser.add_argument('mainfile', type=argparse.FileType('r'), help='Text file to be indexed')
parser.add_argument('term', type=str, help='Term for search')
args = parser.parse_args()
# Here goes the code that works with argument values
pass
You import argparse and write your tool as normally, but argparse2cwl grabs all the information available from the code and uses it to generate a CWL tool. If you run
python search.py --generate_cwl_tool
or
./search.py --generate_cwl_tool
The program will generate the next CWL file
#!/usr/bin/env cwl-runner
# This tool description was generated automatically by argparse2cwl ver. 0.3.0
# To generate again: $ ./search.py -b ./search.py --generate_cwl_tool
# Help: $ ./search.py --help_arg2cwl
cwlVersion: "cwl:v1.0"
class: CommandLineTool
baseCommand: ['./search.py']
doc: |
Toy program to search inverted index and print out each line the term appears
inputs:
mainfile:
type: File
doc: Text file to be indexed
inputBinding:
position: 1
term:
type: string
doc: Term for search
inputBinding:
position: 2
outputs:
[]
and you just need to look through the document and add the information that couldn’t be extracted from the code (requirements, hints, labels). Pay attention to the output section, it often can be blank because argparse2cwl couldn’t recognize that a parameter belongs to output. Use --generate-outputs
option to generate output parameters from the ones which have output
keyword in their names. Other options more detailed in the README.
If you have a command with nested subcommands, argparse2cwl
will discover them all and generate a bunch of tools for each subcommand. It is especially handy for bioinformatics toolkits which usually have one root command.
click2cwl
click2cwl shares absolutely the same logic as argparse2cwl, the only difference is it is adjusted to work with click argument parser. Whenever click is imported, gxargparse grabs its commands, arguments, options and forms CWL parameters out of them.
pypi2cwl
pypi2cwl is a small wrapper around argparse2cwl that goes further in its desire to facilitate tool wrapping. After running
pypi2cwl <PYPI-PACKAGE>
it
-
Installs
PYPI-PACKAGE
if it hasn’t been installed before -
Downloads the source code to a new directory
-
Checks for command-line scripts in its
setup.py
. -
Runs argparse2cwl against them and generates the associated CWL output files.
pypi2cwl is available over pip
.
cwl2argparse
cwl2argparse does the job straightly opposite to what argparse2cwl does: it parses CWL tools and generates Python code which can be imported as a function and used. It looks like this: supposing there is a file named search.cwl
#!/usr/bin/env cwl-runner
cwlVersion: "cwl:draft-3"
class: CommandLineTool
baseCommand: ['search.py']
description: |
Toy program to search inverted index and print out each line the term appears
inputs:
- id: mainfile
type: File
description: Text file to be indexed
inputBinding:
position: 1
- id: term
type: string
description: Term for search
inputBinding:
position: 2
outputs:
[]
Running cwl2argparse search.cwl
generates the following file:
import argparse
def parser():
description="""
Toy program to search inverted index and print out each line the term appears
"""
parser = argparse.ArgumentParser(description=description)
arg_mainfile = parser.add_argument("arg_mainfile",
type=argparse.FileType(), help="""Text file to be indexed""",)
arg_term = parser.add_argument("arg_term",
type=str, help="""Term for search""",)
return parser
cwl2argparse is also installed by pip
.
That’s all I’ve been working on during GSoC 2016 :) I will support the projects after their release, if confronted with any problems, please open issues, report bugs, I will fix them as soon as possible.