GSoC Project Summary
Google Summer of Code 2016 is approaching to its finish line. Time to sum up what has been completed!
The original purpose of the project was to implement Automated tool wrapper/converter for CWL. That means, having a Python command-line tool that uses argparse standard library, a developer can automatically generate a tool written in Common Workflow Language. The aim is to facilitate adapting new tools for CWL due to a great reduction of manual work on re-writing big numbers of arguments and tools. A good example is a CNVkit - the toolkit of 30 tools with a number of arguments from 2 to 21. Now all of them are wrapped in CWL instantly, here are the results.
The implementation of the tool wrapper was based on gxargparse - a tool which solves a similar problem: generating Galaxy XML from Python tools. Having the base architecture for a new functionality, I have completed the initial task already before the midterm evaluation. The new argparse2cwl functionality embraces the whole argparse API and all parts of CWL where a correspondence can be established, all of this tested with a broad test case. Here is a pull request with all the commits regarding argparse2cwl: link
After the midterm, my mentors brainstormed some new ideas for the second half. The first in the row was pypi2cwl. It is a thin command-line wrapper around argparse2cwl that allows wrapping a PyPi package(s) without a need to install it and look for scripts. That could be used for bulk conversion when several packages are given on the input. The commits are here, starting from June 26.
The second one is a tool with a purpose opposite to argparse2cwl - cwl2argparse. cwl2argparse allows parsing given CWL document(s) and generate a Python function which uses information about the tool and parameters to return an argparse ArgumentParser
object. It allows to develop tools backward: write CWL definition first and then write language-specific implementation. Here is the history of commits for cwl2argparse: link
The last task was similar to what I’ve done with argparse2cwl - implementing the same functionality for click argument parser. As a small research I’ve conducted at the beginning of GSoC showed, there is no single argument parser that is used in all bioinformatic tools. The most common are the ones from the standard libraries (argparse and sys.argv), but third-party parsers are also popular and require some attention. That’s why it was decided to extend argparse2cwl to a more general cmdline2cwl interface and include click support. The code is represented by this pull request.
To conclude, all of the tasks were accomplished in a required volume and on time, the original goals met and exceeded, so I consider GSoC as successfully completed. I hope the community will appreciate my work and contributions :)