Batch use
Written by Jean-Rémy Marchand
11 minute read
Once you have installed CAVIAR and activated the environment (–> installation <–), you can call the command line instance of CAVIAR simply with:
caviar -h
This will trigger the presentation of the tool. The most basic use only requires a PDB code:
caviar -code 1dwc
The output is visualized in the terminal as a table. The first table contains information about identified cavities, ranked by cavity score.
PDB_chain | CavID | Ligab. | Score | Size | Hydrophob | InterChain | AltLoc | Miss | Subcavs |
---|---|---|---|---|---|---|---|---|---|
1dwc_H | 1 | 0.6 | 3.7 | 333 | 39% | 0 | 0 | 0 | 4 |
1dwc_H | 2 | 0.2 | 0.9 | 51 | 10% | 0 | 0 | 0 | 2 |
1dwc_H | 3 | 0.8 | 0.6 | 63 | 56% | 0 | 0 | 0 | 1 |
The second table focuses on subcavities:
PDB_chain | CavID | SubCavID | Size | Hydrophob. | Polar | Neg | Pos | Other |
---|---|---|---|---|---|---|---|---|
1dwc_H | 1 | 1 | 27 | 33% | 56% | 11% | 0% | 0% |
1dwc_H | 1 | 2 | 76 | 33% | 58% | 7% | 0% | 3% |
1dwc_H | 1 | 3 | 157 | 51% | 33% | 0% | 1% | 15% |
1dwc_H | 1 | 4 | 73 | 23% | 36% | 4% | 37% | 0% |
1dwc_H | 2 | 1 | 26 | 0% | 58% | 42% | 0% | 0% |
1dwc_H | 2 | 2 | 25 | 20% | 56% | 24% | 0% | 0% |
1dwc_H | 3 | 1 | 63 | 56% | 44% | 0% | 0% | 0% |
Note that the types are assigned corresponding the the closest atom from the protein. In the case of charged pharmacophores. Rounded up values, thus sums can differ slightly from 100%.
In addition, CAVIAR generates by default a certain number of files in the working directory:
|-- .
|-- 1dwc_cavities.pml
|-- 1dwc_subcavities.pml
|-- caviar_out/
|-- 1dwc_cavs.pdb
|-- 1dwc_subcavs.pdb
The two *.pml files are pymol session files to automatically open and visualize the PDB file and its cavities or subcavities, respectively. The folder caviar_out/ contains the original PDB file with at the end, the cavities with the residue name GRI and the subcavities as SUB. Cavities contain as b factor the buriedness for each cavity grid point (from 8 to 14, with 14 being the most buried) and as occupancy field the pharmacophore type of the grid point, i.e., the chemical type of the closest atom of the protein. By default, the coloring of cavities is one color per cavity, but this can be changed for a coloring by buriedness or pharmacophore types, with a legend (cf command line arguments section). Cavities are ordered as in the printout, with the first cavity being represented in the PDB file as resname GRI, chain A, residue index 1. The second cavity is GRI A 2, and so forth.
Subcavities are ordered iteratively and correspond to the cavities they come from, but we have to separate both the different cavities and the different subcavities. Therefore, subcavity 1 of cavity 1 is represented as resname SUB, chain A, residue index 1. Subcavity 2 of cavity 1 is SUB A 2. Subcavity 1 of cavity 2 is SUB B 1.
To come back to our example with 1dwc, it contains three cavities, represented in the PDB file as residue name “GRI” (default for any cavity), chain identifier A (default for any cavity) and residue indices 1, 2 and 3 (identifies the 3 cavities as different).
The first cavity (resname GRI, chain A, resid 1) contains 4 subcavities. These 4 subcavities are named resname SUB (default for any subcavity), chain A (identifies the first cavity), residue identifiers 1 to 4 (identifies the 4 subcavities of said cavity).
caviar
handles both command line arguments and the use of parameter/configuration files.
We already saw one command line argument earlier: -code
option to specify a PDB code.
-code
will download the file from the RCSB PDB if it is not present in -sourcedir
. Can be a custom PDB filename, but then has to be in the corresponding sourcedir!-sourcedir
is the folder in which the PDB file is, in case you already downloaded it.-cif
sets up mmCIF file parsing rather than PDB. It will download the file from the RCSB PDB as mmCIF if it is not present in -sourcedir
. (True/False, default: False)-dcd
to parse a MD trajectory as DCD file format. Requires the PDB to be given in -code
as template for the coordinates (requires the file path if not in current working directory).-what
defines what objects to select from the PDB file: all protein chains (keyword “allproteins”), just the longest chain (“longestchain”), or the longest chain plus contacting chains at 5A (longestandcontacting).-chain_id
permits to select a priori a certain protein chain or more than one protein chains. For example, if you want to select the chain A, simply define -chain_id A
, but if you want to investigate chains A and B, specify both as -chain_id AB
.-color_cavs_by
defines the coloring scheme mentioned a bit before. Generates a pymol *.pml file that colors cavity by cavity ID (default, “bychain”), by buriedness (“buriedness”), or by corresponding protein pharmacophore type (“pharmacophore”).-subcavs_decomp
to inactivate the subcavities decomposition (True/False, default = True).-out
to define an output path (default: ./caviar_out/).-v
to activate verbosity.-preset_config
gives the choice between three presets default configuration: search for cavities and decompose them into subcavities (default, “default”), only search for cavities (“cavities_only”), or only export subcavities (“subcavities_only”).caviar
can take as unique argument, or in addition to command line arguments, a configuration file containing any of the parameters.
-custom_config
gives the possibility to use a custom configuration file created by the user.This file needs to follow the standard set by –> configparse <–.
[Simple Values]
key=value
spaces in keys=allowed
spaces in values=allowed as well
spaces around the delimiter = obviously
you can also use : to delimit keys from values
[All Values Are Strings]
values like this: 1000000
or this: 3.14159265359
are they treated as numbers? : no
integers, floats and booleans are held as: strings
can use the API to get converted values directly: true
[Multiline Values]
chorus: I'm a lumberjack, and I'm okay
I sleep all night and I work all day
[No Values]
key_without_value
empty string value here =
[You can use comments]
# like this
; or this
# By default only in an empty line.
# Inline comments can be harmful because they prevent users
# from using the delimiting characters as parts of values.
# That being said, this can be customized.
[Sections Can Be Indented]
can_values_be_as_well = True
does_that_mean_anything_special = False
purpose = formatting for readability
multiline_values = are
handled just fine as
long as they are indented
deeper than the first line
of a value
# Did I mention we can indent comments, too?
Many parameters can be set in the configuration file and can be found in the advanced use section (link it once it's written!
).
Now let us make a final example combining all of the above. We want to check only subcavities of PDB 1dwc (human Thrombin (protease), in complex with an inhibitor), present in chain H, output in the folder “~/thrombin_caviar_out/” and we have already downloaded the PDB at ~/1dwc.pdb.
caviar -code 1dwc -sourcedir ~/ -chain_id H -preset_config subcavities_only -out ~/thrombin_caviar_out/
This is equivalent to the following:
caviar -custom_config ~/custom_config.cfg
where ~/custom_config.cfg is:
### Example of custom parameter file for CAVIAR ###
[custom] # At least one section header is necessary, the name does not matter
sourcedir: ~/ # Source directory, otherwise downloads file
code: 1dwc # PDB Code (no default)
chain_id: H # Protein chain identifier of interest
preset_config subcavities_only # We want only subcavities
out: ~/thrombin_caviar_out/ # Path/to/outfolder
If you find any bugs or problems with this theme, please open an issue over on GitHub.
GitHubFeel free to tweet at me if you have suggestions for CAVIAR. Or if you just want to say hi.
Twitter