cod-tools

Description

cod-tools is an open-source collection of command line scripts for handling of CIF files. The package is developed by the team of Crystallography Open Database developers. Detailed information for the usage of each individual script from the package can be obtained by invoking commands with --help and --usage command line options. For example:

cif_filter --help
cif_filter --usage
  • cif_cod_check – parse a CIF file, check if certain data values match COD requirements and IUCr data validation criteria (Version: 2000.06.09, ftp://ftp.iucr.ac.uk/pub/dvntests or ftp://ftp.iucr.org/pub/dvntests)

  • cif_cod_deposit – deposit CIFs into COD database using CGI deposition interface.

  • cif_cod_numbers – find COD numbers for the .cif files in given directories of file lists.

  • cif_correct_tags – correct misspelled tags in a CIF file.

  • cif_filter – parse a CIF file and print out essential data values in the CIF format, the COD CIF style.

    This script has also many capabilities – it can restore space group symbols from symmetry operators (consulting predefined tables), parse and tidy-up _chemical_formula_sum, compute cell volume, exclude unknown or "empty" tags, and add specified bibliography data.

  • cif_fix_values – correct temperature values which have units specified or convert between Celsius degrees and kelvins. Changes 'room/ambiante temperature' to the appropriate numeric value. Fixes other undefined values (no, not measured, etc.) to '?' symbol. Determine a report about changes made into standard I/O streams.

    Fixes enumeration values in CIF file against CIF dictionaries.

  • cif_mark_disorder – marks disorder in CIF files judging by distance and occupancy.

  • cif_molecule – restores molecules from a CIF file.

  • cif_select – read CIFs and print out selected tags with their values.

  • cif_split – split CIF files into separate files with one data_ section each.

    This script parses given CIF files to separate the data blocks, so is capable of splitting non-correctly formatted and nested CIF files.

  • cif_split_primitive – split CIF files into separate files with one data_ section each.

    This is a very naive and primitive version of the splitter, which expects each data_... section to start on a new line. It may fail on some CIF files that do not follow such convention. For splitting of any correctly formatted CIF files, one must do full CIF parsing using CIF grammar and tokenisation of the file.

Installation

cod-tools package is distributed via source code. On Debian/Ubuntu operating systems it may be installed in binary-only form from the standard software repositories.

From package repositories

cod-tools can be installed from the standard package repositories of Debian (since Debian 10 Buster) and Ubuntu (since Ubuntu 18.10 Cosmic Cuttlefish):

sudo apt-get install cod-tools

From the source

To prepare the package from the source one has to follow these steps:

  • Get the source:

    • Version 3.10.0 (can be downloaded as tbz2 archive or tar gz archive). To unpack the tbz2 archive one can execute:

          tar -xf cod-tools-3.10.0.tbz2
          mv cod-tools-3.10.0 cod-tools
      
    • The latest revision can be retrieved from the Subversion repository:

          svn co svn://www.crystallography.net/cod-tools/trunk cod-tools
      
    • Both the current and the previous releases (for example, version 3.10.0) can be retrieved from the Subversion repository too:

          svn co svn://www.crystallography.net/cod-tools/tags/v3.10.0 cod-tools-3.10.0
      
  • Install the dependencies:

      sh cod-tools/dependencies/Ubuntu-20.04/install.sh
    

    Note: the dependency installer is written for Ubuntu 20.04, but works fine on some older or newer Ubuntu as well as Debian distributions.

  • Build:

      make -C cod-tools all
    
  • Check:

      make -C cod-tools check
    
  • Set up:

    • Install:

          make -C cod-tools install
      

      This command will place executables and libraries in standard locations under /usr where they should be automatically located by the most of Unix-like operating systems. Perl library path may need manual setting (evident from Can't locate ... in @INC errors). This can be fixed by adjusting the PERL5LIB environment variable:

          export PERL5LIB=/usr/lib/perl5:${PERL5LIB}
      

      Install destination could be changed by passing it by PREFIX variable to make install, for example:

          make -C cod-tools install PREFIX=/usr/local
      
    • Prepare the environment for usage without installing. Described below are two methods of setting the environment for cod-tools as of source revision 4854 or newer; ${PYTHON_VERSION} indicates the used version of Python:

      • Using Bash:

            CODTOOLS_SRC=~/src/cod-tools
            export PATH=${CODTOOLS_SRC}/scripts:${PATH}
            export PERL5LIB=${CODTOOLS_SRC}/src/lib/perl5:${PERL5LIB}
            export PYTHONPATH=${CODTOOLS_SRC}/src/components/pycodcif/build/python${PYTHON_VERSION}:${PYTHONPATH}
        

        These commands can be pasted to ~/.bashrc file, which is sourced automatically on the opening of a new shell.

      • Using modulefile:

            #%Module1.0#####################################################################
            module-whatis    loads the cod-tools environment
            set             CODTOOLS_SRC    ~/src/cod-tools
            prepend-path    PATH            ${CODTOOLS_SRC}/scripts
            prepend-path    PERL5LIB        ${CODTOOLS_SRC}/src/lib/perl5
            prepend-path    PYTHONPATH      ${CODTOOLS_SRC}/src/components/pycodcif/build/python${PYTHON_VERSION}
        

Examples

  • Fix a syntactically incorrect structure:

    Some simple common CIF syntax errors can be fixed automatically using cif_filter with --fix-syntax option. For example, such structure:

      data_broken
      _publ_section_title "Runaway quote
      loop_
      _atom_site_label
      _atom_site_fract_x
      _atom_site_fract_y
      _atom_site_fract_z
      C 0 0 0
    

    can be fixed (provided it's stored in test.cif):

      cif_filter --fix test.cif
    

    Obtained structure:

      data_broken
      _publ_section_title              'Runaway quote'
      loop_
      _atom_site_label
      _atom_site_fract_x
      _atom_site_fract_y
      _atom_site_fract_z
      C 0 0 0
    

    A warning message tells what was done:

      cif_filter: test.cif(2) data_broken: WARNING, double-quoted string is missing a closing quote -- fixed.
    
    where:
    • cif_filter is the name of the used script;
    • test.cif is the name of the CIF file;
    • 2 is the number of a line in the file;
    • data_broken is the CIF data block name;
    • WARNING is the level of severity;
    • rest is the message text.
  • Fetch a structure from Web, filter and fix it, restore the crystal contents and calculate summary formulae per each compound in a crystal:

      curl --silent https://www.crystallography.net/cod/2231955.cif \
          | cif_filter \
          | cif_fix_values \
          | cif_molecule \
          | cif_cell_contents --use-attached-hydrogens
    

    Obtained result:

      C9 H14 N
      C10 H6 O6 S2
      H2 O
    

    As well as a warning message:

      cif_molecule: - data_2231955: WARNING, multiplicity ratios are given instead of multiplicities for 39 atoms -- taking calculated values.
    
  • Fetch a structure from Web and mark alternative atoms sharing same site:

      curl --silent https://www.crystallography.net/cod/2018107.cif \
          | cif_mark_disorder \
          | cif_select --cif --tag _atom_site_label
    

    Obtained result:

      data_2018107
      loop_
      _atom_site_type_symbol
      _atom_site_label
      _atom_site_fract_x
      _atom_site_fract_y
      _atom_site_fract_z
      _atom_site_u_iso_or_equiv
      _atom_site_adp_type
      _atom_site_calc_flag
      _atom_site_refinement_flags
      _atom_site_occupancy
      _atom_site_symmetry_multiplicity
      _atom_site_disorder_assembly
      _atom_site_disorder_group
      Pb Pb1 0.5000 0.0000 0.2500 0.0213(13) Uani d S 1 4 . .
      Mo Mo2 0.0000 0.0000 0.0000 0.022(4) Uani d S 1 4 . .
      Pb Pb3 0.5000 0.5000 0.0000 0.025(2) Uani d SP 0.881(8) 4 A 1
      Mo Mo3 0.5000 0.5000 0.0000 0.025(2) Uani d SP 0.119(8) 4 A 2
      Mo Mo1 0.0000 0.5000 0.2500 0.018(3) Uani d S 1 4 . .
      O O1 0.2344(13) -0.1372(14) 0.0806(6) 0.0302(17) Uani d . 1 1 . .
      O O2 0.2338(14) 0.3648(14) 0.1697(6) 0.0307(17) Uani d . 1 1 . .
    

    As well as output messages:

      cif_mark_disorder: - data_2018107: NOTE, atoms 'Mo3', 'Pb3' were marked as alternatives.
      cif_mark_disorder: - data_2018107: NOTE, 1 site(s) were marked as disorder assemblies.
    

    Note: atoms Mo3 and Pb3 share the same site, as can be found out by checking their coordinates. Moreover, sum of their occupancies is close to 1. In the original CIF file these sites have both _atom_site_disorder_assembly and _atom_site_disorder_group set to '.'.

License

cod-tools is licensed under LGPL-3 free software license since v2.4.

Citing

If you use cod-tools in your research, please cite the following papers:

  • Vaitkus, A., Merkys, A. & Gražulis, S. (2021). Validation of the Crystallography Open Database using the Crystallographic Information Framework. Journal of Applied Crystallography, 54(2), 661-672. doi: 10.1107/S1600576720016532 (BibTeX, plain text)

  • Merkys, A., Vaitkus, A., Butkus, J., Okulič-Kazarinas, M., Kairys, V. & Gražulis, S. (2016). COD::CIF::Parser: an error-correcting CIF parser for the Perl language. Journal of Applied Crystallography, 49(1), 292–301. doi: 10.1107/s1600576715022396 (BibTeX, EndNote/Refer, plain text)

  • Gražulis, S., Merkys, A., Vaitkus, A. & Okulič-Kazarinas, M. (2015). Computing stoichiometric molecular composition from crystal structures. Journal of Applied Crystallography, 48(1), 85-91. doi: 10.1107/s1600576714025904 (BibTeX, EndNote/Refer, plain text)