Frequently Ask Questions and best practices#

This section gives an overview over the most commonly asks questions and the best practice when it comes to plugin development.

How can I make use of Freva from jupyter notebooks?#

The easiest way to use freva in a jupyter notebook is to log on the HPC system where freva is installed and load the freva module. The name of the specific freva module should be communicated by your admin team. If the name of the freva module is freva then simply load it with:

module load freva

After that you can install a new jupyter kernel via:

jupyter-kernel-install python --name freva --display-name FrevaKernel

After you have installed the kernel you can use freva in your jupyter notebook:

[1]:
import freva
# Get an overview of the available freva plugins
freva.get_tools_list()
[1]:
AnimatorAnimate data on lon/lat grids
DummyPluginA dummy plugin
DummyPluginFoldersA dummy plugin with outputdir folder
[2]:
# Get the documentation of a plugin
freva.plugin_doc("animator")
[2]:
Animator (v2022.7.15): Create animations (in gif or mp4 format) This tool creates plots of solr facets and an animation.

OptionDescription
input_file NetCDF input file(s), you can choose multiple files separated by a , or use a global pattern for multiple files chose this option only if you don't want Freva to find files by seach facets (default: )
variable Variable name (only applicable if you didn't choose an input file) (default: )
project Project name (only applicable if you didn't choose an input file) (default: )
product Product name (only applicable if you didn't choose an input file) (default: )
experiment Experiment name (only applicable if you didn't choose an input file) (default: )
institute Institute name (only applicable if you didn't choose an input file) (default: )
model Model name (only applicable if you didn't choose an input file) (default: )
time_frequency Time frequency name (only applicable if you didn't choose an input file) (default: )
ensemble Ensemble name (only applicable if you didn't choose an input file) (default: )
start Define the first time step to be plotted, leave blank if taken from data (default: )
end Define the last time step to be plotted, leave blank if taken from data (default: )
time_mean Select a time interval if time averaging/min/max/sum should be applied along the time axis. This can be D for daily, M for monthly 6H for 6 hours etc. Leave blank if the time axis should not be resampled (default) or set to all if you want to collapse the time axis. (default: )
time_method If resampling of the time axis is chosen (default: no) set the method: mean, max or min. Note: This has only an effect if the above parameter for time_mean is set. (default: mean)
lonlatbox Set the extend of a rectangular lonlatbox (left_lon, right_lon, lower_lat, upper_lat) (default: )
output_unit Set the output unit of the variable - leave blank for no conversion. This can be useful if the unit of the input files should be converted, for example for precipitation. Note: Although many conversions are supported, by using the `pint` conversion library. (default: )
vmin Set the minimum plotting range (leave blank to calculate from data 1st decile) (default: )
vmax Set the maximum plotting range (leave blank to calculate the 9th decile) (default: )
cmap Set the colormap, more information on colormaps is available on the matplotlib website. (default: RdYlBu\_r)
linecolor Color of the coast lines in the map (default: k)
projection Set the global map projection. Note: this should the name of the cartopy projection method (e.g PlatteCarree for Cylindrical Projection). Pleas refer to cartopy website for details. (default: PlateCarree)
proj_centre Set center longitude of the global map projection. (default: 50)
pic_size Set the size of the picture (in pixel) (default: 1360,900)
plot_title Set plot title (default: )
cbar_label Overwrite default colorbar label by this value (default: )
suffix Filetype of the animation (default: mp4)
fps Set the frames per seceonds of the output animation. (default: 5)
extra_scheduler_options Set additional options for the job submission to the workload manager (, separated). Note: batchmode and web only. (default: --qos=test, --array=20)

How can I use freva in my own python environment?#

If you don’t want to create a new jupyter kernel that points to the python environment where the freva system is installed you can also install freva into your own python environment. You can either install freva via pip:

python3 -m pip install freva

or conda:

conda install -c conda-forge freva

Afterwards you can use freva in your own environment.

The only difference to the above approach of using the freva python environment is that you will have to tell freva to use the correct configuration.

import freva
# In order to use freva we have to tell where to get the configuration
freva.config(config_file="/path/to/the/freva/config/file.conf")
# Talk to your admins to get the location of the config file.

Please talk to your admins on where the central freva configuration is located.

How can I add my own plugins?#

If you are working with python and want to load your own plugin definitions you can also use freva.config to load you own plugin definitons. Assuming you have your plugin definition saved in ~/freva/mypluging/the_plugin.py you can load your own plugin via:

import freva
freva.config(plugin_path="~/freva/mypluging,the_plugin")

You can load multiple plugins by using a list.

import freva
freva.config(plugin_path=["~/freva/mypluging,the_plugin_1", "~/freva/mypluging,the_plugin_2"])

Loading plugins can of course be combined with loading a different freva config:

freva.config(config_file="/path/to/the/freva/config/file.conf",
             plugin_path="~/freva/mypluing,the_plugin")

How can I use Freva in my analysis workflows?#

You can use Freva without creating or applying data analysis plugins. One example would be using the databrowser command in you data analysis workflow:

# Use the databrowser search and pipe the output to ncremap
freva databrowser project=observations experiment=cmorph time_frequency=30min | ncremap -m map.nc -O drc_rgr

Below you can find a python example, which you could use in a notebook

[3]:
import freva
import xarray as xr

# Open the data with xarray
dset = xr.open_mfdataset(freva.databrowser(project="obs*", time_frequency='30min'), combine='by_coords')['pr']
dset
[3]:
<xarray.DataArray 'pr' (time: 48, lat: 412, lon: 687)> Size: 54MB
dask.array<concatenate, shape=(48, 412, 687), dtype=float32, chunksize=(1, 1, 687), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) datetime64[ns] 384B 2016-09-02 ... 2016-09-02T23:30:00
  * lon      (lon) float32 3kB 255.0 255.1 255.2 255.3 ... 304.8 304.9 305.0
  * lat      (lat) float32 2kB 15.06 15.14 15.21 15.28 ... 44.83 44.9 44.97
Attributes:
    standard_name:  precipitation_flux
    long_name:      Precipitation
    units:          kg m-2 s-1

Best practice: Using the Freva module in a plugin#

Like above you can use the Freva python module within your wrapper API code of a plugin

[4]:
import json

from evaluation_system.api import plugin, parameters
import freva

class MyPlugin(plugin.PluginAbstract):
    """An analysis plugin that uses the databrowser search."""

    __parameters__ = parameters.ParameterDictionary(
        parameters.SolrField(name="project", facet="project", help="Set the project name"),
        parameters.SolrField(name="variable", facet="variable", help="Set the variable name"),
    )

    def run_tool(self, config_dict):
        """Main plugin method that makes Freva calls."""
        # Search for files
        files = list(freva.databrowser(**config_dict))
        # Save the files to a json file
        with open("/tmp/some_json_file.json", "w") as f:
            json.dumps(files, f)
        self.call("external_command /tmp/some_json_file.json")

Best practice: Making calls to external commands with complex command line parameters#

If you have a plugin that makes calls to a command line interface (like a shell script) you should avoid long command line argument calls like this:

[5]:
import json

from evaluation_system.api import plugin, parameters
import freva
class MyPlugin(plugin.PluginAbstract):

    def run_tool(self, config_dict):
        result = self.call('%s/main.py %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s'
                   % (self.class_basedir, config_dict['inputdir'], config_dict['project'],
                      config_dict['product'], config_dict['institute'], config_dict['model'],
                      config_dict['experiment'], config_dict['time_frequency'],
                      config_dict['variable'], config_dict['ensemble'], config_dict['cmor_table'],
                      config_dict['outputdir'], config_dict['cache'], config_dict['cacheclear'],
                      config_dict['seldate'], config_dict['latlon'], config_dict['area_weight'],
                      config_dict['percentile_threshold'], config_dict['threshold'],
                      config_dict['persistence'], config_dict['sel_extr'], config_dict['dryrun'])
        )

Such call are very hard to read and should be avoided. Instead you can use Freva as much as possible within the wrapper code and save relevant results into a json file that is passed and a single argument to the tool. Using the above example the code could be simplified as follows:

[6]:
import json
from tempfile import NamedTemporaryFile
from evaluation_system.api import plugin, parameters
import freva

class MyPlugin(plugin.PluginAbstract):

    def run_tool(self, config_dict):
        config_dict['input_files'] = list(
            freva.databrowser(
                product=config_dict.pop('product'), project=config_dict.pop('project'),
                institute=config_dict.pop('institute'), model=config_dict.pop('model'),
                experiment=config_dict.pop('experiment'), variable=config_dict.pop('variable'),
                ensemble=config_dict.pop('ensemble'), cmor_table=config_dict.pop('cmor_table')
            )
        )
        with NamedTemporaryFile(suffix='.json') as tf:
            with open(tf.name, 'w') as f:
                json.dump(config_dict, f)
            self.call(f"{self.class_basedir}/main.py {tf.name}")

Here we use json because most scripting and programming languages have a json parser functionality which can be conveniently used.