Frequently Ask Questions and best practices#

This section gives an overview over the most commonly asks questions and the best practice when it comes to plugin development.

How can I make use of `Freva` from jupyter notebooks?#

The easiest way to use freva in a jupyter notebook is to log on the HPC system where freva is installed and load the freva module. The name of the specific freva module should be communicated by your admin team. If the name of the freva module is freva then simply load it with:

module load freva

After that you can install a new jupyter kernel via:

jupyter-kernel-install python --name freva --display-name FrevaKernel

After you have installed the kernel you can use freva in your jupyter notebook:

[1]:

import freva
# Get an overview of the available freva plugins
freva.get_tools_list()

[1]:

Animator	Animate data on lon/lat grids
DummyPlugin	A dummy plugin
DummyPluginFolders	A dummy plugin with outputdir folder

[2]:

# Get the documentation of a plugin
freva.plugin_doc("animator")

[2]:

Animator (v2022.7.15): Create animations (in gif or mp4 format) This tool creates plots of solr facets and an animation.

Option	Description
input_file	NetCDF input file(s), you can choose multiple files separated by a , or use a global pattern for multiple files chose this option only if you don't want Freva to find files by seach facets (default: )
variable	Variable name (only applicable if you didn't choose an input file) (default: )
project	Project name (only applicable if you didn't choose an input file) (default: )
product	Product name (only applicable if you didn't choose an input file) (default: )
experiment	Experiment name (only applicable if you didn't choose an input file) (default: )
institute	Institute name (only applicable if you didn't choose an input file) (default: )
model	Model name (only applicable if you didn't choose an input file) (default: )
time_frequency	Time frequency name (only applicable if you didn't choose an input file) (default: )
ensemble	Ensemble name (only applicable if you didn't choose an input file) (default: )
start	Define the first time step to be plotted, leave blank if taken from data (default: )
end	Define the last time step to be plotted, leave blank if taken from data (default: )
time_mean	Select a time interval if time averaging/min/max/sum should be applied along the time axis. This can be D for daily, M for monthly 6H for 6 hours etc. Leave blank if the time axis should not be resampled (default) or set to all if you want to collapse the time axis. (default: )
time_method	If resampling of the time axis is chosen (default: no) set the method: mean, max or min. Note: This has only an effect if the above parameter for time_mean is set. (default: mean)
lonlatbox	Set the extend of a rectangular lonlatbox (left_lon, right_lon, lower_lat, upper_lat) (default: )
output_unit	Set the output unit of the variable - leave blank for no conversion. This can be useful if the unit of the input files should be converted, for example for precipitation. Note: Although many conversions are supported, by using the `pint` conversion library. (default: )
vmin	Set the minimum plotting range (leave blank to calculate from data 1st decile) (default: )
vmax	Set the maximum plotting range (leave blank to calculate the 9th decile) (default: )
cmap	Set the colormap, more information on colormaps is available on the matplotlib website. (default: RdYlBu\_r)
linecolor	Color of the coast lines in the map (default: k)
projection	Set the global map projection. Note: this should the name of the cartopy projection method (e.g PlatteCarree for Cylindrical Projection). Pleas refer to cartopy website for details. (default: PlateCarree)
proj_centre	Set center longitude of the global map projection. (default: 50)
pic_size	Set the size of the picture (in pixel) (default: 1360,900)
plot_title	Set plot title (default: )
cbar_label	Overwrite default colorbar label by this value (default: )
suffix	Filetype of the animation (default: mp4)
fps	Set the frames per seceonds of the output animation. (default: 5)
extra_scheduler_options	Set additional options for the job submission to the workload manager (, separated). Note: batchmode and web only. (default: --qos=test, --array=20)

How can I use freva in my own python environment?#

If you don’t want to create a new jupyter kernel that points to the python environment where the freva system is installed you can also install freva into your own python environment. You can either install freva via pip:

python3 -m pip install freva

or conda:

conda install -c conda-forge freva

Afterwards you can use freva in your own environment.

The only difference to the above approach of using the freva python environment is that you will have to tell freva to use the correct configuration.

import freva
# In order to use freva we have to tell where to get the configuration
freva.config(config_file="/path/to/the/freva/config/file.conf")
# Talk to your admins to get the location of the config file.

Please talk to your admins on where the central freva configuration is located.

How can I add my own plugins?#

If you are working with python and want to load your own plugin definitions you can also use freva.config to load you own plugin definitions. Assuming you have your plugin definition saved in ~/freva/mypluging/the_plugin.py you can load your own plugin via:

import freva
freva.config(plugin_path="~/freva/mypluging,the_plugin")

You can load multiple plugins by using a list.

import freva
freva.config(plugin_path=["~/freva/mypluging,the_plugin_1", "~/freva/mypluging,the_plugin_2"])

Loading plugins can of course be combined with loading a different freva config:

freva.config(config_file="/path/to/the/freva/config/file.conf",
             plugin_path="~/freva/mypluing,the_plugin")

How can I use `Freva` in my analysis workflows?#

You can use Freva without creating or applying data analysis plugins. One example would be using the databrowser command in you data analysis workflow:

# Use the databrowser search and pipe the output to ncremap
freva databrowser project=observations experiment=cmorph time_frequency=30min | ncremap -m map.nc -O drc_rgr

Below you can find a python example, which you could use in a notebook

[3]:

import freva
import xarray as xr

# Open the data with xarray
dset = xr.open_mfdataset(freva.databrowser(project="obs*", time_frequency='30min'), combine='by_coords')['pr']
dset

[3]:

<xarray.DataArray 'pr' (time: 48, lat: 412, lon: 687)> Size: 54MB
dask.array<concatenate, shape=(48, 412, 687), dtype=float32, chunksize=(1, 1, 687), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) datetime64[ns] 384B 2016-09-02 ... 2016-09-02T23:30:00
  * lon      (lon) float32 3kB 255.0 255.1 255.2 255.3 ... 304.8 304.9 305.0
  * lat      (lat) float32 2kB 15.06 15.14 15.21 15.28 ... 44.83 44.9 44.97
Attributes:
    standard_name:  precipitation_flux
    long_name:      Precipitation
    units:          kg m-2 s-1

Best practice: Using the `Freva` module in a plugin#

Like above you can use the Freva python module within your wrapper API code of a plugin

[4]:

import json

from evaluation_system.api import plugin, parameters
import freva

class MyPlugin(plugin.PluginAbstract):
    """An analysis plugin that uses the databrowser search."""

    __parameters__ = parameters.ParameterDictionary(
        parameters.SolrField(name="project", facet="project", help="Set the project name"),
        parameters.SolrField(name="variable", facet="variable", help="Set the variable name"),
    )

    def run_tool(self, config_dict):
        """Main plugin method that makes Freva calls."""
        # Search for files
        files = list(freva.databrowser(**config_dict))
        # Save the files to a json file
        with open("/tmp/some_json_file.json", "w") as f:
            json.dumps(files, f)
        self.call("external_command /tmp/some_json_file.json")

Best practice: Making calls to external commands with complex command line parameters#

If you have a plugin that makes calls to a command line interface (like a shell script) you should avoid long command line argument calls like this:

[5]:

import json

from evaluation_system.api import plugin, parameters
import freva
class MyPlugin(plugin.PluginAbstract):

    def run_tool(self, config_dict):
        result = self.call('%s/main.py %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s'
                   % (self.class_basedir, config_dict['inputdir'], config_dict['project'],
                      config_dict['product'], config_dict['institute'], config_dict['model'],
                      config_dict['experiment'], config_dict['time_frequency'],
                      config_dict['variable'], config_dict['ensemble'], config_dict['cmor_table'],
                      config_dict['outputdir'], config_dict['cache'], config_dict['cacheclear'],
                      config_dict['seldate'], config_dict['latlon'], config_dict['area_weight'],
                      config_dict['percentile_threshold'], config_dict['threshold'],
                      config_dict['persistence'], config_dict['sel_extr'], config_dict['dryrun'])
        )

Such call are very hard to read and should be avoided. Instead you can use Freva as much as possible within the wrapper code and save relevant results into a json file that is passed and a single argument to the tool. Using the above example the code could be simplified as follows:

[6]:

import json
from tempfile import NamedTemporaryFile
from evaluation_system.api import plugin, parameters
import freva

class MyPlugin(plugin.PluginAbstract):

    def run_tool(self, config_dict):
        config_dict['input_files'] = list(
            freva.databrowser(
                product=config_dict.pop('product'), project=config_dict.pop('project'),
                institute=config_dict.pop('institute'), model=config_dict.pop('model'),
                experiment=config_dict.pop('experiment'), variable=config_dict.pop('variable'),
                ensemble=config_dict.pop('ensemble'), cmor_table=config_dict.pop('cmor_table')
            )
        )
        with NamedTemporaryFile(suffix='.json') as tf:
            with open(tf.name, 'w') as f:
                json.dump(config_dict, f)
            self.call(f"{self.class_basedir}/main.py {tf.name}")

Here we use json because most scripting and programming languages have a json parser functionality which can be conveniently used.