Managing your own datasets: the UserData class#

Freva offers the possibility to share custom datasets with other users by making it searchable via freva.databrowser. With help of the UserData class users can add their own data to the central data location, (re)-index or delete data in the databrowser.

Note: Any data that has been added by users will be assigned a special project name: project=user-$USER.

Add new data to the databrowser#

To be able to add data to the databrowser the file names must follow a strict standard and the files must reside in a specific location. This add method takes care about the correct file naming and location. No pre-requirements other than the file has to be a valid netCDF or grib file are assumed. In other words this method places the user data with the correct naming structure to the correct location.

Suppose you’ve gotten data from somewhere and want to add this data into the databrowser to make it accessible to others. In this specific example we assume that you have stored your original data in the /tmp/my_awesome_data folder. E.g /tmp/my_awesome_data/outfile_0.nc...tmp/my_awesome_data/outfile_9.nc The routine will try to gather all necessary metadata from the files. You’ll have to provide additional metadata if mandatory keywords are missing. To make the routine work you’ll have to provide the institute, model and experiment keywords:

[1]:
from freva import UserData, databrowser

Let’s create an instance of the UserData class.

[2]:
user_data = UserData()

The add method can add data into the users data directory. The location of this data directory is given by the system and can be queried by accessing the user_dir property:

[3]:
# Get the location of the user data
print(user_data.user_dir)
/tmp/user_data/user-runner

Let’s inspect the help of the add method:

[4]:
help(user_data.add)
Help on method add in module freva._user_data:

add(product: 'str', *paths: 'os.PathLike', how: 'str' = 'copy', override: 'bool' = False, **defaults: 'str') -> 'None' method of freva._user_data.UserData instance
    Add custom user files to the databrowser.

    To be able to add data to the databrowser the file names must
    follow a strict standard and the files must reside in a
    specific location. This ``add`` method takes care about the correct
    file naming and location. No pre requirements other than the file has
    to be a valid ``netCDF`` or ``grib`` file are assumed. In other words
    this method places the user data with the correct naming structure to
    the correct location.

    Parameters
    ----------
    product: str
        Product search key the newly added data can be found.
    *paths: os.PathLike
        Filename(s) or Directories that are going to be added to the
        databrowser. The files will be added into the central user
        directory and named according the CMOR standard. This ensures
        that the data can be added into the databrowser.
        **Note:** Once the data has been added into the databrowser it can
        be found via the ``user-<username>`` project.
    how: str, default: copy
        Method of how the data is added into the central freva user directory.
        Default is copy, which means your data files will be replicated.
        To avoid a this redundancy you can set the ``how`` keyword to
        ``symlink`` for symbolic links or ``link`` for creating hard links
        to create symbolic links or ``move`` to move the data into the central
        user directory entirely.
    override: bool, default: False
        Replace existing files in the user data structure.
    experiment: str, default: None
        By default the method tries to deduce the *experiment* information from
        the metadata. To overwrite this information the *experiment* keyword
        should be set.
    institute: str, default: None
        By default the method tries to deduce the *institute* information from
        the metadata. To overwrite this information the *institute* keyword
        should be set.
    model: str, default: None
        By default the method tries to deduce the *model* information from
        the metadata. To overwrite this information the *model* keyword
        should be set.
    variable: str, default: None
        By default the method tries to deduce the *variable* information from
        the metadata. To overwrite this information the *variable* keyword
        should be set.
    time_frequency: str, default: None
        By default the method tries to deduce the *time_frequency* information
        from the metadata. To overwrite this information the *time_frequency*
        keyword should be set.
    ensemble: str, default: None
        By default the method tries to deduce the *ensemble* information from
        the metadata. To overwrite this information the *ensemble* keyword
        should be set.

    Raises
    ------
    ValueError: If metadata is insufficient, or product key is empty.


    Example
    -------

    Suppose you've gotten data from somewhere and want to add this data into the
    databrowser to make it accessible to others. In this specific
    example we assume that you have stored your `original` data in the
    ``/tmp/my_awesome_data`` folder.
    E.g ``/tmp/my_awesome_data/outfile_0.nc...tmp/my_awesome_data/outfile_9.nc``
    The routine will try to gather all necessary metadata from the files. You'll
    have to provide additional metadata if mandatory keywords are missing.
    To make the routine work you'll have to provide the ``institute``, ``model``
    and ``experiment`` keywords:

    .. execute_code::

        from freva import UserData, databrowser
        user_data = UserData()
        # You can also provide wild cards to search for data
        user_data.add("eur-11b", "/tmp/my_awesome_data/outfile_?.nc",
                          institute="clex", model="UM-RA2T",
                          experiment="Bias-correct")
        # Check the databrowser if the data has been added
        for file in databrowser(experiment="bias*"):
            print(file)

    By default the data is copied. By using the ``how`` keyword you can
    also link or move the data.

[5]:
# You can also provide wild cards to search for data
user_data.add("eur-11b", "/tmp/my_awesome_data/outfile_?.nc",
                  institute="clex", model="UM-RA2T",
                  experiment="Bias-correct")
# Check the databrowser if the data has been added
Status: crawling ...ok

Let’s check if the data has been added to the data browser

[6]:
for file in databrowser(experiment="bias*"):
    print(file)
/tmp/user_data/user-runner/eur-11b/clex/UM-RA2T/Bias-correct/hr/user_data/hr/r0i0p0/v20241114/tas/tas_hr_UM-RA2T_Bias-correct_r0i0p0_197001041800-197001050300.nc
/tmp/user_data/user-runner/eur-11b/clex/UM-RA2T/Bias-correct/hr/user_data/hr/r0i0p0/v20241114/tas/tas_hr_UM-RA2T_Bias-correct_r0i0p0_197001040800-197001041700.nc
/tmp/user_data/user-runner/eur-11b/clex/UM-RA2T/Bias-correct/hr/user_data/hr/r0i0p0/v20241114/tas/tas_hr_UM-RA2T_Bias-correct_r0i0p0_197001032200-197001040700.nc
/tmp/user_data/user-runner/eur-11b/clex/UM-RA2T/Bias-correct/hr/user_data/hr/r0i0p0/v20241114/tas/tas_hr_UM-RA2T_Bias-correct_r0i0p0_197001031200-197001032100.nc
/tmp/user_data/user-runner/eur-11b/clex/UM-RA2T/Bias-correct/hr/user_data/hr/r0i0p0/v20241114/tas/tas_hr_UM-RA2T_Bias-correct_r0i0p0_197001030200-197001031100.nc
/tmp/user_data/user-runner/eur-11b/clex/UM-RA2T/Bias-correct/hr/user_data/hr/r0i0p0/v20241114/tas/tas_hr_UM-RA2T_Bias-correct_r0i0p0_197001021600-197001030100.nc
/tmp/user_data/user-runner/eur-11b/clex/UM-RA2T/Bias-correct/hr/user_data/hr/r0i0p0/v20241114/tas/tas_hr_UM-RA2T_Bias-correct_r0i0p0_197001020600-197001021500.nc
/tmp/user_data/user-runner/eur-11b/clex/UM-RA2T/Bias-correct/hr/user_data/hr/r0i0p0/v20241114/tas/tas_hr_UM-RA2T_Bias-correct_r0i0p0_197001012000-197001020500.nc
/tmp/user_data/user-runner/eur-11b/clex/UM-RA2T/Bias-correct/hr/user_data/hr/r0i0p0/v20241114/tas/tas_hr_UM-RA2T_Bias-correct_r0i0p0_197001011000-197001011900.nc
/tmp/user_data/user-runner/eur-11b/clex/UM-RA2T/Bias-correct/hr/user_data/hr/r0i0p0/v20241114/tas/tas_hr_UM-RA2T_Bias-correct_r0i0p0_197001010000-197001010900.nc

By default the data is copied. By using the how keyword you can also link or move the data.

Remove your data from the databrowser#

The delete removes entries from the databrowser and if necessary existing files from the central user data location. Let’s inspect the help of the user delete method first:

[7]:
help(user_data.delete)
Help on method delete in module freva._user_data:

delete(*paths: 'os.PathLike', delete_from_fs: 'bool' = False) -> 'None' method of freva._user_data.UserData instance
    Delete data from the databrowser.

    The methods deletes user data from the databrowser.

    Parameters
    ----------

    *paths: os.PathLike
        Filename(s) or Directories that are going to be from  the
        databrowser.
    delete_from_fs: bool, default : False
        Do not only delete the files from the databrowser but also from their
        central location where they have been added to.

    Raises
    ------
    ValidationError:
        If crawl_dirs do not belong to current user.

    Example
    -------

    Any data in the central user directory that belongs to the user can
    be deleted from the databrowser and also from the central data location:

    .. execute_code::

        from freva import UserData
        user_data = UserData()
        user_data.delete(user_data.user_dir)

[8]:
user_data.delete(user_data.user_dir)

Check the databrowser if the data has been removed:

[9]:
 databrowser(experiment="bias*", count=True)
[9]:
<generator object SolrFindFiles._search at 0x7f204430e110>

(Re)-Index existing data to the databrowser#

Using the index method the databrowser can be updated with existing user data. For example, if data has been removed from the databrowser it can be re-added. Again, inspect the help first:

[10]:
help(user_data.index)
Help on method index in module freva._user_data:

index(*crawl_dirs: 'os.PathLike', dtype: 'str' = 'fs', continue_on_errors: 'bool' = False, **kwargs: 'bool') -> 'None' method of freva._user_data.UserData instance
    Index and add user output data to the databrowser.

    This method can be used to update the databrowser for existing user data.

    Parameters
    ----------
    crawl_dirs:
        The data path(s) that needs to be crawled.
    dtype:
        The data type, currently only files on the file system are supported.
    continue_on_errors:
        Continue indexing on error.

    Raises
    ------
    ValidationError:
        If crawl_dirs do not belong to current user.

    Example
    -------

    If data has been removed from the databrowser it can be re added using
    the ``index`` method:

    .. execute_code::

        from freva import UserData
        user_data = UserData()
        user_data.index()

[11]:
user_data.index()
Status: crawling ...ok