Managing your own datasets: the UserData class#
Freva offers the possibility to share custom datasets with other users by making it searchable via freva.databrowser
. With help of the UserData
class users can add their own data to the central data location, (re)-index or delete data in the databrowser.
Note: Any data that has been added by users will be assigned a special project name: project=user-$USER
.
Add new data to the databrowser#
To be able to add data to the databrowser the file names must follow a strict standard and the files must reside in a specific location. This add method takes care about the correct file naming and location. No pre-requirements other than the file has to be a valid netCDF
or grib
file are assumed. In other words this method places the user data with the correct naming structure to the correct location.
Suppose you’ve gotten data from somewhere and want to add this data into the databrowser to make it accessible to others. In this specific example we assume that you have stored your original data in the /tmp/my_awesome_data folder. E.g /tmp/my_awesome_data/outfile_0.nc...tmp/my_awesome_data/outfile_9.nc
The routine will try to gather all necessary metadata from the files. You’ll have to provide additional metadata if mandatory keywords are missing. To make the routine work you’ll have to
provide the institute, model and experiment keywords:
[1]:
from freva import UserData, databrowser
Let’s create an instance of the UserData
class.
[2]:
user_data = UserData()
The add
method can add data into the users data directory. The location of this data directory is given by the system and can be queried by accessing the user_dir
property:
[3]:
# Get the location of the user data
print(user_data.user_dir)
/tmp/user_data/user-runner
Let’s inspect the help of the add
method:
[4]:
help(user_data.add)
Help on method add in module freva._user_data:
add(product: 'str', *paths: 'os.PathLike', how: 'str' = 'copy', override: 'bool' = False, **defaults: 'str') -> 'None' method of freva._user_data.UserData instance
Add custom user files to the databrowser.
To be able to add data to the databrowser the file names must
follow a strict standard and the files must reside in a
specific location. This ``add`` method takes care about the correct
file naming and location. No pre requirements other than the file has
to be a valid ``netCDF`` or ``grib`` file are assumed. In other words
this method places the user data with the correct naming structure to
the correct location.
Parameters
----------
product: str
Product search key the newly added data can be found.
*paths: os.PathLike
Filename(s) or Directories that are going to be added to the
databrowser. The files will be added into the central user
directory and named according the CMOR standard. This ensures
that the data can be added into the databrowser.
**Note:** Once the data has been added into the databrowser it can
be found via the ``user-<username>`` project.
how: str, default: copy
Method of how the data is added into the central freva user directory.
Default is copy, which means your data files will be replicated.
To avoid a this redundancy you can set the ``how`` keyword to
``symlink`` for symbolic links or ``link`` for creating hard links
to create symbolic links or ``move`` to move the data into the central
user directory entirely.
override: bool, default: False
Replace existing files in the user data structure.
experiment: str, default: None
By default the method tries to deduce the *experiment* information from
the metadata. To overwrite this information the *experiment* keyword
should be set.
institute: str, default: None
By default the method tries to deduce the *institute* information from
the metadata. To overwrite this information the *institute* keyword
should be set.
model: str, default: None
By default the method tries to deduce the *model* information from
the metadata. To overwrite this information the *model* keyword
should be set.
variable: str, default: None
By default the method tries to deduce the *variable* information from
the metadata. To overwrite this information the *variable* keyword
should be set.
time_frequency: str, default: None
By default the method tries to deduce the *time_frequency* information
from the metadata. To overwrite this information the *time_frequency*
keyword should be set.
ensemble: str, default: None
By default the method tries to deduce the *ensemble* information from
the metadata. To overwrite this information the *ensemble* keyword
should be set.
Raises
------
ValueError: If metadata is insufficient, or product key is empty.
Example
-------
Suppose you've gotten data from somewhere and want to add this data into the
databrowser to make it accessible to others. In this specific
example we assume that you have stored your `original` data in the
``/tmp/my_awesome_data`` folder.
E.g ``/tmp/my_awesome_data/outfile_0.nc...tmp/my_awesome_data/outfile_9.nc``
The routine will try to gather all necessary metadata from the files. You'll
have to provide additional metadata if mandatory keywords are missing.
To make the routine work you'll have to provide the ``institute``, ``model``
and ``experiment`` keywords:
.. execute_code::
from freva import UserData, databrowser
user_data = UserData()
# You can also provide wild cards to search for data
user_data.add("eur-11b", "/tmp/my_awesome_data/outfile_?.nc",
institute="clex", model="UM-RA2T",
experiment="Bias-correct")
# Check the databrowser if the data has been added
for file in databrowser(experiment="bias*"):
print(file)
By default the data is copied. By using the ``how`` keyword you can
also link or move the data.
[5]:
# You can also provide wild cards to search for data
user_data.add("eur-11b", "/tmp/my_awesome_data/outfile_?.nc",
institute="clex", model="UM-RA2T",
experiment="Bias-correct")
# Check the databrowser if the data has been added
Status: crawling ...ok
Let’s check if the data has been added to the data browser
[6]:
for file in databrowser(experiment="bias*"):
print(file)
/tmp/user_data/user-runner/eur-11b/clex/UM-RA2T/Bias-correct/hr/user_data/hr/r0i0p0/v20241114/tas/tas_hr_UM-RA2T_Bias-correct_r0i0p0_197001041800-197001050300.nc
/tmp/user_data/user-runner/eur-11b/clex/UM-RA2T/Bias-correct/hr/user_data/hr/r0i0p0/v20241114/tas/tas_hr_UM-RA2T_Bias-correct_r0i0p0_197001040800-197001041700.nc
/tmp/user_data/user-runner/eur-11b/clex/UM-RA2T/Bias-correct/hr/user_data/hr/r0i0p0/v20241114/tas/tas_hr_UM-RA2T_Bias-correct_r0i0p0_197001032200-197001040700.nc
/tmp/user_data/user-runner/eur-11b/clex/UM-RA2T/Bias-correct/hr/user_data/hr/r0i0p0/v20241114/tas/tas_hr_UM-RA2T_Bias-correct_r0i0p0_197001031200-197001032100.nc
/tmp/user_data/user-runner/eur-11b/clex/UM-RA2T/Bias-correct/hr/user_data/hr/r0i0p0/v20241114/tas/tas_hr_UM-RA2T_Bias-correct_r0i0p0_197001030200-197001031100.nc
/tmp/user_data/user-runner/eur-11b/clex/UM-RA2T/Bias-correct/hr/user_data/hr/r0i0p0/v20241114/tas/tas_hr_UM-RA2T_Bias-correct_r0i0p0_197001021600-197001030100.nc
/tmp/user_data/user-runner/eur-11b/clex/UM-RA2T/Bias-correct/hr/user_data/hr/r0i0p0/v20241114/tas/tas_hr_UM-RA2T_Bias-correct_r0i0p0_197001020600-197001021500.nc
/tmp/user_data/user-runner/eur-11b/clex/UM-RA2T/Bias-correct/hr/user_data/hr/r0i0p0/v20241114/tas/tas_hr_UM-RA2T_Bias-correct_r0i0p0_197001012000-197001020500.nc
/tmp/user_data/user-runner/eur-11b/clex/UM-RA2T/Bias-correct/hr/user_data/hr/r0i0p0/v20241114/tas/tas_hr_UM-RA2T_Bias-correct_r0i0p0_197001011000-197001011900.nc
/tmp/user_data/user-runner/eur-11b/clex/UM-RA2T/Bias-correct/hr/user_data/hr/r0i0p0/v20241114/tas/tas_hr_UM-RA2T_Bias-correct_r0i0p0_197001010000-197001010900.nc
By default the data is copied. By using the how keyword you can also link or move the data.
Remove your data from the databrowser#
The delete
removes entries from the databrowser and if necessary existing files from the central user data location. Let’s inspect the help of the user delete
method first:
[7]:
help(user_data.delete)
Help on method delete in module freva._user_data:
delete(*paths: 'os.PathLike', delete_from_fs: 'bool' = False) -> 'None' method of freva._user_data.UserData instance
Delete data from the databrowser.
The methods deletes user data from the databrowser.
Parameters
----------
*paths: os.PathLike
Filename(s) or Directories that are going to be from the
databrowser.
delete_from_fs: bool, default : False
Do not only delete the files from the databrowser but also from their
central location where they have been added to.
Raises
------
ValidationError:
If crawl_dirs do not belong to current user.
Example
-------
Any data in the central user directory that belongs to the user can
be deleted from the databrowser and also from the central data location:
.. execute_code::
from freva import UserData
user_data = UserData()
user_data.delete(user_data.user_dir)
[8]:
user_data.delete(user_data.user_dir)
Check the databrowser if the data has been removed:
[9]:
databrowser(experiment="bias*", count=True)
[9]:
<generator object SolrFindFiles._search at 0x7f204430e110>
(Re)-Index existing data to the databrowser#
Using the index
method the databrowser
can be updated with existing user data. For example, if data has been removed from the databrowser
it can be re-added. Again, inspect the help first:
[10]:
help(user_data.index)
Help on method index in module freva._user_data:
index(*crawl_dirs: 'os.PathLike', dtype: 'str' = 'fs', continue_on_errors: 'bool' = False, **kwargs: 'bool') -> 'None' method of freva._user_data.UserData instance
Index and add user output data to the databrowser.
This method can be used to update the databrowser for existing user data.
Parameters
----------
crawl_dirs:
The data path(s) that needs to be crawled.
dtype:
The data type, currently only files on the file system are supported.
continue_on_errors:
Continue indexing on error.
Raises
------
ValidationError:
If crawl_dirs do not belong to current user.
Example
-------
If data has been removed from the databrowser it can be re added using
the ``index`` method:
.. execute_code::
from freva import UserData
user_data = UserData()
user_data.index()
[11]:
user_data.index()
Status: crawling ...ok