Data Management
With the current and future increasing spatial and temporal resolution of the models, the demand for computing resources and storage space is also increasing. Previous working methods and processes no longer hold up and need to be rethought.
Taking the German Climate Computing Centre (DKRZ) as an example, we are analyzing the users, their goals and working methods. DKRZ provides the climate science community (and also CLICCS members) with resources such as high-performance computing (HPC), data storage and specialized services and hosts the World Data Center for Climate (WDCC). In analyzing users, we distinguish between two groups: those who need the HPC system to run resource-intensive simulations and then analyze them, and those who reuse, build on and analyze existing data. Each group subdivides into subgroups. We have analyzed the workflows for each identified user and found identical parts in an abstracted form and derived Canonical Workflow Modules.
These modules are ”Data and Software (Re)use”, ”Compute”, ”Data and Software Storing”, ”Data and Software Publication”, ”Generating Knowledge” and in their entirety form the basis for a Canonical Work- flow Framework for Research (CWFR).
In the process, we critically examine the advantages and drawbacks of so-called FAIR Digital Objects (FDOs) and will check to what extent the derived workflows and workflow modules are actually future-proof.
The vision is that the global integrated data space is formed by standardized, independent and persistent entities that contain all information about diverse data objects (data, documents, metadata, software, etc.) so that human and, above all, machine agents can find, access, interpret and reuse (FAIR) them in an efficient and cost-saving way. At the same time, these units become independent of technologies and heterogeneous organization of data, and will contain a built-in mechanism that supports data sovereignty. This will make the handling of data sustainable and secure.
So, each step in a research workflow can be a FDO. In this case, the research is fully reproducible, but parts can also be exchanged and, e.g. experiments can be varied transparently. FDOs can easily be linked to others. The redundancy of data is minimized and thus also the susceptibility to errors is reduced. FDOs open up the possibility of combining data, software or whole parts of workflows in a new and simple but at all times comprehensible way. This work is accepted in Data Intelligence (I. Anders, K. Peters-von Gehlen, H. Thiemann, 2022: Canonical Workflows in simulation-based Climate Sciences. Data Intelligence, accepted).
Team: N.N., Andrea Lammert, Ivonne Anders (Alumni)
Contact: lammert@dkrz.de
Links: