MoSGrid
The Project
The chemical industry is one of the most research-intensive sectors of the German economy. The high levels of innovative dynamism foster close cooperation between industry and scientific institutions. The MoSGrid (Molecular Simulation Grid) should generate competitive advantages for this sector of industry and science through the grid. In MoSGrid, the key focus is on setting up and providing grid services for performing molecular simulations. MoSGrid makes the D-Grid infrastructure available for high-performance computing in the area of molecular simulations, including the annotation of metadata results and its provision for data mining and knowledge generation. The scope of MoSGrid is to support the user in all stages of the simulation process. A portal provides access to data repositories that store information about calculated molecular properties as well as ‘recipies’ – standard methods for the provided applications. With the aid of these recipes, application-specific input files and computing requests can be generated automatically that are subsequently submitted into the Grid (pre-processing and job submission). Furthermore, users will be supported by an evaluation of the calculation results. This facilitates the preparation and processing of data for further calculations and analyses that derive from it. Additional knowledge will be attained by cross-referencing different results data files. Furthermore, the data repository allows external referencing of simulation results.
The D-Grid initiative is already enabling the supported communities to gain simple access to shared computing resources. Based on this technology and tools, MoSGrid will integrate the special requirements of chemically-oriented scientists into the D-Grid infrastructure. The high complexity of this discipline’s software (e.g. quantum mechanics or molecular dynamics) often makes accessing this technology difficult for non-specialist scientists. This difficulty is compounded by the fact that user interfaces, such as graphic accessibility functions, are often not available or are inadequate. The user’s experience is greatly assisted by a clear method selection and simple importing of molecular data, as well as the automatic set-up of a program-specific input data. Consequently, MoSGrid will offer a web-based, graphic user interface, which will enable the transparent use of the installed applications. Therefore, high-quality standard techniques will be suggested on request, e.g. for basic structure optimisation with quantum chemical methods or standard workflows for molecular dynamic research, which scientists can modify based on their own requirements. From the information received, the input data can be automatically generated for the actual simulation calculation, supported by the so-called ‘adapters’. Based on well-known and established methods, jobs are submitted into the grid and supervised. The adapters will be created, maintained and expanded by the consortium and the users. Simulation results will be extracted automatically after the calculations are complete, assisted by a suitable parser adapted to the special output formats of the different programs, and checked for elementary plausibility (post-processing). At the user’s request, these results will be transferred to collaborative data repositories of molecular properties.
Simple access to shared data is, along with the common use of computing capacity, a fundamental basis for the acceptance of grids in business and science. MoSGrid sets up the technological basis in order to provide results of extensive molecular simulations, that can subsequently used for example for data mining processes. Parsers aid the generation of these result data sets. In addition, data repositories are being planned, developed and operated, that support scientists through coordinated access to simulation data and the information derived from it to find solutions to complex questions. As a consequence, the generation of metadata is an important goal for MoSGrid in applying simulation results to complex searches and logical operations. For this, well-known ontolgies will be used, to which it will be possible to add specific requirements through MoSGrid.
Validated workflows and simulation instructions will play a key role, so that the data produced for common datarooms meets certain quality standards. The planned data repositories are of practical importance, according to the expertise of those producing data for a wider circle of data users within and outside of MoSGrid. Topic-specific data can be derived from molecular simulations for the identification of relationships between structural properties. In terms of content, topics such as:
- Fundamental research, such as investigating experimental reaction phenomena;
- Applied research, such as optimisation of materials; and
- Product-related development, such as classification of potentially bioactive agents will be covered.
These broad topics are also documented through the participation of notable business partners in MoSGrid.
The value of the MoSGrid project for business and science relies on the quality, attractive content and sufficiently broad coverage of data, which are only financially possible through the high throughput of computing scenarios in the Grid. The breadth of expert knowledge is available to the MoSGrid thanks to the participating partners from both the business and scientific communities.
Our Mission
The PC˛ works on the question were and which kind of workflows do chemists usually needed. The kind of workflows ranges from very simple ones that are simple applications on one hardware resource to very complex workflows with result depended sub-jobs that can be MPI distributed application running on many nodes in parallel. A result of the analysis was that even the simple computations will, in MoSGrid, be covered by several workflow steps.
Thus, an example workflow for a simple computation consists of four steps,
- input of a user,
- similarity check which ensures with a test on the set of the by now published results that the computation is not previously done,
- the Meta molecule description to is adapted to the input format needed by the application itself, then the execution of the application, and
- extraction of the important parts of the result and their display to the user.
Complex workflows can contain many dependent simulations steps as shown before containing similarity checks and pre and post-processing.
From the workflow requirements PC˛ extracts in cooperation with the Universität zu Köln the needed workflow constructs, which lead to the required impressiveness of the workflow-language. PC˛ analyzes the abilities of Workflow engines and UNICORE6 according to MoSGrids requirements and select best matching language and workflow engine.
The selected Engine will be implemented and maintained in MosGrid.
Funding
This work is supported by Bundesministerium für Bildung und Forschung BMBF under project grant 01IG09006




Print