RIKEN BASE (Bioinformatics And Systems Engineering) division

1. Overview

2. Introduction to RIKEN BASE

3. International Cyberinfrastructure Standards

4. Database Integration

5. Common Platform Uniting Projects

6. RIKEN SciNeS: Life Science Networking System

7. Scalable Platform Incubating Databases

8. Strengthening Bioinformatics

9. Genome Design

10. Mid-term Goals


 
 

1. Overview of RIKEN BASE

● Research Functions

  • Collaborative research with strategic centers and universities
  • Research and development of new bioinformatics methods

● Infrastructure Functions

  • RIKEN SciNeS or Life Science Networking System for collaborative research through developing databases
    • Collaboration support function
      • Collaboration & review function
      • Message function
      • Electronic labbook function
      • Community function
      • Project management function
        • Ontology project
          • Ontology construction function
          • Ontology corresponding support function
        • Database project
          • Database construction function
          • Database publication function
          • Document tagging rule setting function
          • Automated document tagging function
          • Wiki function
        • Database template project
          • Database wizard function
    • Repository function
      • Active database repository function
    • Analysis tool expansion function
      • Automated data analysis function
      • Data visualization function
    • Batch download function
      • Data mart function
    • Data flow control function
      • Lab automation
    • Automatic update function
      • Automatic semantic web transformation function
      • Database automatic integration update function
    • Inference search function
      • Inference search engine GRASE (PosMed, etc.)
      • Simultaneous hierarchical search function
      • Searches other than keywords such as base sequence or compound structural formula
    • Stable operation
      • Cooperation with supercomputer
      • Various web service functions
    • Disaster countermeasures
    • System maintenance and support staff
 
 

2. Introduction to RIKEN BASE

The RIKEN BASE (Bioinformatics And Systems Engineering) Division was established on April 1, 2008 following reorganization of the former GenomicSciencesCenter (RIKEN GSC).

The 6th RIKEN Advisory Council (RAC) in 2006 recommended that “RIKEN strengthen bioinformatics at the RIKEN life sciences institutes by hiring more specialists in this area and by creating new, integrated databases with user-friendly access. Easy integrated access for RIKEN scientists and those outside RIKEN would allow the data generated at RIKEN to be more fully exploited and would increase RIKEN’s visibility and stature. Although some effort is made in individual institutes to create databases with specialist functional annotation, a RIKEN-wide strategy is needed to develop transparently accessible databases based on international standards and integrated around data types such as genes, sequence and structure. At a minimum, data generated anywhere within RIKEN should be easily available to all RIKEN scientists. RAC believes that the best way to accomplish this is through a distributed data integration model in which data is handled locally, but according to RIKEN-wide standards for data storage, management and exchange. To speed the process and provide optimal interoperability, the adoption of existing data exchange standards and protocols would likely be easier than inventing new ones. To facilitate the construction of such integrated data resources, we suggest the appointment of an Institute-wide director of information sciences and databases.

In addition, the urgent need to integrate databases within Japan was recognized. In its development strategies for the life sciences (based on the Cabinet decisions of March 28, 2006, known as the “Outline for the 3rd Science & Technology Basic Plan”), the Council for Science and Technology Policy proposed the need to establish a world-class research infrastructure for the life sciences as one of the strategically important areas of science and technology in Japan. This required enhanced cooperation for the integration of databases constructed by RIKEN as requested by MEXT (Ministry of Education, Culture, Sports, Science and Technology), as well as the creation of a hub organization to reduce the uncoordinated publication of databases by RIKEN’s individual laboratories and to promote the integration of national-level databases. As a well-conceived and timely response by RIKEN to meet such a need, RIKEN BASE division was established.

 
 
 

3. RIKEN’s database activities need to be adapted efficiently to international cyber standards

The above-mentioned RAC recommendation indicated that RIKEN should adapt its mission to provide a portion of the distributed cyberinfrastructure functions that would enhance the mutual international utilization of data. Most of the lab heads of biology, however, like their database to be browsed visually, and do not well understand the need to make it as transparently accessible as recommended by RAC or the latest technologies they need to adopt. In addition, information technology is constantly undergoing change. For example, the world’s standard technologies are currently shifting to semantic web and cloud computing. New types of datasets are also continuously being generated owing to the rapid advancement of omics technologies. Few laboratories at RIKEN can afford to employ researchers or system engineers skilful in adapting a database that can keep up with such changes. Moreover, due to recent budgets cuts by the Japanese government, some strategic research centers need to reduce the costs for database publication work. For this reason, RIKEN BASE plays the role of an efficient mediator (Fig.1). In regard to internationally visible and valuable protocols such as RSS, DAS and OAI-PMH, we continue to adapt our system’s interface to their specifications, taking account of their occasional modifications. For example, we have adapted our system to DAS1 interfaces and registered them to the Ensembl database.

[Figure 1]

 
 
 

4. Domestic demand for RIKEN to promote disclosed data in both Japanese and English

RIKEN is now one of Japan’s largest research institutions in terms of large-scale data production. In addition to international demand for access to RIKEN’s databases, there is a steady and continuous demand from scientists at Japanese universities and industries for RIKEN to actively promote the disclosure of its data. MEXT has started a new project to integrate databases in Japan and has funded RIKEN BASE to contribute to the MEXT project at RIKEN. We are currently undertaking an operation aimed at transferring RIKEN’s database contents into data standards (initially into semantic web in both Japanese and English, so that we can convert them into any format MEXT may request in the future), enabling cross-searching and bulk downloading in both English and Japanese as requested by the MEXT project. RIKEN BASE also needs to adapt its cyberinfrastructure interfaces to the specifications recommended by the MEXT project, to replace overlapping, similar or unnecessary services hosted within RIKEN or Japan, and to coordinate all services efficiently as a whole.
   To promote the project more strongly, RIKEN has established the RIKEN Life Science Database Cooperation Committee (headed by Executive Director Dr. Yoshiharu Doi). Currently, over 90 sites hosted by individual RIKEN laboratories provide data in an uncoordinated manner on the internet. In order to reorganize the data publication sites for enhanced accessibility and availability of RIKEN data for external researchers, the RIKEN Hub Database project (Fig. 2) has been promoted under the support of RIKEN’s internal budget (President’s fund) since FY2007 and will be released to external researchers by the end of FY2008 (March 2009). RIKEN BASE is expected to play the role of coordinator in the above-mentioned domestic projects, disclosing databases coherently. So far, a Database Registry has been initially created as a part of the project ( http://omicspace.riken.jp/db/database.html ), making it possible for outside users to find each RIKEN database site with ease.

[Figure 2]

 
 
 

5. RIKEN BASE will unite multiple projects on the same information infrastructure

RIKEN reorganized the former GSC from a “complex center style”, in which one large research center encompasses all functions within it, into a “federated cluster style”, in which RIKEN BASE and other centers with distinct national and global roles are cooperating with other strategic centers and external laboratories in order to maximize research performance as a whole. RIKEN BASE is expected to have responsibility for supplying an information platform effective not only for database integration, but also for a wide range of life-science-related data cooperation activities between different projects. RIKEN BASE provides a common platform (Life Science Networking System) for RIKEN experts in each field to promote internationally or cross-sectionally coordinated research teams which are linked through the mutual exchange of experimental data, data analysis and ontology-based annotation. Such exchange cannot be implemented only between the experimental laboratories and is realized by RIKEN BASE through the construction of a network of communities in the triangular shape shown in the right side of Fig. 3. This is different from previous models in which large-scale projects (left side of Fig. 3) include specialized internal informatics teams. The new model strategically constructs flexible collaboration networks through incorporating the triangles which comprise the bioinformatics platform, and efficiently and simultaneously realizes many independent/incorporated collaboration networks that are securely protected.

[Figure 3]

 
 
 

6. Realizing an ideal “Life Science Networking System” to strengthen human cooperation

RIKEN BASE has been promoting, as research, the construction of an ideal life science information infrastructure termed “Life SciNeS” or “Life Science Networking System” which is applicable across all life science research (Fig. 4). Currently, we have developed a RIKEN version of Life SciNeS (RIKEN SciNeS) primarily for the purpose of constructing integrated databases. We are planning to extend RIKEN SciNeS to medical cyber-collaboration purposes.
   In the field of medical research, for example, expectations are high for omics data-driven approaches. Such approaches require an information infrastructure of the diverse and vast quantities of essential omic and clinical data that are strategically accumulated for analysis in order to promote effective collaboration among researchers from a wide range of disciplines, including those from genomics, bioinformatics and basic research, as well as clinicians and specialist physicians. This information infrastructure is both multilevel and bidirectional, and also provides independence in that each multilevel can respond to individual realistic needs from diverse areas of research. One example is a data-sharing, communication and management system in which the exchange of medical data and communications among doctors are automatically accumulated as a database to which a restricted group of doctors is allowed access. RIKEN SciNeS provides a function to collect reliable medical data by supporting a flexible yet strict reviewing system in which researchers and medical doctors can review each submitted data item as strictly and systematically as in the system we use when reviewing journal papers.

[Figure 4]

 
 
 

7. Providing researchers with a scalable and standardized database collaboration platform

RIKEN BASE allows every RIKEN researcher to set up his or her database projects easily and securely for the various purposes of collaborative work, and to publish the databases based on the international standards. RIKEN SciNeS is being constructed as a semantic-web-based scalable system that can host at minimum thousands of different databases simultaneously in a single system, allowing every individual user access control of each data item and interface (Fig. 5). We regard the semantic web (whose format is RDF or Resource Definition Framework) as a useful container for the storage of any kind of data in a structured manner, and we are conveniently using it as the framework technology of RIKEN SciNeS. The advantage of the semantic web is that we can convert it to any international standard that arises in future, because international consensus usually takes a long time to become a concrete recommendation. The RIKEN SciNeS system has already been implemented with RIKEN’s scalable technologies such as SWF (Semantic Web Folders), OmicBrowse and GRASE, and it is significantly improving our cost performance for operating the system. Thus far, RIKEN BASE has made a commitment to develop such an information platform for RIKEN. Our future challenge is to expand the platform in order that we become a “Center” serving a diverse range of research communities.

[Figure 5]

 
 
 

8. Strengthening bioinformatics to analyze data generated at strategic research centers

The 6th RAC proposed that the employment of bioinformatics experts should be strengthened. Bioinformatics involves the daily analysis of data for researchers who are preparing papers during research, and therefore each RIKEN center is promoting the employment of specialists who mainly analyze data generated at the center. While some centers such as RIKEN OSC have large bioinformatics teams, laboratories in many centers are experiencing a sharp shortage of bioinformatics specialists. For this reason, there are a large number of requests made to RIKEN BASE from laboratories both inside and outside RIKEN for collaborative research related to analyzing data generated by the laboratories. At present one researcher needs to be involved in a number of collaborative research projects.
   In particular, since RIKEN employs the latest technology for omics analysis, there is a strong need to analyze new types of data for which analyses do not yet exist, and therefore RIKEN also needs to employ researchers who possess strong skills in the mathematical development of a new analytic algorithm implemented as a program. RIKEN BASE employs specialists with such skills and engages in various collaborative research efforts at each strategic center. For example, we have developed original analytical technologies, such as ARTADE (ARabidopsis Tiling-Array-based Detection of Exons) in collaboration with the wet lab of RIKEN PSC (PlantScienceCenter). In addition, unique technologies for the candidate search of disease genes, such as PosMed (Positional Medline), have been highly acclaimed and are being used extensively by both domestic and overseas researchers. In this way, new large-scale bioinformatics technology is being developed at RIKEN BASE, making RIKEN unique in that the bioinformatics needs of each experimental center are promptly handled.
   Dr. Tetsuro Toyoda, Director of the Division, previously served as Team Leader of the Informatics Team which contributed to both plant and mammalian genomics research in the former GSC and was highly evaluated by GSAC (the Advisory Council of GSC) in 2005: “Both mouse and plant programs are connected through bioinformatics resources created by the Phenome Informatics Team (PIT) led by Dr. Toyoda. PIT develops algorithms and provides tools specific to each project in a single integrated environment. GSAC felt this combination of practical and integrated bioinformatics driven by project needs was an excellent example for future bioinformatics development for the entire Institute.” After obtaining a PhD degree in drug design based on protein structural information in the field of pharmacy, Dr. Toyoda passed the examination for First Class Information Processing Engineer, and conducted research by combining the fields of bioinformatics and databases. So far, he has constructed a database environment which unites knowledge information regarding omics data and literature on mice and plants, and has contributed to the advancement of the GSC research project. Also, without diverging from the needs of life science, he has recruited researchers whose field of expertise is in information science, mathematics and/or databases while directing a bioinformatics team, has undertaken strong and unique research development in informatics, and has created a framework of research teams which can respond to a wide range of needs of life science.
   As well as comprising an important part of the project to coordinate the entire RIKEN life science database, RIKEN BASE is also an important part of the work being undertaken to integrate the databases in life science fields in Japan as promoted by MEXT. Dr. Toyoda also serves as a member of the Cabinet’s Integrated Database Task Force Committee, a member of the Information Platform Working Group of the MEXT Life Science Committee, Coordinator of the RIKEN Life Science Database Cooperation Committee and a member of the Steering Committee of the Integrated Database of Research Organization of Information and Systems (ROIS). Dr. Toyoda is therefore deeply involved in the life science database policy decisions in Japan. At an international level, Dr. Toyoda contributes to the standardization of mouse phenotype data (InterPhenome).

 
 
 

9. Challenges in Genome Design

Dr. Akiyoshi Wada served as Director during the first term of RIKEN GSC (October 1998-March 2004), and a number of project directors collected data regarding omics under the concept of "Omic Space," which encompassingly transforms various biological phenomena ranging from genome to phenome into information. Dr. Yoshiyuki Sakaki led a research group for genomics, Dr. Yoshihide Hayashizaki was the leader for transcriptomics, Dr. Shigeyuki Yokoyama for structural proteomics, Dr. Toshikiko Shiroishi for mouse phenotypes, Dr. Kazuo Shinozaki for plant omics, and Dr. Akihiko Konagaya for bioinformatics.
   Dr. Yoshiyuku Sakaki served as Director during the second term of RIKEN GSC (April 2004-March 2008) and turned to systems biology as a means for understanding all elements of omics as a system. Systems biology is essentially reverse engineering, which is a process of constructing a model aimed at explaining the behavior of the internal systems of existing biological entities (regarded as unknown "black boxes") by observing their response to various perturbations, such as stimulation, stress, gene alteration, and so on. Through the systems biology research promoted by GSC, an exhaustive collection of biological elements and networks was gathered as information in databases, and GSC has contributed to the creation of "information resources" that contain an immense amount of knowledge and data.
   The mission of RIKEN GSC was accomplished within these two terms, and although there was a reorganization, if there had been a third term of RIKEN GSC, its new mission would have been the challenge of forward engineering based on the above-mentioned information resources obtained from the reverse engineering; in other words, “forward engineering from information resources to create new bioresources useful for mankind (Fig. 6).” As the problem of natural resources has recently become an important international issue, RIKEN PSC is trying to create genetically modified plants to approach this problem, and RIKEN BASE is expected to provide support from the standpoint of information. Furthermore, Rothamsted Research in Britain is currently engaged in the challenging task of pioneering plant energy by performing gene manipulation of the poplar and willow, and has requested use of RIKEN BASE’s information technologies (PosMed, etc.). Such forward engineering cannot be implemented without combining the "wet technology" of synthetic genomics and gene-design technologies with "dry technology", which is concerned with the problem of how genomes should be designed such that the new modified plant exhibits the desired features. With the incorporation of the experimental center for animals and plants located in the RIKEN Yokohama institute, RIKEN BASE can utilize the full advantages of the RIKEN Yokohama Institute in methodological research in genome design, and in particular, the realization of “algorithmic genome design by using the semantic-web-based automation technology on the platform of RIKEN SciNeS.” In this context, the target of “Systems Engineering,” which is included in the name of RIKEN BASE, involves not only the construction of information systems, but also the design of biological systems.

[Figure 6]

 
 
 

10. Mid-term goals of RIKEN BASE

Research Mission A: Research and develop new bioinformatics methods
Conduct research focusing on algorithms or bioinformatics for new data types. Develop new automatic data analysis techniques by utilizing RIKEN SciNeS. Undertake methodological research of algorithms or automation techniques for genome design or systems engineering of living organisms (plants and bacteria). Elucidate the design principles to create useful biological resources from information resources.

Research Mission B: Promote bioinformatics collaborations
Conduct research on the integrative analysis of omics data and create a collaboration network between the bioinformatics researchers (dry) and the groups of researchers conducting experiments at the strategic centers (wet).

Infrastructure Mission C: Develop an information platform for RIKEN-wide researchers to create, release and collaborate on databases efficiently
Strengthen the database platform for the entire RIKEN organization by developing new information technologies and tools to realize an ideal information platform or “Life Science Networking System.” Strengthen the data governance within RIKEN.

Infrastructure Mission D: Promote database cooperation and publication within RIKEN
Promote database cooperation and publication by uniting researchers of a wide variety of expertise, and promote the development of "RIKEN Hub Database" to contribute to the database integration in Japan and construction of cyberinfrastructure based on international standards. Replace overlapping, similar or unnecessary services hosted within RIKEN to coordinate all infrastructural services of bioinformatics and databases as a whole.

Please send comments and questions to omicspace@gmail.com
Copyright © RIKEN (The Institute of Physical and Chemical Research), Japan. All rights reserved. Our site policy.