Boletín de Mayo de 2005
Boletín Informativo

Scientific Workflows- web services for scienceandnetworks

An excellent overview of Keppler project - one of several open source scientific workflow projects. Workflow tools such as Keppler have wide application beyond scientific application in areas such as industrial control systems, sensors, etc. CANARIE's UCLPv2 will allow workflow tools to be used by researchers and ebusiness application to control the topology and configuration of the network, including deployment of lightpaths as an integral part of overall scientific or business supply chain workflow. As a result the network is no longer a common "plumbing" infrastructure but an integral part of an application. Some excerpts from GridToday article at BSA]

Speeding Scientific Workflows: The Open-Source Kepler Project By Paul Tooby, SDSC Senior Science Writer

The advent of the World Wide Web has brought a whole new world of data, tools, and online services within the reach of scientists. But this wealth of opportunities also brings a new set of challenges, and scientists can be overwhelmed by the sheer volume and variety of resources. To overcome this problem, researchers are developing a user-friendly workflow tool that can organize and automate scientific tasks, helping scientists take full advantage of today's complex software and Web services.

Named Kepler, after the Ptolemy software on which it is built, the new tool is part of the emerging cyberinfrastructure, or integrated technologies for doing today's science. Downloads and more information can be found at

"With Kepler, scientists from many disciplines can automate complex workflows, without having to become expert programmers," said Bertram Ludäscher, one of the initiators of the Kepler project and an associate professor of Computer Science at UC-Davis and SDSC Fellow.

"Kepler's flexibility and its visual programming interface make it easy for scientists to create both low-level 'plumbing workflows' to move data around and start jobs on remote computers, as well as high-level data analysis pipelines that chain together standard or custom algorithms from different scientific domains."

In its simplest form, Kepler may be thought of as a sort of "scientific robot" that relieves researchers of repetitive tasks so
that they can focus on their science. In addition to increasing the efficiency of scientists' own workflows, Kepler will also give researchers increased capabilities to communicate and work together searching for, integrating, and sharing data and workflows in large-scale collaborative environments.

Scientific Workflows

In using software and online resources, scientists typically carry out tasks involving the design and execution of a series of steps, or workflow. A researcher begins by identifying and accessing initial data sets, and proceeds through additional steps using software tools such as Web services, modeling and simulation programs, image processing programs and visualization software.

Ecologists in the Science Environment for Ecological Knowledge project, which initiated Kepler, study invasive diseases such the the West Nile virus. West Nile virus is spread through mosquitos feeding on migrating birds in a complex dual-vector process. The researchers develop predictions for where and how fast this kind of disease will spread. To do this, ecologists access online data sets about where the mosquitoes and birds are observed to live and migrate. Then they use
Web-based ecological niche modeling tools to correlate this information with climate data, computing predictions for where the birds and mosquitos are likely to be found. Automating these steps with Kepler can make it feasible to produce accurate predictions for the spread of an invasive disease far more quickly than previously possible. Automating workflows can yield similar benefits in a wide range of other scientific fields.

In addition to the Ptolemy project described below, which serves as the framework for Kepler, the collaboration currently includes the following projects that span a range of scientific fields:

Ecology: the NSF Science Environment for Ecological Knowledge, or SEEK.

Biology: Scientific Discovery through Advanced Computing Program, or SciDAC, in the Department of Energy, and the NSF Encyclopedia of Life project, or EOL.

Geosciences: the NSF GEON project, building a cyberinfrastructure for the geosciences.

Environmental science and sensor networks: the NSF Real-time Observatories, Applications, and Data management Network project, or ROADNet.

Computational chemistry: the NSF RESearch sURGe ENabled by CyberinfrastructurE project, or RESURGENCE.

Enhancing Collaboration

Beyond automating the steps of a given project, workflows captured in Kepler are intended to promote communication and collaboration for scientists in diverse domains -- a crucial capability for today's large-scale interdisciplinary collaborations. "Through its systematic approach to scientific workflows, Kepler can fulfill the important function of publishing analyses, models, data transformation programs, and derived data sets," said Kepler co-initiator Jones. "This gives scientists a way to track the provenance of derived data sets produced through workflow transformations, which is essential to being able to identify appropriate data sets for integration and further research."