Scientific
Workflows- web services for scienceandnetworks
An excellent overview of Keppler
project - one of several open source scientific workflow projects.
Workflow tools such as Keppler have wide application beyond scientific
application in areas such as industrial control systems, sensors,
etc. CANARIE's UCLPv2 will allow workflow tools to be used by
researchers and ebusiness application to control the topology
and configuration of the network, including deployment of lightpaths
as an integral part of overall scientific or business supply chain
workflow. As a result the network is no longer a common "plumbing"
infrastructure but an integral part of an application. Some excerpts
from GridToday article at www.gridtoday.com-- BSA]
Speeding Scientific Workflows: The
Open-Source Kepler Project By Paul Tooby, SDSC Senior Science
Writer
The advent of the World Wide Web
has brought a whole new world of data, tools, and online services
within the reach of scientists. But this wealth of opportunities
also brings a new set of challenges, and scientists can be overwhelmed
by the sheer volume and variety of resources. To overcome this
problem, researchers are developing a user-friendly workflow tool
that can organize and automate scientific tasks, helping scientists
take full advantage of today's complex software and Web services.
Named Kepler, after the Ptolemy
software on which it is built, the new tool is part of the emerging
cyberinfrastructure, or integrated technologies for doing today's
science. Downloads and more information can be found at http://kepler-project.org/.
"With Kepler, scientists from many disciplines can automate
complex workflows, without having to become expert programmers,"
said Bertram Ludäscher, one of the initiators of the Kepler
project and an associate professor of Computer Science at UC-Davis
and SDSC Fellow.
"Kepler's flexibility and its
visual programming interface make it easy for scientists to create
both low-level 'plumbing workflows' to move data around and start
jobs on remote computers, as well as high-level data analysis
pipelines that chain together standard or custom algorithms from
different scientific domains."
In its simplest form, Kepler may
be thought of as a sort of "scientific robot" that relieves
researchers of repetitive tasks so
that they can focus on their science. In addition to increasing
the efficiency of scientists' own workflows, Kepler will also
give researchers increased capabilities to communicate and work
together searching for, integrating, and sharing data and workflows
in large-scale collaborative environments.
Scientific Workflows
In using software and online resources, scientists typically carry
out tasks involving the design and execution of a series of steps,
or workflow. A researcher begins by identifying and accessing
initial data sets, and proceeds through additional steps using
software tools such as Web services, modeling and simulation programs,
image processing programs and visualization software.
Ecologists in the Science Environment for Ecological Knowledge
project, which initiated Kepler, study invasive diseases such
the the West Nile virus. West Nile virus is spread through mosquitos
feeding on migrating birds in a complex dual-vector process. The
researchers develop predictions for where and how fast this kind
of disease will spread. To do this, ecologists access online data
sets about where the mosquitoes and birds are observed to live
and migrate. Then they use
Web-based ecological niche modeling tools to correlate this information
with climate data, computing predictions for where the birds and
mosquitos are likely to be found. Automating these steps with
Kepler can make it feasible to produce accurate predictions for
the spread of an invasive disease far more quickly than previously
possible. Automating workflows can yield similar benefits in a
wide range of other scientific fields.
In addition to the Ptolemy project
described below, which serves as the framework for Kepler, the
collaboration currently includes the following projects that span
a range of scientific fields:
Ecology: the NSF Science Environment
for Ecological Knowledge, or SEEK.
Biology: Scientific Discovery through Advanced Computing Program,
or SciDAC, in the Department of Energy, and the NSF Encyclopedia
of Life project, or EOL.
Geosciences: the NSF GEON project,
building a cyberinfrastructure for the geosciences.
Environmental science and sensor networks: the NSF Real-time Observatories,
Applications, and Data management Network project, or ROADNet.
Computational chemistry: the NSF RESearch sURGe ENabled by CyberinfrastructurE
project, or RESURGENCE.
Enhancing Collaboration
Beyond automating the steps of a given project, workflows captured
in Kepler are intended to promote communication and collaboration
for scientists in diverse domains -- a crucial capability for
today's large-scale interdisciplinary collaborations. "Through
its systematic approach to scientific workflows, Kepler can fulfill
the important function of publishing analyses, models, data transformation
programs, and derived data sets," said Kepler co-initiator
Jones. "This gives scientists a way to track the provenance
of derived data sets produced through workflow transformations,
which is essential to being able to identify appropriate data
sets for integration and further research."