Description
This service defines infrastructure for enterprise computational grids. Enterprise grids as opposed to global grids
work within and impose very little restrictions on existing IT infrastructure such as security, resource provisioning,
network and hardware configuration. In fact, enterprise grids rely on the fact that these IT issues are already effectively
addressed - which is the case in majority of today's businesses.
Computational grids concern primarily with solving computationally intensive tasks by splitting such task into multiple
sub-tasks, executing each sub-task on a dedicated grid node in parallel and aggregating results back to get the final
result in O(T/N) time (assuming optimal split, N is number of grid nodes and T is the time to solve this problem on one
node only).
It is important to note the differense between on-demand computing and computational grids. On-demand computing generally
enables on-demand provisioning of IT resources - CPU load, data storage, network bandwidth, etc. These resources can reside
within enterprise (be co-located) or be physically outsourced to outside data centers or to on-demand (a.k.a. utility)
computing providers.
Computational grids on the other hand concentrate on solving a specific type of business tasks - one that is computationally
intensive and can be logically split into parallel sub-tasks that can be asynchronously executed and aggregated back.
In most cases computational grid implementations would employ on-demand strategy for allocating resources for splitting
sub-tasks and that is where confusion between on-demand computing and computational grids lies - computational grids use
on-demand infrastructure as one of their component to effectively distribute computational load and thus the relational
difference between on-demand and grid computing.
One of the unique characteristics of 'grid' service is its support for near real-time (nRT) grid applications. Most of the
traditional grid systems were aimed to solve extremely computationally expensive problems that could take months if not
years to finish. These types of problems appear predominantly in scientific research and few specific business areas such
biotech. Majority of business grid applications however would have very different scale of computational problems, ones
that usually take 10-15 minutes and needed to be reduced to 5-10 seconds. xTier 'grid' service is the only grid
infrastructure that was designed from the ground up to address both types of computational intensive problems -
traditional long running problems and nRT grid applications.
Main features of 'grid' service include:
- Designed to solve enterprise computationally intensive problems.
- New Real-Time (nRT) support additionally to traditional long-running jobs.
- Direct support for task splitting, topology management, result aggregation and failover.
One of the basic usage of 'grid' service is to solve HPC task using split & aggregate methodology.
'grid' service has a direct and comprehensive support for splitting task into task units, executing them
in parallel on remote nodes and aggregating the results back. This process can be simply illustrated by the following
diagram:
Grid task initiated on Node1 and gets split over onto two nodes: Node2 and Node3. Further,
Node3 decides to split its task unit and routes this split to Node4 and Node5. Node2, Node4
and Node5 execute their workloads locally and return results back without further splitting. Note that Node3 aggregates
results from Node4 and Node5 before returning them to Node1.
In the end the grid task originated on Node1 was executed in parallel on Node2, Node4 and Node5.
Top
Configuration
'grid' service is configured via pre-defined xtier_grid.xml configuration file. This file follows
standard xTier service configuration pattern that can be demonstrated by the following complete example of
grid configuration (from examples):
| 1 |  | <xtier-grid> |
| 2 |  | <region name="example"> |
| 3 |  | <config> |
| 4 |  | <ip-port default="54321"/> |
| 5 |  | |
| 6 |  | <thread-pool-name> |
| 7 |  | grid.thread.pool |
| 8 |  | </thread-pool-name> |
| 9 |  | |
| 10 |  | <!-- |
| 11 |  | |
| 12 |  | |
| 13 |  | --> |
| 14 |  | <max-exec-traces>100</max-exec-traces> |
| 15 |  | |
| 16 |  | <!----> |
| 17 |  | <!-- |
| 18 |  | |
| 19 |  | |
| 20 |  | |
| 21 |  | --> |
| 22 |  | </config> |
| 23 |  | |
| 24 |  | <!-- |
| 25 |  | |
| 26 |  | |
| 27 |  | --> |
| 28 |  | <task id="1"> |
| 29 |  | <!----> |
| 30 |  | <factory> |
| 31 |  | <ioc ref-uid="prime.unit.factory"/> |
| 32 |  | </factory> |
| 33 |  | |
| 34 |  | <!----> |
| 35 |  | <topology> |
| 36 |  | <ioc ref-uid="basic.topology"/> |
| 37 |  | </topology> |
| 38 |  | |
| 39 |  | <!----> |
| 40 |  | <router> |
| 41 |  | <ioc ref-uid="weight.router"/> |
| 42 |  | </router> |
| 43 |  | |
| 44 |  | <!----> |
| 45 |  | <failover> |
| 46 |  | <ioc ref-uid="fail.slow.resolver"/> |
| 47 |  | </failover> |
| 48 |  | </task> |
| 49 |  | </region> |
| 50 |  | </xtier-grid> |
Formal sepcification for this service configuration can be found in xtier_grid.dtd file in
${XTIER_ROOT}/config/dtd folder. Generally, 'grid' service configuration consists of
one <config> XML tag element that describes common grid parameters and optional list of grid
tasks defined by <task> XML tag. <config> XML element contains the following
sub elements:
| ip-port |
This element specifies IP port that is used by each grid node in the cluster. Note that you can
either specify default value using default attribute or provide IP port for a specific
cluster node. Note also, that if you specify IP port per cluster node, the same port must be specified
on each grid node for the same cluster node.
|
| thread-pool-name |
This property specifie the name of the thread pool that is used for task unit processing internally
in 'grid' service. Named thread pool must be specified in 'objpool' service. See
ObjectPoolService for details on
'objpool' service configuration.
|
| max-exec-traces |
This property specifies maximum number of grid task for which execution traces will be kept on this node.
Note that if limit is reached the oldest execution traces will be dropped.
|
| taxonomy |
This optinoal element defined IoC object that represents grid taxonomy. Grid taxonomy defines grid node's
relative characterstics such as CPU wieght, IO weight and memory weight.
See IocService for more details on IoC usage. If
taxonomy is not specified the default one will be used. In default taxonomy all grid nodes are considered
identical as far as their weights.
|
<task> XML element has the following sub elements and attribute:
| id |
This attribute defiens grid task ID. This ID is used througout the 'grid' service to identify
grid task.
|
| topology |
This element defines IoC object that represents topology resolver for this grid task.
See IocService for more details on IoC usage.
See below in example section for details on how topology is used in
'grid' service.
|
| factory |
This element defines IoC object that represents grid task unit factory for this grid task.
See IocService for more details on IoC usage.
See below in example section for details on how grid task unit factory is used in
'grid' service.
|
| router |
This element defines IoC object that represents task unit router for this grid task.
See IocService for more details on IoC usage.
See below in example section for details on how grid task unit router is used in
'grid' service.
|
| failover |
This elements defines IoC object that represents failover resolver for this grid task.
See IocService for more details on IoC usage.
See below in example section for details on how failover resolver is used in
'grid' service.
|
Top
Examples
Usage of 'grid' service follows the standard pattern of using xTier service: you need to obtain an instance of
xTier kernel that serves as a service registry. Once you have xTier kernel you can get an instance of any service,
in our case the grid service. Once the service instance is obtained the service API can be used.
Note that usage of 'grid' service includes usage of 'marshal' and 'cluster' service.
See MarshalService and
ClusterService services for details on their usage.
Following code snippet taken out from prime number finder grid example:
| 1 |  | |
| 2 |  | private static final long MIN = 0; |
| 3 |  | private static final long MAX = 1500000; |
| 4 |  | private static final int PRIME_TID = 1; |
| 5 |  | |
| 6 |  | |
| 7 |  | |
| 8 |  | |
| 9 |  | XtierKernel xtier = XtierKernel.getInstance(); |
| 10 |  | |
| 11 |  | |
| 12 |  | GridService grid = xtier.grid(); |
| 13 |  | |
| 14 |  | MarshalObject arg = new MarshalObject(); |
| 15 |  | |
| 16 |  | |
| 17 |  | arg.putInt64("min", MIN); |
| 18 |  | arg.putInt64("max", MAX); |
| 19 |  | |
| 20 |  | |
| 21 |  | GridTask task = grid.getTask(PRIME_TID); |
| 22 |  | |
| 23 |  | |
| 24 |  | GridTaskResult result = grid.exec(task, arg); |
| 25 |  | |
| 26 |  | if (result.isSuccessful() == true) { |
| 27 |  | MarshalObject retval = (MarshalObject)result. |
| 28 |  | getReturnValue(); |
| 29 |  | |
| 30 |  | |
| 31 |  | } |
| 32 |  | else { |
| 33 |  | |
| 34 |  | } |
Download xTier for full examples and documentation.
Top
|