Best Practices: Integrating Data using Groove Server 2007 Data Bridge

Summary: Review the recommended best practices for configuring, monitoring, troubleshooting, and programming against the Microsoft Office Groove Server 2007 Data Bridge (GDB). This article focuses on using the Groove Data Bridge as a data integration point between Microsoft Office Groove 2007 workspaces and line of business applications. (6 printed pages)

Jesse Howard, Microsoft Corporation

August 2007

Applies to: Microsoft Office Groove 2007, Microsoft Office Groove 2007 Server, Groove Server 2007 Data Bridge

Contents:

  • Focusing on Integration

  • Designing for the Application Development Lifecycle

  • Using Transaction Modeling

  • Planning for Peak Usage

  • Managing the "Ramp"

  • Planning to Test Performance

  • Planning Ongoing Performance Monitoring and Post-Launch Maintenance

  • Conclusion

  • Additional Resources

This article provides recommended practices for configuring, monitoring, troubleshooting, and programming against the Microsoft Office Groove Server 2007 Data Bridge (GDB). In a production environment, there are additional configuration, monitoring and troubleshooting tasks and practices that are advisable.

Groove Data Bridge provides a central, server-based mechanism to manage Groove workspaces programmatically, using Groove Web services (GWS), and to back up Groove workspaces of which the Data Bridge is a member. While it may be possible, even convenient and useful, to use a Groove Data Bridge in the capacity of both backup agent and data integration point, doing so may create contention for resources within the server. Creating Groove workspace archives requires heavy use of Data Bridge resources, like most event-driven data transaction models. Consequently, if a Data Bridge is being used to perform both tasks, external transactions may become constrained during archive cycles, and archive cycles may take longer than expected during times of peak transaction load. It is recommended that any single Groove Data Bridge be used for workspace archiving or for data integration, but not both simultaneously.

This article presents a number of best practices for configuring and deploying Groove Data Bridge Server 2007 as an integration point between Groove workspaces and line of business applications. See the Groove Data Bridge 2007 Administrators Guide for details about the minimum system requirements, pre-requisites, and minimum basic configuration tasks for the Groove Data Bridge.

Focusing on Integration

For information about the minimum system requirements, pre-requisites, and minimum basic configuration tasks, see Groove Server Data Bridge Functionality. In a production environment, there are additional configuration, monitoring and troubleshooting tasks and practices that are advisable. This article provides recommended practices for planning for a data integration deployment, including configuring, monitoring, and troubleshooting the Groove 2007 Data Bridge.

Designing for the Application Development Lifecycle

The key to successfully deploying the Groove Data Bridge in any data integration implementation is proper planning up front, including understanding the projected usage patterns of Groove workspaces, information needs, the proposed transaction model and performance testing. Without a proper understanding of the proposed usage model and information needs of workspace members, it is impossible to predict the expected “ramp” from initial roll out of the application to full adoption, peaks in workspace usage, and overall server demand.

Planning each implementation and deployment necessarily involves making assumptions about end user behavior and these assumptions, as well as contingent scenarios, must be validated with robust performance testing.

Finally, each deployment should include a detailed monitoring and maintenance plan, including the mechanisms and counters that will inform capacity planning and triage decisions. The Groove environment and synchronization model provide unique opportunities for users and pose unique challenges for system administrators; it is critical to think of any GDB deployment in the context of a complete application development process, and to plan both expected system operations as well as contingency.

Using Transaction Modeling

Integrating external application data into workspaces requires consideration of both the demands that the integrating application will make on the GDB through Groove Web services, as well as the usage of the workspaces themselves by other members. Additional load is placed on the servers by workspace lifecycle activities, such as workspace creation, invitation processing and workspace deletion.

A well-constructed transaction model contains estimated frequency and duration of each operation on the Data Bridge, and the projected impact on Data Bridge resource consumption (in the absence of other operations). In addition, the model limits total duration of all transactions on the Data Bridge to less than 50% of the total available duration of a period (12 out of 24 hours, for example); this allows time for the Data Bridge to process user-generated requests in addition to GWS requests.

Table 1. Sample Transaction Model Table

Transaction Type Avg. response time Avg. time to proc. Expected Frequency Min. wait time

Spaces.Create

4000 ms

38 sec.

3 / hour

3 min.

Members.Create

500 ms

29 sec.

30 / hour

30 sec.

Spaces.Read

500 ms

11 sec.

1 / hour

30 sec.

Subscription.Create

500 ms

1 sec.

on initialization

n/a

Subscription.Update

500 ms

1 sec.

4 / day

360 min.

Subscription.Delete

500 ms

1 sec.

on error

n/a

Events.ReadExtended

500 ms

500 ms.

360 / hour

10 sec.

Events.Delete

500 ms

500 ms.

120 / hour

10 sec.

Spaces.ReadSpace

10000 ms

10 sec.

120 / hour

10 sec.

Tools.Read

1000 ms

1 sec.

120 / hour

10 sec.

Tools.ReadTool

500 ms

500 ms

120 / hour

10 sec.

Forms2.ReadRecords

10000 ms

10 sec.

120 / hour

10 sec.

Forms2.CreateRecords

1000 ms

12 sec.

30 / hour

10 sec.

Forms2.UpdateRecords

1000 ms

15 sec.

30 / hour

10 sec.

Forms2.DeleteRecords

1000 ms

10 sec.

45 / hour

10 sec.

Forms2.ReplaceDesign

200000 ms

360 sec.

n/a

n/a

Members.Delete

500 ms

2 sec.

1 / hour

30 sec.

Spaces.Delete

1000 ms

40 sec.

1 / hour

3 min.

It is also important to note that some GWS requests have asymmetric impact on GDB performance—a request that takes only 1 or 2 seconds to return may result in GDB processing for several seconds, or even a minute. For this reason, be sure to establish baseline impact for transactions of every expected type and, as applicable, with a variety of sizes.

Planning for Peak Usage

During periods of peak usage, either of the workspaces or of the source or target external systems, the GDB may become practically unusable for some kinds of transactions, such as workspace creation or deletion. You should plan for peak usage. You can estimate peak usage as follows:

  • Peak usage = Total # of workspaces X Total # of members X # of transactions per member per period X avg. weight of transaction

    Note

    Transaction denotes either a user-generated action or a programmatic action; size refers to either the size, in KB, of the transaction, or the impact, in terms of resource consumption—either works, as long as it is applied consistently.

Using this approach does not yield an exact metric, but helps you determine the impact of peak usage during a period of any duration. For example, assuming that initial workspace population has occurred, for the most part, and so there is not a consistent “ramp” of new workspaces, peak usage is estimated as follows:

  • # of workspaces on the Data Bridge: 1000

  • # of members (per workspace): 20

  • # of transactions per member per hour: 4

  • Average “weight” of transaction: 500 milliseconds of CPU processing time

  • Peak usage = ~ 1000 x 20 x 4 x .5 = 40,000 CPU seconds, or approximately 1.2 processing hours.

In this (rather extreme) example, one hour of peak usage on the total client population yields 1.2 hours of processing time on the GDB. In and of itself, this does not necessarily overload the GDB, because the GDB queues incoming deltas for sequential processing. GWS requests, however, both impact delta processing and be impacted by the delta processing, resulting in longer processing periods for all transactions and potential GWS timeouts.

GDB caches workspaces and so there is likely to be some optimization when applying deltas in a scenario like this one, resulting in improved processing, but because such caching and optimization are beyond the control (or even observation) of an administrator, it is important not to plan on such processing improvements.

During the requirements-gathering phase of the project, business analysis should yield the anticipated average workspace membership and number of transactions per hour. Testing should yield the average transaction weight, in terms of processing milliseconds, and from these numbers, you can calculate the total number of workspaces that a Data Bridge should support under peak processing.

It should be noted that this approach to estimating peak usage should not be construed as an airtight predictive modeling tool; other internal and environmental factors also impact the GDB. It is not advisable to plan for normal GDB performance under peak usage scenarios if peak usage is estimated to exceed 50% of available processing on the GDB.

Managing the "Ramp"

In addition to modeling and preparing for peaks in usage of workspaces after full adoption of the solution occurs, you should analyze and prepare the solution uptake and workspace creation process in advance. Groove Data Bridge, unlike other server products, does not allow for bulk operations—each workspace created, whether through invitation acceptance or GWS request, must be created in series.

Consequently, the process of adding workspaces to GDB takes time, and few operations stress the Data Bridge as much as workspace creation. The less time allowed between workspace creation transactions, the less processing time is dedicated to processing deltas and other GWS transactions. It is possible to request so many new workspaces in a given period that the Data Bridge ceases to accept new workspace requests. Due the load that workspace creation places on the Data Bridge, it is possible that the GDB will fall behind in processing the serialized workspaces. At such times, the Data Bridge continues to show signs of peak operation and resource consumption for up to several hours after the last invitation is accepted or the last GWS transaction successfully requested. There is no way to speed up or cancel processing of these requests, and you should allow the GDB to complete processing of the workspaces.

For these reasons, it is best to plan for a relatively slow ramp to full workspace creation on the GDB, to pre-create workspaces on the GDB prior to inviting human members to the workspaces, or to set user expectations to allow for considerable latency in workspace creation.

Planning to Test Performance

Groove Data Bridge performance depends on a large number of factors, most of which are external to the GDB. You should design applications that rely upon the GDB around the assumption that each GDB is a conduit to Groove workspaces of limited size. All transactions and data must pass through this conduit, and so compete for GDB resources. To properly set service level expectations, careful testing of GDB performance under a variety of conditions should be conducted prior to deploying the server in production. In each scenario, it is important to test the response time and resource consumption of the GDB under normal operating conditions, consisting of both normal use of the workspaces by simulated human membership and GWS-consuming applications. After a baseline under normal operating conditions is established, modify the usage profile to include abnormal conditions including (at least):

  • Increased usage of each space by human members

  • Increased churn in the membership of workspaces

  • Increased workspace creation or destruction

  • Increased transactional throughput from an external application via GWS

  • Increased average workspace size

When creating a performance test plan, any increase in any parameter associated with load or resource consumption is cumulative. For example, an increase in the average delta (or transaction) size per member has the aggregate effect of not only increasing the impact of each active workspace on the GDB, but also decreases the resources available for other kinds of tasks. If the expected number of human members in each workspace increases, even if the total user pool stays static, the “weight” of each workspace is increased, and the total delta volume impacts not only processing of deltas, but such operations as workspace creation and GWS data transactions.

Typical analysis of the business that a GDB-enabled application supports yields a range for each of the parameters required to plan for performance testing:

  • Average number of users per workspace

  • Average number of transactions per user per period

  • Average number of workspaces created per period

  • Average number of workspaces deleted per period

  • Average number of GWS data transactions per period

  • Average change in new members per workspace per period

A good performance test includes the projected mean and high values in each range.

Planning Ongoing Performance Monitoring and Post-Launch Maintenance

Due to the unique nature of Groove’s relay-enabled peer-synchronization model, no amount of modeling or testing in a lab complete addresses the possible scenarios that may arise after you deploy a GDB in a production environment. End user behavior is beyond the control of any central administrator, and although you can provide prescriptive guidance to each user, enforcement is impossible (and in most cases not desirable). Consequently, it is prudent to monitor GDB behavior and performance quite closely after the launch of any application that incorporates the GDB.

As time goes on, user behavior and GDB performance becomes more predictable and you can develop more dependable models of behavior for the specific application. At no point, however, is the aggregate behavior of the user population in the field controllable, and various factors, such as network availability, relay performance and general usage can all create completely unique and novel situations that impact GDB and overall application performance.

A thorough post-launch operations plan should include both the day-to-day administration of the tools that enable monitoring the GDB as well periodic reviews of service level agreements and scheduled maintenance including the results of GDB performance analysis.

Conclusion

The Groove Data Bridge provides a data integration point between Groove workspaces and external data or systems. To ensure that there are adequate resources to handle the integration, you should consider performance and load when you are planning your system. Also, it is important to build in an ongoing monitoring system to ensure that the system continues to perform adequately.

Additional Resources

For more information, see the following resources: