Thursday, May 15, 2008

OpsMgr Data Warehouse: Changing the number of days that data is stored

Stumbled across this post out in the news groups a while back and have referenced it numerous times. Vitaly Filimonov at Microsoft has provided this comprehensive explanation of changing data retention settings for the various forms of data in the data warehouse.


"Unfortunately, OpsMgr2007 does not have UI to change data retention
settings, but you can do it by modifying columns in certain tables inside
OperationsManagerDW database.

There are two places in the DW where we store data retention-related
settings. For “config space” (your management packs, rules they contain,
overrides you’ve created, etc) and “instance space” (objects discovered,
their properties and relationships, etc.) we store setting inside the
MaintenanceSetting table. Here are the columns of interest and their default
values:

Instance space settings:

1. LastInstanceGroomingDateTime - the last time grooming operations were
performed;
2. InstanceGroomingFrequencyMinutes - frequency of the grooming process
start (default: 480)
3. (most important) InstanceMaxAgeDays - maximum age (since the day instance
was deleted) for the instance space objects (default: 400)
4. InstanceMaxRowsToGroom - maximum number of objects to delete in one run
(default: 5000).

Config space settings:

1. LastConfigGroomingDateTime - the last time grooming operations were
performed;
2. ConfigGroomingFrequencyMinutes - frequency of the grooming process start
(default: 60)
3. ManagementPackMaxAgeDays - maximum age for the management pack (since the
day MP was uninstalled) (default: 400)
4. NonSealedManagementPackMaxVersionCount - maximum # of non-sealed MP
versions to preserve (independent of the age) (default: 3)

Based on these settings for config space, sealed MP will be removed 400 days
after it was uninstalled from all management groups that are members of the
DW. Non-sealed MPs play by the same rules, but in addition we keep up to 3
old versions of non-sealed MP maximum.

Now, to the data. Each data type is stored in a separate structure called
“dataset”. There is Performance dataset for perf data, state dataset for
monitor state transitions, event dataset for events, etc. etc. MPs may
introduce new datasets as well. All datasets in existence known today are
so-called “standard datasets”. For those, we have a set of tables that hold
description of the dataset including data retention policies. Non-standard
datasets may be introduced (we do not know of one today though) and they
don’t have to follow the same rules - data retention settings for
non-standard datasets are dataset specific.

For standard dataset data retention is set at the “aggregation” level. Such
that performance “raw” data (samples themselves) stored certain number of
days which may be different from the number of days daily aggregates of
performance counters are stored. These settings are stored in the
StandardDatasetAggregation table. Here are the columns of interest. Note
that “primary key” of the table is composite consisting of dataset id (you
can lookup which dataset is which id in the Dataset table and
AggregationTypeId which can be looked up at the AggregationType table). The
defaults very by dataset / aggregation type:

1. MaxDataAgeDays - maximum number of days to store data;
2. GroomingIntervalMinutes - grooming process frequency;
3. MaxRowsToGroom - max number of rows to delete per transaction (see note
below);
4. LastGroomingDateTime - last time grooming process run.

One important note here is that we do not always groom data row-by row. If
data inflow is high (which is usually the case in medium-to-large
organizations for performance and event data) we create additional tables to
store data. For example, we store first 10M performance samples in the first
table. Once we get more data we leave the first table there, create second
table and start inserting into it. At the same time we calculate min and max
date for the data in the first table (and store it separately in the
StandardDatasetTableMap table). Then the grooming process works like that
(for certain dataset/aggregation type combination): Check to see if we have
only one table. If one - delete records row-by-row using DELETE TOP and
using MaxRowsToGroom parameter. If there is more then one table, find the
table with the oldest “max date” for data in it. If the “max date” is older
then retention period - drop entire table if not, leave everything there.
So, we do not necessarily “up to date” on grooming all the time. If you
have a table which spans one month, we will keep some records one month
longer then really needed, but performance gains of dropping whole table vs.
row deletes is so huge that we think it is way better to store a bit more
data for a bit longer then to pay the penalty.

Hope, this helps.


Vitaly Filimonov [MSFT]
——————————————-
This posting is provided “AS IS” with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm"

OpsMgr: Performance reports not showing data

"When I run a performance report against a single machine I get no data."

A very commonly asked question in the news groups. The resolution is quite simple:

When running a performance report against a single machine, use the 'Add Group' button for that machine rather than the 'Add Object' button.

Why?

The reason for this lies in the way that performance counters are assigned to objects in SCOM. For example, a CPU performance counter may not necessarily be "owned" by a specific computer object. Figuring out that assignment can be painful. However when 'Add Group' is used rather than 'Add Object', you are basically telling SCOM "I want this counter data for this machine. You figure the rest out." A bit unorthodox but it works.

Credit goes to Vitaly Filimonov at Microsoft for explaining this.

Additionally, here is a more thorough explanation of this from Kevin Holman:
http://blogs.technet.com/kevinholman/archive/2008/04/21/why-do-my-reports-show-no-data.aspx

Wednesday, May 14, 2008

OpsMgr: Monitoring the Event Log - Part 1

Event log monitoring is arguably the bread and butter of SCOM. There is plenty of flexibility in other directions, but from the customers I've dealt with most custom monitors that are needed deal with the event log. In this post I'll discussing Event Log properties.

If you've built any event log monitors and have needed to specify any properties other than the default of 'ID' and 'Source', you likely have seen that only a handful of the event properties are available to choose from:




These present a good starting point, but what if you want to look for a particular Event Type, or perhaps text in the Event Description?

The answer is to use the 'Use parameter name not specified' option. This option allows you to specify any parameter including those not listed above, provided that you know the parameter name. Here are the parameter names that can be specified:

So for example, if we wanted to create a rule to look for an event in the application event log from the source "App01", with an ID of "123", that has the text "Error 1 encountered" in the body of the description, our expression would look like this:


That should clarify things a bit. I'll be posting the next part in this series soon that will address customization of alert descriptions shortly.