BlackBerry Forums Support Community

jibi · 02-11-2008, 11:06 PM

I was out of the office today when the outage happened, but I'm happy to say that very few people in our IT organization were confused as to how to troubleshoot the issue and how to utilize the tools made available to them. It's refreshing to see the implementation of tools and processes, even if at a bare minimum at this point in time, make an impact when something like today's outage comes up.

At 3:22:20 PM ET, we received our first BoxTone email notification letting us know one of our servers was in an unavailable status and had lost it's SRP connection. Within a minute, we were alerted for the rest of the BES servers we have in the United States (we do not have the international sites configured for notifications at this time). The outage was reported by RIM to have started at 3:20 PM ET. Not too shabby of a turnaround time.

Ten minutes after we first received notifications from our BoxTone connector, we received our first RIM email notification for the outage. This outage notification took over 5 minutes to deliver from RIM to our infrastructure. Over an hour after our initial BoxTone alert, we received our first notification from AT&T (ATTOM), which took nearly 20 minutes to deliver to our system after it was sent from AT&T.

Granted, a monitoring solution can do nothing to fix an outage, but it can certainly reduce the amount of time spent troubleshooting end-user issues when outages happen. This determination period is vital when dealing with thousands and thousands of users. Bulletins can be posted, internal notifications can be sent, Help Desk personnel can start notifying rather than troubleshooting ...all within minutes of an outage developing and quite often much more rapid than official vendor acknowledgement and notification. During these important minutes ticking away, vendors are typically in the process of drafting a response, gaining approvals to send the message to a select few hundred thousands customers, and straining their own mail queues; meanwhile the monitoring system is doing its thing - gathering real-time statistics, aggregating the data, sending alerts to internal technology groups, and helping deduce the outage's scope of impact in your own environment.

Here's what our environment looked like following the reconnection of SRP when messages were still increasing in the pending queues. Quite astonishing.

BES: North America
SABES: South America
PACBES: Asia-Pacific
EUBES: Europe

Sagz · 02-12-2008, 12:52 PM

Great example.

mingjing · 02-13-2008, 04:03 PM

We are using Zenprise to monitor our BlackBerry infrastructure and I have to say that the software is amazing. I was alerted immediately when the outage started. When I called Rogers, they did not even know that the RIM network was down (I guess RIM hadn't notified them yet). I agree with JIBI that using monitoring software to identify outages means that we can proactively reach out to our user community before they end up flooding us with calls.

I've included a screenshot of the Zenprise console and one alert message. We were able to see pending messages growing for critical users, as well as immediately identify the root cause to be connectivity problems with the SRP network.

jibi · 02-13-2008, 10:27 PM

Do you happen to have a screenshot of Zenprise in your own environment rather than the Zenprise test lab screenshot that they mass-mailed this morning? Just curious.

mingjing · 02-14-2008, 08:02 AM

I have one alert message screenshot and one zenprise console issue warning screenshot.

mgaffney · 03-17-2008, 05:45 AM

I posted an article about the BoxTone Dashboard during the BlackBerry outage on my blog. In the article, I use jibi's screen shot and compare it with mingjing's screen shot of the Zenprise User Dashboard. Give it a read if you get a chance:

The BoxTone Dashboard and the BlackBerry Outage

And just so you know, I work for BoxTone.

Highfall · 03-20-2008, 12:49 AM

mingjing, Zenprise 3.3 gets even better... Tons more visibility and alerting on user level issues. You will want to tune the alert filters somewhat though. Since 3.3 was installed, it even caught some hiccups with some of the international providers that was out of trend.

Not a Zenprise rep, but a happy user of it.

grepPZ · 03-24-2008, 08:11 AM

highfall, screenshots or didn't happen

BlackBerry Forums Support Community

BoxTone, Notifications, and Rapid Response