BlackBerry Forums Support Community
              

Closed Thread
 
Thread Tools
Old 05-01-2008, 06:36 AM   #1
Jadey
BBF War Game Mod
 
Jadey's Avatar
 
Join Date: Oct 2006
Location: Denver CO
Model: Z10
OS: 10010614
PIN: SEEKRIT innit
Carrier: AT&T
Posts: 4,294
Unhappy Most bizarre BES problem I have ever had

Please Login to Remove!

OK, there is a really odd problem with my BES, not even sure how to describe it.

Sys info:
* Lotus Domino Server (Release 7.0.2 for Windows/32) - BES is NOT the Domino mail server, it actually connects to four mail servers (2 on same LAN in UK, 2 via VPNs in USA/CA)
* BlackBerry Enterprise Server, Version 4.1.3.22
* Windows 2000 Server

Problem seems to be:
* In a nutshell, BES slowly gives up processing anything for users.
- I do not ever receive a BES alert email
- When I check the server, Domino is running
- When I check "show tasks" on Domino, BES is running
- When I check services, all BlackBerry services are running

How does problem present?
- One by one, BES seems to just start "ignoring" the fact it has certain users.
- Watching the server console, it never seems to scan these users' inboxes for mail. Nor does it attempt to contact the device. Users find that they just "don't receive" any mail, and can have problems sending mail and completing lookups.
- Server does not show errors for those users, it just *ignores* the fact they exist
- Affected users are on different mail servers, so this is not (obviously) linked to one of the 4 domino servers running mail
- Only "fix" seems to be restarting Domino & BES
- Once this issue has appeared (it does not happen for some time after a restart), Domino & BES will not shut-down cleanly, I normally have to end task on Domino
- Have to put all services to manual before server reboot, and start them when server is up - when they are on auto, Domino & BES do not start reliably
- Domino does not seem able to start BES services any more, even when BES services are set to automatic. I bring up Domino, which says it has loaded BES but nothing processes, then I have to go to each BB service and manually start them.
- Different users are affected on different days
- Problem starts with one user, then another, then another, till everyone affected, and this makes it hard to know when server is problematic. For example, today my BB fine, 2 colleagues on same mail server had no mail since early hours of morning.


Any error messages?
* Only in logs, never on BES or Domino console
* These messages will start appearing in Application Logs on Windows for affected users:
---------------------------------
Event Type: Warning
Event Source: BlackBerry Messaging Agent XXXXXXXX
Event Category: None
Event ID: 20148
Date: 01/05/2008
Time: 08:40:00
User: N/A
Computer: XXXXXXXX
Description:
The description for Event ID ( 20148 ) in Source ( BlackBerry Messaging Agent BLACKBERRY ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: Thread: *** No Response *** Thread Id=0x175C, Handle=0x8F0, WaitCount=121, Last Activity: New Message for user Mark Smith/OU/ORG.
---------------------------------

AND

---------------------------------
Event Type: Warning
Event Source: BlackBerry Messaging Agent XXXXXXXX
Event Category: None
Event ID: 20149
Date: 01/05/2008
Time: 08:40:00
User: N/A
Computer: XXXXXXXX
Description:
Thread 175C, utilization=0.0000%, failed health check 121 times
---------------------------------

Has anyone seen anything like this before?
Any ideas where to start?!
Any requests for further info/Win logs/BES logs please let me know and I will post.

The result of this is that I have a defective and very difficult to manage BES service, has been going on for a week or so now, and I am running out of ideas!

PLEASE HELP IF YOU CAN!
__________________
Jadey : Infrastructure Architect, Denver CO
Offline  
Old 05-01-2008, 07:32 AM   #2
Sagz
Knows Where the Search Button Is
 
Join Date: Feb 2006
Model: 8100t
Carrier: tmobile
Posts: 45
Default

The alerting service was bunk in 4.1.3 and 4.1.4. It is suppose to be fixed in 4.1.5.

When a thread fails a health check its going to fail for the users that are utilizing those particular threads. So in your case "Thread 175C, utilization=0.0000%, failed health check 121 times" if there were 10 people on that thread 10 people will not have service. When you start seeing threads failing it usually waterfalls over time...meaning more and more threads are going to fail.


How many users do you have? What is your current threading model on the BES and what is it optimized to? Also what lvl is your logging set to right now?
Offline  
Old 05-01-2008, 08:11 AM   #3
Jadey
BBF War Game Mod
 
Jadey's Avatar
 
Join Date: Oct 2006
Location: Denver CO
Model: Z10
OS: 10010614
PIN: SEEKRIT innit
Carrier: AT&T
Posts: 4,294
Default

32 users on this BES

Sorry no idea about threading model - how/where would I check?

Log levels are currently:

ALRT 4
ASRV 4
ACNV 4
DISP 4
MAGT 4
MNGR 4
ROUT 4
CTRL 4
POLC 4
SYNC 4
CNTS 4
CBCK 4
CMNG 4
MDSS 4

A lot of these were at 2, but we tried changing the logging levels after this issue started.

Thanks!!
__________________
Jadey : Infrastructure Architect, Denver CO
Offline  
Old 05-01-2008, 08:26 AM   #4
Jadey
BBF War Game Mod
 
Jadey's Avatar
 
Join Date: Oct 2006
Location: Denver CO
Model: Z10
OS: 10010614
PIN: SEEKRIT innit
Carrier: AT&T
Posts: 4,294
Default

Quote:
Originally Posted by Sagz View Post
What is your current threading model on the BES and what is it optimized to?
Is this an Exchange/MAPI setting? Cannot find ANYTHING on Domino BES relating to this...
__________________
Jadey : Infrastructure Architect, Denver CO
Offline  
Old 05-01-2008, 08:59 AM   #5
TreeDude
Talking BlackBerry Encyclopedia
 
TreeDude's Avatar
 
Join Date: Apr 2008
Location: Western NY, USA
Model: iPn4S
OS: iOS 7.0.1
PIN: 76E5A626
Carrier: Verizon
Posts: 243
Default

How long does it take to stop processing?

When we had our old 2.2 BES on Windows 2000 it did something similar to this about every 3 or 4 weeks. We would restart and it would be fine. I did not have to play with the services though.

You could try updating to 4.1.5.
__________________
Technical Engineer III

BES was decommissioned. Currently using iPhones with Lotus Notes Traveler 9.0.
Offline  
Old 05-01-2008, 10:08 AM   #6
Jadey
BBF War Game Mod
 
Jadey's Avatar
 
Join Date: Oct 2006
Location: Denver CO
Model: Z10
OS: 10010614
PIN: SEEKRIT innit
Carrier: AT&T
Posts: 4,294
Default

Is 4.1.5 OK though? I've seen a lot of posts for 4.1.5 and Domino issues, but have not had time to look into how serious they are.

When this started a few weeks ago, it'd take days to start messing up. Now it is doing it in a matter of hours.

I have just noticed through testing:
-Restarting individual BES services does not appear to help (only tested Controller, Router & Dispatcher)
- Doing a "tell bes quit" then "load bes" on the Domino server clears the problem... albeir temporarily!
__________________
Jadey : Infrastructure Architect, Denver CO
Offline  
Old 05-01-2008, 10:43 AM   #7
Oderus
Knows Where the Search Button Is
 
Join Date: Sep 2006
Model: 9700
Carrier: Rogers
Posts: 39
Default

What's the latency of each mail server to the BES? If it's over 40ms, it won't work properly.
Offline  
Old 05-01-2008, 10:59 AM   #8
Jadey
BBF War Game Mod
 
Jadey's Avatar
 
Join Date: Oct 2006
Location: Denver CO
Model: Z10
OS: 10010614
PIN: SEEKRIT innit
Carrier: AT&T
Posts: 4,294
Default

The 2 mailservers on same LAN as BES are fine.

Hard to test the overseas servers, they are on VPNs with firewall rules that only allow notes traffic over. Can't ping.

However, nothing on VPNs has changed, and typically it is threads to one of the local server that start to die first. So I don't think latency is an issue - particularly when "tell bes quit" "load bes" fixes it (temporarily) without a domino restart or server restart or anything else...
__________________
Jadey : Infrastructure Architect, Denver CO

Last edited by Jadey; 05-01-2008 at 11:01 AM..
Offline  
Old 05-01-2008, 11:33 AM   #9
mahoward
CrackBerry Addict
 
mahoward's Avatar
 
Join Date: May 2005
Model: 8900
Carrier: T-Mobile
Posts: 560
Default

Perhaps try to increase your MaxTotalThreads from default of 40 decimal to 80 decimal. This might just be a band aid however, as I believe the issue to be latency over the VPN links.

What is your ping time from your BES to your Domino mail servers over the VPN? If you can't ping how about NotesConnect (nPing)?

Are your VPN users mailfiles increasing greatly? That could explain the issues now vs before. Scanning large mailfiles can be intensive over our high speed low latency WAN links within the US, I can't imaging doing that over a VPN link overseas.

I really think the ultimate solution is to set up an additional BES VM in the US connecting via WAN or LAN links to the mailboxes in US / CA.
__________________
BESX 4.1.7 on Exchange 2003: 65 Devices
BESX 5.0.3 on Exchange 2003: 2007 Devices
Offline  
Old 05-01-2008, 11:41 AM   #10
mahoward
CrackBerry Addict
 
mahoward's Avatar
 
Join Date: May 2005
Model: 8900
Carrier: T-Mobile
Posts: 560
Default

Also if you could perform the following command on your MAGT log and post or PM the results that would help:

grep -i "thread.*pool" [BESServerName]_MAGT_01_20080501_0001.txt

For example my server shows this:

[ENV] ThreadpoolOptimizationInterval = 240
[ENV] NumThreadPools = 10
Optimize ThreadPools, total number of users 680
No empty thread-pools were found.
Thread pool for mail server (DominoServer1) has 20 threads to serve 172 handhelds
Thread pool for mail server (DominoServer2) has 39 threads to serve 348 handhelds
Thread pool for mail server (DominoServer3) has 2 threads to serve 6 handhelds
Thread pool for mail server (DominoServer4) has 17 threads to serve 151 handhelds
Thread pool for mail server (DominoServer5) has 2 threads to serve 3 handhelds


Note that the total number of threads adds up to 80, whereas 40 is the BES default.
__________________
BESX 4.1.7 on Exchange 2003: 65 Devices
BESX 5.0.3 on Exchange 2003: 2007 Devices

Last edited by mahoward; 05-01-2008 at 11:42 AM..
Offline  
Old 05-01-2008, 11:56 AM   #11
Jadey
BBF War Game Mod
 
Jadey's Avatar
 
Join Date: Oct 2006
Location: Denver CO
Model: Z10
OS: 10010614
PIN: SEEKRIT innit
Carrier: AT&T
Posts: 4,294
Default

Eek!
Attached Images
File Type: jpg nping.jpg (31.5 KB, 95 views)
__________________
Jadey : Infrastructure Architect, Denver CO
Offline  
Old 05-01-2008, 12:04 PM   #12
wunderbar
Talking BlackBerry Encyclopedia
 
wunderbar's Avatar
 
Join Date: Jun 2007
Location: Edmonton AB, Canada
Model: 9630
Carrier: Telus
Posts: 300
Default

I think we have found our problem.

No wonder why it's going sideways on you.
Offline  
Old 05-01-2008, 12:13 PM   #13
Jadey
BBF War Game Mod
 
Jadey's Avatar
 
Join Date: Oct 2006
Location: Denver CO
Model: Z10
OS: 10010614
PIN: SEEKRIT innit
Carrier: AT&T
Posts: 4,294
Default

I'm having trouble finding a command line grep utility for windows - anyone know where to get one?
__________________
Jadey : Infrastructure Architect, Denver CO
Offline  
Old 05-01-2008, 12:48 PM   #14
Jadey
BBF War Game Mod
 
Jadey's Avatar
 
Join Date: Oct 2006
Location: Denver CO
Model: Z10
OS: 10010614
PIN: SEEKRIT innit
Carrier: AT&T
Posts: 4,294
Default

Quote:
Originally Posted by mahoward View Post
Perhaps try to increase your MaxTotalThreads from default of 40 decimal to 80 decimal. This might just be a band aid however, as I believe the issue to be latency over the VPN links.
Where is this setting pls?
Ta
__________________
Jadey : Infrastructure Architect, Denver CO
Offline  
Old 05-01-2008, 12:49 PM   #15
Jadey
BBF War Game Mod
 
Jadey's Avatar
 
Join Date: Oct 2006
Location: Denver CO
Model: Z10
OS: 10010614
PIN: SEEKRIT innit
Carrier: AT&T
Posts: 4,294
Default

Scratch that last question, found it
__________________
Jadey : Infrastructure Architect, Denver CO
Offline  
Old 05-01-2008, 01:16 PM   #16
Jadey
BBF War Game Mod
 
Jadey's Avatar
 
Join Date: Oct 2006
Location: Denver CO
Model: Z10
OS: 10010614
PIN: SEEKRIT innit
Carrier: AT&T
Posts: 4,294
Default

Thankfully only small # of users on this BES. All the USA users have replicas on local servers - I might have to in the short term (assuming threads increase doesn't work) just need to point their mail at the local replicas, and have a fast replication schedule for their mail to the states servers
__________________
Jadey : Infrastructure Architect, Denver CO
Offline  
Old 05-01-2008, 03:44 PM   #17
mahoward
CrackBerry Addict
 
mahoward's Avatar
 
Join Date: May 2005
Model: 8900
Carrier: T-Mobile
Posts: 560
Default

That would work. I have heard of a enhancement request to be able to customize which replica to point to instead of pull from the person doc however doubt if / when that would be implemented.
__________________
BESX 4.1.7 on Exchange 2003: 65 Devices
BESX 5.0.3 on Exchange 2003: 2007 Devices
Offline  
Old 05-02-2008, 01:45 AM   #18
Jadey
BBF War Game Mod
 
Jadey's Avatar
 
Join Date: Oct 2006
Location: Denver CO
Model: Z10
OS: 10010614
PIN: SEEKRIT innit
Carrier: AT&T
Posts: 4,294
Default

Another part of the problem (which I did not state above, as it seemed less urgent)...

Around the time that all this faffing about started, we noticed problems with starting BES.
If I rebooted the server and left everything on manual, then although all the services reported as started, they actualy would not do anything (BlackBerry services and Domino). Currentlky, the only way I can get the server to come up and run is:
a) Set all BB and Domino services to Manual
b) Restart server
c) Set all the BB services to automatic
d) Log-on to Windows and start Domino as a service
e) Notice (again) that Domino seems unable to fully start BES, so then I set all the BES services to Automatic, and have to do a "tell BES quit" - "load BES" on the console
f) Then everything works

If I accidentally reboot the server with the BES services set to Automatic, I have to power off the server (as it hangs when windows starts) and boot into safe mode, set everything to manual and start again.


Anyway sorry if that doesn't make sense. That'll be because I was here till 22.30 last night, and back again at 06.30 this morning. I don't like spending so much time at work!!!

Bloody BES. It's at times like this I miss the old days, when Execs accepted "when you're out of the office, you don't have email". Gah.
__________________
Jadey : Infrastructure Architect, Denver CO
Offline  
Old 05-02-2008, 10:50 AM   #19
Jadey
BBF War Game Mod
 
Jadey's Avatar
 
Join Date: Oct 2006
Location: Denver CO
Model: Z10
OS: 10010614
PIN: SEEKRIT innit
Carrier: AT&T
Posts: 4,294
Default

Well, it seems that upgrading Win2000 server to Win2003 fixed, er, all of it.
I can't explain why!

I will post again if this issue kicks up again.

Thanks to everyone who posted...
__________________
Jadey : Infrastructure Architect, Denver CO
Offline  
Old 05-02-2008, 10:59 AM   #20
mahoward
CrackBerry Addict
 
mahoward's Avatar
 
Join Date: May 2005
Model: 8900
Carrier: T-Mobile
Posts: 560
Default

Bloody windows. When will they port BES to Linux?
__________________
BESX 4.1.7 on Exchange 2003: 65 Devices
BESX 5.0.3 on Exchange 2003: 2007 Devices
Offline  
Closed Thread


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


Crystal Dental Digital RVG Intra Oral Water Resistant Sensor Size 1 picture

Crystal Dental Digital RVG Intra Oral Water Resistant Sensor Size 1

$2564.99



Quartz Single Crystal Wafers AT cut. Double sided polished. picture

Quartz Single Crystal Wafers AT cut. Double sided polished.

$8.95



Dental Glass Quartz Fiber Post Root Canal Pin Straight Screw Restorative 1.0-1.8 picture

Dental Glass Quartz Fiber Post Root Canal Pin Straight Screw Restorative 1.0-1.8

$256.61



Dental Fiber Post Resin Screw Thread Quartz Drills Yellow/Red USPS CEFDA picture

Dental Fiber Post Resin Screw Thread Quartz Drills Yellow/Red USPS CEFDA

$244.36



Yabangbang USA Dental Fiber Post Resin Screw Thread Quartz & Drills 4 Type Color picture

Yabangbang USA Dental Fiber Post Resin Screw Thread Quartz & Drills 4 Type Color

$316.63



Dental Endo Quartz Fiber Post Root Canal Pins 1.0 1.2 1.4 1.6mm Straight Posts picture

Dental Endo Quartz Fiber Post Root Canal Pins 1.0 1.2 1.4 1.6mm Straight Posts

$256.61







Copyright © 2004-2016 BlackBerryForums.com.
The names RIM © and BlackBerry © are registered Trademarks of BlackBerry Inc.