|
|
|
11-09-2005, 06:26 PM
|
#1
|
Knows Where the Search Button Is
Join Date: Nov 2005
Location: Central Florida
Model: 8130
Carrier: Verizon
Posts: 36
|
HELP - BES Delays Delivering Messages
Please Login to Remove!
I need help!! We are experiencing message delivery delays with our BES. Users are receiving messages delayed 5 mins - 2 hours from the time the same messages are received in their Inbox. When messages are received, they are being received in batches (x number of messages received at the same time regardless of when they were sent).
Configuration:
Standalone BES 3.6 (Service Pack 2)
107 Users (mix of T-Mobile, Cingular, Verizon)
Mix of handhelds: oldest is 6210, newest is 7100c
2 Exchange 2003 Servers (SP 1) - one local, one remote
cdo.dll and mapi32.dll are Ex 2003 SP1 version
We first became aware of this problem last Thursday (11/3). The problem seems to come and go depending on the time of day. When the problem is occurring, all users seem to be affected regardless of their carrier or device. The problem seems to appear around 10:00am-12:00pm (eastern time) each day. The problem seems to die down or go away late at night and continue that way through the morning. I know this all sounds like it should point to something in our environment or configuration, but I can’t figure out what that may be.
Our BES and Exchange servers are both within our corporate firewall. We have not discovered any problems on the LAN or WAN. The Exchange servers do not have any abnormal errors or warnings. When within the BES Management utility and watching the User Stats for a given user, we see a delay with the “Pending to handheld” queue incrementing. Once the queue does increment, the messages are received on the handheld almost instantly after.
We have restarted the service. We have rebooted the computer. Neither has resolved the problem (although it MAY have lessened the duration of the delays - we don't have enough data to support that). We have run the IEMSTest utility to verify the connectivity with the Exchange server. We have verified the cdo.dll and mapi32.dll versions on the BES server. We have verified the permissions on each of the Exchange servers. We have verified that PIN to PIN messages work. We have verified that messages sent via the BlackBerry Web Client are delivered promptly.
Your thoughts and suggestions are greatly appreciated.
Dagon
|
Offline
|
|
11-09-2005, 07:52 PM
|
#2
|
Talking BlackBerry Encyclopedia
Join Date: May 2005
Model: 7100
Carrier: T-Mobile
Posts: 299
|
Does the Application Log on the BlackBerry Server show a lot of Warnings about
Blocked Threads?
|
Offline
|
|
11-09-2005, 10:00 PM
|
#3
|
CrackBerry Addict
Join Date: Apr 2005
Location: Toronto
Model: 8800
Carrier: Rogers
Posts: 571
|
Also, what's the latency between the BES and the remote Exchange server? The latency limitation is <35ms between the BES and Exchange server(s). Anything greater and the users will experience delays in mail redirection and PIM data synchronization.
Are the users with the highest pending count located on the remote Exchange?
Have you used the /3GB switch in the boot.ini file to address the virtual memory address allocation issues? See the following link.. http://support.microsoft.com/kb/815372/
Take a look at the event viewer, the BES logs and exchange servers CPU usage during the peak times and compare with non-peak hours.
Are any of your servers connected to a hub? If so use a switch instead to reduce the collisions.
For a 100+ users that type of delay points to system resources or network issues.
|
Offline
|
|
11-10-2005, 09:32 AM
|
#4
|
Thumbs Must Hurt
Join Date: Jun 2005
Location: Here
Model: 8700
Carrier: Cingular
Posts: 161
|
Have your network guys put a watch on your connection between the BES and EXCH servers for connections. Sounds like you are getting hung threads and they aren't clearing. Moving to BES4.0.1 fixed this issue with us, on 3.6.x we had the servers rebooting twice a day to clear the threads.
|
Offline
|
|
11-10-2005, 09:45 AM
|
#5
|
Knows Where the Search Button Is
Join Date: Nov 2005
Location: Central Florida
Model: 8130
Carrier: Verizon
Posts: 36
|
Hi BBTechGuy,
Thanks for the response. Actually, we are seeing warnings about blocked threads. The specific warnings have one of the following descriptions:
At least one worker thread seems to be blocked (3) < --- The # in the ( )’s changes
Some worker threads have been blocked for 3 health checks < --- The # changes
I used to think that these warnings were random, but going back over the Event Viewer it does look like we are receiving these around the time we are seeing the delays and are not receiving them when we do not see the delays. What is your suggestion?
Hi dev,
Thanks for your response as well. The latency between the BES and the remote Exchange server is ~120 ms. Yes, I know that is greater than MS recommended MAPI latency limit of <35 ms. But, we have been running in this configuration for over 2 years and have not had this problem before (that I know of). However, we do have more users now. I am not ruling this out…
Looking over the pending count, there does not seem to be a big difference between users on the remote Exchange server and the local Exchange server. I am on the local Exchange server and when the problem occurs, I am definitely seeing it.
The /3GB switch is set on all the Exchange servers. It is not set on the BES (BES has 2 GB of RAM). All the servers are connected to 100 Mb switched Ethernet with a gig backbone. We have a 3 Mb frame connection to our remote site.
Hi jrbes,
Thank you for the response. We have not performed any packet captures between our BES and Exchange Servers. I am not exactly sure what we would be looking for. We have verified that connectivity is consistent between the servers and that MAPI connections can be established. Any thoughts as to what we would look for? BES 4.0 is in our future, but unlikely that management would approve just upgrading to it without full planning, testing, and approved downtime.
All good thoughts. Keep them coming!
Dagon
|
Offline
|
|
11-10-2005, 09:53 AM
|
#6
|
BlackBerry Extraordinaire
Join Date: Sep 2005
Location: Congested Islet of "Foreign Talents" (> 45% of workforce) - Singapore.
Model: Z10
OS: 10.0.0
PIN: NUKE(PAP)
Carrier: Singtel
Posts: 1,504
|
.. hey guys, check this out...
KB-01685 "What Is - Factors that contribute to latency"
Source: http://www.blackberry.com/knowledgec...28801&vernum=0
May be helpful...
-= noname =-
|
Offline
|
|
11-10-2005, 11:50 AM
|
#7
|
Thumbs Must Hurt
Join Date: Jun 2005
Location: Here
Model: 8700
Carrier: Cingular
Posts: 161
|
Look for MAPI sessions being established but also MAPI connections not being released. Each hung thread that you see is 10 minutes (3 health checks = 30 minutes of delay for messages that are dependant upon those threads).
Are you seeing a lot of rescans in your log files?
|
Offline
|
|
11-10-2005, 02:40 PM
|
#8
|
Knows Where the Search Button Is
Join Date: Nov 2005
Location: Central Florida
Model: 8130
Carrier: Verizon
Posts: 36
|
Thanks jrbes,
We are seeming some rescans. I am not sure what is normal and what is "a lot".
What would cause hung threads?
Dagon
|
Offline
|
|
11-10-2005, 02:57 PM
|
#9
|
Thumbs Must Hurt
Join Date: Jun 2005
Location: Here
Model: 8700
Carrier: Cingular
Posts: 161
|
Hung threads are caused by connections (MAPI) to the Exchange server not terminating in a timely fashion - 4.0 fixed this for us as it will reset inactive connections.
|
Offline
|
|
11-11-2005, 09:43 AM
|
#10
|
Knows Where the Search Button Is
Join Date: Nov 2005
Location: Central Florida
Model: 8130
Carrier: Verizon
Posts: 36
|
A week now and the problem is still going on.
So does anyone have any other suggestions as to how to prevent blocked\hung threads? I have gone through a day's debug logs and it looks for that given day, all the logged blocked threads were from approximately 10 users. However, I was not one of the users and I was still seeing delays. My understanding of the blocked threads, was that only the user's thread that was blocked would experience the delay. In reality, it seems that if there are any blocked threads (or maybe X number), then it impacts everyone on the BES.
We are not seeing anything on the Exchange server that gives us any indication there is a problem communicating with the BES server.
One thing that concerns me, is we didn't have this problem before. The only change I can think of would be users added or removed from the BES.
Also, does anyone have any suggestions on a good text editor \ parser \delimiter to use to view the debug logs? Idealy I would like to input them into Excel, but the columns do not come in cleanly.
Dagon
Last edited by Wakefield103; 11-11-2005 at 10:57 AM..
|
Offline
|
|
11-11-2005, 10:21 AM
|
#11
|
Thumbs Must Hurt
Join Date: Jun 2005
Location: Here
Model: 8700
Carrier: Cingular
Posts: 161
|
A blocked thread can cause delays on anyone's device, not just the person that is attached to the hung thread (this is what RIM told me). Preventing them is based primarily on your network and the speed between your BES and Mail server (especially if you are an Exchange shop - MAPI is a dog with this until version 4.0).
Our hung thread issues went away after upgrading to 4.0 SP1 and applicable HF's. Once on 4.0 you can use the Resource kit to view the logs a little easier but I haven't found anything that makes reading those darn things extremely easy on the eyes.
Hope this helps.
__________________
Policies get in the way of fun.
|
Offline
|
|
11-13-2005, 10:43 AM
|
#12
|
Thumbs Must Hurt
Join Date: Jun 2005
Model: 7100i
Carrier: Rogers
Posts: 73
|
To prevent or stop hung threads your first have to understand what event they are hanging on. Grep your Debug logs for "No Response" (without the quote" and it will show all the hung thread events. The key things that it will show, that are usefull, are:
- The action it is trying to perform
-> RESCAN_SOMETHING
-> NEW_MAILBOX_PACKET
-> NEW_MESSAGE Etc.
Are these the same or different?
- The user for who the thread is hung for
- The WAITCOUNT - 1=10 minutes and after 5 WAITCOUNTS it will print a stack trace.
- The Exchange server the user is on
-> Is this the same or are their any similarities (site etc.)?
If you are still stuck copy a couple of lines from your grep and someone might be able to give you some additional insight.
|
Offline
|
|
11-14-2005, 07:19 AM
|
#13
|
Talking BlackBerry Encyclopedia
Join Date: May 2005
Model: 7100
Carrier: T-Mobile
Posts: 299
|
Quote:
Originally Posted by jrbes
Look for MAPI sessions being established but also MAPI connections not being released. Each hung thread that you see is 10 minutes (3 health checks = 30 minutes of delay for messages that are dependant upon those threads).
Are you seeing a lot of rescans in your log files?
|
The BES will perform roughly the same number of rescans in both a healthy and non-healthy environment. The question you should be asking is whether a lot of Messages are being picked up through Rescan which is where you would see Delay come in.
|
Offline
|
|
11-14-2005, 09:23 AM
|
#14
|
Talking BlackBerry Encyclopedia
Join Date: Jan 2005
Location: LE
Model: Pearl
Carrier: T-Mobile
Posts: 202
|
Have you looked at this KB : Installing security update MS05-019 or Windows Server 2003 Service Pack 1 may cause network connectivity between clients and servers to fail
http://support.microsoft.com/default...-us;898060#kb2
|
Offline
|
|
12-07-2005, 10:50 AM
|
#15
|
New Member
Join Date: Dec 2005
Model: 7290
Posts: 1
|
We have exactly the same problem. Blackberry version 3.6 with about 160 users.
Outside of working hours all is OK but between 09:00h and 12:30h and again between 14:00 and 18:30h the delays to synchronise email to the Blackberry handhelds increase even up to a couple of hours. t the same time we are seeing "hung threads" inthe error logs, usually associated with RELOAD_FOLDERS events. In time the reload folders does complete but can sometimes take over an hour to complete - the same RELOAD_FOLDERS for the same user completes in a matter of seconds when it runs outside of working hours.
For us this has been going on for over a month, despite all efforts to find a solution. Wakefield103, perhaps you could contact me by email so we can discuss this offline.
|
Offline
|
|
12-07-2005, 07:31 PM
|
#16
|
Talking BlackBerry Encyclopedia
Join Date: May 2005
Model: 7100
Carrier: T-Mobile
Posts: 299
|
Have either of you looked up Microsoft's document for Exchange 2000 Troubleshooting? This will contain PerfMon Counters you'd want to watch and how to interpret the data. I don't know if there's a 2003 version but I'm sure the values would be similar anyways.
|
Offline
|
|
12-08-2005, 08:37 PM
|
#17
|
New Member
Join Date: Nov 2005
Location: Sydney
Model: 8100
Carrier: Telstra
Posts: 2
|
We're seeing an identical problem, although we have a slightly different config:
Standalone BES 4.0 (Service Pack 3)
102 Users
3 Exchange Servers - one local (2003 SP2, ping <1ms), two remote (1x 2003 Sp2, ping 180ms, 1x 2000 sp3, ping 400ms).
Like you, everything is fine out of hours and on the weekends, but during the day, particularly late morning and mid-afternoon, delays can be anywhere from 20 mins to 1 hour. We've had a similar config for about 12 months, only started noticing the delays in the last 4 weeks or so.
We did a test yesterday where we removed all the users on the remote Exchange servers but this made little difference, if any.
Our local Exchange server is *very* busy (~2000 users, 600GB of stores) and RIM have suggested that the MAPI connections are timing out which is causing the delays. So today I tested that theory by moving my mailbox to a new Exchange server with no other users on it - and now I'm not seeing any delays. So I think our problem is that our Exchange server really is too busy to respond quick enough for BES to handle. Looks like it's time for someone to write a cheque for a new Exchange server...
|
Offline
|
|
08-15-2006, 01:04 PM
|
#18
|
Thumbs Must Hurt
Join Date: Mar 2006
Model: 8703e
Carrier: Sprint
Posts: 156
|
Can anyone give a straight answer as to how to clear blocked threads..
The BES has gone from 1 to 6 heath checks and stops for mail server 3.
__________________
Your BlackBerry Did What!!
Outlook 2010
BES 5
|
Offline
|
|
08-15-2006, 01:42 PM
|
#19
|
Thumbs Must Hurt
Join Date: May 2005
Location: Calgary, AB
Model: 9800
OS: 6.0.0.246
Carrier: Telus
Posts: 85
|
I found issues, and RIM apparently has confirmed it, with Microsoft security patch MS05-051, and can block the UDP packets that the BES server is waiting for. What I hear you guys describing is that the emails are actually getting delivered to the devices when the BES does a rescan, which it does typically every 20-30 minutes.
If you go into your agent log files (servername_MAGT_1) and search for "Queuing new mail through rescan" (without the "") If you are seeing hundreds if not thousands of those, chances are the UDP packets from the exchange server aren't reaching the BES application, and can be blocked from everything including the firewall, port filtering, or the MS05-051 patch.
I also experience with that MS05-051 patch, smaller groups of users, usually between 22 and 32, getting a hung thread. I removed the patch and didn't have any issues for two weeks... Then I upgraded to 4.1 SP1... But before then I was seeing the hung thread 2 or 3 times a day.
Check your logs, but the is a good chance if you have no firewalls between your bes and exchange server that MS05-051 is the source of your issue.
|
Offline
|
|
08-15-2006, 02:11 PM
|
#20
|
CrackBerry Addict
Join Date: Jun 2006
Model: 7100
Carrier: Rogers
Posts: 615
|
Quote:
Originally Posted by technickel
Can anyone give a straight answer as to how to clear blocked threads..
The BES has gone from 1 to 6 heath checks and stops for mail server 3.
|
Reboot
permanent fix is to decrease Ping times. 30 msec or less is required but you may be fine with under 90msec
|
Offline
|
|
|
|