Quote:
Originally Posted by bossj52
I got it installed alright, however, when I select the suggested word after it checks the spelling, it changes the word then deletes all or part of the previous word. Anyone else experience this?
|
Do you have any non-alphanumeric characters in your email, such as but not limited to the following?
» €
I've found the problem when using some of those characters. The text sent from the Blackberry to BBCorrectorServer.cgi seems to be in UTF-8 format. BBCorrectorServer.cgi correctly handles this but either the Blackberry or, more specifically, BBCorrector running on the Blackberry does not.
The way the spell check works is the Blackberry sends it's email to the server. The server then Spell checks it, taking note of the absolute position from the beginning of the file of any misspelled words by counting the number of bytes that each character takes up. It then sends that data back to the Blackberry which marks the position of the misspelled words. The Blackberry, however, uses the number of characters to count position. This is where the problem lies.
As an example, this is the text you sent to get spell checked:
Ths is jst an example for the spell checker.
The server would note that you misspelled "Ths" at position 1 and "jst" at position 8 and send that data back to the Blackberry. With this example, you would have no problems because each of the characters in the above example only takes up one byte.
As an example of a sentence that would cause problems, this is the text you sent to get spell checked:
Ths is jst an example for the spell checker.
The server would note that you misspelled "Ths" at position 1 and "jst" at
position 14 and send that data back to the Blackberry. The Blackberry, though, thinks that "jst" is at
position 10. In UTF-8,
is 3 bytes long even though it is only one character, so this is where the position indexes get screwed up. There are several other characters that behave similarly.
If you add two lines of code to BBCorrectorServer.cgi, you can fix this issue. I've also fixed it in my PHP version of BBCorrectorServer.php. I will post the PHP code in another post below this one, just in case anyone was trying to use that.
In the original BBCorrectorServer.cgi file, after line 16, which is "use File::Temp qw/ tempfile tempdir /;"
add this line:
use utf8;
In the original file, after line 31, which is "my $text2Check = "$FORM{check}";"
add this line which removes utf8 formatting:
utf8::decode($text2Check);
And here is the entire file, just in case:
Code:
#!/usr/bin/perl
#
# BBCorrector Server
#
# This script accepts a block of text from the BlackBerry software
# BBCorrector, runs it through Aspell, and returns an XML packet
# indicating spelling errors.
#
# NOTES:
#
# - It is recommended you protect this script with some form of HTTP
# authentication. BBCorrector is setup to handle Basic Authentication.
#
use CGI qw(:standard Vars);
use File::Temp qw/ tempfile tempdir /;
use utf8;
# The Aspell executable
my $cmdAspellExe = "aspell";
# Language is US English
my $lang = "en_US";
# Options for Aspell. Puts Aspell into Ispell compatibility mode
# so that its output is written to stdout
my $cmdAspellOptions = "-a --lang=$lang";
my %FORM = Vars();
# Get the block of text from the HTTP parameter "check"
my $text2Check = "$FORM{check}";
#Remove utf8 formatting
utf8::decode($text2Check);
# Convert line endings to a common format. In this case whatever
# line ending combination we get (CRLF, LF, etc), we convert to a standard
# format of LF
$text2Check =~ s/\x0D\x0A|\r/\n/g;
# Create a temporary file to store our block of text
my $dirTemp = tempdir( CLEANUP => 1 );
my( $tempHandle, $tempFilename ) = tempfile( DIR => $dir );
# Split block of text into lines and write to temp file
@lines = split( /\n/, $text2Check );
for my $line ( @lines ) {
# Force Aspell to check whole line via ^ contol character
print $tempHandle "^$line\n";
}
close $tempHandle;
# XML packet has format such as:
# <spell-results>
# <error>
# <word>maan</word>
# <position>12</position>
# <suggest>Man</suggest>
# <suggest>man</suggest>
# <suggest>moan</suggest>
# </error>
# <error>
# <word>helllo</word>
# <position>33</position>
# <suggest>hello</suggest>
# </error>
# <error>
# <word>chris</word>
# <position>41</position>
# <suggest>Chris</suggest>
# <suggest>Charis</suggest>
# </error>
# </spell-results>
my $xmlPacket = "<spell-results>";
# Do this here so that when we are debugging we can display it in return output
print header;
# Keeps track of current line number
my $lineNum = 0;
# Keeps track of the absolute position in the block of text
my $posAbsolute = 0;
# Execute Aspell
my $cmd = "$cmdAspellExe $cmdAspellOptions < $tempFilename 2>&1";
# TODO: $status most likely only tracks wether the fork failed or not, not
# whether the actual command we are running (ie: aspell) failed
my $status = open ASPELL, "$cmd |";
if ($status > 0) {
# Parse Aspell output
for my $cmdReturn (<ASPELL>) {
chomp($cmdReturn);
#print "$cmdReturn<br>\n";
if( $cmdReturn =~ /^\*/ ) {
# Line begins with *. Do nothing.
} elsif( $cmdReturn =~ /^(&|#)/ ) {
# Line begins with & or #.
# Start error element
$xmlPacket .= "<error>";
# Split return line up for easier access
my @tokens = split(" ", $cmdReturn, 5);
# Add word element which contains original misspelled word
$xmlPacket .= "<word>$tokens[1]</word>";
# Need to work out absolute position in file, not just position in current line
my $offsetIdx = 3;
if ($cmdReturn =~ /^\#/) {
$offsetIdx--;
}
my $pos = $posAbsolute + ($tokens[$offsetIdx] - 1);
$xmlPacket .= "<position>".$pos."</position>";
# Add suggestions
my @suggestions = ();
if ($tokens[4]) {
@suggestions = split(", ", $tokens[4]);
for my $suggestion (@suggestions) {
$xmlPacket .= "<suggest>$suggestion</suggest>";
}
}
# End error element
$xmlPacket .= "</error>";
} elsif( $cmdReturn =~ /^$/ ) {
# We have a blank line which indicates a line of text has been processed
my $line = $lines[$lineNum];
$posAbsolute += (length($line) + 1);
$lineNum++;
}
}
close ASPELL;
} else {
$xmlPacket .= "<exception>BBCorrector Server has encountered an error ($!)</exception>";
}
# Delete the temp file
unlink $tempFilename;
# End results XML packet
$xmlPacket .= "</spell-results>";
#print "check = $text2Check<p>\n";
# Return XML packet back to client
print "$xmlPacket\n";