Rich Text Spell Checker
Spent quite some time this weekend improving the spell checker component and I’ve managed to get rich text editing to work reasonably well (especially in Mozilla). Also fixed quite a few bugs and improved the performance significantly.
Been getting a few requests to explain how this thing works, so if you’re interested in that read on, if not you might be want to play with one of the demos?
Rich text editing is still quite buggy and needs more work, but it looks promising.
Implementation
The spell checker is built around a rich edit component (an iframe with designMode set to ‘on’) with a keyup event handler firing after each key press. If a separator was entered (such as space or comma) the previous word is determined using the caret position, that word is then encapsulated in a span with a specific className (webfx-spellchecker-word) to allow it to be accessed easily.
The word is then checked against the cached dictionary and if a match was found the style of the span is updated to reflect the status of the word (red wavy underline if it’s misspelled). If the word was not found in the cached dictionary it’s added to the validation queue and a timer is started that will call the _askServer method. The reason a timer is used is to allow multiple words to be checked simultaneously if they’re enterted in rapid sequence.
Once the timer triggers the _askServer a request is sent to the server using XMLHttp and the server side component, in my case a perl script, gets executed. The server side script iterates over the supplied words and checks the spelling using Aspell. A JavaScript array is then generated containing the status status for each word, and if it was misspelled a list of suggestions.
The client parses the reply using eval and then calls the _updateWords method that iterates over the words that where validated and updates the corresponding span.
For performance reasons XML is not used, rather XMLHttp is used to send a regular HTTP POST request that returns plain text (actually text/javascript but the content-type is quite irrelevant here). POST is used rather than GET to avoid caching as the browser cache would otherwise quickly fill up with a a lot of rather pointless entries.
The fact that each individual word s encapsulated in a span causes a few problems when words are merged or split but thats quite easy to handle. Another more difficult problem that this causes is when rich text editing is enabled and the getHTML method is called those spans should not be included, but the style information assigned to them should.
A bit of regex magic has nearly solved that problem, the spans are stripped but the only style information that is maintained in this implementation is bold/italic and underline. So if any other style information is applied to the same span thats used to highlight misspelled words it’s lost. This obviously only applies if rich text editing is enabled.
License
Being such a nice guy I’ve decided to release this component under the MIT License. As one of the least restrictive licenses it allows for virtually any use, including commercial, as long as credit is given where credit is due. Obviously this also means that open source projects can use it even if they’re under a different, more restrictive, license, such as the GPL.
2005-07-18, Update:
Fixed two bugs in the Mozilla implementation; occasionally if text was entered directly in front of a word it was lost and words where not correctly merged when the whitespace between them was deleted.
Also updated the ignore method to ignore all occurrence of the affected word.
June 13th, 2005 at 12:10
Nice work. Thought you may want to know that once the spell check is done you are no longer able to type in text area. This is on Firefox in Linux.
June 13th, 2005 at 13:16
Yeah, I’ve noticed that behavior occasionally for some versions of firefox… haven’t been able to track it down yet though.
June 13th, 2005 at 19:25
Some things I’ve noticed is that it causes problems for things such as the delete button, , selection, also I can’t go back to the previous page easily. sometimes the red underline doesnt’ go under the whole word., I have problems moving my arrow around.
The text also is a bit jumpy sometimes. This is very cool but even simple text version needs a lot of work to be robust.
I’m using FireFox 1.0.4
June 14th, 2005 at 09:13
Seems I managed to break the mozilla implementation hen I fixed the IE one… Expect a new update shortly.
June 14th, 2005 at 10:19
The mozilla focus problem has now been resolved and the script updated.
Please let me know if you find any other problems.
June 14th, 2005 at 11:14
Guess I should have mentioned my version, duh - sorry.
5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050513 Firefox/1.0 (Ubuntu package 1.0.4~5.04ubp1+1.0.2)
June 14th, 2005 at 12:45
Works great! Have to get our Linux guy to install Aspell so i can test it out myself.
I haven’t looked at the script yet but I assume that in good old WebFX tradition that it’s no worries to hook up the script against another WYSIWYG editor?
June 14th, 2005 at 12:50
Espen Antonsen: Shouldn’t be too hard, but you probably want to use the spell checkers getHTML method, as getting the html directly from the RichText component would return the markup used to highlight misspelled words.
June 14th, 2005 at 21:22
Emil, Your spell-check looks great and I’m anxious to see the tarball- however I’ve been unable to open it. Would you consider also putting up a zipped folder of the same? Best regards- J
June 14th, 2005 at 22:03
Of course, try http://me.eae.net/stuff/spellchecker/spellchecker.zip
July 7th, 2005 at 23:47
do you also have a servlet version (the back end processing for dictionary look up and returning the list of words)?
July 7th, 2005 at 23:48
btw, that was really good work (sorry for not mentioning).Â
July 10th, 2005 at 13:06
Wonder how you make those suggestions..
I see it isn’t ‘%word’ ..
What do take into consideration?
July 11th, 2005 at 18:03
Great work with the Spell checker. I’ve found a problem with the javascript in that sometimes deletes words without you even realising until it’s too late. I can tell you how to recreate the bug though.
In the text area if you Type ‘Mr B’ then hit the HOME key then start typing ‘This is something’ as slow or as fast as you like. You’ll notice it deletes the word ‘Mr’. I’m looking at the code myself to see if I can fix it but my javascript and dom isn’t what it used to, or should be
Hope you had / are having a good holiday (*basic maths escapes me today*)
July 16th, 2005 at 11:10
Thanks for all your feedback and bug reports, I really appreciate it!
Said: I’m using the aspell library.
Paul W: Thanks, I’ll look into that.
Dasika: I’m afraid I don’t. However it shouldn’t be too hard to port it.
July 16th, 2005 at 21:34
Hi Emil, That thing is mad!:)
I have one problem in Firefox though. When I test the demo on your site in Firefox it works fine. When I download the archive on my hard drive, unpack it in a folder and try the richdemo.html or demo.html in Firefox it never underlines the misspelled words; it adds the “webfx-spellchecher-word”  span around the word though. ( same thing works fine in IE however). Has anyone experienced the same problem?…
And I noticed one bug in IE : when you ignore a word the underlining under other instances of the same word through the text aren’t removed, even though when you click on them the popup
Oh yeah… my platform is a Windows XP Pro; Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.9) Gecko/20050711 Firefox/1.0.5
July 16th, 2005 at 21:36
…even though when you click on them the popup with the suggestions isn’t shown.
Sorry ’bout that
July 18th, 2005 at 11:24
No problem. I’ve located where it messes up, its in the _moz_parseActiveNode function where text is entered inside an existing word: sel.focusNode.parentNode.className == ‘webfx-spellchecker-word’.
I’m trying to fix it myself but I don’t know enough about deleting / replacing nodes. When u enter a space it’s not recognised so it still thinks its part of the word. Then when u hit space again it splits it at the first space and deletes everything after it. Hope this helps.
July 18th, 2005 at 21:03
Update:
Fixed two bugs in the Mozilla implementation; occasionally if text was entered directly in front of a word it was lost and words where not correctly merged when the whitespace between them was deleted.
Also updated the ignore method to ignore all occurrence of the affected word.
Thanks Pal W and Ivo.
Ivo > Firefox does not allow xml http to access the remote server so you really need to set up your own copy of the server side component and update the uri to that and to the underline image.
July 19th, 2005 at 00:11
Hi,
Great work…You inspired me to create this object that I have been thinking of for a long time now but never found time for. The object, I am calling it “woco” (Word Complete), is as it sounds, word completer. Here it is: http://demo.challenger.se://demo.challenger.se.  By the way, this editor behaves a little strange in IE6, WinXP. I can’t start a new paragraph… Also it wasn’t easy to write the url above. The editor, reentered or duplicated it for some reason. Best regards…/hbi
July 19th, 2005 at 00:12
A new try: www.challenger.se/woco
July 19th, 2005 at 09:55
Hakan > Nice! Been meaning to implement word completion in the spellchecker for some time but I wan’t to get rid of all the weird bugs first. The rich edit functionality is quite cumbersome at times.
July 19th, 2005 at 12:22
Hi Emil,
Nice work on the update! It doesn’t delete the word now BUT, I’m sorry to say I’ve found another bug which I don’t think is really the codes fault. If you do the same as the first bug :: “In the text area if you Type ‘Mr B’ then hit the HOME key then start typing ‘This is something’” :: type it REALLY fast and you’ll notice the caret position
is one step behind where it should be. I believe this is because the next character is already being processed when the caret is being reset.
Maybe a caret cancel boolean is needed?  Or a caret ‘cmd’ ID.
Ahh the benefits of 60+wpm touch typing :/
July 19th, 2005 at 23:23
Nice work!Â
Noticed
two
bugs using
IE6 xp. First
the
red
line
does
not
get
removed
when
I misspell
a
word,
continue
typing
for
at
least
one
more
word
then navigate back
using the
arrow
key, then
correct
manually
(not
using
the
suggestions).Â
Second,
I
cant
add
a
return
after
the
end
of
a
paragraph.Â
Again,
this
is
great.Â
I’ve
been
looking
out
for
something
like
this
for
quite
some
time.Â
I
got
it
to
work
with
jazzy
so I could use our tomcat/jsp server.
Oh, replacing the hard coded server location in the .js with location.hostname would make it easier to port to different systems.
July 19th, 2005 at 23:25
Sorry about the formatting. I copied the text from an IE browser to a Firefox browser so I could get in the newlines.
September 16th, 2005 at 16:56
great work.
Have you tried it on Safari ? I having trouble getting it to work. the display never changes and the javascript options (on Safari) are weak.
Thx
September 20th, 2005 at 11:49
Hi Emil,
Nice work, this one is great.
But i have found one problem. when we have multiple text areas in the page then the rescan function only checks checks the first text area of the form. The rescan(); function is not generalised for the given text area.
September 29th, 2005 at 09:38
Hi!
Do you think that we can change the component you’re using to use the NetSpell component under the .net environment?
This is a really great work
Regards
Philippe
October 5th, 2005 at 13:59
Hi Emil
Very Nice work. I have to check spelling in a text box. I want it to be the same way as you showed the demo here.Can I have an idea to do that by my own.
Thanks!
Selvan
October 28th, 2005 at 20:45
Hey man this software is pretty awesome. I look foward to watching it’s progress. Those big-name wysiwyg editors (FCK, tinymce, etc.) could learn a thing or two here.
November 1st, 2005 at 18:36
I need to implement client side spell checking in a span in ASP page.I don’t want client to install or require any extra component.Please help me in this regard.
you can reply at zeemalik78@yahoo.com
thanks and regards,
zeeshan
November 13th, 2005 at 04:06
This script is great in demo. I’m having some issues getting it to work on one of my pages. Can you drop me a line so i can ask you some technical questions?
Error:
Line 571
Char 3
Permission Denied
Code 0
URL: http://i2driven.net/i2driven/spellcheck/demo.htm
Thanx Tom
November 20th, 2005 at 16:23
Very nice script, however I dont use CGI/Perl, but still looks very good. But woulda been better if it wasn’t an exe so i could learn how it worked ;P
November 20th, 2005 at 20:42
exe > The source code for the C++ version is available, see http://me.eae.net/stuff/spellchecker/cpp/
January 27th, 2006 at 19:17
I have it all installed and it runs fine except when I try to run the cgi script in the browser. I get “%1 is not a valid Win32 application.” instead of “data = [];”. What do I have wrong? I’m trying to run this one a W2k server with IIS5 with ActivePerl and Aspell also installed.
I appreciate all that you have done and think this is a great app!
February 1st, 2006 at 20:50
Ok… I managed to get it to work, but only if I used the PHP script for the dictionary…
But it works none-the-less.
February 7th, 2006 at 00:39
Ran across a bug. If you have a word that changes formatting midway, it treats the word like two different words.
Really like the plaintext version. Hope to see this rich text version up and running at some point so I can create a plug-in for FCKeditor.
March 15th, 2006 at 04:59
Hi Emil,
I can do nothing but reinterate what everyone else has said about how good this spell checker is - its fantastic:)
I have to build a spellchecker to integrate with the FCKeditor, and i am trying to work out what is the best way about going about this developing this. Im not sure whether to go for the Richtext one - which of course has more support for Richtext:) or for the plaintext one, and move accross the rich text functionality into it, as it seems more developed. What do you think would be best/easiest?
April 27th, 2006 at 12:23
Emil,
This is an amazing example , but just was worried as in my case I need a script which would run on the client side as none of our server’s here have MS Office installed.
Can you suggest me an alternative approach to the CGI and check for the client side .dic file.
Thanks in advance
April 28th, 2006 at 02:15
rt: MS Office has nothing to do with it, all that is required on the server side is aspell, or any other spell checker library with an exposed API. The C++ example works very well on windows.
April 28th, 2006 at 10:37
Emit thanks for the quick response. We have domino installed and I guess it should be helpful to accomplish this task ? Please let me know and as I do not have experience on CGI based applications needs to know where all modifications needs to go for the API changes to be reflected.
Thanks in advance
May 9th, 2006 at 18:43
Hi,
I wanted to confirm if I can integrate your spell checked component within my richtext editor in ASP pages?
Thank you.
May 30th, 2006 at 22:54
Emil,
Very useful script! I’m working on evaluating it for our application, however I can across an issue that has me stumped. As best as I can see the overlay trick you use requires that the textarea have position: absolute; applied to it. That’s fine however I’m seeing having that is breaking all my surrounding div’s (I’m wrapping the textarea in a few div’s for styling and other reasons) such that all the div’s collapse overtop of the textarea.
Have you ever come across this, and if so, do you have any suggestions on how to overcome this issue?
Thanks in advance,
Bruce
August 4th, 2006 at 12:53
Hi Emil,
I found your spell checker really great one. But we are using a framework called ES-framework build on IBM WebSphere. Which is more a tag based components. Right now I have not implemented this on it. So I just want to know, can you use this idea is ES-framework such that it doesn’t create any problems.
Thanks in advance,
Bhargava
August 19th, 2006 at 22:12
I tried to use your code locally but this error:
Error: uncaught exception: Permission denied to call method XMLHttpRequest.open
any suggestions?
October 15th, 2006 at 21:02
Great Work. I was looking for this long time ago.
Thank
March 14th, 2007 at 17:38
Has anybody made any progress on integrating Emil’s code with fckeditor? thx
May 3rd, 2007 at 12:25
If you want to integrate JSP Spell Checker with FCK Editor, here’s how you will do it:
NOTE: The reason JSP SpellChecker does not work with FCK Editor by default is because at the time of replacing the contents it gets confused, as it cannot refer the text area anymore. The only way to update the text area which has been rendered by FCKEditor is by using FCKEditor object itself! (remember FCKEditor object creates IFrame etc. around the text area - so it is no longer accessible in the normal fashion!)
Here we go !~
//Get the instance of FCKEditor object
oEditor = FCKeditorAPI.GetInstance( “myTextArea”);//put this as globally accessible variable
myTextArea is the text area you converted to FCK Editor.
In spellcheck-functions.jsp, Look for this function: replaceWordInFieldIE.
Here get handle to oEditor (remember its in your parent window). So you may do something like this:
oEditor = opener.blah blah…
var oldHTML = “” + oEditor.GetHTML();
var originalWord = “” + word; // original word to replace
var newWord = “” + newWord; // new word to replace
var newHTML = oldHTML.replace(originalWord, newWord); //new string to set.
oEditor.SetHTML(newHTML);
return;
Ofcourse you will have to take care the your string replacement function is smarter than this, since it deals with html it can potenially replace even the text inside html tags. But this is how you do it !
April 4th, 2008 at 04:29
Awesum work, I had imagined such things could be done, but never really bothered with it. You could just create a spell check for asp.net using some code like
http://www.mabaloo.com/Software-Development/Spell-Checker-Routine-Using-VB.html
And get things done on server side