Emil’s Chronicle - The journal of Emil A Eklund

Rich Text Spell Checker

Spent quite some time this weekend improving the spell checker component and I’ve managed to get rich text editing to work reasonably well (especially in Mozilla). Also fixed quite a few bugs and improved the performance significantly.

Been getting a few requests to explain how this thing works, so if you’re interested in that read on, if not you might be want to play with one of the demos?

Rich text editing is still quite buggy and needs more work, but it looks promising.

Implementation
The spell checker is built around a rich edit component (an iframe with designMode set to ‘on’) with a keyup event handler firing after each key press. If a separator was entered (such as space or comma) the previous word is determined using the caret position, that word is then encapsulated in a span with a specific className (webfx-spellchecker-word) to allow it to be accessed easily.

The word is then checked against the cached dictionary and if a match was found the style of the span is updated to reflect the status of the word (red wavy underline if it’s misspelled). If the word was not found in the cached dictionary it’s added to the validation queue and a timer is started that will call the _askServer method. The reason a timer is used is to allow multiple words to be checked simultaneously if they’re enterted in rapid sequence.

Once the timer triggers the _askServer a request is sent to the server using XMLHttp and the server side component, in my case a perl script, gets executed. The server side script iterates over the supplied words and checks the spelling using Aspell. A JavaScript array is then generated containing the status status for each word, and if it was misspelled a list of suggestions.

The client parses the reply using eval and then calls the _updateWords method that iterates over the words that where validated and updates the corresponding span.

For performance reasons XML is not used, rather XMLHttp is used to send a regular HTTP POST request that returns plain text (actually text/javascript but the content-type is quite irrelevant here). POST is used rather than GET to avoid caching as the browser cache would otherwise quickly fill up with a a lot of rather pointless entries.

The fact that each individual word s encapsulated in a span causes a few problems when words are merged or split but thats quite easy to handle. Another more difficult problem that this causes is when rich text editing is enabled and the getHTML method is called those spans should not be included, but the style information assigned to them should.
A bit of regex magic has nearly solved that problem, the spans are stripped but the only style information that is maintained in this implementation is bold/italic and underline. So if any other style information is applied to the same span thats used to highlight misspelled words it’s lost. This obviously only applies if rich text editing is enabled.

License
Being such a nice guy I’ve decided to release this component under the MIT License. As one of the least restrictive licenses it allows for virtually any use, including commercial, as long as credit is given where credit is due. Obviously this also means that open source projects can use it even if they’re under a different, more restrictive, license, such as the GPL.

2005-07-18, Update:
Fixed two bugs in the Mozilla implementation; occasionally if text was entered directly in front of a word it was lost and words where not correctly merged when the whitespace between them was deleted.
Also updated the ignore method to ignore all occurrence of the affected word.

51 Responses to 'Rich Text Spell Checker'

  1. macewan Says:

    Nice work. Thought you may want to know that once the spell check is done you are no longer able to type in text area. This is on Firefox in Linux.

  2. Emil Says:

    Yeah, I’ve noticed that behavior occasionally for some versions of firefox… haven’t been able to track it down yet though.

  3. Alex Bosworth Says:

    Some things I’ve noticed is that it causes problems for things such as the delete button, , selection, also I can’t go back to the previous page easily. sometimes the red underline doesnt’ go under the whole word.,  I have problems moving my arrow around.

    The text also is a bit jumpy sometimes. This is very cool but even simple text version needs a lot of work to be robust.

    I’m using FireFox 1.0.4

  4. Emil Says:

    Seems I managed to break the mozilla implementation hen I fixed the IE one… Expect a new update shortly.

  5. Emil Says:

    The mozilla focus problem has now been resolved and the script updated.

    Please let me know if you find any other problems.

  6. macewan Says:

    Guess I should have mentioned my version, duh - sorry.

    5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050513 Firefox/1.0 (Ubuntu package 1.0.4~5.04ubp1+1.0.2)

  7. Espen Antonsen Says:

    Works great! Have to get our Linux guy to install Aspell so i can test it out myself.

    I haven’t looked at the script yet but I assume that in good old WebFX tradition that it’s no worries to hook up the script against another WYSIWYG editor?

  8. Emil Says:

    Espen Antonsen: Shouldn’t be too hard, but you probably want to use the spell checkers getHTML method, as getting the html directly from the RichText component would return the markup used to highlight misspelled words.

  9. john Says:

    Emil, Your spell-check looks great and I’m anxious to see the tarball- however I’ve been unable to open it. Would you consider also putting up a zipped folder of the same? Best regards- J

  10. Emil Says:

    Of course, try http://me.eae.net/stuff/spellchecker/spellchecker.zip

  11. dasika Says:

    do you also have a servlet version (the back end processing for dictionary look up and returning the list of words)?

  12. dasika Says:

    btw, that was really good work (sorry for not mentioning). 

  13. Said Says:

    Wonder how you make those suggestions..
    I see it isn’t  ‘%word’ ..

    What do take into consideration?

  14. Paul W Says:

    Great work with the Spell checker.  I’ve found a problem with the javascript in that sometimes deletes words without you even realising until it’s too late.  I can tell you how to recreate the bug though.

    In the text area if you Type ‘Mr B’ then hit the HOME key then start typing ‘This is something’ as slow or as fast as you like.  You’ll notice it deletes the word ‘Mr’.  I’m looking at the code myself to see if I can fix it but my javascript and dom isn’t what it used to, or should be :)

    Hope you had / are having a good holiday (*basic maths escapes me today*)

  15. Emil Says:

    Thanks for all your feedback and bug reports, I really appreciate it!

    Said: I’m using the aspell library.
    Paul W: Thanks, I’ll look into that.
    Dasika: I’m afraid I don’t. However it shouldn’t be too hard to port it.

  16. Ivo Says:

    Hi Emil, That thing is mad!:)

    I have one problem in Firefox though. When I test the demo on your site in Firefox it works fine. When I download the archive on my hard drive, unpack it in a folder and try the richdemo.html or demo.html in Firefox it never underlines the misspelled words; it adds the “webfx-spellchecher-word”  span around the word though. ( same thing works fine in IE however). Has anyone experienced the same problem?…
    And I noticed one bug in IE : when you ignore a word the underlining under other instances of the same word through the text aren’t removed, even though when you click on them the popup
    Oh yeah… my platform is a Windows XP Pro; Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.9) Gecko/20050711 Firefox/1.0.5

  17. Ivo Says:

    …even though when you click on them the popup with the suggestions isn’t shown.
    Sorry ’bout that

  18. Paul W Says:

    No problem.  I’ve located where it messes up, its in the _moz_parseActiveNode function where text is entered inside an existing word: sel.focusNode.parentNode.className == ‘webfx-spellchecker-word’.

    I’m trying to fix it myself but I don’t know enough about deleting / replacing nodes.  When u enter a space it’s not recognised so it still thinks its part of the word. Then when u hit space again it splits it at the first space and deletes everything after it.  Hope this helps.

  19. Emil Says:

    Update:

    Fixed two bugs in the Mozilla implementation; occasionally if text was entered directly in front of a word it was lost and words where not correctly merged when the whitespace between them was deleted.
    Also updated the ignore method to ignore all occurrence of the affected word.

    Thanks Pal W and Ivo.

    Ivo > Firefox does not allow xml http to access the remote server so you really need to set up your own copy of the server side component and update the uri to that and to the underline image.

  20. Hakan Bilgin Says:

    Hi,
    Great work…You inspired me to create this object that I have been thinking of for a long time now but never found time for. The object, I am calling it “woco” (Word Complete), is as it sounds, word completer. Here it is: http://demo.challenger.se://demo.challenger.se.  By the way, this editor behaves a little strange in IE6, WinXP. I can’t start a new paragraph… Also it wasn’t easy to write the url above. The editor, reentered or duplicated it for some reason. Best regards…/hbi

  21. Hakan Bilgin Says:

    A new try: www.challenger.se/woco

  22. Emil Says:

    Hakan > Nice! Been meaning to implement word completion in the spellchecker for some time but I wan’t to get rid of all the weird bugs first. The rich edit functionality is quite cumbersome at times.

  23. Paul W Says:

    Hi Emil,

    Nice work on the update!  It doesn’t delete the word now BUT, I’m sorry to say I’ve found another bug which I don’t think is really the codes fault.  If you do the same as the first bug :: “In the text area if you Type ‘Mr B’ then hit the HOME key then start typing ‘This is something’” :: type it REALLY fast and you’ll notice the caret position
    is one step behind where it should be.  I believe this is because the next character is already being processed when the caret is being reset.

    Maybe a caret cancel boolean is needed?   Or a caret ‘cmd’ ID.

    Ahh the benefits of 60+wpm touch typing :/

  24. Matthew Says:

    Nice work! 

    Noticed
    two
    bugs using
    IE6 xp.  First
    the
    red
    line
    does
    not
    get
    removed
    when
    I misspell
    a
    word,
    continue
    typing
    for
    at
    least
    one
    more
    word
    then navigate back
    using the
    arrow
    key, then
    correct
    manually
    (not
    using
    the
    suggestions). 
    Second,
    I
    cant
    add
    a
    return
    after
    the
    end
    of
    a
    paragraph. 

    Again,
    this
    is
    great. 
    I’ve
    been
    looking
    out
    for
    something
    like
    this
    for
    quite
    some
    time. 
    I
    got
    it
    to
    work
    with
    jazzy
    so I could use our tomcat/jsp server.

    Oh, replacing the hard coded server location in the .js with location.hostname would make it easier to port to different systems.

  25. Matthew Says:

    Sorry about the formatting.  I copied the text from an IE browser to a Firefox browser so I could get in the newlines.

  26. jim Says:

    great work.

    Have you tried it on Safari ? I having trouble getting it to work. the display never changes and the javascript options (on Safari) are weak.

    Thx

  27. Vamshi Says:

    Hi Emil,
    Nice work, this one is great.
    But i have found one problem. when we have multiple text areas in the page then the rescan function only checks checks the first text area of the form. The rescan(); function is not generalised for the given text area.

  28. Philippe Says:

    Hi!
    Do you think that we can change the component you’re using to use the NetSpell component under the .net environment?
    This is a really great work
    Regards
    Philippe

  29. Selvan Says:

    Hi Emil

    Very Nice work. I have to check spelling in a text box. I want it to be the same way as you showed the demo here.Can I have an idea to do that by my own.

    Thanks!

    Selvan

  30. Scott Says:

    Hey man this software is pretty awesome. I look foward to watching it’s progress. Those big-name wysiwyg editors (FCK, tinymce, etc.) could learn a thing or two here.

  31. zeeshan Says:

    I need to implement client side spell checking in a span in ASP page.I don’t want client to install or require any extra component.Please help me in this regard.

    you can reply at zeemalik78@yahoo.com

    thanks and regards,

    zeeshan

  32. TomJackson.Com Says:

    This script is great in demo. I’m having some issues getting it to work on one of my pages. Can you drop me a line so i can ask you some technical questions?

    Error:
    Line 571
    Char 3
    Permission Denied
    Code 0
    URL: http://i2driven.net/i2driven/spellcheck/demo.htm

    Thanx Tom

  33. exe Says:

    Very nice script, however I dont use CGI/Perl, but still looks very good. But woulda been better if it wasn’t an exe so i could learn how it worked ;P

  34. Emil Says:

    exe > The source code for the C++ version is available, see http://me.eae.net/stuff/spellchecker/cpp/

  35. Jason Foshee Says:

    I have it all installed and it runs fine except when I try to run the cgi script in the browser. I get “%1 is not a valid Win32 application.” instead of “data = [];”. What do I have wrong? I’m trying to run this one a W2k server with IIS5 with ActivePerl and Aspell also installed.

    I appreciate all that you have done and think this is a great app!

  36. Jason Foshee Says:

    Ok… I managed to get it to work, but only if I used the PHP script for the dictionary…

    But it works none-the-less.

  37. Matthew Ratzloff Says:

    Ran across a bug. If you have a word that changes formatting midway, it treats the word like two different words.

    Really like the plaintext version. Hope to see this rich text version up and running at some point so I can create a plug-in for FCKeditor.

  38. Rob M Says:

    Hi Emil,

    I can do nothing but reinterate what everyone else has said about how good this spell checker is - its fantastic:)

    I have to build a spellchecker to integrate with the FCKeditor, and i am trying to work out what is the best way about going about this developing this. Im not sure whether to go for the Richtext one - which of course has more support for Richtext:) or for the plaintext one, and move accross the rich text functionality into it, as it seems more developed. What do you think would be best/easiest?

  39. rt Says:

    Emil,

    This is an amazing example , but just was worried as in my case I need a script which would run on the client side as none of our server’s here have MS Office installed.

    Can you suggest me an alternative approach to the CGI and check for the client side .dic file.

    Thanks in advance

  40. Emil Says:

    rt: MS Office has nothing to do with it, all that is required on the server side is aspell, or any other spell checker library with an exposed API. The C++ example works very well on windows.

  41. rt Says:

    Emit thanks for the quick response. We have domino installed and I guess it should be helpful to accomplish this task ? Please let me know and as I do not have experience on CGI based applications needs to know where all modifications needs to go for the API changes to be reflected.

    Thanks in advance

  42. Dee Says:

    Hi,

    I wanted to confirm if I can integrate your spell checked component within my richtext editor in ASP pages?

    Thank you.

  43. Bruce Ritchie Says:

    Emil,

    Very useful script! I’m working on evaluating it for our application, however I can across an issue that has me stumped. As best as I can see the overlay trick you use requires that the textarea have position: absolute; applied to it. That’s fine however I’m seeing having that is breaking all my surrounding div’s (I’m wrapping the textarea in a few div’s for styling and other reasons) such that all the div’s collapse overtop of the textarea.

    Have you ever come across this, and if so, do you have any suggestions on how to overcome this issue?

    Thanks in advance,

    Bruce

  44. Bhargava Says:

    Hi Emil,

    I found your spell checker really great one. But we are using a framework called ES-framework build on IBM WebSphere. Which is more a tag based components. Right now I have not implemented this on it. So I just want to know, can you use this idea is ES-framework such that it doesn’t create any problems.

    Thanks in advance,
    Bhargava

  45. Scott Says:

    I tried to use your code locally but this error:
    Error: uncaught exception: Permission denied to call method XMLHttpRequest.open

    any suggestions?

  46. Daniel Says:

    Great Work. I was looking for this long time ago.
    Thank

  47. Mark Says:

    Has anybody made any progress on integrating Emil’s code with fckeditor? thx

  48. Jitender Says:

    If you want to integrate JSP Spell Checker with FCK Editor, here’s how you will do it:

    NOTE: The reason JSP SpellChecker does not work with FCK Editor by default is because at the time of replacing the contents it gets confused, as it cannot refer the text area anymore. The only way to update the text area which has been rendered by FCKEditor is by using FCKEditor object itself! (remember FCKEditor object creates IFrame etc. around the text area - so it is no longer accessible in the normal fashion!)

    Here we go !~

    //Get the instance of FCKEditor object
    oEditor = FCKeditorAPI.GetInstance( “myTextArea”);//put this as globally accessible variable

    myTextArea is the text area you converted to FCK Editor.

    In spellcheck-functions.jsp, Look for this function: replaceWordInFieldIE.

    Here get handle to oEditor (remember its in your parent window). So you may do something like this:
    oEditor = opener.blah blah…

    var oldHTML = “” + oEditor.GetHTML();
    var originalWord = “” + word; // original word to replace
    var newWord = “” + newWord; // new word to replace
    var newHTML = oldHTML.replace(originalWord, newWord); //new string to set.
    oEditor.SetHTML(newHTML);
    return;

    Ofcourse you will have to take care the your string replacement function is smarter than this, since it deals with html it can potenially replace even the text inside html tags. But this is how you do it !

  49. Girish Singh Says:

    Awesum work, I had imagined such things could be done, but never really bothered with it. You could just create a spell check for asp.net using some code like

    http://www.mabaloo.com/Software-Development/Spell-Checker-Routine-Using-VB.html

    And get things done on server side

  50. Lokesh Says:

    hello there i am lokesh and i need some help in this spell check as we use citrix in our company and i need to add a spell check in one of the application and it is working only when it is stored in desktop and when we use a shard folder it is not working so can you help me in telling how we can use it by uploading it to web. and please contact me on my mail sundar.lokesh@gmail.com

  51. shashikant Says:

    Nice Work!!
    This are working in Mozilla but It’s not working with IE6 and IE7 at all
    Is there any work around ?
    Please help me…

Leave a Reply

(required)
(required, will not be published)