Main Page Content
Dhtml Text Marker An Experiment
What is a text marker?
Anyone who uses google newsgroups search will encounter a text marker a.k.a
highlighter. For instance, search on http://groups.google.com for a recipe to make Minestrone. View any of the search results, the word 'minestrone' will appear highlighted (see screen-cap below). This feature makes a search engine on a website much more user friendly, the user now does not have to scan the page visually for the part of the text that has the 'minestrone' recipe. Instead, the user now knows immediately which part of the content has the recipe. Google achieves this 'marking' of the text by applying a<span style="background-color:#color">
tag around the search keyword.We will look at one particular way to implement this feature. alt="Google groups page" align="right" border="1" width="260" height="280" /> RIGHT : Screen shot of page from Google Groups, I searched for a recipe to make 'Minestrone' (this is a tasty italian creation). This was one of the search result pages that I opened.
Various Implementation methods
There are a lot of "out of the box" search engine tools available
that can be used with a website. A few of these search engine solutions come with some kind of search results text highlighting, but what about the rest? (I am aware of a couple : Lotus Domino search and Perlfect Search ). Some people also end up rolling out their own site-specific search engine solutions.A good way to implement this feature would be on the server side (as Google
Groups search does). This involves script based processing of the content using the search keyword (in our example: minestrone) as the parameter, and then output the tagged/highlighted content to the browser. The negative side of this approach is that a different implementation might be needed for a different server side platform. Like many kinds of server side programs this could also turn out be be resource and processor intensive.I decided for the quick 'n' dirty client-side JavaScript route - which I thought
I could then re-use on any server side platform or even for static html pages (though as you read on you will learn I wasn't exactly successful).Compatibility Issues
There are a few compatibility issues in the JavaScript approach. The original
(and single unto now) guinea-pig-implementation for the code is a restricted Intranet scenario (my workplace). Almost everybody is compulsorily on IE 6, so I basically wrote it without giving a damn about other browsers (now, is that evil or what?). But the good news is I got it to work with minimum fuss on Mozilla (yeah! I am a decent guy now). I have tested it successfully on the following browsers:- IE 6 (Windows versions)
- Mozilla 1 RC1
I tested the script on IE 5.0 and IE crapped out completely: when the browser
reached a particular regular expression function in the code my CPU hit 100% processor utilization and IE came to a grinding halt. I guess the regular expression object in IE 5 isn't anywhere as sturdy as the one in IE6 and Mozilla.I don't have IE 5.5, IE 5.01, Netscape 6.x installed, nor do I have MacOS running
anywhere close by, so I have no idea of how the code would behave on those browsers. Though if the code is not working on IE 5, with IE 4.x your guess is as good as mine. Theoretically it should be possible to make it work on NN4 or any browser that supports theinnerHTML
object, (i.e. if the regular expression object can handle the stress). I did try to make it work on Opera 6 but it doesn't support innerHTML
:-(.Which is why I have called this article an experiment ! But at least the article
is futuristic with its upward compatibility ;).JavaScripting our implementation
Here is a summarized list of steps needed to implement this:
- Enclose the main content portion of the page within a single named
<div>
tag - Add an
onLoad
event to the<body>
tag . - Call the highlighting function from the
onLoad
event, if a particular query string has been passed to the page.
Step 1 : <div>
ing your content
To begin with I designed all my content pages such that, the whole content
section on a page was wrapped inside a named<div>
tag, something like this:<div id="contentdiv"><!--begin content section--> <p> skajfl sa fjla safkljasl lkfasj ja akjflka .....<br> <a href="dudu.com">my link</a> <br> </p> <ul><li>....</ul> <table> ......</table> <p>......</p></div><!--end content section-->
Note: this is the content area of the page, kind of similar to the article
content part of a page on evolt.org. This doesn't include stuff like the evolt side bar or the top menu.Why did I have to use a <div>
tag?
More for convenience, to access all the HTML Content area in a page through
JavaScript all I would need to do now is: elemObj = document.getElementById('contentdiv');
strInHtml = elemObj.innerHTML;
strInHtml
is now a string containing all the HTML contained by <div
innerHTML
property of the <div>
tag to access the raw HTML. The good thing about the innerHTML
property is , it is also a settable property. I can do something like:x.innerHTML = '<h2>My new stuff </h2>';
which will overwrite all the HTML within the <div>
with my
Note: There is a big ongoing
debate about the pros and cons of usinginnerHTML
. Read all about it!. The fact remains that innerHTML
is very convenient as against using the more complicated DOM methods which seems the more politically correct method.Step 2 : the onLoad
event
When a keyword on the page needs to be highlighted, the keyword is passed to
the page using a query string. If the page is normally invoked like this:http://server/page/index.asp
With the highlight query string it will be invoked like this:
http://server/page/index.asp?hilite=minestrone
Add an onLoad
event in the <body>
tag. Something like:
<body onLoad="javascript:onLoad();">
Step 3 : Calling the highlighting function from onLoad
Here I read the innerHTML
property of the <div>
<div>
. The highlighting is achieved using a simple <span>
tag with style="background-color:yellow;"
.There is a precaution to take here when inserting the tags, consider some content
like this:<p> ............. <!--content-->............ <a href="minestrone.com" title="link to minestrone home page">minsestrone home page</a>. ............. <!--more content-->............</p>
The replace function should not place highlight tags around the word minestrone
found within the "title" attribute of the<a>
tag, that would break the HTML. It should replace only stuff within the <a></a>
tags. I used the javascript RegExp object to filter out illegal matches of this kind.Just a couple of points before we dive into the actual code.
- Instead of trying to match only text within an opening
<*>
and closing<*/>
I match everything to the right of
<* >
. This is because not all tags have opening and closing pairs and people invariably forget to close the good old<p>
tag. - The code assumes there are no
<script>
tags within the content. I didn't build any checks in for these tags.
There is some preliminary stuff that onLoad
does, that I will not be getting
- Extracting the keyword to be highlighted from a querystring using a javascript DOM property (
document.location.href
). - Extracting the
innerHTML
from the<div>
again using the DOM properties. - Writing back the highlighted text into the
<div>
'sinnerHTML
.
The actual function that applies the <span>
highlighting tags to the
innerHTML
is quite small , lets examine it part by part:function markText(txtKeyword, inputHtml) { var re; /*regex object*/var varMatches; /*matches array*/var outHtml; /*output html*/var replaceText;/*build the span tag with the keyword in advance*/replaceText = '<span style="background-color:yellow;color:red;font-weight:bold;">'+txtKeyword+ '</span>';
The function takes two paramters, the keyword to be highlighted (txtKeyword
)
innerHTML
property (inputHtml
).All the neccessary string and regular expression object variables are declared.
The highlighted keyword string is built up in advance, in Line 6 by prefixing & suffixing it with a<span style=....>
tag. re=new RegExp("(\<[^>][^<]*\>)([^<]*)","g"); /*create non-greedy regex match*/outHtml=new String(''); /*init html string*/
A new instance of the RegExp
Object is declared ,every opening
<
) and closing (>
) tag is matched and any non-tag expression to the right of the closing tag. The second parameter to the RegExp object ("g") indicates that the RegExp match will be done recursively(globally).I had to slip in the extra [^<]
while ((varMatches = re.exec(inputHtml)) != null)/*exec sequentially to apply span tags*/ {outHtml+=varMatches[1]; /*html tag part*/outHtml+=replaceMe(varMatches[2], txtKeyword, replaceText); /*call the search & replace function*/ }return outHtml;}
The innerHTML
string is now evaluated against the regular expression
exec()
method searches the string using the regular expression and returns an array (varMatches
) containing the results of the search. Dimension 1 of the array (varMatches[1]
) contains the matched HTML tag and Dimension 2 (varMatches[2]
) contains the non-tagged text to the right of the matched tag . For example if the following is one of the matches :
<p class="xclass">hello there
varMatches[1]
would contain <p class="xclass">
and
varMatches[2]
would contain the string: "hello there"The string in varMatches[2]
is now searched for the keyword to
<span>
tagged keyword (using the replaceMe()
function).Subsequently the highlighted output string from the markText()
<div>
tag by setting the innerHTML
property , something like :contentDivObj.innerHTML = strOutputFromMarkText;
The sample code should be self explanatory and it is commented. Most of the
layer writing methods like reading and setting theinnerHTML
, I learnt from ppk's website.That's about it.
There is a working example available : DHTML marker sample
Some possible improvements / optimizations :
- Portable code for other minor browsers .
- Right now the code treats multiple keywords as a phrase, changing this code to handle each word in the phrase individually shouldn't be hard to implement .
- I don't do character code conversions. For example: if someone searched for a word like: bonnie&clyde. I don't convert it to bonny&clyde. So maybe this could be added.