Dhtml Text Marker An Experiment

Posted on 11 Jul 2002

in Code

by Ashok Hariharan (Junglee)

Rated 3.93 (Ratings: 3)

Want more?

More articles in Code

Ashok Hariharan

Member info

User since: 28 Jan 2002

Articles written: 5

What is a text marker?

Anyone who uses google newsgroups search will encounter a text marker a.k.a

highlighter. For instance, search on http://groups.google.com

for a recipe to make Minestrone. View any of the search

results, the word 'minestrone' will appear highlighted (see screen-cap below).

This feature makes a search engine on a website much more user friendly, the

user now does not have to scan the page visually for the part of the text that

has the 'minestrone' recipe. Instead, the user now knows immediately which part

of the content has the recipe.

Google achieves this 'marking' of the text by applying a

<span style="background-color:#color">

tag around the search keyword.

We will look at one particular way to implement this feature.

alt="Google groups page"

align="right" border="1" width="260" height="280" />

RIGHT : Screen shot of page from Google Groups,

I searched for a recipe to make 'Minestrone' (this is a tasty italian creation). This was one of

the search result pages that I opened.

Various Implementation methods

There are a lot of "out of the box" search engine tools available

that can be used with a website. A few of these search engine solutions come

with some kind of search results text highlighting, but what about the rest?

(I am aware of a couple : Lotus

Domino search and Perlfect

Search ). Some people also end up rolling out their own site-specific search

engine solutions.

A good way to implement this feature would be on the server side (as Google

Groups search does). This involves script based processing of the content using

the search keyword (in our example: minestrone) as the parameter, and

then output the tagged/highlighted content to the browser. The negative side

of this approach is that a different implementation might be needed for a different

server side platform. Like many kinds of server side programs this could also

turn out be be resource and processor intensive.

I decided for the quick 'n' dirty client-side JavaScript route - which I thought

I could then re-use on any server side platform or even for static html pages

(though as you read on you will learn I wasn't exactly successful).

Compatibility Issues

There are a few compatibility issues in the JavaScript approach. The original

(and single unto now) guinea-pig-implementation for the code is a restricted

Intranet scenario (my workplace). Almost everybody is compulsorily on IE 6,

so I basically wrote it without giving a damn about other browsers (now, is

that evil or what?). But the good news is I got it to work with minimum fuss

on Mozilla (yeah! I am a decent guy now). I have tested it successfully on the

following browsers:

IE 6 (Windows versions)

Mozilla 1 RC1

I tested the script on IE 5.0 and IE crapped out completely: when the browser

reached a particular regular expression function in the code my CPU hit 100%

processor utilization and IE came to a grinding halt. I guess the regular expression

object in IE 5 isn't anywhere as sturdy as the one in IE6 and Mozilla.

I don't have IE 5.5, IE 5.01, Netscape 6.x installed, nor do I have MacOS running

anywhere close by, so I have no idea of how the code would behave on those browsers.

Though if the code is not working on IE 5, with IE 4.x your guess is as good

as mine.

Theoretically it should be possible to make it work on NN4 or any browser that

supports the innerHTML object, (i.e. if the regular expression

object can handle the stress). I did try to make it work on Opera 6 but it doesn't

support innerHTML :-(.

Which is why I have called this article an experiment ! But at least the article

is futuristic with its upward compatibility ;).

JavaScripting our implementation

Here is a summarized list of steps needed to implement this:

Enclose the main content portion of the page within a single named <div> tag

Add an onLoad event to the <body> tag .

Call the highlighting function from the onLoad event, if a
particular query string has been passed to the page.

Step 1 : `<div>`ing your content

To begin with I designed all my content pages such that, the whole content

section on a page was wrapped inside a named <div> tag, something like

this:

<div id="contentdiv"><!--begin content section-->
  <p> 
	  skajfl sa fjla safkljasl lkfasj ja akjflka .....<br>
	  <a href="dudu.com">my link</a> <br>
  </p>
  <ul><li>....</ul>
  <table> ......</table>
  <p>......</p>
</div><!--end content section-->

Note: this is the content area of the page, kind of similar to the article

content part of a page on evolt.org. This doesn't include stuff like the evolt

side bar or the top menu.

Why did I have to use a `<div>` tag?

More for convenience, to access all the HTML Content area in a page through

JavaScript all I would need to do now is:

elemObj = document.getElementById('contentdiv'); strInHtml = elemObj.innerHTML;

strInHtml is now a string containing all the HTML contained by <div

id='contentdiv'>. I used the innerHTML property of the <div>

tag to access the raw HTML. The good thing about the innerHTML

property is , it is also a settable property.

I can do something like:

x.innerHTML = '<h2>My new stuff </h2>';

which will overwrite all the HTML within the <div> with my

new stuff. Our implementation will be using this powerful little DOM property.

Note: There is a big ongoing

debate about the pros and cons of using innerHTML. Read

all about it!. The fact remains that innerHTML is very convenient

as against using the more complicated DOM methods which seems the more politically

correct method.

Step 2 : the `onLoad` event

When a keyword on the page needs to be highlighted, the keyword is passed to

the page using a query string. If the page is normally invoked like this:

http://server/page/index.asp

With the highlight query string it will be invoked like this:

http://server/page/index.asp?hilite=minestrone

Add an onLoad event in the <body> tag. Something like:

<body onLoad="javascript:onLoad();">

Step 3 : Calling the highlighting function from `onLoad`

Here I read the innerHTML property of the <div>

as raw HTML into a string variable. Then I do a search and replace of every

instance of the keyword with the highlighted version of the keyword. Then finally

I write the replaced version of the string back into the <div>.

The highlighting is achieved using a simple <span> tag with style="background-color:yellow;" .

There is a precaution to take here when inserting the tags, consider some content

like this:

<p>
  .............
  <!--content-->............
  <a href="minestrone.com" title="link to minestrone home page">minsestrone 
  home page</a>.
  .............
  <!--more content-->............
</p>

The replace function should not place highlight tags around the word minestrone

found within the "title" attribute of the <a> tag,

that would break the HTML. It should replace only stuff within the <a></a>

tags.

I used the javascript RegExp object to filter out illegal matches of this kind.

Just a couple of points before we dive into the actual code.

Instead of trying to match only text within an opening <*>
and closing <*/> I match everything to the right of
<* > . This is because not all tags have opening and closing
pairs and people invariably forget to close the good old <p>
tag.

The code assumes there are no <script> tags within the content. I
didn't build any checks in for these tags.

There is some preliminary stuff that onLoad does, that I will not be getting

into explaining here as they have been dealt with earlier in articles by other

people on this website:

Extracting the keyword to be highlighted from a querystring using a javascript
DOM property (document.location.href).

Extracting the innerHTML from the <div>
again using the DOM properties.

Writing back the highlighted text into the <div>'s innerHTML.

The actual function that applies the <span> highlighting tags to the

innerHTML is quite small , lets examine it part by part:

function markText(txtKeyword, inputHtml) 
{	
var re; /*regex object*/
var varMatches; /*matches array*/
var outHtml; /*output html*/
var replaceText;/*build the span tag with the keyword in advance*/
replaceText = '<span style="background-color:yellow;color:red;font-weight:bold;">'+txtKeyword+ '</span>';

The function takes two paramters, the keyword to be highlighted (txtKeyword)

and the raw HTML content string extracted using the innerHTML property (inputHtml).

All the neccessary string and regular expression object variables are declared.

The highlighted keyword string is built up in advance, in Line 6 by prefixing

& suffixing it with a <span style=....> tag.

re=new RegExp("(\<[^>][^<]*\>)([^<]*)","g"); /*create non-greedy regex match*/
outHtml=new String('');	/*init html string*/

A new instance of the RegExp Object is declared ,every opening

(<) and closing (>) tag is matched and any

non-tag expression to the right of the closing tag. The second parameter to

the RegExp object ("g") indicates that the RegExp

match will be done recursively(globally).

I had to slip in the extra [^<]

in the first part of the expression, sometimes the match used to bomb on encountering

a non-visible character. The extra expression seemed to fix that.

	
while ((varMatches = re.exec(inputHtml)) != null)/*exec sequentially to apply span tags*/
 {
outHtml+=varMatches[1]; 	/*html tag part*/
outHtml+=replaceMe(varMatches[2], txtKeyword, replaceText); /*call the search & replace function*/
 }
return outHtml;
}

The innerHTML string is now evaluated against the regular expression

object. The exec() method searches the string using the regular

expression and returns an array (varMatches) containing the results

of the search. Dimension 1 of the array (varMatches[1]) contains

the matched HTML tag and Dimension 2 (varMatches[2]) contains the

non-tagged text to the right of the matched tag .

For example if the following is one of the matches :

<p class="xclass">hello there

varMatches[1] would contain <p class="xclass">

and

varMatches[2] would contain the string: "hello there"

The string in varMatches[2] is now searched for the keyword to

be highlighted and every instance of it is replaced with the <span>

tagged keyword (using the replaceMe() function).

Subsequently the highlighted output string from the markText()

function is written back to the <div> tag by setting the

innerHTML property , something like :

contentDivObj.innerHTML = strOutputFromMarkText;

The sample code should be self explanatory and it is commented. Most of the

layer writing methods like reading and setting the innerHTML, I learnt from

ppk's website.

That's about it.

There is a working example available : DHTML marker sample

Some possible improvements / optimizations :

Portable code for other minor browsers .

Right now the code treats multiple keywords as a phrase, changing this code
to handle each word in the phrase individually shouldn't be hard to implement
.

I don't do character code conversions. For example: if someone searched
for a word like: bonnie&clyde. I don't convert it to
bonny&clyde. So maybe this could be added.

Ashok is based in Nairobi, Kenya. When not busy dodging vagrant matatus in Nairobi traffic, he keeps himself upto date by evolt-ing.

Start of page header

Other Fine Evolt.org Sites

Navigation Starts

Submit

Article Categories

Highest rated articles

Help Support evolt.org

Main Page Content

Dhtml Text Marker An Experiment

Want more?

Ashok Hariharan

What is a text marker?

Various Implementation methods

Compatibility Issues

JavaScripting our implementation

Step 1 : `<div>`ing your content

Why did I have to use a `<div>` tag?

Step 2 : the `onLoad` event

Step 3 : Calling the highlighting function from `onLoad`

Start of page header

Other Fine Evolt.org Sites

Navigation Starts

Submit

Article Categories

Highest rated articles

Help Support evolt.org

Main Page Content

Dhtml Text Marker An Experiment

Want more?

Ashok Hariharan

What is a text marker?

Various Implementation methods

Compatibility Issues

JavaScripting our implementation

Step 1 : <div>ing your content

Why did I have to use a <div> tag?

Step 2 : the onLoad event

Step 3 : Calling the highlighting function from onLoad

Step 1 : `<div>`ing your content

Why did I have to use a `<div>` tag?

Step 2 : the `onLoad` event

Step 3 : Calling the highlighting function from `onLoad`