Extending www.borland.pl with Delphi and InterBase
by Pawel Ksiezyk, Borland Developers Conference San Diego 2000
Table of Contents:
- Background information
- PKWeb functionality
- List of why
This paper is description and overview of the application built with Delphi and InterBase using WebBroker technology. The application (PKWeb) is extending functionality of the Borland website in Poland (http://www.borland.pl).
It’s not intended to be exhaustive description of WebBroker technology. I will concentrate on issues that are important for web development in Delphi. I’ll show some specific mechanisms that are foundations of web application. You will read some remarks about making speed optimizations for the application. I will give also an overview of debugging techniques that can be used in building web application. I'll list the issues that I found and the solutions I used for them.
I’ll start with a little overview of the website that this application is working with and the application itself. It can give you better understanding of what needs this application is fulfilling and what ideas lie behind it.
Borland website in Poland was founded in March 1997. It is the homepage of BSC Polska – Borland representative in Poland. Now it consists of more than 400 pages with information about products and services (training, technical support and consulting). It has also plenty of stuff for developers – tips, articles and white papers. Being updated more than 3 times a week it is now major source of information for developers in Poland. The website has more than 3000 page views per day and plenty of developers visit it on daily basis. Of course – you are invited to visit it also. However I have to give you a word of warning - as you may have guessed - all texts there are in Polish.
The information on the website is mostly static HTML pages. They are created with MS FrontPage. After saving HTML page in editor common webpage content (header, common links etc.) is added by Omnimark scripts.
As stated before the website has mostly static pages. We wanted to get more out of it by gathering information about visitors and giving more ways to communicate with customers. The development started on September 1999. The first version was ready in two months of not too intensive work. First results were used in October 1999. As with most of applications the work is being continued and new features are added.
The application is built using Delphi’s WebBroker technology with InterBase as the backend server. PKWeb is one ISAPI DLL. At the beginning the BDE was used to communicate with the server. After some tests it turned out that InterBase Express (components for native communication with InterBase) are reliable enough and in most cases much faster than BDE. Therefore we changed the application to work with InterBase Express. PKWeb was tested and developed using webserver – Sambar and deployed on MS IIS on Windows NT.
The application is actually set of services that power the www.borland.pl site. These services are:
Extending webserver logging of user activity
PKWeb implements the redirections. You can obtain a lot of information about visitors from the webserver log. It shows you the most frequently requested pages, paths through the website and a many other statistics. What it doesn’t show however is how often users are following links outside your website. PKWeb has an implementation of simple redirections. Instead of having the direct links outside, the link points to pkweb.dll. It not only transfers the user to target website but also logs information about it.
Powering the banner system
We have banners and buttons that promote our services and products on our website. Thanks to PKWeb they don’t have to be static and can be changed with every page view. It can boost their influence on visitors significantly.
Powering questionnaires on the web
PKWeb stores the data submitted by visitors via questionnaires in the database. Then they can be accessed easily in structured manner.
Powering the polls
Polls are great tools to get opinions from visitors. PKWeb is responsible for storing information about visitors’ votes and displaying the results.
Powering specialized search engine
Searching www.borland.pl is performed using PKWeb. The application is capable of indexing many different websites and uses ranking system that promotes sites with content valuable to developers
The application performs several functions. The last three from the list above are described in this chapter in details with special emphasis on how they were developed and what problems arose when implementing them.
The first question that everyone should ask is why one should have questionnaires on the website. There are several reasons and if they apply also to your situation you may want to use forms as well. You may want to use them to:
- gather information about visitors
- implement feedback form
- let visitor request more information
- let visitor join the mailing list
Even if you decide that you need questionnaires you shouldn’t start coding :-). There are several ways you can implement them in your website. The forms in HTML have “mailto” action that makes the browser mail the results of the form to specific address. The problem with this approach is that some browsers don’t support it and may behave in strange way when user presses the “submit” button. Majority of ISPs has scripts that can do roughly the same in a way that is supported by almost 100% of currently used browser. The problem with both approaches is that the data are emailed to you and if you want to store it in structured database someone has to insert them manually.
There are specialized companies that helps to run the questionnaires and you can contact one of them (e.g. Borland is using InfoPool). As you can see there are several options and creating web application for questionnaires is not the first choice.
You may ask then why we decided to develop our own custom solution... We do a lot of questionnaires on our website – there are some that are for people that want to join one of our mailing list, some for getting people opinions. New ones are added quite often. A lot of submitted results are coming. We wanted to have full flexibility on one side and automated inserts to database and maintenance on the other. It was very important to be able to add new questionnaires easily. It turned out that the best solution would be to write our own engine.
Since the data from the questionnaires are stored in the database the first task is to design the structure of tables. I considered two possible structures:
- The first one was very general table capable of holding any structure of data – keeping one value of one field in one record. This table would be able to hold all results from all questionnaires. Creating new questionnaires would be very easy but getting results and statistics would require usage of the SQL and will not be really easy.
- The other one was strict structure that would reflect all fields of the questionnaire in one record. It would be easier to use and get statistics from but table would have to be created for every questionnaire.
PKWeb takes the best from both worlds. I create table structure reflecting the content of the form on the fly with first request. The application analyzes all submitted fields and based on this builds CREATE TABLE statement to send to the server. The only requirement is that the first submitted results of the questionnaire have to be complete. I make sure it is true by filling first one myself.
There are two reasons why you may want to have online polls on your website:
It is very good way to gather information about visitors.
You can ask people what they want from our website (we asked if visitors are interested in newsgroups on our server – the answer was clear - YES). You can ask for opinion on many different subjects (we did our own survey on Linux). It is also good idea to let people rate the articles and tips on your website.
Create online community
Polls is way for people to communicate with each other. It could be first step to create online community.
There are plenty of free services in the Internet that helps you build polls. They put banners or links on the pages with poll results in exchange. They are ok as long as you accept banner or links on your website. If not – its easy to build your own poll engine. Poll engine in PKWeb consists of two parts – one is static webpage with the questions, different possible answers, “Vote” button and “Show the results” link.
After clicking on the “Vote” button the visitor’s vote is stored in the database and results are shown as dynamically created HTML page. The example of poll and poll's results with question about adoption of Linux are shown here
When you are creating polls you will sooner or later find the problem of checking whether user already voted or not. This check can be done in two ways. Both have their own advantages and disadvantages.
You can save user information about user vote in cookie on user machine. It is quite easy to do using WebBroker - the code example is below. The cookies are saved in client machine and when user votes again you can check on the presence of specific cookie. If you find one you will not save new vote or replace old vote with new one. However you cannot rely on cookies completely. If user deletes the cookie and restart the browser you will not be able to detect the vote again. Furthermore user can disable saving cookies on his machine thus disabling the logging of the votes.
If you want to have more strict control on who has already voted you can also save info about IP address of the machine from which someone submitted results. The problem with IP logging is that is often too restrictive – there are many companies that access Internet through firewall for additional security. Every employee from such company may be identified by just one IP address. Therefore if you use IP logging you may disable voting from many domains. On the other hand a lot of people use dialup connection to the Internet. In this case temporary IP address is assigned for such connections. It means that every time user logs to the Internet he will have different IP address and IP logging will allow such user to vote many times.
procedure TWebDane.WebDispatcher1web_poolAction(Sender: TObject; Request: TWebRequest; Response: TWebResponse; var Handled: Boolean); var vote_no : Integer; vote_wyn : Integer; begin [...] with Response.Cookies.Add do begin // I save number of voting Name := 'vote_no'; value := IntToStr(vote_no); expires := now + 30; // The cookie will expire in 30 days end; with Response.Cookies.Add do begin // the results of the voting Name := 'vote_result'; value := IntToStr(vote_wyn); expires := now + 30; // The cookie will expire in 30 days end; [...] end;
The other piece of code that you have to write is creation of results page. Thanks to WebBroker technology you can use HTML template page and insert only specific fragment of HTML that is responsible of rendering of the result table. Example of result table for Linux poll is shown here:
If you look closely you will notice how the table is built. It is actually set of tables. The HTML code of this fragment looks like this:
<table border="1" cellpadding="3" cellspacing="1"> <tr> <td align="center" colspan="3" bgcolor="#6A96CC"><font size="2" face="Arial"><strong>Czy zamierzasz tworzyc aplikacje dla Linux-a?</strong></font></td> </tr> <tr> <td width="260"><font size="2" face="Arial">Chyba tak</font></td> <td align="center" width="30"><font size="2" face="Arial">23%</font></td> <td width="200"> <table border="1" cellpadding="0" cellspacing="0"> <tr> <td width="46" bgcolor="#CB3434"><font size="2" face="Arial"></font> </td> </tr> </table> </td> </tr> [... next results] <tr> <td colspan="3" bgcolor="#6A96CC"><font size="2" face="Arial">Oddanych glosów: 26</font></td> </tr> </table>
Now I will give you some tips that can help in building web application. You will probably find them useful during work on even the most basic ISAPI dll. I’ll show more of them when we proceed to creating web search engine.
It is worth to raise one issue that every developer of web application will find sooner or later. The important question is – “How to pass parameters from webpage to web application?” The answer is not simple. There are several method of doing this – not everyone suitable for every situation. You can use:
The most obvious and probably most frequently used one is passing parameters within url of the webpage. Majority of search engines uses this to pass search phrases and additional information. Major advantage of this approach is that if you keep all parameters in url than it can be safely copied from one machine to another. I can mail address of such page to friend saying what great page it is with suggestion to go there. You can store link to such dynamically created page. It can be difficult with other approaches. The drawback is that such url tends to be very large and complex.
With hidden fields you don’t have the problem with large urls. You can have multiple hidden fields on the page. But they have to be part of the form and you cannot save a link to the dynamically created webpage – hidden pages values are passed only if you press “Submit” button on the form.
Usage of cookies has been discussed above. They are very good tools to keep state information like user preferences and general-purpose data. The problem with cookies is that user can delete them from time to time. Of course information contained in cookies applies only to specific machine and cannot be easily transferred between machines.
Another basic issue that you will soon find is the problem with testing ISAPI dll on the webserver. I’ll give more overview of debugging of the web application below. For now I want to just mention the problem with replacing the dll with new, freshly compiled version. When you access your application through webserver the dll is attached to webserver process and marked as used. Operating system doesn’t allow then to delete or to replace it. Many webserver (including MS Personal Web Server that comes with Internet Explorer) have problems with this and are unable to release the dlls. MS PWS doesn’t do this even after stopping. Only machine restart seems to help which slows down work on web application a lot. The solution I’ve found was simple - use different webserver. I use Sambar that can be downloaded from Tucows. It is nice, fast and free webserver that is very good for testing purposes. You can release the dll by right clicking Sambar icon in system tray.
Since you usually create application in test environment with test webserver it is vital to make your web application as customizable as possible. You have to make sure it can be moved easily to production environment. It can be done by utilizing configuration files.
There is one question that you have to answer when creating and managing application configuration – “When will configuration files be read?” If you do it at application startup it can be too seldom because application startup happens with the first request and then the dll is not unloaded until webserver stops. Usually you cannot stop the webserver often. If you have tens of requests in minute you cannot afford to switch webserver off just to reinitialize your application. On the other hand – if you reinitialize the application with every request it may be slowing the whole system too much. The solution that I use is to have special flag in configuration file that says – configuration of your application has changed, you have to reinitialize. I check this flag for every request and reinitialize only when necessary.
There are tons of information out there in the Internet. Even if you have very valuable data on your website – if they are not easily accessible – they are worthless. If your website has less than 100 pages you can live without search engine without problem. If you have more than 100 pages – search engine is highly welcome. If you have more than 500 pages – the search engine is a must.
So the next question is how rather than why to have search engine. The searching of information on a specific website is offered as free service by many search engines. You can take Infoseek as an example (BTW – Borland is using Infoseek). The majority of webservers has built-in support for indexing and searching websites.
You may wonder – why have we decided to build custom application to do searching on our site?. This is very good question indeed. The reason is – we wanted to have very specialized search engine:
It should index only specific websites
In first phase we wanted to give searching engine only for our own site. However we want to extend searching to cover best sites that offer valuable content for developer community. We want to be able to manually add specific websites or webpages to the list of indexed websites. Majority of search engines have problem of indexing too much – more than half of results of your search point to worthless or spam sites who manage to rank high by carefully choosing set of keywords that appear on their pages. We want to have search engine that is indexing pages by invitation.
We need a different ranking system
Majority of webpages has system that sets the rank of the page based on the appearance of words you are looking for. Our search engine of course works in similar way but there are also some differences. E.g. - we want to rank high pages that have source code on them. We want to higher rank of pages that offer good content for developers. We also want to be able to set priorities for pages or sites so we have greater control over the results.
Based on these requirements we decided to build custom solution for searching the websites. PKWeb has three modules that do different parts of preparing and executing searches:
- Downloading HTML pages to the hard drive for offline processing
- Indexing them and building searching tables
- Searching for keywords submitted by user
Downloading the pages
First we download webpages to specific directory on hard drive so the pages are ready for analysis and indexing phase. Due to a lot of problems with downloading whole websites for off-line browsing and availability of plenty good tools that can do it easily we haven’t coded it ourselves. We use Teleport that can have schedule of download of specific sites and is easy configurable to download only the files that we need.
When all HTML files from indexed sites are loaded into local hard drive I start indexing phase. For every HTML file I find I build word index. I have table that stores all the words that appear on the page with rank that is computed by examining the content of the page. The rank is based on several factors:
- relative number of the word occurrences on the page
- rank is multiplied by priority of the page
- rank is multiplied by priority of the website that page is in
- if the word appears in the title the rank is increased
- if the word appears in name of the page the rank is increased
- if there is source code on the page the rank is increased
- if the page is index.htm or default.htm the rank is increased
- if the word is appearing way too often on the page the rank is decreased
As a result huge table with word index is computed. It can be used then for the searching purposes.
When user is searching for something using PKWeb he submits search phrase. The search phrase consists of one or more words with optional operators. I like the style of search phrase that AltaVista uses so I based my own on similar ideas. If user wants to find only the pages that has specific word he puts “+” sign. If he wants to search for pages that don’t have specific word he puts “-“ sign in front of the word. If he puts word without “+” or “-" the word is just used to order pages by ranks. If the user wants the search to be case insensitive he just writes whole search phrase in lowercase. Otherwise search will be case sensitive.
The example: Users types following search phrase - “+delphi +webbroker –asp”. PKWeb will find information about Delphi AND webbroker that doesn’t have mentioned ASP. The search will be case insensitive.
The actual searching is done by issuing multiple SELECT queries. First I build the list of words with “+”, “-“ and without operators. I call them AND, NOT and OR groups. If there is any word in AND group I get the list of pages that has words in AND group and combine it making sure that every result page has all the words from AND group. If the resulting list of pages is not empty I subtract pages that has words in NOT group one by one. Then I update the ranking of result pages by checking pages that have words in OR group. As a result I get list of pages that comply with user request and order them by rank.
It’s not that difficult. However it wasn’t piece of cake either. Plenty of issues and problems occurred during work on search engine. I’ll write here about some of the problems I had and describe the way I dealed with them.
Problem with indexing tables getting too big
Using the indexes of word can speed up performance of queries significantly. However the word index itself can get really big. Lets take the HTML page with paper that you are reading as an example. It has about 6000 words on it. Our www.borland.pl website has more than 400 pages. Most of them are not as big as this one but I can estimate that average size of webpage is about 3000 words. So information about this website itself can take much more than 1000 000 records in word index. If we want to index tens of good websites it can become problem.
To lower the size of index I decided to index only words longer than two characters (in this document there are about 1000 words that consist of one or two letters). I don’t index majority of special characters – I index a-z A-Z 0-9 + - _ and polish national letters. Everything else is not indexed. It allowed me to lower the size of index significantly.
Problem with too slow indexing
My plan is to index all pages on www.borland.pl and other valuable sites for developers so the indexing engine should be fast. When I first implemented the indexing the tests that I performed on my notebook showed that I could index 2 – 4 pages a minute. It would mean that indexing just our own website would take hours. Considering the plans to index much bigger number of pages the performance of indexing engine was much too slow.
- I started to optimize the engine. I looked at the InterBase - SQL Server I was using. InterBase uses forced writes by default. It means that with every operation on the database InterBase bypasses the system cache and writes directly to the hard drive. It gives additional safety in case of system or machine failure at the expense of performance. Since the indexing process are mostly writes to the server the forced writes slow down the whole process significantly. The production server was using UPS so the threat of machine failure is low. I decided to turn off forced writes.
- I replaced BDE with InterBase Express – the sent of components that allow native access to InterBase. My tests show that they are much faster than BDE in majority of situation.
- I started to control transactions manually so indexing whole page is one transaction instead of short transactions I used previously.
After those changes I was able to index up to 10 pages per minute which was still far from satisfactory.
- The last optimization I made was to move ranking calculation and preparing the word index for one page to the clientside. I prepare external text file that holds word index for one page and show it to InterBase as external table. InterBase can access external tables almost as if it were standard server tables. They just have to be text files with fixed length fields. On the serverside they can be copied with simple SELECT statement. This final optimization gave me huge boost in performance. Now I’m able to index up to 2 pages in second on my slow testing machine. It means that after moving to more powerful production system the indexing process of www.borland.pl takes less than 5 minutes.
Problem with slow searching
As you have seen in the description of the search process the user search could invoke many queries on the server. For every word that appeared in search phrase SELECT statement was sent to the server. A lot of users simultaneously performing search could be using a lot of computing power on the server and could slow the server down. My experience here is – make sure that the queries that are most frequently used utilize the indexes properly. Create indexes and make sure they are actually used. You can do it in InterBase by running query in WISQL with SHOW PLAN option on. With proper usage of indexes queries on the word index table are pretty fast.
If you look at what people are searching for on your website you will probably find that they are looking for roughly the same things. Having this in mind I implemented simple caching in PKWeb. I have cache table that keeps current user request and save the results of the request as ClientDataSet file on local hard drive. When new query comes I first check if it is not already in the cache and if this is the case I just show it. From time to time I scan through cache looking for files that weren’t accessed lately and I remove them from cache.
My tests show that the caching really speed things up!
Problem with browsing the results
When the user specify very general search phrase the number of results is too high to fit into one page. There should be way for user to browse through the result set. The solution I use is to specify “from” and “to” parameters in search url. When queries for words return the list of the pages I just jump to the record number specified by “from” and show pages up to “to” record. Thanks to the caching algorithm I use I don’t have to worry about keeping the results of the search. I just search once more and the results are taken from cache file.
Problem with reliability of web application
The ISAPI DLLs runs in the same process space as webserver. Therefore the DLL problem can affect stability of the whole webserver. To minimize the threat many companies have separate server for running web application. If the server fails at least static pages will be served normally.
When building the web application you have to watch very carefully for any bugs it may have. You have to remember that it should run for a long time without any problem serving even high number of requests. The answer is to test, test and test even more! Especially at the early stage after deploying the application watch it carefully for any misbehavior.
In my case - after deploying the application the SQL server crashed every two days. After upgrade from version 5.5 to version 5.6 the problems stopped.
Problem with debugging of web application
Debugging of the web application is different from typical debugging. Delphi has very powerful debugging features for testing standalone executables. It can also test DLLs providing you have host application that can be run to test it. The problem with ISAPI DLLs is that they are not loaded at server startup. Therefore typical DLL debugging techniques have no use here. In Delphi 5 however the new feature was added – “Attach to process”. You can use it to attach to different process (even running on another machine) and place breakpoints on source code of modules it uses. I found it to be excellent and very valuable tool for debugging ISAPI DLLs. Just make sure you compile the DLL before attaching to server and set the breakpoint on code that will actually be executed!
Of course there are also more traditional debugging techniques. I catch and save every exception that is raised inside my application to my own log file. I have also plenty of debug information that I send to log with intermediate results, values of important variables and so on. In my configuration files I have three levels of debug information that allows me to filter logging based on the severity of the message.
I decided to create web application using Delphi and InterBase. You may ask for reasons behind this decisions. Here are the answers – here is why:
I develop applications using Delphi since version one and I am very productive using this tool. The architecture behind it makes it easy to leverage your skill that you have for building database or multimedia application and apply it for Internet development. Delphi creates very reliable and fast applications. It has also very powerful debugging tools.
It’s just easy with Delphi!
InterBase has very tight integration with Delphi, especially with introduction of InterBase Express components in Delphi 5. The applications build with Delphi, InterBase Express And InterBase are very fast and has minimal configuration, installation and maintenance required. InterBase is reliable (in version 5.6 there is no problem with it). InterBase has very small footprint, which is very important for us. We have web server, ftp server, newsgroup server and InterBase on one machine and the SQL server shouldn't consume too much of server power.
If out of the box solutions are not sufficient for you, if your ISP doesn’t offer you all necessary functionality please remember that you can build your own custom solutions. With Delphi and WebBroker technology it is easy. You can easily extend functionality of your website with questionnaires, polls or even custom search engine.