This week I just work on very simple things:
How to weight a word in a text for ranking?
First of all it seems very simple as give all of them same rank like 1, but when it goes in the real world we would find there are many problem occurs with this method. I'm trying to explain each of problem and methods I found to cover each in a series of posting...
First problem I faced it, is we have different zone in a single text. Let's define zone as a particular slice of text that has special meaning to us, like title, abstract, body, header...
Ok, If we found a word like cake in the title, I'm sure we can rate it a bit higher that if we saw it in the body. and so on. But New question will be found here: How much? should we rate it half or double or ?. should we weight items in more <h1> than <h2> ? these are exactly open question and you should find best value for your repository itself.
In the next post I explain the methods of normalization of word weight for a static repository.
Monday, September 1, 2008
Subscribe to:
Post Comments (Atom)

0 نظرات:
Post a Comment