BDb (htlib)
Replaces DB2
Compiled, but not completed yet.
Uses UnicodeString::getBuffer (maybe questionable).
Header only.
Singleton, with the URL rewrite rules.
Compiled.
Done with QuotedStringList
Next handle the List
argument of checkSyntax
,
invoked from htsearch.cc
,
maybe a list<WeightWord>
?
No... These are UnicodeString
s, so either a list
or a vector
of them. Maybe done.
Made result
a vector<ResultList>
(was a List*
).
Let's take the vector, because of operator[]
Now, WordList: done.
So continuing in parser
.
Made partial changes for Berkeley Db...
Also in the Makefile, but not in the template.
perform_push:
I don't understand what should happen there.
The role of wildcard is not clear.
The key is not checked!?
I'm just slurping the db...
Not sure where I push to...
In the old code, was p the key or the data?
And what is the data? an index, convertible to an int?
It depends on the db...
I decide to understand that the key is a (now unicode) string
and the data in doc_index is (convertible to) an int
(in dbf, it is WordRecord).
temp is compared to the key.
Not sure what to store as the key, from the unicode string.
What about the value returned from getBuffer?
I have already done that in WordList
and Configuration...
Note that the key is truncated to maximum_word_length
Done, for Parser
.
This is a vector rather than a list.
Inherited from Dictionary
(deleted)
map<char const*, DocMatch>
Only const members... May be a problem?
Deleted.
In parser
,
a vector<ResultList>
,
since stack::pop
doesn't return anything,
vector<UnicodeString>
with parse/split constructor.
Skips white space and punctuation.
Use through iterators.
Uses vector
in order to have operator[]
Done, for use in htsearch
Done, for htsearch
Built around a map<UnicodeString, WordReference>
For valid_word
:
alpha> ./alpha foo
text: foo
alpha1: 1, alpha2: 1
alpha> ./alpha foo2
text: foo2
alpha1: 0, alpha2: 0
alpha> ./alpha foo!
text: foo!
alpha1: 0, alpha2: 0
alpha> ./alpha таня
text: таня
alpha1: 1, alpha2: 1
alpha> ./alpha foo_bar
text: foo_bar
alpha1: 0, alpha2: 0
alpha> ./alpha foo-bar
text: foo-bar
alpha1: 0, alpha2: 0
alpha> ./alpha Épaminondas
text: Épaminondas
alpha1: 1, alpha2: 1
Done (small doubt about u_fopen_u
,
which may not support the append mode of fopen
,
although it is a wrapper, so it should work...
Otherwise, only use 'rw', and fseek(fl, 0, SEEK_END);
Record stored inthe db.
Used in parser
Not done.
Header only struct.
Done.
Just superficial changes to compile WordList
The issue of valid_punctuation
,
from htcommon/defaults.cc
, is unclear.
Ignored in HtStripPunctuation
.
Parser: done.
At least header only.
Probably not needed at all. Not referenced explicitly.
Deleted.
if (mystrncasecmp(word, "exact:", 6) == 0)
{
word += 6;
isExact = 1;
}
becomes:
while (pos = str.indexOf(UnicodeString("exact:"))) {
if (pos != -1) {
str.remove(pos, 6);
isExact = true;
} else break;
}
error << ' ' << boolean_keywords[1] << " '"
<< boolean_keywords[1] << "'";
Added in htcommon/uhelper.h
In htsearch/parser.cc,
not too sure this is a good idea...
temp.toLower();
char* p = (char*)temp.getBuffer();
if (temp.length() > maximum_word_length) p[maximum_word_length] = '\0';
key.set_data((void*)p);
Top, log
Marc Girod