Extra
)
Extracted and built the source tree in ~marc/tmp/htdig-3.1.16
I made minor but systematic changes such as:
htdig-3.1.6> diff htsearch/Display.cc~ htsearch/Display.cc
19c19
< #include <fstream.h>
---
> #include <fstream>
27c27,28
<
---
> #include <iostream>
> using namespace std;
Installed to /opt/www/htdig (config file: /opt/www/htdig/conf/htdig.conf)
Run with (problem non investigated, and obviously non critical!? possibly related to my changes for gcc 4.6.3):
tmp> sudo /opt/www/htdig/bin/rundig
DB2 problem...: PANIC: Invalid argument
Segmentation fault
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted
Result (example):
tmp> ll /opt/www/htdig/db
total 36036
drwxr-xr-x 2 root root 4096 Mar 4 17:22 .
drwxr-xr-x 6 root root 4096 Dec 10 19:07 ..
-rw-r--r-- 1 root root 7359488 Mar 4 17:22 db.docdb
-rw-r--r-- 1 root root 207872 Mar 4 17:22 db.docs.index
-rw-r--r-- 1 root root 13242638 Mar 4 17:22 db.wordlist
-rw-r--r-- 1 root root 16074752 Mar 4 17:22 db.words.db
Run configure (note: debug not enabled!) with:
htdig> ./configure --with-image-dir=/var/www/htdig --with-search-dir=/var/www/htdig
in order to preserve the CONFIG file produced (and which I checked in).
checking if we should use the included regex?... yes
Attempted to build.
Got errors in aclocal.m4, bacuase of an upgrade of autconf from
1.13 to 2.69.
Rerun aclocal, configure, make.
This worked. Committed the new aclocal.m4 and configure...
Taken a backup, and run install:
htdig> ll /opt/www/htdig/db
total 36036
drwxr-xr-x 2 root root 4096 Mar 4 17:22 .
drwxr-xr-x 6 root root 4096 Dec 10 19:07 ..
-rw-r--r-- 1 root root 7359488 Mar 4 17:22 db.docdb
-rw-r--r-- 1 root root 207872 Mar 4 17:22 db.docs.index
-rw-r--r-- 1 root root 13242638 Mar 4 17:22 db.wordlist
-rw-r--r-- 1 root root 16074752 Mar 4 17:22 db.words.db
htdig> sudo /opt/www/htdig/bin/rundig
DB2 problem...: Unable to allocate 1936618136 bytes from mpool shared region: Cannot allocate memory
DB2 problem...: Unable to allocate 1936618136 bytes from mpool shared region: Cannot allocate memory
...
DB2 problem...: Unable to allocate 1936618136 bytes from mpool shared region: Cannot allocate memory
DB2 problem...: Unable to allocate 1936618136 bytes from mpool shared region: Cannot allocate memory
[5000 lines interrupted with Ctl-C ]
htdig> ll /opt/www/htdig/db
total 37316
drwxr-xr-x 2 root root 4096 Mar 6 21:25 .
drwxr-xr-x 6 root root 4096 Dec 10 19:07 ..
-rw-r--r-- 1 root root 7361536 Mar 6 21:24 db.docdb
-rw-r--r-- 1 root root 207872 Mar 4 17:22 db.docs.index
-rw-r--r-- 1 root root 11984 Mar 6 21:25 db.log
-rw-r--r-- 1 root root 14537589 Mar 6 21:25 db.wordlist
-rw-r--r-- 1 root root 16074752 Mar 4 17:22 db.words.db
htdig> ./configure --prefix=$HOME/tst --with-image-dir=/var/www/htdig
--with-search-dir=/var/www/htdig
...
htdig> make
...
defaults.cc:195:1: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings]
...
htdig> make install
Maybe fixed this warning for this file (in htlib/Confgure.h),
but there are still many occurrences elsewhere.
htdig> ~/tst/bin/rundig
DB2 problem...: /opt/www/htdig/db/db.docdb: Permission denied
htdig: Unable to open/create document database '/opt/www/htdig/db/db.docdb'
htmerge: Unable to open word list file '/home/marc/tst/db/db.wordlist'.
Did you index anything?
Check your config file and try running htdig again.
DB2 problem...: /home/marc/tst/db/db.docdb: No such file or directory
C-c C-c
This is a bug... Trying:
htdig> ./configure --prefix=$HOME/tst --with-image-dir=/var/www/htdig --with-search-dir=$HOME/tst/htdig
The next warning such as previously is for htfuzzy.cc:84 and String.h.
./contrib/htparsedoc/catdoc.c
with Cyrillic
KOI-8 encodings!
...
make[1]: Entering directory '/home/marc/git/htdig/htsearch'
transform=s,x,x,
/usr/bin/install -c htsearch /opt/www/cgi-bin/`echo htsearch | sed ''`
/usr/bin/install: cannot remove `/opt/www/cgi-bin/htsearch': Permission denied
Makefile:24: recipe for target 'install' failed
make[1]: *** [install] Error 1
make[1]: Leaving directory '/home/marc/git/htdig/htsearch'
...
ran:
htdig> /home/marc/tst/bin/rundig
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted
htdig> ll ~/tst/db
total 24
drwxr-xr-x 2 marc marc 4096 Mar 18 18:32 .
drwxr-xr-x 8 marc marc 4096 Mar 18 18:27 ..
-rw-r--r-- 1 marc marc 2048 Mar 18 18:30 db.docdb
-rw-r--r-- 1 marc marc 2048 Mar 18 18:30 db.docs.index
-rw-r--r-- 1 marc marc 297 Mar 18 18:30 db.wordlist
-rw-r--r-- 1 marc marc 2048 Mar 18 18:30 db.words.db
htdig> sudo mkdir /opt/www/cgi-bin/tst
htdig> sudo chown marc /opt/www/cgi-bin/tst
htdig> cp htsearch/htsearch /opt/www/cgi-bin/tst/
Edited the test page so that it uses this search, but this doesn't work.
The server replies that it doesn't find the script.
And from the command line, the script finds no match for simple words.
htdig> ~/tst/bin/htnotify
...
htdig> gdb ~/tst/bin/htnotify
...
(gdb) r
Starting program: /home/marc/tst/bin/htnotify
Traceback (most recent call last):
File "/usr/lib/debug/usr/lib/arm-linux-gnueabihf/libstdc++.so.6.0.19-gdb.py", line 63, in <module>
from libstdcxx.v6.printers import register_libstdcxx_printers
ImportError: No module named libstdcxx.v6.printers
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Program received signal SIGABRT, Aborted.
0xb6d2f8dc in raise () from /lib/arm-linux-gnueabihf/libc.so.6
(gdb) bt
#0 0xb6d2f8dc in raise () from /lib/arm-linux-gnueabihf/libc.so.6
#1 0xb6d3365c in abort () from /lib/arm-linux-gnueabihf/libc.so.6
#2 0xb6eebc0c in ?? () from /usr/lib/arm-linux-gnueabihf/libstdc++.so.6
#3 0xb6eebc0c in ?? () from /usr/lib/arm-linux-gnueabihf/libstdc++.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
It dies within readPreAndPostamble,
and never reaches line 209
Data = new char[Allocated];
(line 584 in String.cc)
(gdb) p Allocated
$1 = 4
(gdb) bt
#0 0x00013b28 in allocate_space (this=0x5c0ec, len=<optimized out>)
at String.cc:584
#1 String::allocate_space (this=0x5c0ec, len=2) at String.cc:570
#2 0x00013d04 in String::append (this=0x5c0ec, ch=<optimized out>)
at String.cc:166
#3 0x000131f4 in operator<< (ch=<optimized out>, this=0x5c0ec)
at htString.h:208
#4 ParsedString::get (this=0x5c0d8, dict=...) at ParsedString.cc:187
#5 0x00010da4 in Configuration::AddParsed (this=0x56c28,
name=0x49330 "locale", value=<optimized out>) at Configuration.cc:189
#6 0x0001160c in Configuration::Defaults (this=0x56c28,
array=<optimized out>) at Configuration.cc:398
#7 0x0000aafc in main (ac=1, av=0xbefffc84) at htnotify.cc:103
Only not this (first) time... Rather:
(gdb) c
Continuing.
Catchpoint 7 (exception caught), __cxa_begin_catch ()
at ../../../../src/libstdc++-v3/libsupc++/eh_catch.cc:41
41 ../../../../src/libstdc++-v3/libsupc++/eh_catch.cc: No such file or directory.
(gdb) bt
#0 __cxa_begin_catch ()
at ../../../../src/libstdc++-v3/libsupc++/eh_catch.cc:41
#1 0xb6f1f324 in __cxa_throw ()
at ../../../../src/libstdc++-v3/libsupc++/eh_throw.cc:86
#2 0xb6f1f96c in operator new(unsigned int) ()
at ../../../../src/libstdc++-v3/libsupc++/new_op.cc:56
#3 0xb6f1fa24 in operator new[](unsigned int) ()
at ../../../../src/libstdc++-v3/libsupc++/new_opv.cc:32
#4 0x00013b28 in allocate_space (this=0x56b94, len=<optimized out>)
at String.cc:584
#5 String::allocate_space (this=0x56b94, len=268435457) at String.cc:570
#6 0x00013cac in String::reallocate_space (this=0x56b94, len=<optimized out>)
at String.cc:614
#7 0x00013d04 in String::append (this=0x56b94, ch=<optimized out>)
at String.cc:166
#8 0x0000b708 in operator<< (ch=10 '\n', this=0x56b94)
at ../htlib/htString.h:208
#9 readPreAndPostamble () at htnotify.cc:202
#10 0x0000abe4 in main (ac=1, av=<optimized out>) at htnotify.cc:139
...
(gdb) up
#4 0x00013b28 in allocate_space (this=0x56b94, len=<optimized out>)
at String.cc:584
584 Data = new char[Allocated];
(gdb) p Allocated
$6 = 536870912
""
.
~/tst/bin/rundig
.
/cgi-bin/tst/htsearch
/var/log/apache2/error.log
successively:
[Sun Apr 02 18:14:44 2017] [error] [client 192.168.1.9] script not found or unable to stat: /usr/lib/cgi-bin/tstsearch, referer: http://berry314.dyndns-pics.com/test/
[Sun Apr 02 18:24:37 2017] [error] [client 192.168.1.9] Symbolic link not allowed or link target not accessible: /usr/lib/cgi-bin/tst, referer: http://berry314.dyndns-pics.com/test/
So, fixed, with a copy of the ~marc/git/htsearch/htsearch
file.
htdig> ./htsearch/htsearch -c ~/tst/conf/htdig.conf
Enter value for words: default
Content-type: text/html
Enter value for format: short
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html><head><title>No match for '(defaulted or defaulting or defaulter or defaults)'</title></head>
...
I also check that the word queried was in the db:
htdig> grep default ~/tst/db/db.wordlist
default i:1 l:42 w:958
Late, so abandoning, but logging what attempted:
htdig> gdb ./htsearch/htsearch
...
(gdb) b 318
...
(gdb) run -c ~/tst/conf/htdig.conf
...
Enter value for words: default
Breakpoint 1, main (ac=<optimized out>, av=<optimized out>) at htsearch.cc:318
318 ResultList *results = htsearch(word_db, searchWords, parser);
htdig> sudo apt-get update
...
Reading package lists... Done
N: Ignoring file 'collabora.list.jessie' in directory '/etc/apt/sources.list.d/' as it has an invalid filename extension
N: Ignoring file 'raspi.list.wheezy' in directory '/etc/apt/sources.list.d/' as it has an invalid filename extension
W: Ignoring Provides line with DepCompareOp for package pypy-cffi
W: Ignoring Provides line with DepCompareOp for package pypy-cffi-backend-api-max
W: Ignoring Provides line with DepCompareOp for package pypy-cffi-backend-api-min
W: You may want to run apt-get update to correct these problems
htdig> sudo apt-get upgrade
...
Ahum... Started to read a bit late
https://www.raspberrypi.org/forums/viewtopic.php?f=66&t=121880
~> rm ph.tgz
~> du -sh .
1.4G .
~> tar zfc /tmp/marc.tgz .
~> cd /var/www
www> sudo rm -rf tmfish.bak
www> sudo du -sh .
3.8M .
www> sudo tar zfc /tmp/www.tgz .
www> cd ~tanya
tanya> sudo tar zfc /tmp/tanya.tgz .
Uploaded to Google drive.
apt> sudo adduser --disabled-password --disabled-login pi
And then ...5
apt> sudo apt-get dist-upgrade
...
Configuring wicd-daemon
-----------------------
Users who should be able to run wicd clients need to be added to the group
"netdev".
1. marc 2. pi 3. Sergey 4. tanya
(Enter the items you want to select, separated by spaces.)
Users to add to the netdev group: marc pi tanya
...
Installing new version of config file /etc/init.d/procps ...
Configuration file '/etc/sysctl.conf'
==> Modified (by you or by a script) since installation.
==> Package distributor has shipped an updated version.
What would you like to do about it ? Your options are:
Y or I : install the package maintainer's version
N or O : keep your currently-installed version
D : show the differences between the versions
Z : start a shell to examine the situation
The default action is to keep your current version.
*** sysctl.conf (Y/I/N/O/D/Z) [default=N] ? y
...
Configuration file '/etc/login.defs'
==> Modified (by you or by a script) since installation.
==> Package distributor has shipped an updated version.
What would you like to do about it ? Your options are:
Y or I : install the package maintainer's version
N or O : keep your currently-installed version
D : show the differences between the versions
Z : start a shell to examine the situation
The default action is to keep your current version.
*** login.defs (Y/I/N/O/D/Z) [default=N] ? y
...
Configuration file '/etc/dphys-swapfile'
==> File on system created by you or by a script.
==> File also in package provided by package maintainer.
What would you like to do about it ? Your options are:
Y or I : install the package maintainer's version
N or O : keep your currently-installed version
D : show the differences between the versions
Z : start a shell to examine the situation
The default action is to keep your current version.
*** dphys-swapfile (Y/I/N/O/D/Z) [default=N] ? y
...
apt> sudo shutdown -r now
One fsck and a reconfig of default DocumentRoot as /var/www/html
later...
318 ResultList *results = htsearch(word_db, searchWords, parser);
(gdb) x searchWords
Value can't be converted to integer.
(gdb) x/1s searchWords
Value can't be converted to integer.
(gdb) x/1s *searchWords
No symbol "operator*" in current context.
(gdb) whatis searchWords
type = List
(gdb) x/1s searchWords.current
0xa71b8: ""
(gdb) x/1s searchWords.head
0xa6f80: "Xo\n"
(gdb) x/1s searchWords.tail
0xa71b8: ""
(gdb) whatis word_db
type = String
(gdb) x/1s word_db.Data
0xa6248: "/home/marc/tst/db/db.words.db"
htdig> strings /home/marc/tst/db/db.words.db | wc -l
20
htdig> strings /home/marc/tst/db/db.words.db | grep -C 2 default
long
format
default
content
boolean
htdig> gdb ./htsearch/htsearch
(gdb) b 275
(gdb) run -c ~/tst/conf/htdig.conf
Starting program: /home/marc/git/htdig/htsearch/htsearch -c ~/tst/conf/htdig.conf
Enter value for words: default
Breakpoint 1, main (ac=<optimized out>, av=<optimized out>) at htsearch.cc:276
276 strcmp(config["match_method"], "boolean") == 0,
(gdb) x /1s originalWords.Data
0x79220: "default"
...
(gdb) n
288 origPattern += logicalPattern;
(gdb) x /1s logicalPattern.Data
0xa5e78: "defaulted|defaulting|defaulter|defaults"
(gdb) x /1s logicalWords.Data
0xa6478: "(defaulted or defaulting or defaulter or defaults)"
I set debug
to 2 in htsearch.cc, and rebuilt.
htdig> ./htsearch/htsearch -c ~/tst/conf/htdig.conf
Enter value for words: default
tempWords: 'default:0 '
Boolean: 'default:0 '
initial: ''
Fuzzy on: default
exact
synonyms
endings defaulted defaulting defaulter defaults
searchWords: '(:0 defaulted:0 |:0 defaulting:0 |:0 defaulter:0 |:0 defaults:0 ):0 '
LogicalWords: (defaulted or defaulting or defaulter or defaults)
Pattern: defaulted|defaulting|defaulter|defaults
Content-type: text/html
Enter value for format: short
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html><head><title>No match for '(defaulted or defaulting or defaulter or defaults)'</title></head>
<body bgcolor="#eef7ff">
...
BTW: I update IMAGE_DIR in CONFIG to /var/www/html/htdig
htdig> make CXXFLAGS="-g -O0"
This seems to recompile only the top binaries...
(gdb) x/1s searchWords.head.object
0xb9050: "x\256\006"
(gdb) x/1s searchWords.head.next.object
0xb8eb0: "x\256\006"
(gdb) x/1s searchWords.head.next.next.object
0xb9088: "x\256\006"
(gdb) x/1s searchWords.tail.object
0xb9170: "x\256\006"
(gdb) x/1s searchWords.current.object
0xb9170: "x\256\006"
I reset both the debug value and CXXFLAGS...
In file included from regex.c:215:0:
../htlib/gregex.h:530:0: warning: "__restrict_arr" redefined
#define __restrict_arr
^
In file included from /usr/include/features.h:374:0,
from /usr/include/arm-linux-gnueabihf/sys/types.h:25,
from regex.c:46:
/usr/include/arm-linux-gnueabihf/sys/cdefs.h:363:0: note: this is the location of the previous definition
# define __restrict_arr __restrict
^
...
SGMLEntities.cc: In member function ‘void SGMLEntities::init()’:
SGMLEntities.cc:178:56: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
trans->Add(entities[i].entity, (Object *) entities[i].equiv);
^
...
words.cc: In function ‘void mergeWords(const char*, const char*)’:
words.cc:112:10: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings]
sid = "-";
^
...
git> git clone htdig hdmst
git> cd hdmst/
hdmst> git checkout master
hdmst> git cherry-pick dbg
error: could not apply 2f9112d... RootDirectory changed from apache 2.2 to 2.4
hint: after resolving the conflicts, mark the corrected paths
hint: with 'git add <paths>' or 'git rm <paths>'
hint: and commit the result with 'git commit'
hdmst> git status
On branch master
Your branch is up-to-date with 'origin/master'.
You are currently cherry-picking commit 2f9112d.
(fix conflicts and run "git cherry-pick --continue")
(use "git cherry-pick --abort" to cancel the cherry-pick operation)
Unmerged paths:
(use "git add <file>..." to mark resolution)
both modified: CONFIG
no changes added to commit (use "git add" and/or "git commit -a")
hdmst> git reset --merge dbg
hdmst> git status
On branch master
Your branch is ahead of 'origin/master' by 7 commits.
(use "git push" to publish your local commits)
nothing to commit, working directory clean
hdmst> git cherry-pick --abort
error: no cherry-pick or revert in progress
fatal: cherry-pick failed
hdmst> git checkout master
Already on 'master'
Your branch is ahead of 'origin/master' by 7 commits.
(use "git push" to publish your local commits)
hdmst> git status
On branch master
Your branch is ahead of 'origin/master' by 7 commits.
(use "git push" to publish your local commits)
nothing to commit, working directory clean
hdmst> make
make: *** No targets specified and no makefile found. Stop.
hdmst> git checkout dev
Branch dev set up to track remote branch dev from origin.
Switched to a new branch 'dev'
hdmst> git branch
dbg
* dev
master
hdmst> git status
On branch dev
Your branch is up-to-date with 'origin/dev'.
nothing to commit, working directory clean
hdmst> ll CONFIG
-rw-r--r-- 1 marc marc 1963 May 21 10:04 CONFIG
hdmst> grep IMAGE_DIR CONFIG
# IMAGE_DIR
IMAGE_DIR= /var/www/htdig
# This is the URL to prefix the images placed in IMAGE_DIR.
hdmst> git cherry-pick dbg
[dev fda0985] RootDirectory changed from apache 2.2 to 2.4
Date: Sun May 21 09:44:13 2017 +0000
1 file changed, 1 insertion(+), 1 deletion(-)
hdmst> cd ..
git> mv hdmst hddev
git> cd hddev
hddev> ./configure
hddev> make CXXFLAGS="-g -O0"
hddev> ./htsearch/htsearch -c ~/tst/conf/htdig.conf
Enter value for words: default
Content-type: text/html
Enter value for format: short
...
<strong>Documents 1 - 1 of 1 matches.
...
OK, so... I can use this version of htsearch
with my test db, and it works...
searchWords
is still as opaque:
(gdb) x/1s searchWords.current.object
0xc7230: "x\212\a"
(gdb) x/1sw searchWords.current.object
0xc7230: U"\x78a78\x7b4b0\001\004\xc7268"
(gdb) x/1sw searchWords.head.object
0xc70a8: U"\x78a78\x7b4b0\001\004\xc6fe8"
(gdb) x/1sw searchWords.head.next.object
0xc6ab8: U"\x78a78\x7b4b0\a\b\xc6af0"
(gdb) x/1s ((String)searchWords.head.object).Data
0xc6fb0: "xo\f"
(gdb) x/1sw ((String)searchWords.head.object).Data
0xc6fb0: U"\xc6f78\xc6fd8\xc6ab8\031\xc6f80\b\t\xc6fd8"
Updated the search database from the installed version,
with new errors (due to the new data, I hope):
hddev> sudo /opt/www/htdig/bin/rundig
DB2 problem...: PANIC: Invalid argument
Segmentation fault
BAD TAG IN SERIALIZED DATA: 108
BAD TAG IN SERIALIZED DATA: 111
DB2 problem...: missing or empty key value specified
DB2 problem...: missing or empty key value specified
DB2 problem...: missing or empty key value specified
DB2 problem...: missing or empty key value specified
BAD TAG IN SERIALIZED DATA: 108
BAD TAG IN SERIALIZED DATA: 111
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted
In order to compare the trace of the two versions, made clean and
rebuilt with -g -O0
in htdig as well.
Otherwise, I keep getting:
value has been optimized out
I stop there, but I believe I have 0 element on line 462 in parser.cc:
462 for (int i = 0; i < elements->Count(); i++)
(gdb) x elements->number
0x0: Cannot access memory at address 0x0
...whereas in hddev I have 1:
0x1: Cannot access memory at address 0x1
(gdb) handle SIGALRM ignore
...
564 name = strtok((char*)algs[i], ":");
(gdb)
565 weight = strtok(0, ":");
(gdb) p name
$24 = {<Object> = {_vptr.Object = 0x7b518 <vtable for String+8>}, Length = 5,
Allocated = 6, Data = 0xc5db8 "exact"}
What I fail to do with tempWords[0] or words[0]...
Printing the value of the Data member (offset 8 in the String
structure).
(gdb) tb 581
Temporary breakpoint 5 at 0x1eaa8: file htsearch.cc, line 581.
(gdb) c
...
283 createLogicalWords(searchWords, logicalWords, logicalPattern);
(gdb) p searchWords
$30 = {<Object> = {_vptr.Object = 0x7b3b0 <vtable for List+8>},
head = 0xc6f80, tail = 0xc71b8, current = 0x0, current_index = -1,
number = 9}
This was in dbg; in the dev branch, the number of searchWords is 11...
(gdb) p originalWords
$32 = {<Object> = {_vptr.Object = 0x7b518 <vtable for String+8>}, Length = 7,
Allocated = 8, Data = 0x99220 "default"}
The difference between the number of searchWords is a consequence of a
difference in the number of weigthWords in doFuzzy: 4 vs 5 — one
adds one parenthesis before and after, as well as one '|' between them.
fuzzyWords.Get_Next()
, invoked on line 636,
or even on the previous line, in fuzzyWords.Start_Get()
.
Indeed... current is not initialized!
fuzzy->getWords(ww->word, fuzzyWords);
getWords
which gets invoked...
Fuzzy::getWords
Exact::getWords
const
differences...
Obviously one overloading failed to match the intended signature!
make
, i.e. it used -g -O2
(not -g -O0
, meaning that debugging will be
inconsistent).
~/.config/git/ignore
and ~/git/htdig/.git/info/exclude
, maybe not 100% correct.
htdig> git tag -a -m 'Hopefully working and const correct' const
Note: the 3.1.6 tag was not annotated...
htdig> /home/marc/tst/bin/rundig
without error or warning... producing a db.wordlist
from which the unicode characters are stripped away.
The status was not the one recorded: looking for default,
one does now find the root page (It works). Updated.
Started to play with replacing String with string, first for
configFile in htdig.cc and htsearch.cc.
One annoying issue is the String class supports a family of
operator<< members, which are extensively used to append stuff
to strings... Although this works: (foo += '/') += bar;
Only scratched the surface... Building with make CXXFLAGS="-g -O0"
Added a join2s
member in StringList
,
used only from htsearch.cc
.
I'm afraid I didn't properly test the changes so far. Although:
(gdb) run restrict=foo+bar;words=default;format=builtin-short
...
225 urllist.Release(); // release the temporary list of URLs
(gdb) p urlpat
$15 = "foo|bar"
Committed, installed, run rundig, and tested that I didn't break it
yet.
strlist
class, for use as a replacement for StringList
.
htsearch.cc
, for urlList
.
create
function doesn't work correctly. It yields:
(gdb) p word
$19 = "âme\242memee"
The reason is that the original input:
strlist::create (this=0xbefffa34, str=0x9a918 "âme",
sep=0x79f68 "| \t\r\n\001") at strlist.cc:25
is appended several times, every time removing the initial character.
(gdb) p word
$2 = "defaultefaultfaultaultultltt"
append(str)
which appends the full str (word) instead of one char,
as in the original code: fixed.
join
doesn't work:
there is a copy of my aux
function object which doesn't preserve its contents:
fixed.
char
will not support utf-8 characters...
string
is typedef'ed
to basic_string<char>
in stringfwd.h
.
There is as well typedef basic_string<wchar_t> wstring
.
wchar_t
for unicode.
But then, I cannot use wstring
either.
They favour (March 2000) utf-16... although utf-8 is probably closer to my needs (?)
wstrlist
...
Not so easy: there is no implicit conversion from wchar_t
to char
...
c_str()
returns then wchar_t
...
strlist
in all the binaries
(even if only for char
), first to read the configuration.
I.e. after htsearch: htdig, htmerge, htnotify, htfuzzy (er... htdump, htload?).
urllist
so far...).
form_vars
: need to provide non default constructors,
and to explicit default ones... (not done yet for copy ctor)
Count
-> size
and needs operator[]
?
No: iterator.
StringList form_vars(config["allow_in_form"], " \t\r\n");
for (i= 0; i < form_vars.Count(); i++)
{
if (input.exists(form_vars[i]))
config.Add(form_vars[i], input[form_vars[i]]);
}
into:
strlist form_vars(config["allow_in_form"], " \t\r\n");
for (strlist::const_iterator it = form_vars.begin(); it != form_vars.end(); it++) {
if (input.exists(it->c_str()))
config.Add(it->c_str(), input[it->c_str()]);
}
allow_in_form: search_algorithm search_results_header
Problem: this is only intermediate, as the other classes, List
,
WeightWord
, etc... still require char*
,
resulting in superfluous calls to c_str()
.
StringList
from htsearch
...
htdig> /home/marc/tst/bin/rundig
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted
Of the main binaries, only htdig and htsearch were relinked today —
the other ones on Aug 23 (no log?). htsearch doesn't crash under gdb...
The 4 db files were recreated at 16:17 utc — the wordlist is as
usual.
Program received signal SIGABRT, Aborted.
0xb6ce5f70 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0xb6ce5f70 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0xb6ce7324 in __GI_abort () at abort.c:89
#2 0xb6eedb5c in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/arm-linux-gnueabihf/libstdc++.so.6
#3 0xb6eeb9a0 in ?? () from /usr/lib/arm-linux-gnueabihf/libstdc++.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
OK... Maybe the problem is not in
htnotify... Update/upgrade/dist-upgrade... Only: failed.
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 1397BC53640DB551
...but on berry (Raspian) the issue persisted.
I tried to reset /etc/apt/sources.list, but this failed,
then touched /etc/resolv.conf,
and found I was competing against an fsync
process!?
Then I tried to reboot, and failed: disk corruption!
I fixed it eventually
by adding fsck.repair=yes
to /boot/cmdline.txt
(as e2fsck must be run on an umounted fs,
yet e2fsck itself sits on the root disk /dev/mmclbk0p6
,
with all the shared libraries it uses...
Another option would have been to boot from a usb disk, but this has
to be enabled in advance and is still experimental (Jessie).
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Program received signal SIGABRT, Aborted.
0xb6ce7f70 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) Not stopped at any breakpoint; argument ignored.
I do a make clean, and rebuild...
At least three warnings:
make[1]: Entering directory '/home/marc/git/htdig/htlib'
...
In file included from regex.c:215:0:
../htlib/gregex.h:530:0: warning: "__restrict_arr" redefined
#define __restrict_arr
^
In file included from /usr/include/features.h:374:0,
from /usr/include/arm-linux-gnueabihf/sys/types.h:25,
from regex.c:46:
/usr/include/arm-linux-gnueabihf/sys/cdefs.h:363:0: note: this is the location of the previous definition
# define __restrict_arr __restrict
^
...
make[1]: Entering directory '/home/marc/git/htdig/htdig'
...
SGMLEntities.cc: In member function ‘void SGMLEntities::init()’:
SGMLEntities.cc:178:56: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
trans->Add(entities[i].entity, (Object *) entities[i].equiv);
^
...
make[1]: Entering directory '/home/marc/git/htdig/htmerge'
...
words.cc: In function ‘void mergeWords(const char*, const char*)’:
words.cc:112:10: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings]
sid = "-";
^
The problem is in readPreAndPostamble();
,
in htnotify.cc:139
string
.
QuotedStringList
...
I'll keep it its name.
lowercase
as a member
of String
?
Especially when the C++ tolower
depends on the locale
.
ExternalParser.cc
which consumes QuotedStringList
s, wants to put their
items into Dictionary
...
string
and wstring
...
tests> ./lc AAÉ
aaÉ
tests> ./wlc A
Segmentation fault
With the default (char
based) string
,
the different characters in the string have different size:
(gdb) p s
$1 = "AAÉ"
(gdb) p s.substr(0,2)
$2 = "AA"
(gdb) p s.substr(0,3)
$3 = "A", <incomplete sequence \303>
(gdb) p s.substr(0,4)
$4 = "AAÉ"
I don't understand how to construct my wide strings for the input.
I get bad cast errors.
Tried also:
tests> c++ -std=c++11 -g -O0 -o u16lc u16lc.cc
tests> c++ -o wlc wlc.cc -g -O0
tests> ./wlc fooÉ
fooÃ: fooÃ
tests> c++ -o u16lc u16lc.cc -g -O0 -std=c++11
tests> ./u16lc fooÉ
terminate called after throwing an instance of 'std::bad_cast'
what(): std::bad_cast
Aborted
tests> locale -a
C
C.UTF-8
en_US.utf8
POSIX
This —lowercase function— is a simple and interesting
starting point. Now that I experienced a failure to implement it with
plain stc c++ library, I'll try to see what icu has to offer.
strlist
to use UnicodeString
instead of string
hddev
...?
git> git clone -l --no-hardlinks htdig bkp
Deleted htlib/(htString.h,String.cc,StringList.cc,StringList.h),
as well as wstrlist.{h,cc} (added to dbg).
htnotify.cc
...
Many changes to header files for strings and lists,
leaving the source files inconsistent.
using icu::UnicodeString;
Hopefully only because of the unicode data in the pages...public_html> sudo /opt/www/htdig/bin/rundig DB2 problem...: PANIC: Invalid argument Segmentation fault terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Aborted
HtRegexp
with icu::RegExpPattern
,
and HtRegExpReplace
with icu::RegExpMatcher
.
Not removing them from git yet (some changes there that I'd have to revert...):
htsearch> mkdir away
htsearch> mv ../htlib/HtRegex* away/
htsearch> ll away
total 32
drwxr-xr-x 2 marc marc 4096 Nov 5 15:30 .
drwxr-xr-x 3 marc marc 4096 Nov 5 15:30 ..
-rw-r--r-- 1 marc marc 2341 Jan 31 2002 HtRegex.cc
-rw-r--r-- 1 marc marc 1093 Nov 5 10:57 HtRegex.h
-rw-r--r-- 1 marc marc 3671 Jan 31 2002 HtRegexReplace.cc
-rw-r--r-- 1 marc marc 1294 Jan 31 2002 HtRegexReplace.h
-rw-r--r-- 1 marc marc 1776 Mar 5 2017 HtRegexReplaceList.cc
-rw-r--r-- 1 marc marc 563 Nov 5 11:01 HtRegexReplaceList.h
Dictionary
with a multimap
,
but this doesn't mean I need to do the same everywhere...
There there could be several notifications for the same key.
HtURLRewriter
,
replacing the HtRegexReplaceList
data member
(renamed to repl
) with a
map<UnicodeString, RegexMatcher>
.
RegexMatcher
already does that.
as seems to be the case, with its replaceAll
member
function...
Apart that this is a singleton...
What I miss are the arguments I would expect: match, and replacement.
HtURLRewriter
is used from htsearch.cc
and htdig.cc
, but without arguments!
htcommon/defaults.cc
,
where both defaults
and config
are defined.
Now configs
is initialized in htsearch.cc
,
by calling Configuration::Defaults
.
search_rewrite_rules
(in /opt/www/htdig/conf/htdig.conf
), but this is the same thing!
bkp
, found in htsearch.cc:253
:
config.AddParsed("url_rewrite_rules", "${search_rewrite_rules}");
Since I had in htdig.conf:180
:
search_rewrite_rules: http://(berry314)/(.*) http://\\1.dyndns-pics.com/\\2
I see that something happens in Configuration::AddParsed
(184)
applying to the dict
member of config
.
Parser::parse
which returns a
ResultList
(specialized from Dictionary
).
URL::rewrite
:
HtURLRewriter::instance()->Replace(_url);
So... the urls get replaced one by one;
only the 'match' is passed as argument, when the 'replace' value
is already recorded in the HtURLRewriter
singleton.
icu
branch,
and to HtURLRewriter
.
In fact, icu::RegexpMatcher
doesn't quite fit the use
from URL
.
In addition, the urls will not contain unicode characters (?)
icu::RegexpMatcher
with the ugrep
example.