I have downloaded rather large site full of HTML and few PDF files and stored it on my Raspberry Pi (my constantly running linux toy). It is not too large (few GB and tens of thousands of files), but it is rather annoying to wait for MidnightCommander content search.
Since they are mostly HTML and PDF files, I thought that a search engine would be nice. My requirements were:
- Must have CLI interface, I don’t have a monitor attached and no desire to run remote desktop.
- Efficient and small, raspberry pi has something like 512 MB ff memory.
Quick googling reveleased few contestants: Sphinx search (its CLI is only for debugging purposes – Nope), Lucene and Tracker. Lucene is Java-based, but with quite small memory footprint (1MB memory heap) with a lucli CLI interface. I kind of regret not choosing it. Anyway, I chose Tracker, poorly documented search engine with issues (mostly lack of documentation). It is supposed to be
Designed and built to run well on lower-memory systems with typically 128MB or 256MB memory. Typical RAM usage is 4-6 MB.
apt-get install --no-install-recommends tracker-utils tracker-miner-fs libglib2.0-bin
Everything is in packages, simply install it. The most important program is tracker-control that can start miners, reset them or give you status of the indexing. You need libglib2.0-bin for gsettings utility that allows user to change the gconfig from CLI.
If you try to run tracker-control without X11, you get an error:
honza@pina ~ $ tracker-control -s Starting miners… Could not start miners, manager could not be created, Command line `dbus-launch --autolaunch=3b0e4b712f60d6b9547b25ae51c194dd --binary-syntax --close-stderr' exited with non-zero exit status 1: Autolaunch error: X11 initialization failed.
Somone else already encountered the problem, the solution is:
eval `dbus-launch --auto-syntax`
It is not pleasant, because you have to manually start the Tracker each time you log in, so you should put it into your login scripts.
You can see all configuration options of Tracker using
gsettings list-recursively | grep -i org.freedesktop.Tracker | sort | uniq
I was interested in searching only one directory, so I changed the index-recursive-directories
gsettings set org.freedesktop.Tracker.Miner.Files index-recursive-directories "['/home/pi/website-mirror']"
Starting the miners
You can start the miners using tracker-control
honza@pina ~ $ tracker-control -s Starting miners… ✓ Applications ✓ File System
And after that check the progress using the status option
honza@pina ~ $ tracker-control -S Store: 27 Jul 2014, 12:44:29: ✓ Store - Idle Miners: 27 Jul 2014, 12:44:31: ✓ Applications - Idle 27 Jul 2014, 12:44:33: 32% File System - Processing… 01h 03m 32s remaining
Once you have that, you can easily search for the term using tracker-search
honza@pina ~ $ tracker-search ping Results: file:///home/pi/website-mirror/000458.html file:///home/pi/website-mirror/005495.html file:///home/pi/website-mirror/019534.html
I also have an application tracker that should find applications, but in default settings, it is probably limited to the Gnome desktop and not programs in /bin.
You can set logging either through the gsettings (for each component separately) or using tracker-control for all of them at once. The default level errors. Possible values are [debug|detailed|minimal|errors].
gsettings set org.freedesktop.Tracker.Miner.Files verbosity 'detailed' // Or for all tracker-control --set-log-verbosity=detailed
The logs are stored in $HOME/.local/share/tracker directory.
Someone else has also tried to his 5 minutes with Tracker, my observations are similar>
- Name is horrible, it was hard to google anything with such generic name. Even Tracker itself giver warning when trying to search for common words (Search term ‘index’ is a stop word. Stop words are common words which may be ignored during the indexing process.)
- It is not running without X out of the box. Rather annoying.
- Search works, but I would probably choose Lucene next time.