Tracker – fulltext search from CLI

I have downloaded rather large site full of HTML and few PDF files and stored it on my Raspberry Pi (my constantly running linux toy). It is not too large (few GB and tens of thousands of files), but it is rather annoying to wait for MidnightCommander content search.

Since they are mostly HTML and PDF files, I thought that a search engine would be nice. My requirements were:

  • Must have CLI interface, I don’t have a monitor attached and no desire to run remote desktop.
  • Efficient and small, raspberry pi has something like 512 MB ff memory.

Quick googling reveleased few contestants: Sphinx search (its CLI is only for debugging purposes – Nope), Lucene and Tracker.  Lucene is Java-based, but with quite small memory footprint (1MB memory heap) with a lucli CLI interface. I kind of regret not choosing it. Anyway, I chose Tracker, poorly documented search engine with issues (mostly lack of documentation). It is supposed to be

Designed and built to run well on lower-memory systems with typically 128MB or 256MB memory. Typical RAM usage is 4-6 MB.

Installation

apt-get install --no-install-recommends tracker-utils tracker-miner-fs libglib2.0-bin

Everything is in packages, simply install it. The most important program is tracker-control that can start miners, reset them or give you status of the indexing. You need libglib2.0-bin for gsettings utility that allows user to change the gconfig from CLI.

Configuration

If you try to run tracker-control without X11, you get an error:

honza@pina ~ $ tracker-control -s
Starting miners…
Could not start miners, manager could not be created, Command line `dbus-launch --autolaunch=3b0e4b712f60d6b9547b25ae51c194dd --binary-syntax --close-stderr' exited with non-zero exit status 1: Autolaunch error: X11 initialization failed.

Somone else already encountered the problem, the solution is:

eval `dbus-launch --auto-syntax`

It is not pleasant, because you have to manually start the Tracker each time you log in, so you should put it into your login scripts.

You can see all configuration options of Tracker using

gsettings list-recursively | grep -i org.freedesktop.Tracker | sort | uniq

I was interested in searching only one directory, so I changed the index-recursive-directories

gsettings set org.freedesktop.Tracker.Miner.Files index-recursive-directories "['/home/pi/website-mirror']"

Starting the miners

You can start the miners using tracker-control

honza@pina ~ $ tracker-control -s
Starting miners…
  ✓ Applications
  ✓ File System

And after that check the progress using the status option

honza@pina ~ $ tracker-control -S
Store:
27 Jul 2014, 12:44:29:  ✓     Store                 - Idle

Miners:
27 Jul 2014, 12:44:31:  ✓     Applications          - Idle
27 Jul 2014, 12:44:33:   32%  File System           - Processing… 01h 03m 32s remaining

Once you have that, you can easily search for the term using tracker-search

honza@pina ~ $ tracker-search ping
Results:
  file:///home/pi/website-mirror/000458.html
  file:///home/pi/website-mirror/005495.html
  file:///home/pi/website-mirror/019534.html

I also have an application tracker that should find applications, but in default settings, it is probably limited to the Gnome desktop and not programs in /bin.

Logging

You can set logging either through the gsettings (for each component separately) or using tracker-control for all of them at once. The default level errors. Possible values are [debug|detailed|minimal|errors].

gsettings set org.freedesktop.Tracker.Miner.Files verbosity 'detailed'
// Or for all
tracker-control --set-log-verbosity=detailed

The logs are stored in $HOME/.local/share/tracker directory.

End notes

Someone else has also tried to his 5 minutes with Tracker, my observations are similar>

  • Name is horrible, it was hard to google anything with such generic name. Even Tracker itself giver warning when trying to search for common words (Search term ‘index’ is a stop word. Stop words are common words which may be ignored during the indexing process.)
  • It is not running without X out of the box. Rather annoying.
  • Search works, but I would probably choose Lucene next time.