Read this page if your Joomla site has a large number of pages and/or some of your pages are particularly large.
Smart Search is suitable for the majority of Joomla sites. However, search presents particular challenges for large sites and both the old and new search methods are likely to present difficulties; just in different ways. It should be remembered that Smart Search is a pure PHP implementation of a search engine and particularly large sites may be better off using a standalone search engine such as Solr.
To use Smart Search on a large site you will probably need to adjust some of the configuration settings. What follows is some general advice on what to look out for and what to try tweaking. There are a number of known outstanding issues with regard to running Smart Search on large sites which will hopefully be addressed in future versions and these are also described here.
Smart Search works by creating and maintaining an independent index of search terms in a number of database tables. The problem for large sites is that the indexing process can be quite heavy in terms of CPU usage, memory usage and disk usage. Even after the initial construction of the index is complete, incremental updates can also be quite heavy. The good news is that querying the index is a relatively quick and lightweight operation.
Because the initial indexing process can take a long time it is best to run the indexer from the command line so as to avoid any issues from browser sessions timing out. The CLI indexer will not timeout regardless of how long it takes t complete and it can be easily aborted if problems are encountered. Furthermore, error messages are easily visible with the CLI indexer, whereas they are hidden when running from the Administrator.
For instructions on using the CLI indexer see Setting up automatic Smart Search indexing
The indexer breaks the indexing job into batches of content items. By default the batch size is set at 30 meaning that up to 30 content items will be indexed per batch. Increasing the batch size will potentially make the indexing process faster, but it will use more memory and possibly more temporary disk space.
If the indexer is running out of memory then try making the following adjustments one at a time until the problem is resolved.
The Smart Search index tables can get very big very quickly! The jos_finder_links_termsX tables (where X is a single hexadecimal character) contain one row per term/phrase per content item and a single Joomla article containing 1000 words will typically result in approximately 3000 rows being added to these tables. A second article of a similar size will add a similar number of rows even if both articles contain the same words. A site with tens of thousands of articles, some of which may contain thousands of words, is very likely to end up with these mapping tables containing millions of rows. It is not unusual for the index tables to occupy several gigabytes of disk space in such circumstances.
With the present version of Smart Search there isn't much you can do about this. However, it is hoped that in the next release you will be able to adjust the number of words per phrase that get indexed. At present this is hard-wired at 3, meaning that every word that gets indexed is also indexed as part of a pair of adjacent words and as part of a triplet of adjacent words. This is useful for the auto-completion feature and generally improves the quality of search results. On sites where disk space is an issue it would be good to reduce this to 2 or even 1, so that the mapping tables would be correspondingly smaller.