From 5d4f9812470fc05d3af78846d57edfe92b73340c Mon Sep 17 00:00:00 2001 From: Yannick Warnier Date: Wed, 3 Oct 2007 06:21:00 +0200 Subject: [PATCH] [svn r13377] Added search plugin to default Dokeos package. This is an attempt at integrating the search plugin better and make it available to all, although it requires a server-side installation. Might be removed in the future if proves inappropriate. --- plugin/search/README.txt | 138 +++ plugin/search/client/client.conf.php | 28 + plugin/search/client/filter_user.lib.php | 105 ++ plugin/search/client/search.css | 27 + plugin/search/client/searchit.php | 99 ++ plugin/search/client/www/indexer_login.php | 47 + plugin/search/index.php | 23 + plugin/search/plugin.php | 24 + plugin/search/server/cron.d/dokeos-indexer | 1 + plugin/search/server/etc/indexer.conf | 1048 ++++++++++++++++++++ plugin/search/server/www/common.inc | 176 ++++ plugin/search/server/www/config.inc | 21 + plugin/search/server/www/init.inc | 544 ++++++++++ plugin/search/server/www/search.php | 269 +++++ plugin/search/server/www/search.xml.php | 383 +++++++ plugin/search/server/www/template.inc | 657 ++++++++++++ 16 files changed, 3590 insertions(+) create mode 100644 plugin/search/README.txt create mode 100644 plugin/search/client/client.conf.php create mode 100755 plugin/search/client/filter_user.lib.php create mode 100644 plugin/search/client/search.css create mode 100644 plugin/search/client/searchit.php create mode 100644 plugin/search/client/www/indexer_login.php create mode 100755 plugin/search/index.php create mode 100644 plugin/search/plugin.php create mode 100644 plugin/search/server/cron.d/dokeos-indexer create mode 100644 plugin/search/server/etc/indexer.conf create mode 100644 plugin/search/server/www/common.inc create mode 100644 plugin/search/server/www/config.inc create mode 100644 plugin/search/server/www/init.inc create mode 100644 plugin/search/server/www/search.php create mode 100644 plugin/search/server/www/search.xml.php create mode 100644 plugin/search/server/www/template.inc diff --git a/plugin/search/README.txt b/plugin/search/README.txt new file mode 100644 index 0000000000..d800fadf6b --- /dev/null +++ b/plugin/search/README.txt @@ -0,0 +1,138 @@ +Dokeos Search Plugin Installation Guide +======================================= + +1. Introduction +--------------- +This search plugin is composed of everything you need to get up and +running with a full-text search feature on your Dokeos portal. However, +this installation is not easy, and if you are not familiar with the +term "indexing", or with the configuration of a Linux server, we +highly recommend you seek advice from a qualified system administrator +to help you doing this. Of course, the Dokeos company, for which I am +directly working, offers this kind of services. Feel free to contact +info@dokeos.com for a quote. + +This search plugin relies on a search server, called MnogoSearch, which +has to be installed independently on a Linux server (the Windows +version, sadly, is not GPL nor free to use). +The following installation guides you through the steps of installing +the server on a Debian or Ubuntu computer, but you might probably +succeed in installing it on other architectures. + +Dokeos has made considerable efforts to have MnogoSearch integrated +into the latest versions of PHP, but it would never have succeeded +without the help of pierre.php@gmail.com who did all the technical +work. + +2. Installing files +-------------------- +All the "conf" files and the "search.xml.php" file in this package +need to be revised to configure properly. Most of all, you should look +for a "DBAddr mysql://db_user:db_pass@db_host/db_name/?dbmode=single" +line in the server files to make sure it is using the correct database +credentials. + +Now you will see that there are two directories in this plugin. +The "client" directory needs to stay there. The "client/www" directory +contains a PHP script that needs to be copied at the root of your +Dokeos portal (this will later give the indexing server an access to +your Dokeos portal). + +The "server" directory has to be moved on the indexing server (which +might be the same as your Dokeos portal's server if it is not too +overloaded). + +This "server" directory contains three subdirectories. +The "server/etc" directory contains the configuration of the +mnogosearch server, which typically on Debian will be located in +/etc/mnogosearch. Once you have installed the mnogosearch server, +you can pretty much overwrite the configuration with the files +contained in "server/etc", as they are already customised for indexing +Dokeos. + +The "server/cron.d" directory contains an optional file that you might +want to put in /etc/cron.d, so that the indexing will be run every night +5.00am. + +The "server/www" directory contains files that should be made available +to the public, to access idexation results. Feel free to put these, for +example, in /var/www/mnogosearch on your indexing server if that's where +Apache takes its public files. + +3. Installing the search server (MnoGoSearch) +--------------------------------------------- +The mnogosearch server installation comes in two parts: +A) installing the mnogosearch indexing server itself. This can be done +with a simple: + sudo apt-get install mnogosearch-common mnogosearch-mysql +B) installing the PHP5-mnogosearch bindings. This can be done by using +the PECL command-line installer + sudo pecl install mnogosearch-1.0.0 + +Once the server is installed, you may need to install server specific +additional programs to allow your indexer to go into documents (PDF, +Word, Excel, etc) and index the contents of these documents as well. + +You can find a list of programs supposed to be there in the +server/etc/indexer.conf file. Search for "pdftotext" and you will find +the lines nearby all define a program used to translate a document +into pure text before indexing it. Make sure you are able to launch +all of these commands on the command line. If you can't, the indexing +server is not likely to be able to do it either... + +4. Creating the DB and Dokeos user +---------------------------------- +In order to keep the index data, mnogosearch requires a database to +store this data. It is recommended to create an alternative user, with +access to only one database to do this. + +Once this user is configured and the DBAddr line is configured in +server/etc/indexer.conf, you can create the database structure by +calling (on the indexing server): + indexer -Ecreate indexer.conf + +The next step is to create a Dokeos user for the purpose of indexing +your courses (the user needs access to all courses to be able to index +them). Create a simple user in the Dokeos administration interface. Then +get his ID (you can get it by hovering the edition icon in the users +list: the user id is the number that shows after "user_id" in the URL) +and use it inside indexer_login.php to replace the 'xxx' value. + +Also configure the IP address and the host name of the indexing server +inside this file. + +Once these two steps are complete, you can start the first indexation +of your portal, by calling, on the command line of your indexing server: + indexer -N10 index.conf +N10 lets you limit the number of simultaneous threads that your indexing +server will be allowed to use. More than 10 might put your Dokeos portal +in overload. You might want to reduce this number to 3 for light servers. + +5. Installing the plugin +------------------------ +Installing the plugin is done by dispatching the files contained in +this plugin as described in "2. Installing files", and configuring the +various *.conf.php files as well as server/etc/indexer.conf and +server/www/search.xml.php + +Once the files have been moved and configured, you will still need to +index some data, then activate the plugin inside the Dokeos +administration panel. Then, basically, you should be able to use +the plugin straight away. + +6. International use +-------------------- +To keep this plugin small, we had to remove a considerable amount of +international-parsing helper files. If you need one for your language, +it may well be included in the default installation file for the +Debian mnogosearch-common package. + +If not, you should check more recent versions of mnogosearch on its +website: http://www.mnogosearch.org/ + +7. Seek help +------------ +Commercial suppport is available for the configuration and remote use +of this plugin at info@dokeos.com +If you have plenty of time to learn it by yourself or any other reason, +you might find some free help on our forums: http://www.dokeos.com/forum \ No newline at end of file diff --git a/plugin/search/client/client.conf.php b/plugin/search/client/client.conf.php new file mode 100644 index 0000000000..a3fd4a321c --- /dev/null +++ b/plugin/search/client/client.conf.php @@ -0,0 +1,28 @@ + + */ +/** + * Variables + */ +//// Addressing variables +// $search_url is the relative URL from the HTTP root of this portal, to the +// 'searchit.php' script. Something like /plugin/search/client/searchit.php +$search_url = '/plugin/search/client/searchit.php'; +// $server_url is the URL of the server containing the search engine XML +// interface (the contents of the server/www directory in this plugin package) +// and, more precisely, the absolute web path to the search.php script +$server_url = 'http://your.domain.com/subdir/search/search.php'; + +//// Language variables +// The name to be displayed on the 'Search' button +$lang_search_button = 'Search'; +// The text to be suffixed to the number of search results found +$lang_search_found = 'résultats trouvés.'; +// The text to be suffixed to the number of seconds the search took +$lang_seconds = 'secondes'; +// the text to be shown if no results were found +$lang_no_result_found = 'La recherche n\'a pas renvoyé de résultat.'; +?> diff --git a/plugin/search/client/filter_user.lib.php b/plugin/search/client/filter_user.lib.php new file mode 100755 index 0000000000..40211446e4 --- /dev/null +++ b/plugin/search/client/filter_user.lib.php @@ -0,0 +1,105 @@ + + * @uses The Dokeos database library, to access the tables using its facilities + * @uses The Dokeos main api library to execute database queries + */ +/** + * Checks if a user can access a given course + * + * The function gets the course code from the course directory, then + * checks in the course_user table if the user has access to that course. + * @param integer User ID (inside Dokeos) + * @param string Course directory + * @return boolean True if user has access, false otherwise + */ +function get_boolean_user_access_to_course_dir($user_id,$course_dir){ + if(api_is_platform_admin()){return true;} + $course_user = Database::get_main_table(TABLE_MAIN_COURSE); + $course = Database::get_main_table(TABLE_MAIN_COURSE); + //Get the course code + $sql = "SELECT code FROM $course WHERE directory = '$course_dir'"; + $res = api_sql_query($sql); + if(Database::num_rows($res)>0){ + //Course found. Get the course code. + $row = Database::fetch_array($res); + $course_code = $row['code']; + //Check user permissions + $sql = "SELECT * FROM $course_user + WHERE course_code = '$course_code' + AND user_id = '$user_id'"; + $res = api_sql_query($sql); + if(Database::num_rows($res)>0){ + //User permission found, go further and check there is a status + $row = Database::fetch_array($res); + $rel = $row['status']; + //if(!empty($rel)){ + //Status found (we may later check this further to refine permissions) + //Sometimes for now it appears that the status can be 0, though. + return true; + //} + //Status not found, problem, return false. + //return false; + }else{ + //No course-user relation found, return false + return false; + } + }else{ + //No course found, return false + return false; + } +} +/** + * Check course URL to get a course code and check it against user permissions + * + * Make this function always return true when no check is to be done + * @param string URL to check + * @return boolean True on user having access to the course or course not found, false otherwise + + */ +function access_check($url,$default=true){ + $matches = array(); + $match1 = preg_match('/courses\/([^\/]*)\//',$url,$matches); + if(!$match1){ + $match2 = preg_match('/cidReq=([^&]*)/',$url,$matches); + } + if($match1 or $match2){ + $has_access = get_boolean_user_access_to_course_dir($_SESSION['_user']['user_id'],$matches[1]); + if(!$has_access){ + //user has no access to this course, skip it + return false; + }//else grant access + else + { + return true; + } + } + return $default; +} +/** + * Translates a course code into a course name into a string + * + * This function should only be used if needed by a funny course-name rule + * @param string The string to transform + * @result string The transformed string + */ +function subst_course_code($string){ + $matches = array(); + if(preg_match('/(PORTAL_[0-9]{1,4})/',$string,$matches)){ + $course = Database::get_main_table(TABLE_MAIN_COURSE); + //Get the course code + $sql = "SELECT title FROM $course WHERE code = '".$matches[1]."'"; + $res = api_sql_query($sql); + if(Database::num_rows($res)>0){ + $row = Database::fetch_array($res); + $string = preg_replace('/(.*)\?cidReq=('.$matches[1].')(.*)/',' '.$row['title'].' - \1 \3',$string); + $string = preg_replace('/'.$matches[1].'/',$row['title'],$string); + } + } + return $string; +} +?> diff --git a/plugin/search/client/search.css b/plugin/search/client/search.css new file mode 100644 index 0000000000..4b6d15b118 --- /dev/null +++ b/plugin/search/client/search.css @@ -0,0 +1,27 @@ +.search_info{ + background-color: #F0F0F0; + font-style: smaller; + padding: 5px; +} +.result{ + /*border: 1px dotted black;*/ + margin: 20px; + padding: 10px; +} +.result .title{ + font-size: bigger; + margin-top: -20px; + background-color: white; + margin-right: 100px; + margin-left: -10px; + margin-bottom: 5px; +} +.result .description{ + font-decoration: none; + width: 90%; +} +.result .highlight{ + display: inline; + background-color: lightblue; + margin-right: 2px; +} diff --git a/plugin/search/client/searchit.php b/plugin/search/client/searchit.php new file mode 100644 index 0000000000..c43e816257 --- /dev/null +++ b/plugin/search/client/searchit.php @@ -0,0 +1,99 @@ + + */ +/** + * Variables + */ +require_once('../../../main/inc/global.inc.php'); +require ('filter_user.lib.php'); +require ('client.conf.php'); +api_block_anonymous_users(); +$htmlHeadXtra[] = ''; + +$start_time = time(); +$xml_file = $server_url.'?'.$_SERVER['QUERY_STRING']; +//if(!$doc = xmldocfile($xml_file)){ +$results = simplexml_load_file($xml_file); +if($results === false) +{ + $res = array(); +} +else +{ + //$doc->load($xml_file); + $subTotals = array(); + $lasttag = ''; + $myindex = 0; + $level = 0; + //$root = $doc->root(); + //$root = $doc->documentElement; + $my_query = $results->query; + $my_search_info = $results->search_info; + $my_search_term = $results->search_term; + $my_num_found = $results->num_found; + $my_search_time = $results->search_time; + $elementCount = 1; +} +/** + * This function is just a display helper. + * @param integer Result ID + * @param string Result title + * @param string Result URL + * @param string Short excerpt of the result document + * @param + */ +function result_output($id,$title,$url='',$excerpt='',$date='',$rating=''){ + if(empty($id) OR empty($title)){return false;} + $title = urldecode($title); + $title = preg_replace('/\?cidReq=.*$/','',$title); + $excerpt = preg_replace('/\s*()?/','
',$excerpt); + $excerpt = preg_replace('/<\/hl>\s*(<\/hl>)?/','
',$excerpt); + $excerpt = stripslashes($excerpt); + $string = "
\n" . + "
$id. $title - $date - $rating
\n" . + "
$excerpt
\n" . + "
\n"; + //$string = "$id. $title - $date
$excerpt

"; + return $string; +} + +include('../../../main/inc/header.inc.php'); +?> + +
+result as $res){ + if(access_check($res->result_du)){ + $to_print .= result_output($i,mb_convert_encoding(urldecode($res->result_dt),$charset,'utf-8'),$res->result_du,html_entity_decode(urldecode($res->result_de)),htmlentities(urldecode($res->result_dm)),$res->result_dr); + $i++; + } +} +//TODO check if a time and number of results is defined +$i--; +if($to_print != ''){ + //$time = $res['search_time'] + (time() - $start_time); + //echo "
".$i.' '.$lang_search_found.' '.$time." $lang_seconds

\n"; + echo "
".$i.' '.$lang_search_found."

\n"; + echo $to_print; +}else{ + echo "
".$lang_no_result_found."

\n"; +} +include('../../../main/inc/footer.inc.php'); +?> diff --git a/plugin/search/client/www/indexer_login.php b/plugin/search/client/www/indexer_login.php new file mode 100644 index 0000000000..8fc08fdf60 --- /dev/null +++ b/plugin/search/client/www/indexer_login.php @@ -0,0 +1,47 @@ +0) + { + while ($row = Database::fetch_array($res)) + { + $sql2 = "INSERT INTO $course_rel_user (course_code,user_id,status)VALUES('".$row['code']."',$id,5)"; + $res2 = @api_sql_query($sql2,__FILE__,__LINE__); + } + } + //now login the user to the platform (put everything needed inside the + // session) and then redirect the search engine to the courses list + $_SESSION['_user']['user_id'] = $id; + define('DOKEOS_HOMEPAGE', true); + require('main/inc/global.inc.php'); + require('user_portal.php'); +} +?> \ No newline at end of file diff --git a/plugin/search/index.php b/plugin/search/index.php new file mode 100755 index 0000000000..d0c6dc4db1 --- /dev/null +++ b/plugin/search/index.php @@ -0,0 +1,23 @@ + + */ +/** + * Variables + */ +include('client/client.conf.php'); +?> +
diff --git a/plugin/search/plugin.php b/plugin/search/plugin.php new file mode 100644 index 0000000000..5618f7c6e7 --- /dev/null +++ b/plugin/search/plugin.php @@ -0,0 +1,24 @@ +Plugins) + * Make sure your read the README.txt file to understand how to use this plugin! + * @package dokeos.plugin + * @author Yannick Warnier + */ +/** + * Plugin details (must be present) + */ +//the plugin title +$plugin_info['title']='Search'; +//the comments that go with the plugin +$plugin_info['comment']="Full-text search engine"; +//the locations where this plugin can be shown +$plugin_info['location']=array('mycourses_main', 'mycourses_menu', 'header', 'footer'); +//the plugin version +$plugin_info['version']='1.0'; +//the plugin author +$plugin_info['author']='Yannick Warnier'; +?> diff --git a/plugin/search/server/cron.d/dokeos-indexer b/plugin/search/server/cron.d/dokeos-indexer new file mode 100644 index 0000000000..11413a12be --- /dev/null +++ b/plugin/search/server/cron.d/dokeos-indexer @@ -0,0 +1 @@ +0 5 * * * root /usr/sbin/indexer -N 2 /etc/mnogosearch/indexer.conf &>/dev/null diff --git a/plugin/search/server/etc/indexer.conf b/plugin/search/server/etc/indexer.conf new file mode 100644 index 0000000000..8f817c3418 --- /dev/null +++ b/plugin/search/server/etc/indexer.conf @@ -0,0 +1,1048 @@ +#!/usr/sbin/indexer -d + +########################################################################### +# This is a sample indexer config file. +# To start using it please edit and rename to indexer.conf. +# You can also make this file executable and run it directly. +# You may want to keep the original indexer.conf-dist for future references. +# Use '#' to comment out lines. +# All command names are case insensitive (DBAddr=DBADDR=dbaddr). +# You may use '\' character to prolong current command to next line +# when it is required. +# +# You may include another configuration file in any place of the indexer.conf +# using "Include " command. +# Absolute path if starts with "/": +#Include /etc/mnogosearch/inc1.conf +# Relative path else: +#Include inc1.conf +########################################################################### + + + +########################################################################### +# Section 1. +# Global parameters. + + +########################################################################### +# DBAddr +# Options (type, host, database name, port, user and password) +# to connect to SQL database. +# Should be used before any other commands. +# Has global effect for whole config file. +# Format: +#DBAddr :[//[DBUser[:DBPass]@]DBHost[:DBPort]]/DBName/[?dbmode=mode] +# +# ODBC notes: +# Use DBName to specify ODBC data source name (DSN) +# DBHost does not matter, use "localhost". +# +# Currently supported DBType values are +# mysql, pgsql, mssql, oracle, ibase, db2, mimer, sqlite. +# +# MySQL users can specify path to Unix socket when connecting to localhost: +# mysql://foo:bar@localhost/mnogosearch/?socket=/tmp/mysql.sock +# +# If you are using PostgreSQL and do not specify hostname, +# e.g. pgsql://user:password@/dbname/ +# then PostgreSQL will not work via TCP, but will use Unix socket. +# +# You may also select database mode of word storage. +# When "single" is specified, all words are stored in the same table. +# If "multi" is selected, words will be located in different tables. +# "multi" mode is usually faster but requires more tables. +# Default mode is "single". + +DBAddr mysql://db_user:db_password@db_host/db_name/?dbmode=single + +###################################################################### +# VarDir /var/lib/mnogosearch +# You may choose alternative working directory for +# search results cache: +# +#VarDir /var/lib/mnogosearch + + + +###################################################################### +# NewsExtensions yes/no +# Whether to enable news extensions. +# Default value is no. +#NewsExtensions no + + +####################################################################### +#SyslogFacility +# This is used if indexer was compiled with syslog support and if you +# don't like the default value. Argument is the same as used in syslog.conf +# file. For list of possible facilities see syslog.conf(5) +#SyslogFacility local7 + + +####################################################################### +# LocalCharset +# Defines the charset which will be used to store data in the database. +# All other character sets will be converted into the given charset. +# Take a look into mnoGoSearch documentation for detailed explanation +# how to choose a LocalCharset depending on languages used on your site(s). +# This command should be used once and takes global effect for the config file. +# Only most popular charsets used in Internet are written here. +# Take a look into the documentation to check the whole list of +# supported charsets. +# Default LocalCharset is iso-8859-1 (latin1). +# +# Western Europe: German, Finnish, French, Swedish +LocalCharset iso-8859-1 +#LocalCharset windows-1252 + +# Central Europe: Czech, Slovenian, Slovak, Hungarian, Polish +#LocalCharset iso-8859-2 +#LocalCharset windows-1250 + +# Baltic: Lithuanian, Estonian, Latvian +#LocalCharset iso-8859-4 +#LocalCharset iso-8859-13 +#LocalCharset windows-1257 + +# Cyrillic: Russian, Serbian, Ukrainian, Belarussian, Macedonian, Bulgarian +#LocalCharset koi8-r +#LocalCharset iso-8859-5 +#LocalCharset x-mac-cyrillic +#LocalCharset windows-1251 + +# Arabic +#LocalCharset iso-8859-6 +#LocalCharset windows-1256 + +# Greek +#LocalCharset iso-8859-7 +#LocalCharset windows-1253 + +# Hebrew +#LocalCharset iso-8859-8 +#LocalCharset windows-1255 + +# Turkish +#LocalCharset iso-8859-9 +#LocalCharset windows-1254 + +# Vietnamese +#LocalCharset viscii +#LocalCharset windows-1258 + +# Chinese +#LocalCharset gb2312 +#LocalCharset BIG5 + +# Korean +#LocalCharset EUC-KR + +# Japanese +#LocalCharset Shift-JIS + +# Full UNICODE +#LocalCharset UTF-8 +#LocalCharset iso-8859-1 +#LocalCharset windows-1252 + +####################################################################### +#ForceIISCharset1251 yes/no +#This option is useful for users which deals with Cyrillic content and broken +#(or misconfigured?) Microsoft IIS web servers, which tends to not report +#charset correctly. This is really dirty hack, but if this option is turned on +#it is assumed that all servers which reports as 'Microsoft' or 'IIS' have +#content in Windows-1251 charset. +#This command should be used only once in configuration file and takes global +#effect. +#Default: no +#ForceIISCharset1251 no + + +########################################################################### +#CrossWords yes/no +# Whether to build CrossWords index +# Default value is no +#CrossWords no +CrossWords yes + + +########################################################################### +# StopwordFile +# Load stop words from the given text file. You may specify either absolute +# file name or a name relative to mnoGoSearch /etc directory. You may use +# several StopwordFile commands. +# +#StopwordFile stopwords/en.sl + +Include stopwords.conf + + +########################################################################### +# LangMapFile +# Load language map for charset and language guesser from the given file. +# You may specify either an absolute file name or a name relative +# to mnoGoSearch /etc directory. You may use several LangMapFile commands. +# +#LangMapFile langmap/en.ascii.lm + +Include langmap.conf + + +####################################################################### +# Word lengths. You may change default length range of words +# stored in the database. By default, words with the length in the +# range from 1 to 32 are stored. +# +#MinWordLength 1 +#MaxWordLength 32 + +####################################################################### +# MaxDocSize bytes +# Default value 1048576 (1 Mb) +# Takes global effect for whole config file +MaxDocSize 10485760 + +####################################################################### +# URLSelectCacheSize num +# Default value 128 +# Select targets to index at once. +#URLSelectCacheSize 1024 + +####################################################################### +# WordCacheSize bytes +# Default value 8388608 (8 Mb) +# Defines maximal in-memory words cache size. +# Note: cache is allocated for every DBAddr, so if you have 3 DBAddr +# commands and WordCacheSize is 10Mb, it can take up to 30Mb of memory. +#WordCacheSize 8388608 + +####################################################################### +# HTTPHeader
+# You may add your desired headers in indexer HTTP request. +# You should not use "If-Modified-Since","Accept-Charset" headers, +# these headers are composed by indexer itself. +# "User-Agent: mnoGoSearch/version" is sent too, but you may override it. +# Command has global effect for all configuration file. +# +#HTTPHeader "User-Agent: My_Own_Agent" +#HTTPHeader "Accept-Language: ru, en" +HTTPHeader "Accept-Language: fr, nl, en, de, es" +#HTTPHeader "From: webmaster@mysite.com" + + +# flush server.active to inactive for all server table records +# before loading new +#FlushServerTable + +####################################################################### +# ServerTable +# Load servers with all their parameters from the table specified in argument. +# Check an example of tables server and srvinfo structure in +# create/(your_database)/create.txt +# +#ServerTable mysql://user:pass@host/dbname/tablename + +########################################################################## +# LoadChineseList +# Load Chinese word frequency list. +# By default GB2312 charset and mandarin.freq dictionary is used. +#LoadChineseList + +########################################################################## +# LoadThaiList +# Load Thai word frequency list +# By default tis-620 and thai.freq dictionary is used. +#LoadThaiList + +########################################################################## +# Section 2. +# URL control configuration. + + +########################################################################## +#Allow [Match|NoMatch] [NoCase|Case] [String|Regex] [ ... ] +# Use this to allow URLs that match (doesn't match) the given argument. +# First three optional parameters describe the type of comparison. +# Default values are Match, NoCase, String. +# Use "NoCase" or "Case" values to choose case insensitive or case sensitive +# comparison. +# Use "Regex" to choose regular expression comparison. +# Use "String" to choose string with wildcards comparison. +# Wildcards are '*' for any number of characters and '?' for one character. +# Note that '?' and '*' have special meaning in "String" match type. Please use +# "Regex" to describe documents with '?' and '*' signs in URL. +# "String" match is much faster than "Regex". Use "String" where it +# is possible. +# You may use several arguments for one 'Allow' command. +# You may use this command any times. +# Takes global effect for config file. +# Note that mnoGoSearch automatically adds one "Allow regex .*" +# command after reading config file. It means that allowed everything +# that is not disallowed. +# Examples +# Allow everything: +#Allow * +# Allow everything but .php .cgi .pl extensions case insensitively using regex: +#Allow NoMatch Regex \.php$|\.cgi$|\.pl$ +# Allow .HTM extension case sensitively: +#Allow Case *.HTM + + +########################################################################## +#Disallow [Match|NoMatch] [NoCase|Case] [String|Regex] [ ... ] +# Use this to disallow URLs that match (doesn't match) given argument. +# The meaning of first three optional parameters is exactly the same +# with "Allow" command. +# You can use several arguments for one 'Disallow' command. +# Takes global effect for config file. +# +# Examples: +# Disallow URLs that are not in udm.net domains using "string" match: +#Disallow NoMatch *.udm.net/* +# Disallow any except known extensions and directory index using "regex" match: +#Disallow NoMatch Regex \/$|\.htm$|\.html$|\.shtml$|\.phtml$|\.php$|\.txt$ +# Exclude cgi-bin and non-parsed-headers using "string" match: +#Disallow */cgi-bin/* *.cgi */nph-* +# Exclude anything with '?' sign in URL. Note that '?' sign has a +# special meaning in "string" match, so we have to use "regex" match here: +#Disallow Regex \? + +Disallow Match *whoisonline* +Disallow Match *myagenda* +Disallow Match *&rand=* +Disallow Match */chat/* +Disallow Match */auth/* +Disallow Match */online/* +Disallow Match */user/* +Disallow Match */admin/* +Disallow Match */group/* +Disallow Match *delete* +Disallow Match *del* +Disallow Match *remove* +Disallow Match *example_document.html* + + +# Exclude some known extensions using fast "String" match: +Disallow *.b *.sh *.md5 *.rpm +Disallow *.arj *.tar *.zip *.tgz *.gz *.z *.bz2 +Disallow *.lha *.lzh *.rar *.zoo *.ha *.tar.Z +Disallow *.gif *.jpg *.jpeg *.bmp *.tiff *.tif *.xpm *.xbm *.pcx +Disallow *.vdo *.mpeg *.mpe *.mpg *.avi *.movie *.mov *.wmv +Disallow *.mid *.mp3 *.rm *.ram *.wav *.aiff *.ra +Disallow *.vrml *.wrl *.png *.ico *.psd *.dat +Disallow *.exe *.com *.cab *.dll *.bin *.class *.ex_ +#Disallow *.xls *.doc +Disallow *.tex *.texi *.texinfo +# Disallow *.rtf *.pdf *.ps *.eps +Disallow *.cdf +Disallow *.ai *.ppt *.hqx +Disallow *.cpt *.bms *.oda *.tcl +Disallow *.o *.a *.la *.so +Disallow *.pat *.pm *.m4 *.am *.css +Disallow *.map *.aif *.sit *.sea +Disallow *.m3u *.qt + +# Exclude Apache directory list in different sort order using "string" match: +Disallow *D=A *D=D *M=A *M=D *N=A *N=D *S=A *S=D + +# More complicated case. RAR .r00-.r99, ARJ a00-a99 files +# and UNIX shared libraries. We use "Regex" match type here: +Disallow Regex \.r[0-9][0-9]$ \.a[0-9][0-9]$ \.so\.[0-9]$ + + +########################################################################## +#CheckOnly [Match|NoMatch] [NoCase|Case] [String|Regex] [ ... ] +# The meaning of first three optional parameters is exactly the same +# with "Allow" command. +# Indexer will use HEAD instead of GET HTTP method for URLs that +# match/do not match given regular expressions. It means that the file +# will be checked only for being existing and will not be downloaded. +# Useful for zip,exe,arj and other binary files. +# Note that you can disallow those files with commands given below. +# You may use several arguments for one "CheckOnly" commands. +# Useful for example for searching through the URL names rather than +# the contents (a la FTP-search). +# Takes global effect for config file. +# +# Check some known non-text extensions using "string" match: +#CheckOnly *.b *.sh *.md5 +#CheckOnly *.arj *.tar *.zip *.tgz *.gz +#CheckOnly *.lha *.lzh *.rar *.zoo *.tar*.Z +#CheckOnly *.gif *.jpg *.jpeg *.bmp *.tiff +#CheckOnly *.vdo *.mpeg *.mpe *.mpg *.avi *.movie +#CheckOnly *.mid *.mp3 *.rm *.ram *.wav *.aiff +#CheckOnly *.vrml *.wrl *.png +#CheckOnly *.exe *.cab *.dll *.bin *.class +#CheckOnly *.tex *.texi *.xls *.doc *.texinfo +#CheckOnly *.rtf *.pdf *.cdf *.ps +#CheckOnly *.ai *.eps *.ppt *.hqx +#CheckOnly *.cpt *.bms *.oda *.tcl +#CheckOnly *.rpm *.m3u *.qt *.mov +#CheckOnly *.map *.aif *.sit *.sea +# +# or check ANY except known text extensions using "regex" match: +#CheckOnly NoMatch Regex \/$|\.html$|\.shtml$|\.phtml$|\.php$|\.txt$ +#CheckOnly NoMatch Regex &rand=[0-9][0-9][0-9][0-9]$|myagenda\.php.*$|whoisonline\.php.*$ + + +########################################################################## +#HrefOnly [Match|NoMatch] [NoCase|Case] [String|Regex] [ ... ] +# The meaning of first three optional parameters is exactly the same +# with "Allow" command. +# +# Use this to scan a HTML page for "href" tags but not to index the contents +# of the page with an URLs that match (doesn't match) given argument. +# Commands have global effect for all configuration file. +# +# When indexing large mail list archives for example, the index and thread +# index pages (like mail.10.html, thread.21.html, etc.) should be scanned +# for links but shouldn't be indexed: +# +#HrefOnly */mail*.html */thread*.html +HrefOnly Match *dk_sid=* +HrefOnly Match *indexer_login.php* +HrefOnly Match */your.domain.com/index.php* +HrefOnly Match */document.php* +HrefOnly Match */courses/*/index.php +HrefOnly Match */courses/*/ +HrefOnly Match */document/headerpage.php* +HrefOnly Match */document/slideshow.php* + + +########################################################################## +#CheckMp3 [Match|NoMatch] [NoCase|Case] [String|Regex] [ ...] +# The meaning of first three optional parameters is exactly the same +# with "Allow" command. +# If an URL matches given rules, indexer will download only a little part +# of the document and try to find MP3 tags in it. On success, indexer +# will parse MP3 tags, else it will download whole document then parse +# it as usual. +# Notes: +# This works only with those servers which support HTTP/1.1 protocol. +# It is used "Range: bytes" header to download mp3 tag. +#CheckMp3 *.bin *.mp3 + + +####################################################################### +#CheckMP3Only [Match|NoMatch] [NoCase|Case] [String|Regex] [ ...] +# The meaning of first three optional parameters is exactly the same +# with "Allow" command. +# If an URL matches given rules, indexer, like in the case CheckMP3 command, +# will download only a little part of the document and try to find MP3 tags. +# On success, indexer will parse MP3 tags, else it will NOT download whole +# document. +#CheckMP3Only *.bin *.mp3 + + +# How to combine Allow, Disallow, CheckOnly, HrefOnly commands. +# +# indexer compares URLs against all these command arguments in the +# order of their appearance in indexer.conf file. +# If indexer finds that URL matches some rule, it will make a decision of what +# to do with this URL, allow it, disallow it or use HEAD instead +# of the GET method. So, you may use different Allow, Disallow, +# CheckOnly, HrefOnly commands order. +# If no one of these commands are given, mnoGoSearch will allow everything +# by default. +# +# There are many possible combinations. Samples of two of them are here: +# +# Sample of first useful combination. +# Disallow known non-text extensions (zip,wav etc), +# then allow everything else. This sample is uncommented above (note that +# there is actually no "Allow *" command, it is added automatically after +# indexer.conf loading). +# +# Sample of second combination. +# Allow some known text extensions (html, txt) and directory index ( / ), +# then disallow everything else: +# +#Allow .html .txt */ +#Disallow * + +# HoldBadHrefs