Class cSearchIndex
CONTENIDO API - Search Index Object.
This object creates an index of an article.
Create object where $db is the global CONTENIDO database object.
$oIndex = new SearchIndex($db);
Start indexing where $aContent is the complete content of an article specified by its content types.
$oIndex->start($idart, $aContent);
It looks like: Array ( [CMS_HTMLHEAD] => Array ( [1] => Herzlich Willkommen... [2] => ...auf Ihrer Website! ) [CMS_HTML] => Array ( [1] => Die Inhalte auf dieser Website ...
The index for keyword 'willkommen' would look like '&12=1(CMS_HTMLHEAD-1)' which means the keyword 'willkommen' occurs 1 times in article with articleId 12 and content type CMS_HTMLHEAD[1].
TODO: The basic idea of the indexing process is to take the complete content of an article and to generate normalized index terms from the content and to store a specific index structure in the relation 'con_keywords'.
To take the complete content is not very flexible. It would be better to differentiate by specific content types or by any content.
The &, =, () and - seperated string is not easy to parse to compute the search result set.
It would be a better idea (and a lot of work) to extend the relation 'con_keywords' to store keywords by articleId (or content source identifier) and content type.
The functions removeSpecialChars, setStopwords, setContentTypes and setCmsOptions should be sourced out into a new helper-class.
Keep in mind that class Search and SearchResult uses an instance of object Index.
- cSearchBaseAbstract
- cSearchIndex
Copyright: four for business AG <www.4fb.de>
License: http://www.contenido.org/license/LIZENZ.txt
Author: Willi Man
Located at classes/search/class.search.index.php
public
|
|
public
|
|
public
|
|
public
|
#
saveKeywords( )
Generate index_string from index structure and save keywords. The index_string looks like "&12=2(CMS_HTMLHEAD-1,CMS_HTML-1)". |
public
|
#
deleteKeywords( )
If keywords don't occur in the article anymore, update index_string and delete keyword if necessary. |
public
|
|
public
mixed
|
|
public
string
|
|
public
|
|
public
|
|
public
|
#
setCmsOptions( mixed $cms_options )
Set the cms_options array of cms types which should be treated special. |
public
boolean
|
#
checkCmsType( string $idtype )
Check if the requested content type should be indexed (false) or not (true). |
public
array
|
|
public
array
|
_debug()
|
protected
array
|
$_keycode | array() |
#
content of the cms-types of an article |
protected
array
|
$_keywords | array() |
#
list of keywords of an article |
protected
array
|
$_stopwords | array() |
#
words, which should not be indexed |
protected
array
|
$_keywordsOld | array() |
#
keywords of an article stored in the DB |
protected
array
|
$_keywordsDel | array() |
#
keywords to be deleted |
protected
string
|
$_place |
|
#
'auto' or 'self' |
protected
array
|
$_cmsOptions | array() |
#
array of cms types |
protected
array
|
$_cmsType | array() |
#
array of all available cms types |
protected
array
|
$_cmsTypeSuffix | array() |
#
suffix of all available cms types |
protected
integer
|
$idart |
|
$cfg,
$client,
$lang,
$oDB
|