Class cSearchIndex
CONTENIDO API - Search Index Object
This object creates an index of an article
Create object with $oIndex = new SearchIndex($db); # where $db is the global CONTENIDO database object. Start indexing with $oIndex->start($idart, $aContent); where $aContent is the complete content of an article specified by its content types. It looks like Array ( [CMS_HTMLHEAD] => Array ( [1] => Herzlich Willkommen... [2] => ...auf Ihrer Website! ) [CMS_HTML] => Array ( [1] => Die Inhalte auf dieser Website ...
The index for keyword 'willkommen' would look like '&12=1(CMS_HTMLHEAD-1)' which means the keyword 'willkommen' occurs 1 times in article with articleId 12 and content type CMS_HTMLHEAD[1].
TODO: The basic idea of the indexing process is to take the complete content of an article and to generate normalized index terms from the content and to store a specific index structure in the relation 'con_keywords'. To take the complete content is not very flexible. It would be better to differentiate by specific content types or by any content. The &, =, () and - seperated string is not easy to parse to compute the search result set. It would be a better idea (and a lot of work) to extend the relation 'con_keywords' to store keywords by articleId (or content source identifier) and content type. The functions removeSpecialChars, setStopwords, setContentTypes and setCmsOptions should be sourced out into a new helper-class. Keep in mind that class Search and SearchResult uses an instance of object Index.
- cSearchBaseAbstract
- cSearchIndex
Copyright: four for business AG <www.4fb.de>
License: http://www.contenido.org/license/LIZENZ.txt
Author: Willi Man
Located at classes/search/class.search.index.php
public
|
|
public
|
|
public
|
#
createKeywords( )
for each cms-type create index structure. it looks like Array ( [die] => CMS_HTML-1 [inhalte] => CMS_HTML-1 [auf] => CMS_HTML-1 CMS_HTMLHEAD-2 [dieser] => CMS_HTML-1 [website] => CMS_HTML-1 CMS_HTML-1 CMS_HTMLHEAD-2 ) |
public
|
#
saveKeywords( )
generate index_string from index structure and save keywords The index_string looks like "&12=2(CMS_HTMLHEAD-1,CMS_HTML-1)" |
public
|
#
deleteKeywords( )
if keywords don't occur in the article anymore, update index_string and delete keyword if necessary |
public
|
|
public
mixed
|
|
public
string
|
|
public
|
|
public
|
|
public
|
#
setCmsOptions( mixed $cms_options )
set the cms_options array of cms types which should be treated special |
public
boolean
|
#
checkCmsType( string $idtype )
Check if the requested content type should be indexed (false) or not (true) |
public
array
|
|
public
array
|
_debug()
|
protected
array
|
$_keycode | array() |
#
the content of the cms-types of an article |
protected
array
|
$_keywords | array() |
#
the list of keywords of an article |
protected
array
|
$_stopwords | array() |
#
the words, which should not be indexed |
protected
array
|
$_keywordsOld | array() |
#
the keywords of an article stored in the DB |
protected
array
|
$_keywordsDel | array() |
#
the keywords to be deleted |
protected
string
|
$_place |
|
#
'auto' or 'self' The field 'auto' in table con_keywords is used for automatic indexing. The value is a string like "&12=2(CMS_HTMLHEAD-1,CMS_HTML-1)", which means a keyword occurs 2 times in article with $idart 12 and can be found in CMS_HTMLHEAD[1] and CMS_HTML[1]. The field 'self' can be used in the article properties to index the article manually. |
protected
array
|
$_cmsOptions | array() |
#
array of cms types |
protected
array
|
$_cmsType | array() |
#
array of all available cms types |
protected
array
|
$_cmsTypeSuffix | array() |
#
the suffix of all available cms types |
protected
integer
|
$idart |
|
$cfg,
$client,
$lang,
$oDB
|