viewgit/inc/functions.php:22 Function utf8_encode() is deprecated [8192]
Filename | |
---|---|
en-US/pages/documentation.thtml |
diff --git a/en-US/pages/documentation.thtml b/en-US/pages/documentation.thtml index 7609026..9064e25 100755 --- a/en-US/pages/documentation.thtml +++ b/en-US/pages/documentation.thtml @@ -73,6 +73,17 @@ there, you can check the Yioop <a href="#requirements">Requirements</a> section followed by the more general <a href="#installation">Installation and Configuration</a> instructions. + <a href="http://www.yioop.com/">Yioop.com</a>, the demo site of Yioop allows + people to register accounts. This would give you an account with User + rather than Admin access, but it would allow you to experiment with + some of the features of Yioop beyond search such as + Yioop Groups, Wikis, and Crawl Mixes without needing to install the + software yourself. The <a href="#search-interface">Search Interface</a>, + <a href="#userrolegroups">Managing Users, Roles, and Groups</a>, + <a href="#feeds-wikis">Feeds and Wikis</a>, and + <a href="#mixes">Mixing Crawl Indexes</a> sections below could serve + as a guide to testing the portion of the site general users have access to + on Yioop.com. </p> <h3 id="intro">Introduction</h3> <p>The Yioop search engine is designed to allow users @@ -84,14 +95,16 @@ control over the exact sites which are being indexed with Yioop, you have much better control over the kinds of results that a search will return. Yioop provides a traditional web interface to do queries, an rss api, - and a function api. In this section we discuss some of the different + and a function api. It also supports many common features of a search + portal such as user discussion group, blogs, wikis, and a news aggregator. + In this section we discuss some of the different search engine technologies which exist today, how Yioop fits into this eco-system, and when Yioop might be the right choice for your search engine needs. In the remainder of this document after the introduction, we discuss how to get and install Yioop; the files and folders used - in Yioop; the various crawl, search and administration facilities in the - Yioop; localization in the Yioop system; building a site using the Yioop - framework; embedding Yioop in an existing web-site; + in Yioop; the various crawl, search, social portal, and administration + facilities in the Yioop; localization in the Yioop system; building a site + using the Yioop framework; embedding Yioop in an existing web-site; customizing Yioop; and the Yioop command-line tools. </p> <p>Since the mid-1990s a wide variety of search engine technologies @@ -131,7 +144,8 @@ <a href="http://www.opensearch.org">Open Search RSS results</a> or a JSON variant. These can be used to embed Yioop within your existing site. If you want to create a new search engine site, Yioop offers a web-based, - model-view-controller framework with a web-interface for localization + model-view-adapter (a variation on model-view-controller) framework w + ith a web-interface for localization that can serve as the basis for your app. </p> <p> @@ -329,7 +343,7 @@ content of each downloaded url. Amongst urls with the same hash only the one that is linked to the most will be returned after grouping. Finally, if a user wants to do more sophisticated post-processing such as clustering - or computing page, Yioop supports a straightforward architecture + or computing page rank, Yioop supports a straightforward architecture for indexing plugins. </p> <p> @@ -430,13 +444,35 @@ <li>Using web archives, crawls can be mirrored amongst several machines to speed-up serving search results. This can be further sped-up by using memcache or filecache.</li> - <li>Yioop comes with its own extendable model-view-controller + <li>Yioop comes with its own extendable model-view-adapter framework that you can use directly to create new sites that use Yioop search technology. This framework also comes with a GUI which makes it easy to localize strings and static pages.</li> + <li>Yioop has been optimized to work well with smart phone web browsers + and with tablet devices.</li> </ul> </li> - <li><b>Search and User Interface</b> + <li><b>Social and User Interface</b> + <ul> + <li>Yioop can be configured to allow or not to allow users to + register for accounts. + </li> + <li>If allowed, user accounts can create discussion groups, blogs, and + wikis. + </li> + <li>Users can share their own mixes of crawls that exist in the Yioop + system.</li> + <li>If user accounts are enabled, Yioop has a search tools page on which + people can suggest urls to crawl.</li> + <li>Yioop has three different captcha'ing mechanisms that can be + used in account registration and for suggest urls: a standard graphics + based captch, a text-based captcha, and a hash-cash-like catpha.</li> + <li>Password authentication can be configured to either use a + standard password hash based system, or make use of Fiat Shamir + zero-knowledge authentication.</li> + </ul> + </li> + <li><b>Search</b> <ul> <li>Yioop supports subsearches geared towards presenting certain kinds of media such as images, video, and news. The list of video and @@ -446,9 +482,9 @@ can be combined like a simplified relational algebra.</li> <li>Yioop can be configured to display word suggestions as a user types a query. It can also suggest spell corrections for mis-typed - queries. This feature can be localized</li> - <li>Yioop has been optimized to work well with smart phone web browsers - and with tablet devices.</li> + queries. This feature can be localized.</li> + <li>Yioop can also make use of a thesaurus facility such as provided + by WordNet to suggest related queries.</li> <li>Yioop supports the ability to filter out urls from search results after a crawl has been performed. It also has the ability to edit summary information that will be displayed for urls.</li> @@ -466,9 +502,9 @@ <li>Yioop is capable of indexing small sites to sites or collections of sites containing low hundreds of millions of documents.</li> - <li>For indexes starting with v0.96, Yioop uses a hybrid inverted + <li>Yioop uses a hybrid inverted index/suffix tree approach for word lookup to make multi-word - queries faster.</li> + queries faster on disk bound machines.</li> <li>Yioop indexes are positional rather than bag of word indexes, and a index compression scheme called Modified9 is used.</li> @@ -483,11 +519,15 @@ also attempting to extract information from unknown filetypes.</li> <li>Yioop has a simple page rule language for controlling what content should be extracted from a page or record.</li> + <li>Yioop has two different kinds of text summarizers which can be used + to further affect what words are index: a basic web + page scraper, and a centroid algorithm summarizer. The latter can be + used to generate word clouds of crawled documents.</li> <li>Indexing occurs as crawling happens, so when a crawl is stopped, it is ready to be used to handle search queries immediately.</li> <li>Yioop Indexes can be used to create classifiers which then can be used in labeling and ranking future indexes.</li> - <li>Yioop come with a stemmer for English and Italian, and a + <li>Yioop comes with a stemmer for English and Italian, and a word segmenter for Chinese. It uses char-gramming for other languages. Yioop has a simple architecture for adding stemmers for other languages. </li> @@ -518,6 +558,7 @@ attributes. It also supports X-Robots-Tag HTTP headers.</li> <li>Yioop has its own DNS caching mechanism.</li> <li>Yioop supports crawling TOR networks (.onion urls).</li> + <li>Yioop supports crawling through a list of proxy servers.</li> <li>Yioop supports crawl quotas for web sites. I.e., one can control the number of urls/hour downloaded from a site.</li> <li>Yioop can detect website congestion and slow down crawling