SeekQuarry

About SeekQuarry/Yioop

SeekQuarry, LLC is the company responsible for the Yioop PHP Search Engine project. SeekQuarry is owned by, and Yioop was mainly written by, myself, Chris Pollett. Development of Yioop began in Nov. 2009 and was first publicly released August, 2010. SeekQuarry maintains the documentation and official public code repository for Yioop. It is also responsible for the SeekQuarry and Yioop servers. SeekQuarry LLC receives revenue from consulting services related to Yioop and by contributions from people interested in the continued development of the Yioop Search Engine Software and in the documentary resources the Seekquarry website provides.

The Yioop and SeekQuarry Names

When looking for names for my search engine, I was originally thinking about using the name SeekQuarry whose domain name hadn't been registered. After deciding that I would use Yioop for the name of my search engine site, I decided I would use SeekQuarry as a site to publish the software that is used in the Yioop engine. That is, yioop.com is a live site that demonstrates the open source search engine software distributed on the seekquarry.com site.

The name Yioop has the following history: I was looking for names that hadn't already been registered. My wife is Vietnamese, so I thought I might have better luck with Vietnamese words since all the English ones seemed to have been taken. I started with the word giup, which is the way to spell 'help' in Vietnamese if you remove the accents. It was already taken. Then I tried yoop, which is my lame way of pronouncing how giup sounds like in English. It was already taken. So then I combined the two to get Yioop.

Dictionary Data

The Bloom filter for Chinese word segmentation was developed using the word list http://www.mdbg.net/chindict/chindict.php?page=cedict which has a Creative Commons Share Alike 3.0 Unported License. Trie's for word suggestion for all languages other than Vietnamese were built using the Wiktionary Frequency Lists. These are also available under a Creative Commons Share Alike 3.0 Unported License as described on Wikipedia's Download page. The derived data files (if they were created for that language) for a language IANA tag, locale-tag, can be found in the locale/locale-tag/resources folder of the Yioop project. These are also licensed using the same license. For Vietnamese, I used the following Vietnamese Word List obtained with permision from Ho Ngoc Duc.

Additional Credits

Several people helped with localization: Mary Pollett, Jonathan Ben-David, Ismail.B, Andrea Brunetti, Thanh Bui, Sujata Dongre, Animesh Dutta, Aida Khosroshahi, Youn Kim, Akshat Kukreti, Vijeth Patil, Chao-Hsin Shih, Ahmed Kamel Taha, and Sugi Widjaja. Thanks to Ravi Dhillon, Akshat Kukreti, Tanmayee Potluri, Shawn Tice, and Sandhya Vissapragada for creating patches for Yioop issues. Several of my master's students have done projects related to Yioop: Amith Chandranna, Priya Gangaraju, Akshat Kukreti, Vijaya Pamidi, Vijeth Patil, Vijaya Sinha, Tarun Pepira, Tanmayee Potluri, Shawn Tice, and Sandhya Vissapragada. Amith's code related to an Online version of the HITs algorithm. It is not currently in the main branch of Yioop, but it is obtainable from Amith Chandranna's student page. Vijaya Pamidi developed a Firefox web traffic extension for Yioop. Her code is also obtainable from Vijaya Pamidi's master's pages. Her project was later extended by Tarun Ramaswamy. Neither of these projects is currently in the main Yioop repository. Vijeth Patil's Project involved adding support for Twitter and RSS feeds to add additional real-time search results to the standard search results. This is not currently in main repository. Tanmayee Potluri's Project added log and database archive iterators for Yioop. It is currently not in the main branch. Vijaya Sinha's Project concerned using Open Street Map data in Yioop. This code is also not currently in the main branch. Priya Gangaraju's code served as the basis for the plugin feature currently in Yioop. Shawn Tice's CS288 project served as the basis of a rewrite of the archive crawl feature of Yioop for the multi-queue server setting. His CS298 project served as the basis for classifier feature within Yioop. Sandhya Vissapragada's Master project served as the basis for the autosuggest and spell checking functionality in Yioop. The following other students have created text processors for Yioop: Nakul Natu (pptx), Vijeth Patil (epub), and Tarun Ramaswamy (xslx). Akshat Kukreti created the Italian language stemmer based on the Snowball version at http://tartarus.org/. His CS298 project served as the basis for the cache history feature and the as the basis etags support while crawling in Yioop.