About SeekQuarry/Yioop

SeekQuarry, LLC is the company responsible for the Yioop PHP Search Engine project. SeekQuarry is owned by, and Yioop was mainly written by, myself, Chris Pollett. Development of Yioop began in Nov. 2009 and was first publicly released August, 2010. SeekQuarry maintains the documentation and official public code repository for Yioop. It is also responsible for the SeekQuarry and Yioop servers. SeekQuarry LLC receives revenue via contributions from people interested in the continued development of the Yioop Search Engine Software and in the documentary resources the Seekquarry website provides.

The Yioop and SeekQuarry Names

When looking for names for my search engine, I was originally thinking about using the name SeekQuarry whose domain name hadn't been registered. After deciding that I would use Yioop for the name of my search engine site, I decided I would use SeekQuarry as a site to publish the software that is used in the Yioop engine. That is, yioop.com is a live site that demonstrates the open source search engine software distributed on the seekquarry.com site.

The name Yioop has the following history: I was looking for names that hadn't already been registered. My wife is Vietnamese, so I thought I might have better luck with Vietnamese words, since all the English ones seemed to have been taken. I started with the word giup, which is the way to spell 'help' in Vietnamese if you remove the accents. It was already taken. Then I tried yoop, which is my lame way of pronouncing how giup sounds like in English. It was already taken. So then I combined the two to get Yioop.

Dictionary Data

The Bloom filter for Chinese word segmentation was developed using the word list http://www.mdbg.net/chindict/chindict.php?page=cedict which has a Creative Commons Share Alike 3.0 Unported License. Trie's for word suggestion for all languages other than Vietnamese were built using the Wiktionary Frequency Lists. These are also available under a Creative Commons Share Alike 3.0 Unported License as described on Wikipedia's Download page. The derived data files (if they were created for that language) for a language IANA tag, locale-tag, can be found in the locale/locale-tag/resources folder of the Yioop project. These are also licensed using the same license. For Vietnamese, I used the following Vietnamese Word List obtained with permission from Ho Ngoc Duc.

The English part-of-speech tagging algorithm in Yioop was originally coded by Shailesh Padave using, with permission, the article on Brill tagging by Ian Barber. This kind of tagging in turn makes use of the Brown Corpus. The Hindi variant of Brill tagging in Yioop was originally coded by Salil Shenoy. Part-of-Speech tagging is used in Yioop only if thesaurus lookup is being used in final result reordering. In this case, to generate the thesaurus, WordNet is used.

Additional Credits

Several people helped with localization: Mary Pollett, Jonathan Ben-David, Ismail.B, Andrea Brunetti, Thanh Bui, Sujata Dongre, Animesh Dutta, Yashi Kamboj, Aida Khosroshahi, Youn Kim, Radha Kotipalli, Akshat Kukreti, Vijeth Patil, Chao-Hsin Shih, Ahmed Kamel Taha, and Sugi Widjaja. Several of my former students have contributed code which appear in the Yioop repository. They are: Charles Bocage, Timothy Chow, Mangesh Dahale, Ravi Dhillon, Priya Gangaraju, Yangcha Ho, Akshat Kukreti, Pooja Mishra, Harika Nakula, Sreenidhi Pundi Muralidharan, Nakul Natu, Shailesh Padave, Sarika Padmashali, Vijaya Pamidi, Aris Panagiotopoulos, Snigdha Parvatneni, Akash Patel, Niravkumar Patel, Parth Patel, Vijeth Patil, Mallika Perepa, Tarun Pepira, Eswara Rajesh Pinapala, Tamayee Potluri, Pragya Rana, Gargi Sheguri, Salil Shenoy, Forrest Sun, Shawn Tice, Pushkar Umaranikar, Sandhya Vissapragada. In addition, several of my students have created projects based on Yioop. Information about these project can be found on their student pages.