Yioop_V9.5_Source_Code_Documentation

AddressesPlugin extends IndexingPlugin
in package
implements CrawlConstants

Used to extract emails, phone numbers, and addresses from a web page.

These are extracted into the EMAILS, PHONE_NUMBERS, and ADDRESSES fields of the page's summary.

Tags
author

Chris Pollett

Interfaces, Classes, Traits and Enums

CrawlConstants
Shared constants and enums used by components that are involved in the crawling process

Table of Contents

$countries  : array<string|int, mixed>
Associative array of world countries and country code. Some entries are duplicated into country's local script
$db  : object
Reference to a database object that might be used by models on this plugin
$index_archive  : object
The IndexArchiveBundle object that this indexing plugin might make changes to in its postProcessing method
$regions  : array<string|int, mixed>
List of common regions, abbreviations, and local spellings of regions of the US, Canada, Australia, UK, as well as major cities elsewhere
__construct()  : mixed
Builds an IndexingPlugin object. Loads in the appropriate models for the given plugin object
checkCandidate()  : mixed
Checks if the passed sequence of lines has enough features of a postal address to call it an address. If so, return the address as a single string
checkCountry()  : bool
Used to check if a line contains a word associated with a World country or country code.
checkPhoneOrEmail()  : bool
Used to check if a line contains either an email address or a phone number
checkRegion()  : bool
Used to check if a line contains a word associated with a province, state or major city.
checkStreet()  : bool
Used to check if a given line in an address candidate has features associated with being a street address.
checkZipPostalCodeWords()  : bool
Used to check if a line contains a word associated with a ZIP or Postal code
getAdditionalMetaWords()  : array<string|int, mixed>
Returns an array of additional meta words which have been added by this plugin
getProcessors()  : array<string|int, mixed>
Which mime type page processors this plugin should do additional processing for
pageProcessing()  : array<string|int, mixed>
This method is called by a PageProcessor in its handle() method just after it has processed a web page. This method allows an indexing plugin to do additional processing on the page such as adding sub-documents, before the page summary is handed back to the fetcher.
pageSummaryProcessing()  : mixed
Adjusts the document summary of a page after the page processor's process method has been called so that the subdoc's fields associated with the addresses plugin get copied as fields of the whole page summary. Then it deletes the subdoc fields.
parseEmails()  : string
Extracts substrings from the provided $line that are in the format of an email address. Returns first email from line
parsePhones()  : array<string|int, mixed>
Checks for a phone number related keyword in the line and if found extracts digits which are presumed to be a phone number
parseSubdoc()  : array<string|int, mixed>
Parses EMAILS, PHONE_NUMBERS and ADDRESSES from $text and returns an array with these three fields containing sub-arrays of the given items
postProcessing()  : mixed
This method is called by the queue_server with the name of a completed index. This allows the indexing plugin to perform searches on the index and using the results, inject new page/index data into the index before it becomes available for end use.

Properties

$countries

Associative array of world countries and country code. Some entries are duplicated into country's local script

public array<string|int, mixed> $countries = ["ANDORRA" => "AD", "UNITED ARAB EMIRATES" => "AE", "AFGHANISTAN" => "AF", "ANTIGUA AND BARBUDA" => "AG", "ANGUILLA" => "AI", "ALBANIA" => "AL", "ARMENIA" => "AM", "ANGOLA" => "AO", "ANTARCTICA" => "AQ", "ARGENTINA" => "AR", "AMERICAN SAMOA" => "AS", "AUSTRIA" => "AT", "AUSTRALIA" => "AU", "ARUBA" => "AW", "ÅLAND ISLANDS" => "AX", "AZERBAIJAN" => "AZ", "BOSNIA AND HERZEGOVINA" => "BA", "BARBADOS" => "BB", "BANGLADESH" => "BD", "BELGIUM" => "BE", "BURKINA FASO" => "BF", "BULGARIA" => "BG", "BAHRAIN" => "BH", "BURUNDI" => "BI", "BENIN" => "BJ", "SAINT BARTHELEMY" => "BL", "BERMUDA" => "BM", "BRUNEI DARUSSALAM" => "BN", "BOLIVIA" => "BO", "BONAIRE, SINT EUSTATIUS AND SABA" => "BQ", "BRAZIL" => "BR", "BAHAMAS" => "BS", "BHUTAN" => "BT", "BOUVET ISLAND" => "BV", "BOTSWANA" => "BW", "BELARUS" => "BY", "BELIZE" => "BZ", "CANADA" => "CA", "COCOS ISLANDS" => "CC", "DEMOCRATIC REPUBLIC OF THE CONGO" => "CD", "CENTRAL AFRICAN REPUBLIC" => "CF", "CONGO" => "CG", "SWITZERLAND" => "CH", "COTE D'IVOIRE" => "CI", "COOK ISLANDS" => "CK", "CHILE" => "CL", "CAMEROON" => "CM", "CHINA" => "CN", "中国" => "China", "COLOMBIA" => "CO", "COSTA RICA" => "CR", "CUBA" => "CU", "CAPE VERDE" => "CV", "CURACAO" => "CW", "CHRISTMAS ISLAND" => "CX", "CYPRUS" => "CY", "CZECH REPUBLIC" => "CZ", "GERMANY" => "DE", "DJIBOUTI" => "DJ", "DENMARK" => "DK", "DOMINICA" => "DM", "DOMINICAN REPUBLIC" => "DO", "ALGERIA" => "DZ", "ECUADOR" => "EC", "ESTONIA" => "EE", "EGYPT" => "EG", "WESTERN SAHARA" => "EH", "ERITREA" => "ER", "SPAIN" => "ES", "ETHIOPIA" => "ET", "FINLAND" => "FI", "FIJI" => "FJ", "FALKLAND ISLANDS (MALVINAS)" => "FK", "MICRONESIA, FEDERATED STATES OF" => "FM", "FAROE ISLANDS" => "FO", "FRANCE" => "FR", "GABON" => "GA", "UNITED KINGDOM" => "GB", "GRENADA" => "GD", "GEORGIA" => "GE", "FRENCH GUIANA" => "GF", "GUERNSEY" => "GG", "GHANA" => "GH", "GIBRALTAR" => "GI", "GREENLAND" => "GL", "GAMBIA" => "GM", "GUINEA" => "GN", "GUADELOUPE" => "GP", "EQUATORIAL GUINEA" => "GQ", "GREECE" => "GR", "SOUTH GEORGIA AND THE SOUTH SANDWICH ISLANDS" => "GS", "GUATEMALA" => "GT", "GUAM" => "GU", "GUINEA-BISSAU" => "GW", "GUYANA" => "GY", "HONG KONG" => "HK", "HEARD ISLAND AND MCDONALD ISLANDS" => "HM", "HONDURAS" => "HN", "CROATIA" => "HR", "HAITI" => "HT", "HUNGARY" => "HU", "INDONESIA" => "ID", "IRELAND" => "IE", "ISRAEL" => "IL", "ISLE OF MAN" => "IM", "INDIA" => "IN", "BRITISH INDIAN OCEAN TERRITORY" => "IO", "IRAQ" => "IQ", "IRAN" => "IR", "ICELAND" => "IS", "ITALY" => "IT", "JERSEY" => "JE", "JAMAICA" => "JM", "JORDAN" => "JO", "JAPAN" => "JP", "日本" => "JA", "KENYA" => "KE", "KYRGYZSTAN" => "KG", "CAMBODIA" => "KH", "KIRIBATI" => "KI", "COMOROS" => "KM", "SAINT KITTS AND NEVIS" => "KN", "NORTH KOREA" => "KP", "SOUTH KOREA" => "KR", "한국" => "KR", "KUWAIT" => "KW", "CAYMAN ISLANDS" => "KY", "KAZAKHSTAN" => "KZ", "LAOS" => "LA", "LEBANON" => "LB", "SAINT LUCIA" => "LC", "LIECHTENSTEIN" => "LI", "SRI LANKA" => "LK", "LIBERIA" => "LR", "LESOTHO" => "LS", "LITHUANIA" => "LT", "LUXEMBOURG" => "LU", "LATVIA" => "LV", "LIBYA" => "LY", "MOROCCO" => "MA", "MONACO" => "MC", "MOLDOVA, REPUBLIC OF" => "MD", "MONTENEGRO" => "ME", "SAINT MARTIN" => "MF", "MADAGASCAR" => "MG", "MARSHALL ISLANDS" => "MH", "MACEDONIA, THE FORMER YUGOSLAV REPUBLIC OF" => "MK", "MALI" => "ML", "MYANMAR" => "MM", "MONGOLIA" => "MN", "MACAO" => "MO", "NORTHERN MARIANA ISLANDS" => "MP", "MARTINIQUE" => "MQ", "MAURITANIA" => "MR", "MONTSERRAT" => "MS", "MALTA" => "MT", "MAURITIUS" => "MU", "MALDIVES" => "MV", "MALAWI" => "MW", "MEXICO" => "MX", "MALAYSIA" => "MY", "MOZAMBIQUE" => "MZ", "NAMIBIA" => "NA", "NEW CALEDONIA" => "NC", "NIGER" => "NE", "NORFOLK ISLAND" => "NF", "NIGERIA" => "NG", "NICARAGUA" => "NI", "NETHERLANDS" => "NL", "NORWAY" => "NO", "NEPAL" => "NP", "NAURU" => "NR", "NIUE" => "NU", "NEW ZEALAND" => "NZ", "OMAN" => "OM", "PANAMA" => "PA", "PERU" => "PE", "FRENCH POLYNESIA" => "PF", "PAPUA NEW GUINEA" => "PG", "PHILIPPINES" => "PH", "PAKISTAN" => "PK", "POLAND" => "PL", "SAINT PIERRE AND MIQUELON" => "PM", "PITCAIRN" => "PN", "PUERTO RICO" => "PR", "PALESTINE, STATE OF" => "PS", "PORTUGAL" => "PT", "PALAU" => "PW", "PARAGUAY" => "PY", "QATAR" => "QA", "REUNION" => "RE", "ROMANIA" => "RO", "SERBIA" => "RS", "RUSSIA" => "RU", "Россия" => "RU", "RWANDA" => "RW", "SAUDI ARABIA" => "SA", "SOLOMON ISLANDS" => "SB", "SEYCHELLES" => "SC", "SUDAN" => "SD", "SWEDEN" => "SE", "SINGAPORE" => "SG", "SAINT HELENA, ASCENSION AND TRISTAN DA CUNHA" => "SH", "SLOVENIA" => "SI", "SVALBARD AND JAN MAYEN" => "SJ", "SLOVAKIA" => "SK", "SIERRA LEONE" => "SL", "SAN MARINO" => "SM", "SENEGAL" => "SN", "SOMALIA" => "SO", "SURINAME" => "SR", "SOUTH SUDAN" => "SS", "SAO TOME AND PRINCIPE" => "ST", "EL SALVADOR" => "SV", "SINT MAARTEN" => "SX", "SYRIAN ARAB REPUBLIC" => "SY", "SWAZILAND" => "SZ", "TURKS AND CAICOS ISLANDS" => "TC", "CHAD" => "TD", "FRENCH SOUTHERN TERRITORIES" => "TF", "TOGO" => "TG", "THAILAND" => "TH", "TAJIKISTAN" => "TJ", "TOKELAU" => "TK", "TIMOR-LESTE" => "TL", "TURKMENISTAN" => "TM", "TUNISIA" => "TN", "TONGA" => "TO", "TURKEY" => "TR", "TRINIDAD AND TOBAGO" => "TT", "TUVALU" => "TV", "TAIWAN" => "TW", "臺灣" => "TW", "TANZANIA, UNITED REPUBLIC OF" => "TZ", "UKRAINE" => "UA", "UGANDA" => "UG", "UNITED STATES MINOR OUTLYING ISLANDS" => "UM", "UNITED STATES OF AMERICA" => "USA", "UNITED STATES" => "US", "URUGUAY" => "UY", "UZBEKISTAN" => "UZ", "VATICAN CITY" => "VA", "SAINT VINCENT AND THE GRENADINES" => "VC", "VENEZUELA, BOLIVARIAN REPUBLIC OF" => "VE", "BRITISH VIRGIN ISLANDS" => "VG", "U.S. VIRGIN ISLANDS" => "VI", "VIETNAM" => "VN", "VANUATU" => "VU", "WALLIS AND FUTUNA" => "WF", "SAMOA" => "WS", "YEMEN" => "YE", "MAYOTTE" => "YT", "SOUTH AFRICA" => "ZA", "ZAMBIA" => "ZM", "ZIMBABWE" => "ZW"]

$db

Reference to a database object that might be used by models on this plugin

public object $db

$index_archive

The IndexArchiveBundle object that this indexing plugin might make changes to in its postProcessing method

public object $index_archive

$regions

List of common regions, abbreviations, and local spellings of regions of the US, Canada, Australia, UK, as well as major cities elsewhere

public array<string|int, mixed> $regions = ["ALABAMA", "AL", "ALASKA", "AK", "ARIZONA", "AZ", "ARKANSAS", "AR", "CALIFORNIA", "CA", "COLORADO", "CO", "CONNECTICUT", "CT", "DELAWARE", "DE", "FLORIDA", "FL", "GEORGIA", "GA", "HAWAII", "HI", "IDAHO", "ID", "ILLINOIS", "IL", "INDIANA", "IN", "IOWA", "IA", "KANSAS", "KS", "KENTUCKY", "KY", "LOUISIANA", "LA", "MAINE", "ME", "MARYLAND", "MD", "MASSACHUSETTS", "MA", "MICHIGAN", "MI", "MINNESOTA", "MN", "MISSISSIPPI", "MS", "MISSOURI", "MO", "MONTANA", "MT", "NEBRASKA", "NE", "NEVADA", "NV", "HAMPSHIRE", "NH", "NEW JERSEY", "NJ", "MEXICO", "NM", "NEW YORK", "NY", "NC", "NORTH DAKOTA", "ND", "OHIO", "OH", "OKLAHOMA", "OK", "OREGON", "OR", "PENNSYLVANIA", "PA", "RHODE", "RI", "CAROLINA", "SC", "DAKOTA", "SD", "TENNESSEE", "TN", "TEXAS", "TX", "UTAH", "UT", "VERMONT", "VT", "VIRGINIA", "VA", "WASHINGTON", "WA", "WV", "WISCONSIN", "WI", "WYOMING", "WY", "SAMOA", "AS", "COLUMBIA", "DC", "MICRONESIA", "FM", "GUAM", "GU", "MARSHALL", "MH", "MARIANA", "MP", "PALAU", "PW", "PUERTO", "RICO", "PR", "VIRGIN", "ISLANDS", "VI", "ALBERTA", "AB", "BRITISH", "COLUMBIA", "BC", "MANITOBA", "MB", "NEW BRUNSWICK", "NB", "NEWFOUNDLAND", "NL", "NORTHWEST", "TERRITORIES", "NT", "NOVA SCOTIA", "NS", "NUNAVUT", "NU", "ONTARIO", "ON", "PRINCE EDWARD ISLAND", "PE", "QUEBEC", "QC", "SASKATCHEWAN", "SK", "YUKON", "YT", "CAPITAL", "ACT", "CHRISTMAS", "CX", "COCOS ISLANDS", "CC", "JERVIS", "BAY", "JBT", "SOUTH", "WALES", "NSW", "NORFOLK", "NF", "NT", "QUEENSLAND", "QLD", "SA", "TASMANIA", "TAS", "VICTORIA", "VIC", "WA", "ABERDEENSHIRE", "ABD", "ANGLESEY", "AGY", "ALDERNEY", "ALD", "ANGUS", "ANS", "ANTRIM", "ANT", "ARGYLLSHIRE", "ARL", "ARMAGH", "ARM", "AVON", "AVN", "AYRSHIRE", "AYR", "BANFFSHIRE", "BAN", "BEDFORDSHIRE", "BDF", "BERWICKSHIRE", "BEW", "BUCKINGHAMSHIRE", "BKM", "BORDERS", "BOR", "BRECONSHIRE", "BRE", "BERKSHIRE", "BRK", "BUTE", "BUT", "CAERNARVONSHIRE", "CAE", "CAITHNESS", "CAI", "CAMBRIDGESHIRE", "CAM", "CARLOW", "CAR", "CAVAN", "CAV", "CENTRAL", "CEN", "CARDIGANSHIRE", "CGN", "CHESHIRE", "CHS", "CLARE", "CLA", "CLACKMANNANSHIRE", "CLK", "CLEVELAND", "CLV", "CUMBRIA", "CMA", "CARMARTHENSHIRE", "CMN", "CORNWALL", "CON", "CORK", "COR", "CUMBERLAND", "CUL", "CLWYD", "CWD", "DERBYSHIRE", "DBY", "DENBIGHSHIRE", "DEN", "DEVON", "DEV", "DYFED", "DFD", "DUMFRIES-SHIRE", "DFS", "DUMFRIES", "GALLOWAY", "DGY", "DUNBARTONSHIRE", "DNB", "DONEGAL", "DON", "DORSET", "DOR", "DOWN", "DOW", "DUBLIN", "DUB", "DURHAM", "DUR", "ELN", "ERY", "ESSEX", "ESS", "FERMANAGH", "FER", "FIFE", "FIF", "FLINTSHIRE", "FLN", "GALWAY", "GAL", "GLAMORGAN", "GLA", "GLOUCESTERSHIRE", "GLS", "GRAMPIAN", "GMP", "GWENT", "GNT", "GUERNSEY", "GSY", "MANCHESTER", "GTM", "GWYNEDD", "GWN", "HAMPSHIRE", "HAM", "HEREFORDSHIRE", "HEF", "HIGHLAND", "HLD", "HERTFORDSHIRE", "HRT", "HUMBERSIDE", "HUM", "HUNTINGDONSHIRE", "HUN", "HEREFORD", "WORCESTER", "HWR", "INVERNESS-SHIRE", "INV", "WIGHT", "IOW", "JERSEY", "JSY", "KINCARDINESHIRE", "KCD", "KENT", "KEN", "KERRY", "KER", "KILDARE", "KID", "KILKENNY", "KIK", "KIRKCUDBRIGHTSHIRE", "KKD", "KINROSS-SHIRE", "KRS", "LANCASHIRE", "LAN", "LONDONDERRY", "LDY", "LEICESTERSHIRE", "LEI", "LEITRIM", "LET", "LAOIS", "LEX", "LIMERICK", "LIM", "LINCOLNSHIRE", "LIN", "LANARKSHIRE", "LKS", "LONGFORD", "LOG", "LOUTH", "LOU", "LOTHIAN", "LTN", "MAYO", "MAY", "MEATH", "MEA", "MERIONETHSHIRE", "MER", "GLAMORGAN", "MGM", "MONTGOMERYSHIRE", "MGY", "MIDLOTHIAN", "MLN", "MONAGHAN", "MOG", "MONMOUTHSHIRE", "MON", "MORAYSHIRE", "MOR", "MERSEYSIDE", "MSY", "NAIRN", "NAI", "NORTHUMBERLAND", "NBL", "NORFOLK", "NFK", "NORTH RIDING OF YORKSHIRE", "NRY", "NORTHAMPTONSHIRE", "NTH", "NOTTINGHAMSHIRE", "NTT", "NYK", "OFFALY", "OFF", "ORKNEY", "OKI", "OXFORDSHIRE", "OXF", "PEEBLES-SHIRE", "PEE", "PEMBROKESHIRE", "PEM", "PERTH", "PER", "POWYS", "POW", "RADNORSHIRE", "RAD", "RENFREWSHIRE", "RFW", "ROSS", "CROMARTY", "ROC", "ROSCOMMON", "ROS", "ROXBURGHSHIRE", "ROX", "RUTLAND", "RUT", "SHROPSHIRE", "SAL", "SELKIRKSHIRE", "SEL", "SUFFOLK", "SFK", "GLAMORGAN", "SGM", "SHETLAND", "SHI", "SLIGO", "SLI", "SOMERSET", "SOM", "SARK", "SRK", "SURREY", "SRY", "SUSSEX", "SSX", "STRATHCLYDE", "STD", "STIRLINGSHIRE", "STI", "STAFFORDSHIRE", "STS", "SUTHERLAND", "SUT", "SUSSEX", "SXE", "SXW", "SYK", "TAYSIDE", "TAY", "TIPPERARY", "TIP", "TYNE", "TWR", "TYRONE", "TYR", "WARWICKSHIRE", "WAR", "WATERFORD", "WAT", "WESTMEATH", "WEM", "WESTMORLAND", "WES", "WEXFORD", "WEX", "WEST GLAMORGAN", "WGM", "WICKLOW", "WIC", "WIGTOWNSHIRE", "WIG", "WILTSHIRE", "WIL", "ISLES", "WIS", "LOTHIAN", "WLN", "WEST MIDLANDS", "WMD", "WORCESTERSHIRE", "WOR", "WRY", "WEST", "WYK", "YORKSHIRE", "YKS", "HELSINKI", "МОСКВА", "上海", "北京", "南京", "成都", "HONG", "KONG", "TOKYO", "SEOUL", "東京", "香港", "서울", "MADRID", "BARCELONA", "ROME", "PARIS", "MARSEILLE", "TOULOUSE", "LYON", "ORLEAN", "BRUSSELS", "DELHI", "UTRECHT", "COPENHAGEN", "BERLIN", "FRANKFURT", "MÜNCHEN", "MUNICH", "VIENNA", "ISTANBUL", "ΑΘΗΝΑ", "ATHENS", "ПЕТЕРБУРГ", "BUENOS", "AIRES", "RIO", "JANEIRO", "MANILA", "深圳", "CHICAGO", "KARACHI", "BANGKOK", "LAGOS", "JOHANNESBERG", "FRANSCICO", "TORONTO", "MIAMI", "PHILADELPHIA", "KUALA", "LAMPUR", "ESSEN", "LONDON", "KINSHASA", "BOSTON", "AMSTERDAM", "臺北", "武漢", "AHMEDABAD", "BANGALORE", "HYDERABAD", "BAGHDAD", "LIMA", "名古屋", "ANGELES", "SANTIAGO", "MILANO", "HOUSTON", "SHÀNGHAISHÌ", "AP", "ANDHRA", "AR", "ARUNACHAL", "AS", "ASSAM", "BR", "BIHAR", "CT", "CHHATTISGARH", "GA", "GOA", "GJ", "GUJARAT", "HR", "HARYANA", "HP", "HIMACHAL", "JK", "JAMMU", "KASHMIR", "JH", "JHARKHAND", "KA", "KARNATAKA", "KL", "KERALA", "MP", "MADHYA", "MH", "MAHARASHTRA", "MN", "MANIPUR", "ML", "MEGHALAYA", "MZ", "MIZORAM", "NL", "NAGALAND", "OR", "ORISSA", "PB", "PUNJAB", "RJ", "RAJASTHAN", "SK", "SIKKIM", "TN", "TAMIL", "NADU", "TR", "TRIPURA", "UT", "UTTARAKHAND", "UP", "UTTAR", "PRADESH", "WB", "BENGAL", "ANDAMAN", "NICOBAR", "CH", "CHANDIGARH", "DN", "DADRA", "NAGAR", "HAVELI", "DD", "DAMAN", "DIU", "DL", "LD", "LAKSHADWEEP", "PY", "PUDUCHERRY", "PONDICHERRY", "澳門半島"]

Methods

__construct()

Builds an IndexingPlugin object. Loads in the appropriate models for the given plugin object

public __construct() : mixed
Return values
mixed

checkCandidate()

Checks if the passed sequence of lines has enough features of a postal address to call it an address. If so, return the address as a single string

public checkCandidate(array<string|int, mixed> $pre_address) : mixed
Parameters
$pre_address : array<string|int, mixed>

an array of potential address lines

Return values
mixed

false if not address, the lines imploded together using space if an address

checkCountry()

Used to check if a line contains a word associated with a World country or country code.

public checkCountry(string $line) : bool
Parameters
$line : string

from address to check

Return values
bool

whether it contains a country term

checkPhoneOrEmail()

Used to check if a line contains either an email address or a phone number

public checkPhoneOrEmail(string $line) : bool
Parameters
$line : string

from address to check

Return values
bool

whether it contains acountry term

checkRegion()

Used to check if a line contains a word associated with a province, state or major city.

public checkRegion(string $line) : bool
Parameters
$line : string

from address to check

Return values
bool

whether it contains acountry term

checkStreet()

Used to check if a given line in an address candidate has features associated with being a street address.

public checkStreet(string $line) : bool
Parameters
$line : string

address line to check

Return values
bool

whether or not it contains a word identified with being a street address such as WAY, AVENUE, STREET, etc.

checkZipPostalCodeWords()

Used to check if a line contains a word associated with a ZIP or Postal code

public checkZipPostalCodeWords(string $line) : bool
Parameters
$line : string

from address to check

Return values
bool

whether it contains such a code

getAdditionalMetaWords()

Returns an array of additional meta words which have been added by this plugin

public static getAdditionalMetaWords() : array<string|int, mixed>
Return values
array<string|int, mixed>

meta words and maximum description length of results allowed for that meta word

getProcessors()

Which mime type page processors this plugin should do additional processing for

public static getProcessors() : array<string|int, mixed>
Return values
array<string|int, mixed>

an array of page processors

pageProcessing()

This method is called by a PageProcessor in its handle() method just after it has processed a web page. This method allows an indexing plugin to do additional processing on the page such as adding sub-documents, before the page summary is handed back to the fetcher.

public pageProcessing(string $page, string $url) : array<string|int, mixed>
Parameters
$page : string

web-page contents

$url : string

the url where the page contents came from, used to canonicalize relative links

Return values
array<string|int, mixed>

consisting of a sequence of subdoc arrays found on the given page.

pageSummaryProcessing()

Adjusts the document summary of a page after the page processor's process method has been called so that the subdoc's fields associated with the addresses plugin get copied as fields of the whole page summary. Then it deletes the subdoc fields.

public pageSummaryProcessing(array<string|int, mixed> &$summary, string $url) : mixed
Parameters
$summary : array<string|int, mixed>

of current document. It will be adjusted by the code below

$url : string

the url where the summary contents came from

Return values
mixed

parseEmails()

Extracts substrings from the provided $line that are in the format of an email address. Returns first email from line

public parseEmails(string $line) : string
Parameters
$line : string

string to extract email from

Return values
string

first email found on line

parsePhones()

Checks for a phone number related keyword in the line and if found extracts digits which are presumed to be a phone number

public parsePhones(string $line) : array<string|int, mixed>
Parameters
$line : string

to check for phone numbers

Return values
array<string|int, mixed>

all phone numbers detected by this method from the $line

parseSubdoc()

Parses EMAILS, PHONE_NUMBERS and ADDRESSES from $text and returns an array with these three fields containing sub-arrays of the given items

public parseSubdoc(string $text) : array<string|int, mixed>
Parameters
$text : string

to use for extraction

Return values
array<string|int, mixed>

with found emails, phone numbers, and addresses

postProcessing()

This method is called by the queue_server with the name of a completed index. This allows the indexing plugin to perform searches on the index and using the results, inject new page/index data into the index before it becomes available for end use.

public postProcessing(string $index_name) : mixed
Parameters
$index_name : string

the name/timestamp of an IndexArchiveBundle to do post processing for

Return values
mixed

        

Search results