This page controls the search engine spider (indexer).
The spider is an "automatic browser" which when instructed will visit your site,
find all the pages linked into your site, and index them.
In general, when you use a link option link in the Control Center a "dialog" with controls will appear.
After setting the control values you can save your changes by pressing
or you can abandon them by pressing
.
Index now
Use this option to tell the spider to re-index your site immediately.
Normally your site will be automatically re-indexed on a schedule we determine.
However, you can also use this option to tell us to re-index your site as soon as possible.
Schedule re-indexing
Use this option to control the automatic re-indexing schedule for your site.
You can leave it on automatic, set it to a specific pre-determined schedule or turn it off entirely.
Set starting points
If you want to include additional sites in your index, or if the spider
cannot locate all of the pages in your site, you can give it additional starting points.
The spider uses these additional starting points to locate pages to index
just like it does with the main address of your account.
The addresses need to be entered one per line (wrapping by the browser can be ignored),
and should include the starting "https://..." or "http://...".
Note that these addresses are not treated as individual pages, but as entire sites.
The spider will follow the links on each page listed in order to locate even more pages to include in the index.
If you want to add just a single page to the index, you can first
prevent the spider from following the links on that page by using page exclusions like:
example.com/justthispage.html index=yes follow=no
and then add it to the list of additional starting points:
example.com/justthispage.html
See the next section for information on exclusions.
Exclude pages
Use this option to prevent the spider from indexing certain parts of your site and/or from following
the links on specified pages.
If you are looking for more information than this summary provides, see
How to Exclude Pages from Search.
This dialog contains a simple list of file "exclusions", one per line (browser wrapping may be ignored).
Each exclusion consists of a "URL mask" optionally followed by one or more exclusion modifiers.
The URL mask is simply a standard web address, but may contain the common wildcards
"*" and "?" to make it match more than one web address.
The "*" will match any number of any characters and
the "?" will match any single character.
Non-wildcard characters are matched without regard to case (case-insensitive).
URL masks which do not begin with "http://"
are treated as if they begin with "*".
The URL mask may be followed by exclusion modifiers. There are two:
index=no/yes
follow=no/yes
The "index" modifier specifies whether pages matching the mask will be included in the index.
The "follow" modifier specifies whether pages matching the mask will have their links followed in order
to locate other pages to index.
The default values are:
index=no follow=no
Important: only the last matching exclusion is used when determining which exclusion to apply.
When determining which exclusion to apply, entire list of exclusions is considered
and only the last matching exclusion is used. This allows convenient expression
of "exclude everything but..." logic. For example, to prevent everything in your
"https://example.com/cgi-bin/" directory from
being index except pages generated by the CGI
"content.cgi"
you can use the following:
By default, FreeFind indexes HTML, text, PDF and common office document types.
If, instead, you just want HTML and text indexed, use this option to disable extended indexing.
The setting takes effect next time your site is indexed.
Indexing speed
This controls how fast your site will be indexed and the load placed on your server.
The indexing speed controls how long the indexer will pause between page reads as it scans your site.
The available indexing speeds are:
slow: 3 seconds
standard: about a second
fast: 1/3 second (subscribers only)
In addition, subscribers with professional accounts can select the "simultaneous access" option to have their site indexed in parallel.
This can make re-indexing go 3 times faster.
Relevance options
When the search engine returns the results of a user's query they are typically ordered by "relevance score"
with the search engine placing first the document it believes to be most relevant.
The search engine automatically determines relevance score and, by default, it is configured to work well with a wide variety of websites.
You can also refine the relevance scoring for your website by using the relevance controls the the FreeFind control center.
Use this option to set up your account to index password-protected areas of your site.
Both HTTP Basic Authentication and custom form-based authentication is supported.
When you click on each option the settings page that appears will have extensive documentation,
plus you can refer to
How to Index Password Protected Pages.
Define subsections
Use this to sub-divide your index into separately searchable sections.
If you are looking for more information than this summary provides, see
How to Use Sections.
This dialog contains a simple list of file "section specifications",
one per line (browser wrapping may be ignored).
Each section specification consists of a "URL mask" and a list of single-word section names
each with an optional modifier. Here are a few quick examples:
The URL mask is simply a standard web address, but may contain the common wildcards
"*" and "?" to make it match more than one web address.
The "*" will match any number of any characters and
the "?" will match any single character.
Non-wildcard characters are matched without regard to case (case-insensitive).
URL masks which do not begin with "https://" or "http://"
are treated as if they begin with "*".
The URL mask is followed by one or more single-word section names.
Your visitors will never see these names, they are just used by the search engine to identify each section.
Each section name may by followed by an equals sign
("=") and then one of the modifiers:
include
exclude
to control whether web addresses which match the URL mask are included or excluded from that section.
The default is "include".
Note: The section name "web" is reserved.
You cannot use it as your own section name.
Important: only the last matching line is used when determining which section specification to apply.
When determining which section specification to apply, only the last matching section specification is used.
This allows convenient expression of "include everything but..." logic. For example, to include everything in your
"http://example.com/store/" directory in a
section except pages in the
"/store/test/" subdirectory
you can use the following:
After you have specified all of your sections your site will be reindexed
before the new sections are active.
Now that you have an index with various sections you need to use an
appropriate search panel to allow your visitors to use those sections.
To do this just go to the
page and choose the panel with sections.
Add it to your web site in the usual manner. (To review instructions
for this see
Adding your Panel to your Site.)
You will probably want to change the labels of the sections as they appear
in the drop down list. This is fine, just be sure to change the
option text only, not the option value itself.
To see an example of this, and more information on customizing
any search panel to support sections, see
Customizing Your Search Panel with Sections.
Result link target
Use this option to set the target of the links in the search results page that lead back to your site.
A simple example is:
example.com/* framename
This would cause the links of all example.com
pages listed in the search results to have a link target of framename.
This would cause search result links starting with example.com/archive/...
to have a target of aframe, and links starting
with example.com/products/...
to have a target of pframe.
Note: Most sites do not need to use this function, even those with frames.
Note: This does not target your search panel so that the search results page is shown in a particular frame or window.
For that operation, go to the
page and use the set the frame target link.