January 5, 2012 - 11:22, by Steven Van de Craen
Categories: SharePoint 2010, Search
Posting this for personal reference:
SharePoint 2010 - Configuring Adobe PDF iFilter 9 for 64-bit platforms
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\14.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf]
@=hex(7):7b,00,45,00,38,00,39,00,37,00,38,00,44,00,41,00,36,00,2d,00,30,00,34,\
00,37,00,46,00,2d,00,34,00,45,00,33,00,44,00,2d,00,39,00,43,00,37,00,38,00,\
2d,00,43,00,44,00,42,00,45,00,34,00,36,00,30,00,34,00,31,00,36,00,30,00,33,\
00,7d,00,00,00,00,00
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\14.0\Search\Setup\Filters\.pdf]
"Extension"="pdf"
"FileTypeBucket"=dword:00000001
"MimeTypes"="application/pdf"
sps2010pdf.reg
March 18, 2010 - 15:28, by Steven Van de Craen
Categories: Search, MOSS 2007, Search Server 2008, SharePoint 2007
The Scope Item Count gives an approximate number of items matching the scope. However at one of my customers it showed only six items for their entire file share !?
There were no Crawl Rules and the Crawl Logs showed tens of thousands of successfully crawled items so what could be wrong ? I played with the scope rules (recreated them, inverse logic, etc) but no luck. I opened up Reflector on the Scope Count property to find that it is calculated through a Search Query. Then it hit me that the account I logged in to to perform Search Administration was a local account that didn’t have access to the file share, thus the Query for returning Scope Count would security trim those results for me.
I’d expect any SharePoint Administrator to get a correct count of items in the Scope so this seems like a minor design flaw to me.
January 20, 2010 - 16:00, by Steven Van de Craen
Categories: MOSS 2007, Search, Search Server 2008, SharePoint 2007, SharePoint Updates
Today I visited a customer to solve an issue that I had run into a while ago in my post regarding ZIP file indexing with IFilters. The customer was indexing Office 2003 documents on a file share (.doc, .xls, .rtf, …) and had the issue of the Filename property having the strange value of fld and some numbers. This only occurred on their x64 live environment and not on a x86 test environment.
I looked at the IFilter overview using Citeknet IFilter Explorer (great tool !) and also the offfilt.dll (IFilter for the aforementioned file types) to check on version differences between the two systems but there were none.
Both environments were running SP2 and June 2009 Cumulative Update but since there aren’t that many obvious options I went for installing the November 09 Cumulative Update and that did the trick. Guess that the issues that was fixed for quite some time on x86 is only recently handled for x64 environments. Either way, everyone happy.
Hope it helps.
September 23, 2009 - 17:43, by Steven Van de Craen
Categories: MOSS 2007, Search, Search Server 2008, SharePoint 2007
Wildcard search is a much discussed topic in SharePoint Land and generally I reuse my custom XML Web Part to solve the issue; build the query, issue the query, read the result XML and transform using XSL.
Downside to this solution is the lack of paging and other functionality (Search Alerts, RSS Feed, Statistics, Paging, …) you get with the out of the box Search Web Parts in Microsoft Office SharePoint Server 2007 or Microsoft Search Server 2008.
So I decided to deep dive into the SearchHiddenResultObject using Reflection and whipped up my own Wildcard Core Results Web Part. Downside with any wildcard search implementation is that you lose some functionalities.
When I was ready to blog my findings I discovered Corey Roth had blogged this already along with his WildcardSearch on Codeplex project. A shame but having discovered his truly excellent blog eased the pain ;)
Anyway here’s the download for mine:
STSADM -o addsolution -filename VNTG.WildcardSearch.wsp
STSADM -o deploysolution -name VNTG.WildcardSearch.wsp -allowgacdeployment -immediate -allcontenturls
-
Visual Studio 2008 + WSPBuilderExtensions project
January 8, 2009 - 12:06, by Steven Van de Craen
Categories: Search, Search Server 2008, SharePoint 2007, MOSS 2007
A file share with Word and Excel documents (.doc, .docx, .xls, .xlsx) having custom document properties is indexed via MOSS 2007 or MSS 2008.
When the crawl has finished the custom properties are listed in 'Crawled properties' but the details view mentions "There are zero documents in the index using this property."
However if you create a Managed Property from this Crawled Property it does contain the correct values and can be used as desired (query, filter, sort, etc).
Hooray !
November 10, 2008 - 22:32, by Steven Van de Craen
Categories: Search, Search Server 2008, SharePoint 2007
Introduction
Here's a post about indexing ZIP archives in the same style as the one I did on PDF indexing. The search engine makes use of IFilters to be able to read the specific structure of a certain file type and retrieve information from it that it puts in an index. When you perform a search query you will see the information from the index. If it weren't for IFilters you could only search on file name and metadata.
[Indexing Server]: the server(s) in the SharePoint Farm that has/have the "Indexing" Role assigned. In a small farm this can be a single server for all roles.
[Web Front End Server]: the server(s) in the SharePoint Farm that has/have the "Web Front End" Role assigned. In a small farm this can be a single server for all roles.
Windows SharePoint Services 3.0
[Indexing Server]
- Install the ZIP IFilter (see below for a list of available IFilters)
- Add the .zip file type to the index list:
- Open the Registry Editor (Start > Run > regedit)
- Go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\\Gather\Search\Extensions\ExtensionList
- Add a new String Value
- Value name:
- Value data: zip
- Perform an iisreset
- Perform a Full Update on the Search content indexes
- Open a Command Prompt on the Indexing Server
- net stop spsearch
- net start spsearch
- cd "C:\Program Files\Common Files\Microsoft Shared\Web server extensions\12\BIN"
- stsadm.exe –o spsearch -action fullcrawlstop
- stsadm.exe –o spsearch -action fullcrawlstart
[Web Front End Server]
The zip icon registration is available out of the box.
Microsoft Office SharePoint Server 2007
[Indexing Server]
- Install the ZIP IFilter (see below for a list of available IFilters)
- Add the .zip file type to the index list:
- Go to Central Administration, then to the Shared Services Administration Web of the current SSP, go to Search Settings and next to File Type
- Add a new file type zip
- Perform an iisreset
- Perform a Full Update on the Search content indexes
- Open a Command Prompt on the Indexing Server
- net stop osearch
- net start osearch
- Go to Central Administration, then to the Shared Services Administration Web of the current SSP, go to Search Settings and start a full crawl of all locations containing ZIP files
[Web Front End Server]
The zip icon registration is available out of the box.
Available IFilters
IFilterShop ZIP IFilter
- requires a license
- 32 bit and 64 bit (applies to the [Indexing Server])
- Note: I haven't gotten this one to work. After installation and configuration I'm receiving the following for all crawled ZIP items: Crawled (The filtering process could not load the item. This is possibly caused by an unrecognized item format or item corruption. )
Citeknet ZIP IFilter
- requires a license
- 32 bit and 64 bit (applies to the [Indexing Server])
- Currently version 2.1 Beta
- Works very nice in the test setup. Haven't seen it in production or stress tests.
What about PDF documents inside ZIP archives ?
The ZIP IFilter will index all files in the archive using a corresponding IFilter, but if yours is an appartment threaded IFilter (such as Adobe's PDF IFilter) you need to make the following adjustment:
[Indexing Server]
- Open the Registry Editor (Start > Run > regedit)
- Go to HKEY_CLASSES_ROOT\CLSID\{4C904448-74A9-11d0-AF6E-00C04FD8DC02}\InprocServer32
- Change the ThreadingModel key value
- Old value: Apartment
- New value: Both
- Go to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex
- Change the DLLsToRegister key value
- Remove the entry corresponding to pdffilt.dll from the list to prevent the Adobe PDF IFilter from re-registering
- Restart the Search Service and perform a Full Update
An excellent tool to get an overview of installed IFilters is Citeknet IFilter Explorer which will also show you the threading model.
Conclusion
Using the above procedure for either WSS 3.0 or MOSS 2007 it is possible to have your ZIP archives indexed by the SharePoint Search. The IFilter will recursively index all containing ZIP archives. Any other files (.txt, .doc, .ppt, .pdf) are indexed and if an IFilter for that file type exists it will be used to extract information from it. This way it can index text inside PDF documents inside the ZIP archive.
Note that the search results will show confusing file names as shown below:
January 14, 2008 - 10:05, by Steven Van de Craen
Categories: .NET, Search, SharePoint 2007
When you query the SharePoint Search Service the number of rows returned defaults to 100 but can be increased as required.
Note that when you specify a value above the maximum RowLimit the query will only return the default value of 100 items !
ServerContext ctx = ServerContext.Default;
FullTextSqlQuery query = new FullTextSqlQuery(ctx);
query.QueryText = BuildQuery();
query.ResultTypes = ResultType.RelevantResults;
query.RowLimit = 1000;
ResultTable resultTable = query.Execute()[ResultType.RelevantResults];
After some trial and error I found the maximum value for the RowLimit to be 917728059.
query.RowLimit = 917728059;
So don't go around setting the RowLimit to int.MaxValue like I did because this only returns 100 items...
UPDATE:
This only applies to the the MOSS 2007 RTM version and seems to be fixed since Service Pack 1.
You can now set the RowLimit to anything from 1 to int.MaxValue and it will return the correct number of items.
November 30, 2007 - 11:15, by Steven Van de Craen
Categories: SharePoint 2007, Search Server 2008, Search
I just installed the Release Candidate of Microsoft Search Server 2008 Express edition. Although the documentation mentioned a Basic and Advanced installation I didn't get that option.
Another thing I noticed:
SharePoint is everywhere :)
November 22, 2007 - 17:15, by Steven Van de Craen
Categories: Search, SharePoint 2007
One of our MOSS 2007 servers has a single Web Application (no extended Web Apps) and is configured to use Basic Authentication. I have confirmed that my dedicated crawl account has sufficient permissions in the Policy for Web Application section of Central Administration > Application management.
I try to start a full crawl of the local SharePoint content but it keeps throwing the following error:
Access is denied. Check that the Default Content Access Account has access to this content, or add a crawl rule to crawl this content. (The item was deleted because it was either not found or the crawler was denied access to it.)
So I extended the main Web Application and configured it to use Integrated Windows Authentication. I edited the Local SharePoint Content Source and set it to use the URL of the extended Web Application and guess what, it started indexing my content again.
Bottom rule
Always make sure there is a Web Application (extension) configured with Integrated Windows Authentication. You can keep it internal and hidden if you like. Preferably it is the default zone. Public URL's that use a different authentication mechanism should always be configured on an extended Web Application.
November 21, 2007 - 15:16, by Steven Van de Craen
Categories: SharePoint 2007, Search Server 2008, Search
Introduction
By default the SharePoint 2007 Search indexed only the meta data of a PDF document. By installing and configuring a PDF IFilter the Search will also index the contents of the PDF document. This allows users to find documents based on text inside the document. This process is called full text indexing.
[Indexing Server]: the server(s) in the SharePoint Farm that has/have the "Indexing" Role assigned. In a small farm this can be a single server for all roles.
[Web Front End Server]: the server(s) in the SharePoint Farm that has/have the "Web Front End" Role assigned. In a small farm this can be a single server for all roles.
Windows SharePoint Services 3.0
[Indexing Server]
- Install the PDF IFilter (see below for a list of available IFilters)
- Add the .pdf file type to the index list:
- Open the Registry Editor (Start > Run > regedit)
- Go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\\Gather\Search\Extensions\ExtensionList
- Add a new String Value
- Value name:
- Value data: pdf
-
[This step only applies to 64 bit servers]
- Go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf
-
Change the (Default) key value
- Old value: {4C904448-74A9-11D0-AF6E-00C04FD8DC02}
-
(Foxit x64 PDF IFilter) New value: {987F8D1A-26E6-4554-B007-6B20E2680632}
-
(Adobe x64 PDF IFilter) New value: {E8978DA6-047F-4E3D-9C78-CDBE46041603}
- Perform an iisreset
- Perform a Full Update on the Search content indexes
- Open a Command Prompt on the Indexing Server
- net stop spsearch
- net start spsearch
- cd "C:\Program Files\Common Files\Microsoft Shared\Web server extensions\12\BIN"
- stsadm.exe –o spsearch -action fullcrawlstop
- stsadm.exe –o spsearch -action fullcrawlstart
[Web Front End Server]
- Copy the ICPDF.GIF () file to "C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Template\Images"
- Edit the file C:\Program Files\Common Files\Microsoft Shared\Web server extensions\12\Template\Xml\DOCICON.XML
- Add an entry for the .pdf extension
Microsoft Office SharePoint Server 2007
[Indexing Server]
- Install the PDF IFilter (see below for a list of available IFilters)
- Add the .pdf file type to the index list:
- Go to Central Administration, then to the Shared Services Administration Web of the current SSP, go to Search Settings and next to File Type
- Add a new file type pdf
-
[This step only applies to 64 bit servers]
- Go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf
-
Change the (Default) key value
- Old value: {4C904448-74A9-11D0-AF6E-00C04FD8DC02}
-
(Foxit x64 PDF IFilter) New value: {987F8D1A-26E6-4554-B007-6B20E2680632}
-
(Adobe x64 PDF IFilter) New value: {E8978DA6-047F-4E3D-9C78-CDBE46041603}
- Perform an iisreset
- Perform a Full Update on the Search content indexes
- Open a Command Prompt on the Indexing Server
- net stop osearch
- net start osearch
- Go to Central Administration, then to the Shared Services Administration Web of the current SSP, go to Search Settings and start a full crawl of all locations containing PDF files
[Web Front End Server]
- Copy the ICPDF.GIF () file to "C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Template\Images"
- Edit the file C:\Program Files\Common Files\Microsoft Shared\Web server extensions\12\Template\Xml\DOCICON.XML
- Add an entry for the .pdf extension
Available IFilters
Adobe PDF IFilter 6.0 - x64
- free (always good !)
- 32 bit and 64 bit (64 bit released recently, applies to the [Indexing Server])
Foxit PDF IFilter v1.0
- free for desktops, servers require a license
- 32 bit and 64 bit (IA64 currently being tested, applies to the [Indexing Server])
Conclusion
Using the above procedure for either WSS 3.0 or MOSS 2007 it is possible to have your PDF document's contents indexed by the SharePoint Search.
References
Other