Steven Van de Craen's Blog

Social Links

Other Blogs

Stefan Goßner
Tobias Zimmergren's thoughts on development
Tom Van Rousselt's Blog
Andrew Connell [MVP SharePoint]
Chris O'Brien
harbar.net
IT Pro Ramblings
Jan Tielens' Bloggings
JOPX on SharePoint 2007 and SharePoint 2010
Waldek Mastykarz
Karine Bosch's Blog
Mark Harrison 2010
Michaël's coding thoughts
Microsoft SharePoint Team Blog
PDT IT Services Blog Posts
Tom's Random Ranting
Sebastian Bouckaert's Blog
SharePoint Automation
SharePoint Joel's SharePoint Land

SharePoint Server 2010 and PDF Indexing

January 5, 2012 - 11:22, by Steven Van de Craen

Categories: SharePoint 2010, Search

Posting this for personal reference:

SharePoint 2010 - Configuring Adobe PDF iFilter 9 for 64-bit platforms

Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\14.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf]
@=hex(7):7b,00,45,00,38,00,39,00,37,00,38,00,44,00,41,00,36,00,2d,00,30,00,34,\
00,37,00,46,00,2d,00,34,00,45,00,33,00,44,00,2d,00,39,00,43,00,37,00,38,00,\
2d,00,43,00,44,00,42,00,45,00,34,00,36,00,30,00,34,00,31,00,36,00,30,00,33,\
00,7d,00,00,00,00,00
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\14.0\Search\Setup\Filters\.pdf]
"Extension"="pdf"
"FileTypeBucket"=dword:00000001
"MimeTypes"="application/pdf"

sps2010pdf.reg

0 Comments

SharePoint Search Scopes: Approximate Item Count is incorrect

March 18, 2010 - 15:28, by Steven Van de Craen

Categories: Search, MOSS 2007, Search Server 2008, SharePoint 2007

The Scope Item Count gives an approximate number of items matching the scope. However at one of my customers it showed only six items for their entire file share !?

There were no Crawl Rules and the Crawl Logs showed tens of thousands of successfully crawled items so what could be wrong ? I played with the scope rules (recreated them, inverse logic, etc) but no luck. I opened up Reflector on the Scope Count property to find that it is calculated through a Search Query. Then it hit me that the account I logged in to to perform Search Administration was a local account that didn’t have access to the file share, thus the Query for returning Scope Count would security trim those results for me.

I’d expect any SharePoint Administrator to get a correct count of items in the Scope so this seems like a minor design flaw to me.

1 Comments

SharePoint Search indexes some files with fldXXXX_XXXX file names

January 20, 2010 - 16:00, by Steven Van de Craen

Categories: MOSS 2007, Search, Search Server 2008, SharePoint 2007, SharePoint Updates

Today I visited a customer to solve an issue that I had run into a while ago in my post regarding ZIP file indexing with IFilters. The customer was indexing Office 2003 documents on a file share (.doc, .xls, .rtf, …) and had the issue of the Filename property having the strange value of fld and some numbers. This only occurred on their x64 live environment and not on a x86 test environment.

I looked at the IFilter overview using Citeknet IFilter Explorer (great tool !) and also the offfilt.dll (IFilter for the aforementioned file types) to check on version differences between the two systems but there were none.

Both environments were running SP2 and June 2009 Cumulative Update but since there aren’t that many obvious options I went for installing the November 09 Cumulative Update and that did the trick. Guess that the issues that was fixed for quite some time on x86 is only recently handled for x64 environments. Either way, everyone happy.

Hope it helps.

0 Comments

Wildcard Search for MOSS 2007

September 23, 2009 - 17:43, by Steven Van de Craen

Categories: MOSS 2007, Search, Search Server 2008, SharePoint 2007

Wildcard search is a much discussed topic in SharePoint Land and generally I reuse my custom XML Web Part to solve the issue; build the query, issue the query, read the result XML and transform using XSL.

Downside to this solution is the lack of paging and other functionality (Search Alerts, RSS Feed, Statistics, Paging, …) you get with the out of the box Search Web Parts in Microsoft Office SharePoint Server 2007 or Microsoft Search Server 2008.

So I decided to deep dive into the SearchHiddenResultObject using Reflection and whipped up my own Wildcard Core Results Web Part. Downside with any wildcard search implementation is that you lose some functionalities.

When I was ready to blog my findings I discovered Corey Roth had blogged this already along with his WildcardSearch on Codeplex project. A shame but having discovered his truly excellent blog eased the pain ;)

Anyway here’s the download for mine:

SharePoint Solution Package (.wsp) for deployment on your servers

STSADM -o addsolution -filename VNTG.WildcardSearch.wsp
STSADM -o deploysolution -name VNTG.WildcardSearch.wsp -allowgacdeployment -immediate -allcontenturls

Visual Studio 2008 + WSPBuilderExtensions project

4 Comments

Crawling custom document properties on a file share

January 8, 2009 - 12:06, by Steven Van de Craen

Categories: Search, Search Server 2008, SharePoint 2007, MOSS 2007

A file share with Word and Excel documents (.doc, .docx, .xls, .xlsx) having custom document properties is indexed via MOSS 2007 or MSS 2008.

When the crawl has finished the custom properties are listed in 'Crawled properties' but the details view mentions "There are zero documents in the index using this property."

However if you create a Managed Property from this Crawled Property it does contain the correct values and can be used as desired (query, filter, sort, etc).

Hooray !

6 Comments

SharePoint 2007 and ZIP indexing

November 10, 2008 - 22:32, by Steven Van de Craen

Categories: Search, Search Server 2008, SharePoint 2007

Introduction

Here's a post about indexing ZIP archives in the same style as the one I did on PDF indexing. The search engine makes use of IFilters to be able to read the specific structure of a certain file type and retrieve information from it that it puts in an index. When you perform a search query you will see the information from the index. If it weren't for IFilters you could only search on file name and metadata.

[Indexing Server]: the server(s) in the SharePoint Farm that has/have the "Indexing" Role assigned. In a small farm this can be a single server for all roles.

[Web Front End Server]: the server(s) in the SharePoint Farm that has/have the "Web Front End" Role assigned. In a small farm this can be a single server for all roles.

Windows SharePoint Services 3.0

[Indexing Server]

Install the ZIP IFilter (see below for a list of available IFilters)
Add the .zip file type to the index list:
1. Open the Registry Editor (Start > Run > regedit)
2. Go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\\Gather\Search\Extensions\ExtensionList
3. Add a new String Value
  1. Value name:
  2. Value data: zip
Perform an iisreset
Perform a Full Update on the Search content indexes
1. Open a Command Prompt on the Indexing Server
2. net stop spsearch
3. net start spsearch
4. cd "C:\Program Files\Common Files\Microsoft Shared\Web server extensions\12\BIN"
5. stsadm.exe –o spsearch -action fullcrawlstop
6. stsadm.exe –o spsearch -action fullcrawlstart

[Web Front End Server]

The zip icon registration is available out of the box.

Microsoft Office SharePoint Server 2007

[Indexing Server]

Install the ZIP IFilter (see below for a list of available IFilters)
Add the .zip file type to the index list:
1. Go to Central Administration, then to the Shared Services Administration Web of the current SSP, go to Search Settings and next to File Type
2. Add a new file type zip
Perform an iisreset
Perform a Full Update on the Search content indexes
1. Open a Command Prompt on the Indexing Server
2. net stop osearch
3. net start osearch
4. Go to Central Administration, then to the Shared Services Administration Web of the current SSP, go to Search Settings and start a full crawl of all locations containing ZIP files

[Web Front End Server]

The zip icon registration is available out of the box.

Available IFilters

IFilterShop ZIP IFilter

requires a license
32 bit and 64 bit (applies to the [Indexing Server])
Note: I haven't gotten this one to work. After installation and configuration I'm receiving the following for all crawled ZIP items: Crawled (The filtering process could not load the item. This is possibly caused by an unrecognized item format or item corruption. )

Citeknet ZIP IFilter

requires a license
32 bit and 64 bit (applies to the [Indexing Server])
Currently version 2.1 Beta
Works very nice in the test setup. Haven't seen it in production or stress tests.

What about PDF documents inside ZIP archives ?

The ZIP IFilter will index all files in the archive using a corresponding IFilter, but if yours is an appartment threaded IFilter (such as Adobe's PDF IFilter) you need to make the following adjustment:

[Indexing Server]

Open the Registry Editor (Start > Run > regedit)
Go to HKEY_CLASSES_ROOT\CLSID\{4C904448-74A9-11d0-AF6E-00C04FD8DC02}\InprocServer32
Change the ThreadingModel key value
1. Old value: Apartment
2. New value: Both
Go to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex
Change the DLLsToRegister key value
1. Remove the entry corresponding to pdffilt.dll from the list to prevent the Adobe PDF IFilter from re-registering
Restart the Search Service and perform a Full Update

An excellent tool to get an overview of installed IFilters is Citeknet IFilter Explorer which will also show you the threading model.

Conclusion

Using the above procedure for either WSS 3.0 or MOSS 2007 it is possible to have your ZIP archives indexed by the SharePoint Search. The IFilter will recursively index all containing ZIP archives. Any other files (.txt, .doc, .ppt, .pdf) are indexed and if an IFilter for that file type exists it will be used to extract information from it. This way it can index text inside PDF documents inside the ZIP archive.

Note that the search results will show confusing file names as shown below:

20 Comments

Search: FullTextQuery RowLimit

January 14, 2008 - 10:05, by Steven Van de Craen

Categories: .NET, Search, SharePoint 2007

When you query the SharePoint Search Service the number of rows returned defaults to 100 but can be increased as required.

Note that when you specify a value above the maximum RowLimit the query will only return the default value of 100 items !

ServerContext ctx = ServerContext.Default;
FullTextSqlQuery query = new FullTextSqlQuery(ctx);
query.QueryText = BuildQuery();
query.ResultTypes = ResultType.RelevantResults;
query.RowLimit = 1000;
ResultTable resultTable = query.Execute()[ResultType.RelevantResults];

After some trial and error I found the maximum value for the RowLimit to be 917728059.

query.RowLimit = 917728059;

So don't go around setting the RowLimit to int.MaxValue like I did because this only returns 100 items...

UPDATE:

This only applies to the the MOSS 2007 RTM version and seems to be fixed since Service Pack 1.

You can now set the RowLimit to anything from 1 to int.MaxValue and it will return the correct number of items.

8 Comments

Search Server 2008: installation gimmick

November 30, 2007 - 11:15, by Steven Van de Craen

Categories: SharePoint 2007, Search Server 2008, Search

I just installed the Release Candidate of Microsoft Search Server 2008 Express edition. Although the documentation mentioned a Basic and Advanced installation I didn't get that option.

Another thing I noticed:

SharePoint is everywhere :)

0 Comments

SharePoint Search: Basic Authentication issues

November 22, 2007 - 17:15, by Steven Van de Craen

Categories: Search, SharePoint 2007

One of our MOSS 2007 servers has a single Web Application (no extended Web Apps) and is configured to use Basic Authentication. I have confirmed that my dedicated crawl account has sufficient permissions in the Policy for Web Application section of Central Administration > Application management.

I try to start a full crawl of the local SharePoint content but it keeps throwing the following error:

Access is denied. Check that the Default Content Access Account has access to this content, or add a crawl rule to crawl this content. (The item was deleted because it was either not found or the crawler was denied access to it.)

So I extended the main Web Application and configured it to use Integrated Windows Authentication. I edited the Local SharePoint Content Source and set it to use the URL of the extended Web Application and guess what, it started indexing my content again.

Bottom rule

Always make sure there is a Web Application (extension) configured with Integrated Windows Authentication. You can keep it internal and hidden if you like. Preferably it is the default zone. Public URL's that use a different authentication mechanism should always be configured on an extended Web Application.

1 Comments

SharePoint 2007 and PDF indexing

November 21, 2007 - 15:16, by Steven Van de Craen

Categories: SharePoint 2007, Search Server 2008, Search

Introduction

By default the SharePoint 2007 Search indexed only the meta data of a PDF document. By installing and configuring a PDF IFilter the Search will also index the contents of the PDF document. This allows users to find documents based on text inside the document. This process is called full text indexing.

[Indexing Server]: the server(s) in the SharePoint Farm that has/have the "Indexing" Role assigned. In a small farm this can be a single server for all roles.

[Web Front End Server]: the server(s) in the SharePoint Farm that has/have the "Web Front End" Role assigned. In a small farm this can be a single server for all roles.

Windows SharePoint Services 3.0

[Indexing Server]

Install the PDF IFilter (see below for a list of available IFilters)
Add the .pdf file type to the index list:
1. Open the Registry Editor (Start > Run > regedit)
2. Go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\\Gather\Search\Extensions\ExtensionList
3. Add a new String Value
  1. Value name:
  2. Value data: pdf
[This step only applies to 64 bit servers]
1. Go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf
2. Change the (Default) key value
  1. Old value: {4C904448-74A9-11D0-AF6E-00C04FD8DC02}
  2. (Foxit x64 PDF IFilter) New value: {987F8D1A-26E6-4554-B007-6B20E2680632}
  3. (Adobe x64 PDF IFilter) New value: {E8978DA6-047F-4E3D-9C78-CDBE46041603}
Perform an iisreset
Perform a Full Update on the Search content indexes
1. Open a Command Prompt on the Indexing Server
2. net stop spsearch
3. net start spsearch
4. cd "C:\Program Files\Common Files\Microsoft Shared\Web server extensions\12\BIN"
5. stsadm.exe –o spsearch -action fullcrawlstop
6. stsadm.exe –o spsearch -action fullcrawlstart

[Web Front End Server]

Copy the ICPDF.GIF () file to "C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Template\Images"
Edit the file C:\Program Files\Common Files\Microsoft Shared\Web server extensions\12\Template\Xml\DOCICON.XML
1. Add an entry for the .pdf extension

Microsoft Office SharePoint Server 2007

[Indexing Server]

Install the PDF IFilter (see below for a list of available IFilters)
Add the .pdf file type to the index list:
1. Go to Central Administration, then to the Shared Services Administration Web of the current SSP, go to Search Settings and next to File Type
2. Add a new file type pdf
[This step only applies to 64 bit servers]
1. Go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf
2. Change the (Default) key value
  1. Old value: {4C904448-74A9-11D0-AF6E-00C04FD8DC02}
  2. (Foxit x64 PDF IFilter) New value: {987F8D1A-26E6-4554-B007-6B20E2680632}
  3. (Adobe x64 PDF IFilter) New value: {E8978DA6-047F-4E3D-9C78-CDBE46041603}
Perform an iisreset
Perform a Full Update on the Search content indexes
1. Open a Command Prompt on the Indexing Server
2. net stop osearch
3. net start osearch
4. Go to Central Administration, then to the Shared Services Administration Web of the current SSP, go to Search Settings and start a full crawl of all locations containing PDF files

[Web Front End Server]

Copy the ICPDF.GIF () file to "C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Template\Images"
Edit the file C:\Program Files\Common Files\Microsoft Shared\Web server extensions\12\Template\Xml\DOCICON.XML
1. Add an entry for the .pdf extension

Available IFilters

Adobe PDF IFilter 6.0 - x64

free (always good !)
32 bit and 64 bit (64 bit released recently, applies to the [Indexing Server])

Foxit PDF IFilter v1.0

free for desktops, servers require a license
32 bit and 64 bit (IA64 currently being tested, applies to the [Indexing Server])

Conclusion

Using the above procedure for either WSS 3.0 or MOSS 2007 it is possible to have your PDF document's contents indexed by the SharePoint Search.

References

No Adobe PDF documents are returned in the search results when you search a Windows SharePoint Services 3.0 Web site

Other

What about PDF documents inside ZIP archives ?

Steven Van de Craen's Blog

Bloggings about SharePoint, .NET and more.

Recent tweets

Social Links

Archives

Categories

Recent Posts

Other Blogs

January 5, 2012 - 11:22, by Steven Van de Craen

March 18, 2010 - 15:28, by Steven Van de Craen

January 20, 2010 - 16:00, by Steven Van de Craen

September 23, 2009 - 17:43, by Steven Van de Craen

January 8, 2009 - 12:06, by Steven Van de Craen

November 10, 2008 - 22:32, by Steven Van de Craen

Introduction

Windows SharePoint Services 3.0

Microsoft Office SharePoint Server 2007

Available IFilters

What about PDF documents inside ZIP archives ?

Conclusion

January 14, 2008 - 10:05, by Steven Van de Craen

November 30, 2007 - 11:15, by Steven Van de Craen

November 22, 2007 - 17:15, by Steven Van de Craen

Bottom rule

November 21, 2007 - 15:16, by Steven Van de Craen

Introduction

Windows SharePoint Services 3.0

Microsoft Office SharePoint Server 2007

Available IFilters

Conclusion

References

Other