I have recently had a need to index some custom content sources within MOSS and have been using the MOSSPH protocol handler which can be found on CodePlex (http://www.codeplex.com/MOSSPH) to help me achieve this.
The downloadable solution has everything you need to get started with lots of comments to assist you in implementing custom functionality. There is however one or two small bugs relating to the retrieval of security descriptors which is necessary when implementing security trimming at query time. Fortunately I was able to locate these bugs relatively quickly thanks to a very handy blog post from Michaël Hompus which saved me some valuable time.
The MOSSPH project is extremely useful to get up and running as quickly as possible when there is a requirement to crawl custom content sources from within MOSS.
I have quoted directly from Michaël’s site below:
Running on X64
The first problem is the code not working on x64 environments. Since we are talking to native code there are a lot of structs in the code. These structs are containing metadata to indicate the layout in memory. This is done using the StructLayoutAttribute class which contains a Value property with a LayoutKind enumeration and a Pack field.
The problem is the Pack value was set to 1. This is normal for 32bit systems, but not for 64bit where a pack of 8 is expected.
Lucky enough the following is written in the remarks section:
A value of 0 indicates that the packing alignment is set to the default for the current platform.
This solves our problem! Now the same code runs fine on both x86 and x64 systems.
Using Security Descriptors
One of the most important features of SharePoint search is security trimming. To make this possible an ACL structure is stored with the crawled item in the index database. The problem is when the ACL is larger then 1kB the crawler goes into an endless loop.
The way it should go is the search service calling the GetSecurityDescriptor method with a pointer and a size. This size is 1024 by default. When the ACL is larger an ERROR_INSUFFICIENT_BUFFER error message should be returned and the required size should be set. The search service then should allocate enough memory and call the GetSecurityDescriptor method again, which is now able to assign the complete ACL to the pointer.
The problem with the current version on CodePlex is the value of the error message is incorrect. Instead of 0×00000122 it should be 0x8007007A (which is also 112). After changing this the ACL will be stored (as long as it’s staying <64kB).