Customizing Content Processing Pipeline in SharePoint 2013 - Netwoven
Blog

Customizing Content Processing Pipeline in SharePoint 2013

By Niraj Tenany  |  Published on April 29, 2013

Netwoven Blog

Often there is a need to customize the content processing pipeline to meet certain business requirements. There are various approaches to this that are discussed in this blog.

The figure below is showing the logical overview of how crawling and content processing works for SharePoint 2013 Enterprise search.

Customizing Content Processing Pipeline in SharePoint 2013

Now if requirement is such that the managed properties of crawled items need to be modified before being indexed then customized business logic needs to be implemented somewhere in the content processing pipeline. The only place where SharePoint 2013 allows us to call external SOAP services (wcf and web services) is during “Content Enrichment”.

In this article, 2 cases will be discussed:

Case 1:

“Calcutta” is a city in India, recently renamed as “Kolkata”. Now some people may search with “Kolkata” and others with “Calcutta”. Here the “Location” property can be modified to “Kolkata” whenever “Calcutta” entry is found against “Location”. For this the following steps are needed:

  1. Create a wcf application in Visual Studio 2012 and add a reference to “Microsoft.office.server.search.contentprocessingenrichment.dll” which you can find in “c:program filesMicrosoft office servers15.SearchApplicationExternal”.
  2. Delete default interface (e.g. IService1)
  3. Add following references to Service1.svc.cs file
    • Microsoft.office.server.search.contentprocessingenrichment
    • Microsoft.office.server.search.contentprocessingenrichment.PropertyTypes
  4. Inherit “Icontentporcessingenrichmentservice” in Service1.svc.cs file
  5. Implement the method “ProcessItem”. This is the method where you get required properties for each items.
  6. Add following in <system.servicemodel> in web.config file<bindings><basicHttpBinding><!– The service will accept a maximum blob of 8 MB. –><binding maxReceivedMessageSize = “8388608”><readerQuotas maxDepth=”32″maxStringContentLength=”2147483647″maxArrayLength=”2147483647″maxBytesPerRead=”2147483647″maxNameTableCharCount=”2147483647″ /><security mode=”None” /></binding>

    </basicHttpBinding>

    </bindings>

  7. Host this wcf to IIS (Create a virtual directory). Map this to the physical path of wcf application. Right click on the Virtual Directory and click on “Convert to Application”
  8. Browse and get the url for hosted .svc file
  9. Execute following PowerShell script to map “Content Enrichment” to hosted custom wcf$ssa = Get-SPEnterpriseSearchServiceApplication$config = New-SPEnterpriseSearchContentEnrichmentConfiguration$config.Endpoint = http://Site_URL/<service name>.svc$config.InputProperties = “Location”$config.OutputProperties = “Location”$config.SendRawData = $True$config.MaxRawDataSize = 8192Set-SPEnterpriseSearchContentEnrichmentConfiguration –SearchApplication$ssa –ContentEnrichmentConfiguration $config
  10. Run a full crawl on the content source.

Case 2:

In case of more than one content source, one can be configured for “Advanced” and the other for “Intermediate” resources. Here “Author” property of one can be modified to “Advanced” and the other to “Intermediate” so that when users search using “Advanced” key word they can see files from first content source and vice versa.

Here the two different content sources are segregated first and then processed differently. That is why there is a need of a “WCF Router” to identify the content source and map to wcfs accordingly.

  • Create two wcfs as directed in Case 1
  • Create a WCF Application and open web.config and configure as described below :
  • Here “basicHttpBinding” is being used<basicHttpBinding><bindingname=”basicHttpBinding_IContentProcessingEnrichmentService”maxReceivedMessageSize = “8388608”><readerQuotasmaxDepth=”32″maxStringContentLength=”2147483647″maxArrayLength=”2147483647″maxBytesPerRead=”2147483647″maxNameTableCharCount=”2147483647″ /><security mode=”None” />

    </binding>

    </basicHttpBinding>

  • Then configure Services section (base address not required if service is hosted in IIS)<services><service behaviorConfiguration=”RoutingServiceBehavior” name=”System.ServiceModel.Routing.RoutingService”><endpoint name=”RoutingServiceEndpoint” address=”” binding=”basicHttpBinding” bindingConfiguration=”basicHttpBinding_IContentProcessingEnrichmentService” contract=”System.ServiceModel.Routing.IRequestReplyRouter”/></service>
  • A service behavior needs to be created where the name of the filter table is referenced. This will be defined in the next step. To enable full inspection of the SOAP envelopes in the XPath filters, the attribute “routeOnHeadersOnly” is set to false.<behavior name=”RoutingServiceBehavior”><routingfilterTableName=”ContentSourceFilters”routeOnHeadersOnly=”False”/></behavior>
  • Now configure client wcfs<client><endpoint name=”ContentProcessingEnrichmentService” address=”http://localhost:300/ContentProcessingEnrichmentService/Service1.svc” binding=”basicHttpBinding” bindingConfiguration=”basicHttpBinding_IContentProcessingEnrichmentService” contract=”*”/><endpoint name=”ContentProcessingEnrichmentServiceDB” address=”http://localhost:300/ContentProcessingEnrichmentServiceDB/Service1.svc” binding=”basicHttpBinding” bindingConfiguration=”basicHttpBinding_IContentProcessingEnrichmentService” contract=”*”/></client>
  • Now “Routing” will be configured i.e. the mapping section between wcf and contentsource. Here the filters and the filter table are defined to map the filters to normal endpoints (and optionally to backup endpoints). Here “Xpath” filtertype is used as e XPath expressions look for all Property nodes in the SOAP envelope.<routing><namespaceTable><add prefix=”cc” namespace=”http://schemas.microsoft.com/office/server/search/contentprocessing/2012/01/ContentProcessingEnrichment”/></namespaceTable><filters><filter name=”Sharepoint” filterType=”XPath” filterData=”//cc:Property[cc:Name[. = ‘ContentSource’] and cc:Value[. = ‘Local SharePoint sites’]]”/><filter name=”WCMContent” filterType=”XPath” filterData=”//cc:Property[cc:Name[. = ‘ContentSource’] and cc:Value[. = ‘WCM’]]”/></filters><filterTables><filterTable name=”ContentSourceFilters”>

    <add filterName=”Sharepoint” endpointName=”ContentProcessingEnrichmentService”/>

    <add filterName=”WCMContent” endpointName=”ContentProcessingEnrichmentServiceDB”/>

    </filterTable>

    </filterTables>

    </routing>

  • Build solution and host to IIS as case 1
  • Execute following PS commands to integrate routing.svc (router service) to content enrichment$ssa = Get-SPEnterpriseSearchServiceApplication$config = New-SPEnterpriseSearchContentEnrichmentConfiguration$config.Endpoint = “http://localhost:300/Router/Router.svc”$config.InputProperties = “Author”, “Filename”, “ContentSource”$config.OutputProperties = “Author”$config.SendRawData = $True$config.MaxRawDataSize = 8192Set-SPEnterpriseSearchContentEnrichmentConfiguration –SearchApplication $ssa –ContentEnrichmentConfiguration $config
  • Run a full crawl on both content sources.

Hope you find this article interesting and helpful. Let us know in the comments below if you have any questions.

By Niraj Tenany

Niraj is Chief Executive Officer and a Co-founder of Netwoven, responsible for the strategic vision and direction. Niraj has been working with Fortune 500 companies to implement large-scale enterprise systems for the past 25 years. Prior to founding Netwoven, Niraj led a profitable Enterprise Applications Consulting Practice at Microsoft. His team implemented large scale deployments of enterprise applications like Siebel, Ariba, and SAP with Fortune 500 customers. Niraj’s team also led the design and implementation of OLAP solutions based on the Microsoft platform. Prior to joining Microsoft, Niraj led a profitable Business Intelligence Consulting practice with Oracle Consulting Services. Niraj has also worked with startup organizations in senior management positions. Niraj was the Director of Consulting Services at Zaplet, a Kleiner Perkins funded company. Niraj holds a BS in Computer Science from Birla Institute of Technology, India, an MS in Computer Science from State University of New York (SUNY), and an MBA from Duke University’s Fuqua School of Business in North Carolina.

1 comment

  1. It depends on whether you need to apply both the processing to all content. If you need to apply only one type of processing for any specific content type then you need to use a WCF Routing Service. This routing service need to be registered to your content processing pipeline as CEWS end point. From the routing service you need to route the request to one or other custom processing service that you have.

    If you need to apply both processing sequentially on all content, then you need to make use of WCF Workflow Service. This workflow service can be configured as CEWS end point. From workflow service you can call your custom services in the order you want from custom activities.

    As far as hosting of services you do not have to host the two services on different servers. You can very well host them on same server.

Leave a comment

Your email address will not be published. Required fields are marked *

Unravel The Complex
Stay Connected

Subscribe and receive the latest insights

Netwoven Inc. - Microsoft Solutions Partner

Get involved by tagging Netwoven experiences using our official hashtag #UnravelTheComplex