Source mapping with different values based on the document URL

I’ve been working with Web sources in the Coveo Cloud V2 Platform a lot lately. One interesting predicament I ran into with one of my sources was the need to have a “multi-value” source mapping; meaning a mapping that would return a different string based on a condition. So in my case, I had a mapping on my source for my content type field, and I needed to return “Event” if the current indexed item URL contains “/events”, otherwise “Page” for everything else. I thought Item Types were exactly what I needed, but I never really found out because I wasn’t able to create them. Instead, I decided to create an indexing pipeline extension that would handle this logic, and hook it up to my source.

Creating the extension

If you aren’t familiar with Indexing Pipeline Extensions, they are essentially Python scripts you can write and attach to your sources, to apply some complex logic for each individual indexed item as it goes through the indexing pipeline. I suggest going through the above linked document and related documents below it for more information. The extensions can be really powerful and can do many things, like rejecting content based on a condition, setting the value of a mapping, and so on.

In this case, I had to set the value of my content type field based on the URL of a document. I knew I could use the clickableURI out-of-the-box field for this, so I wrote the following:

import string

try:
 my_uri = document.get_meta_data_value('clickableuri')
 if "/events" in my_uri[0]:
 document.add_meta_data({'aopacontenttype': "Event"})
 else:
 document.add_meta_data({'aopacontenttype': "Page"})
except Exception as e:
 log(str(e),'Error')

It took some research into Python (V2.7.6), its syntax and available methods, but I was able to do it.

Adding the extension to the source

Next, I had to go back to the Sources screen, select my source, click ​(…) More then Manage extensions and add my extension to the source, at the Post-Conversion stage, and apply it to all items.

Completion

Lastly, I had to rebuild the source. Upon checking the newly indexed content in the Content Browser, I was able to see “Event” as the field value for event pages, and “Page” for all other pages. Hope this helps! I will be writing another post or two about my other experiences with indexing pipeline extensions soon.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s