Creating a Custom Facet in Coveo Hive – Part Two

In my previous blog post, I went over the basics of what I needed to do to set up a custom facet / component in Coveo Hive. In this blog post, I will take a deeper dive into how we can have our custom facet interact with the current Coveo query on the search page and manage Coveo state. Let’s get started.

Updating the Current Coveo Query

At this point, you should have a visible and interact-able facet. When a user enters a value into a field of your facet, checks a box, or makes some other type of facet interaction, you’ll want to make sure the Coveo search page reflects this interaction by showing updated search results. In technical terms, we can achieve this by adding our custom facet interactions to the advanced expression of the current query. We will need to do the following:

  1. Add code to process the query change.
    1. Update the Coveo state.
    2. Then, call executeQuery() to run the buildingQuery event, which will set the advanced expression and cause the search results component to refresh.
    3. Finally, log the search event to Coveo Usage Analytics (optional).

Continue reading “Creating a Custom Facet in Coveo Hive – Part Two”

Advertisements

Creating a Custom Facet in Coveo Hive – Part One

Recently I was tasked with creating a custom facet for a client’s Coveo for Sitecore implementation. I was implementing Coveo Hive, which worked well and it was very simple to add components to the page in a modularized fashion. After reviewing the proof of concept search page, the client realized they didn’t like the date slider component at all, and asked that we instead have a facet with two fields, start date and end date, and each field would have a calendar picker.

I quickly found out this was not available as an OOTB component. I would have to learn how to create a custom component and allow it to be added to a search page. I would have to manage state, facet appearance and breadcrumbs for the new facet. Due to the complexity of this task, I will be creating a blog post series for each major part. In this first blog post, I will go through the basic setup of your custom facet and at the end of this post, you should be able to see and interact with your facet to some extent. Final code will be provided at the end of the final blog post in this series. Let’s get started!

It Starts With Javascript (Or Typescript)

When I first started working on this, Coveo’s documentation suggested creating a custom component using Javascript. It looks like they’ve updated their documentation. You’ll want to start here and pick one of the two paths. I went the JS path, so my code will be different from yours if you choose the TS path.

Continue reading “Creating a Custom Facet in Coveo Hive – Part One”

The coveo-facet-empty class

Recently I struggled through the creation of a custom component in Coveo Hive. My goal was to create a custom facet with two fields, start and end date, and get results to filter based on selections in those dropdowns, include breadcrumbs, etc. One issue that I ran into time and time again was my facet not appearing in the facets section. At first I thought the markup simply wasn’t generated, until I inspected the DOM and found it was there, but hidden.

 id="_..." class="CoveoFacet coveo-facet-empty"

I could immediately tell it was empty due to the existence of the coveo-facet-empty class (it makes sense), but I didn’t manually add this class. It turns out Coveo adds this class to any facet, but only for certain conditions:

  • The facet is not pointing to a field where it can get data, or the field simply has no data for ANY of the search results. This was not my problem.
  • The facet’s isFacet setting is set to false. Okay, so this one actually got me once – for some reason, my facet had the setting equal to false. It needed to be true, so I set it to true in the Cloud Platform -> Fields page, and the facet showed up. Later that week though, the facet disappeared again – so keep reading.
  • There is a syntactical error in the Javascript of your custom component. This happened to me several times. The console should explain what the issue is. Usually I was missing a curly bracket or something minute like that.

Hope this helps anyone having the same issue.

 

Crawling root naming “gotcha!”

Short post for a small “gotcha” I experienced with indexing / crawling roots. I’m using indexing roots with Coveo for Sitecore (Hive) v4.1. I updated my roots and named them how I wanted them to be named, then rebuilt the index – only to find 0 items indexed, and the cloud panel reporting nothing happening.

The problem was, there wasn’t a crawling root with the default naming of ContentCrawler (or MediaItemCrawler for the media library). If you are indexing things in the content tree, you need at least one crawler with that exact name, or the rebuild will do nothing.

I don’t think you can have multiple crawlers with the same name, unless the crawlers are for different databases – meaning you can have two crawlers, both named ContentCrawler, one in your master index element and one in your web index element.

From my experience, other content crawlers after that (but within the same index element) can be named however you choose.

Extract, convert and adjust dates from a Coveo Cloud V2 Web source

Let’s say the pages in your Web source have a date/time value somewhere in the markup of the page. You’d like to be able to extract that string, convert it to a date type and then set one of your fields / mappings with that value, so you can use it in your search results, facets or in other components. This was my exact situation just two weeks ago, and through a lot of Python research and syntactical hoops, I was able to achieve the desired result.

Background

I had one Web source that had some interestingly formatted event dates, and our company didn’t want to burden the client with updating their date formatting on the site to match the formatting of other dates in other sources, so we had to take what we were given and find a way to convert it. Here is one example for one of the client’s sites:

Saturday 3 February 2018 9:00 am CST

At first glance this didn’t seem too difficult. Until I learned how much of a pain date formatting in Python can be.

burns

Phase 1: Extract

First, we had to scrape that date out of there and into a raw / temporary field. To do this we utilized the Coveo Web Scraping Configuration, which is essentially a field on your Web source that lets you extract data, exclude certain parts of a page, and other things. What you enter into this field must be in a JSON format. In this case I also had to brush up on my XPath skills, since I would need to provide a path to the value I wanted to extract. My web scraping configuration looked like this:

[
 {
 "for": {
 "urls": [
 ".*"
 ]
 },
 "metadata": {
 "textpubdate": {
 "type": "XPATH",
 "path": "//div[contains(@id,\"formField_dateTime_event_start\")]/div[contains(@class,\"fieldResponse\")]/text()"
 }
 }
 }
]

What this means: I’m specifying that I would like to extract data from the page as metadata, and I specify my temporary string field textpubdate. The XPath selector above is looking for a div with class fieldResponse inside of a div with class formField_dateTime_event_start and then simply extracts the inner text by calling text() at the end. After I rebuilt my source, it worked – the mapping was showing the string value.

Phase 2: Convert

Time to learn Python! The next step is to create an indexing pipeline extension, using Python as the programming language, which will handle the back end work of converting the dates. To be specific, Coveo Cloud V2 extensions seem to be running version 2.7.6 of Python (at least from what I found), so there are some solutions that won’t work if they don’t work with v2.7.6. Also, solutions I found on the internet did not always apply, as some formatting directives that work on Linux for example, don’t work on Windows. Some examples:

  • If your extracted date string uses lower case time parts such as ‘am’ and ‘pm’, those aren’t supported for use with strptime in the en_US locale. They are supported if you update the locale to de_DE (German), but switching locales didn’t seem to be supported by the Coveo Python OS from what I could tell.
  • If your date uses a single digit numerical day, you’re out of luck because the %e directive is not supported in the standard Windows C library, and you will get an error in the log browser if you try using it.

Thankfully, I found an official list of Windows-supported directives for use with the strptime function. Those should all work in a Coveo extension too.

So, my problem remained: certain time parts could not be converted because no working directive existed. The only option left? Get rid of it.

ramsay

I wasn’t printing it out in my search results anyway. I decided to make a Regex string that would find the time part of the date and remove it (some parts of this were Regex found online), which involved learning re.compile, re.search,re.sub and a bunch other fun Python Regex functions and gotchas – such as:

  • Despite seeing it in the vast majority of articles, the r prefix should not be entered before a Regex string if you are using standard escape sequences. Since my Regex definitely was, I just dropped the r and my re.compile succeeded.

Phase 3: Adjust (if necessary)

If you rebuild your source now, you should have dates coming into your field (you can check on this in the Content Browser).  However, you might notice they are the wrong day! In fact, they might be one day before what the actual date was on the page. This is because of the conversion that you are doing. When the date string gets converted, it gets converted to a date using your local time. Using the timedelta library, you would need to add a certain amount of hours so that the date matches UTC (see full code below).

Completion

Final code used:

from datetime import datetime, timedelta
import string
import re

try:
 # Get Coveo field value (string)
 pubdate = document.get_meta_data_value('textpubdate')
 # Try to strip out time, am/pm and timezone
 pattern = re.compile('([0-9]|0[0-9]|1[0-9]|2[0-3]):[0-5][0-9]\s(am|pm|AM|PM)\s[a-zA-Z]{3}$')
 if (re.search(pattern, pubdate[0])):
 log('Match succeeded', 'Debug')
 # Replace the found pattern with nothing
 newpubdate = re.sub(pattern, '', pubdate[0])
 # Convert string value to Coveo date type, strip out ending spaces if any
 pdate = datetime.strptime(newpubdate.rstrip(), '%A %d %B %Y')
 # Add 6 hours to date due to time zone used in conversion
 newDate = pdate + timedelta(hours=6)
 # Set date field
 document.add_meta_data({'aoparesultdate':newDate})
except Exception as e:
 log(str(e),'Error')

The timedelta library caused some hiccups; I had to use it in the exact way I have it written below or it didn’t work. One final comment: make good use of the log() command while doing these so you can see any errors or custom messages you enter in the Log Browser. Hope this helps someone and let me know if you have any questions, comments or suggestions!