Separating Episerver Find Indexing Job from Commerce Catalog Content – Part Two

August 7, 2019
by
Ryan Duffing

C2's Senior Full-Stack Developer, Ryan Duffing, shares how to create a new scheduled job that only indexes catalog content with Episerver Find Indexing Job.

Development

In my last blog post I covered how to prevent the default Episerver Find Indexing Job from indexing catalog content. By preventing that job from doing so all it will do is index CMS content. The reason for doing that is sometimes it’s beneficial to have two jobs for indexing; one job is for indexing CMS content and the other job is for indexing Commerce catalog content.

There are two challenges to solve in order to accomplish having two scheduled jobs for indexing content across your Episerver application:

  1. Prevent the Episerver Find Indexing Job from indexing catalog content so that it only works against CMS content. This was covered in my last blog post.
  2. Create a new scheduled job that only indexes catalog content.

This blog post will show how to create a new scheduled job for indexing Commerce catalog content. If you want to see how to prevent the Episerver Find Indexing Job from picking up catalog content for processing take a look here.

For the sake of simplicity, we’re going to put all the code for indexing catalog content into the scheduled job. You can break this out further into other classes, but that’s another topic for discussion.

In the new scheduled job, the following steps need to be completed:

  1. Retrieve all catalogs in Episerver Commerce.
  2. Loop through each catalog to retrieve all of its descendants.
  3. For the catalog being indexed, grab all of the languages that pertain to it. It is important to make sure you index all languages for the catalog.
  4. For each language in the catalog retrieve the descendants of the catalog that have been translated into said language.
  5. After the translated descendants have been retrieved for the catalog, index them against Episerver Find.
  6. Finally, delete all catalog content from the index that were indexed prior to the Commerce Indexing Job having started.

At the bare minimum for a scheduled job here is what it should look like:

<p> CODE: https://gist.github.com/thec2group-blog/bce5c9d8452b9a57f66bdcd375e1cc6b.js</p>

I’m not going to go into detail on how Scheduled Jobs are structured or managed in Episerver. If you want to read more about Scheduled Jobs, see Episerver’s documentation here. The only customization in the above is a new method we use at C2 called UpdateStatusMessage().

In the past we noticed updating the status too frequently can cause exceptions. To prevent that, we throttle how often it gets updated. I’m not sure if this is still an issue. Regardless, it is my opinion that updating the message at a slower pace is beneficial to human eyes. If the status message is updating too frequently it makes for a poor user experience when monitoring a manually-started job.

I’m going to go through all the individual pieces required to make this new indexing job work (references to services in each piece are brought in using dependency injection at the job level), and then at the end we’ll put it all together.

Step 1: Retrieve all catalogs in Episerver Commerce

Here’s the code snippet needed to retrieve all catalogs in your application:

<p> CODE: https://gist.github.com/thec2group-blog/572c458095fbb4e0bf733658e79aa038.js</p>

We’re telling Episerver here to retrieve all children under the Commerce root, which would be the catalogs, and also telling it to auto-detect the language for the catalogs it retrieves while falling back to the master language when necessary.

Step 2: Loop through each catalog to retrieve all descendants

At this point we’re not concerned with language specific items. We need the catalogs in order to loop through them for indexing purposes.

Below we’re looping through the catalogs and retrieving their descendants:

<p> CODE: https://gist.github.com/thec2group-blog/fa00a38c8678532d5a0893749f93b68d.js</p>

We utilize the IContentLoader’s GetDescendents() method here. It’s also important to combine that with the catalog’s content reference in order to index that in addition to all of the catalog’s descendants.

Step 3: Grab all languages for the indexed catalog

The next step after retrieving the catalog’s descendants is to grab all languages relevant to the catalog. It is important to make sure you index all languages for the catalog.

We’re going to make a new method to call from within the above loop.

<p> CODE: https://gist.github.com/thec2group-blog/26db13d80920cfea08f240730b9bb14f.js</p>

Basically, you extract all of the languages from the catalog, and if the catalog doesn’t contain the default language for some odd reason you manually add it before returning the collection of languages for further processing.

After creating this new method your loop through the catalogs should look like this:

<p> CODE: https://gist.github.com/thec2group-blog/afbb3d690694c0dc0f7ea1b3490f9fe5.js</p>

Steps 4 and 5: Grab all items translated in the catalog’s languages and index them

Steps four and five consist of grabbing all items translated into the languages that are enabled for the catalog, and then indexing them. These two steps are done together. Due to the size of some catalogs, I prefer to index in batches of 100. That way if the index fails at some point in middle of the job, at least some of the items were indexed. I also like to use parallel processing, and sending smaller batches to the Episerver Find client for indexing, rather than an entire set of items, puts less load on your server.

We’re going to need three core pieces to accomplish both steps four and five. One piece will be for going through the languages of the catalog and handling the parallel processing of item batches. Another piece will be for grabbing the translated commerce content item batches. The final piece will consist of sending the translated item batches to Episerver Find for indexing.

The code below will reference two custom extension methods: IsEmpty() and ChunkBy(). I’ll include the code for both these methods at the very end of this blog post.

Handling parallel processing of item batches:

<p> CODE: https://gist.github.com/thec2group-blog/c746921e1b701c585042caef20fcd9a5.js</p>

Looking at the example above, we loop through each language on the catalog, separate the catalog descendants into batches of 100, and process these batches concurrently with a multi-threaded loop. The multi-threaded loop handles the retrieval of the translated commerce content being passed in from the current batch using the method RetrieveLanguageContentItems(), and if it does find any translated content – it sends it off for indexing to the IndexWithRetry() method.

<p> CODE: https://gist.github.com/thec2group-blog/5734619b1fac1229c30affa93475aa84.js</p>

The main takeaway from the above two methods is the IndexWithRetry() method. Sometimes Episerver Find will not index all of the items you sent to it. If this situation arises, we want to try indexing the items again We don’t want to continue doing this forever, or too quickly (I’ve had exceptions occur when trying to hit the indexing service too frequently), so a while loop is used to limit the amount of times attempted for indexing and we also put the thread to sleep in between index attempts.

Step 6: Delete all catalog content from the index that were indexed prior to the Commerce Indexing Job having started

Hopefully after all of this you’re still with me. We’re at the final step for the indexing job which is deleting old items from the index.

An important thing to note which isn’t covered in detail in this post is that we do not want to delete items from the index if an exception occurred anywhere else in the job during normal indexing. We’re going to take a cautious approach in order to preserve data in case something goes wrong. It’s better to have extra data in my professional opinion than accidentally removing data that shouldn’t have been removed.

One of the things we did in the original Execute() method of the scheduled job is keep track of when the job was started. This is very important when removing items from the index. Any items in the index that were processed with a timestamp prior to when the job started need to be removed as they’re now stale.

<p> CODE: https://gist.github.com/thec2group-blog/f658c1a3e50593abd17053fb96fde951.js</p>

As a reminder, I did not go into detail on how to keep track of when errors occur before calling the DeleteRemovedContentFromIndex() method as that wasn’t the purpose of this post.  Using all of these pieces we can put finally put together the scheduled job specific to only indexing Commerce catalog content.

<p> CODE: https://gist.github.com/thec2group-blog/171b6a4de7a16d850909763f0244437f.js</p>

And as promised here are the two extension methods ChunkBy() and IsEmpty().

<p> CODE: https://gist.github.com/thec2group-blog/4ecd8e259d0b49c508f15f2378dc755f.js</p>