Optimizing Data Extraction from XDB in Sitecore: A Comprehensive Guide to Efficient Report Generation

Introduction: 

In this article, we will delve into the process of extracting data from XDB using a comprehensive code snippet to achieve this task effectively. The core functionality of the presented code is to extract data from XDB and download it on the server within the App Folder. It is essential to note that the methodology can be customized to enable data extraction via various means such as through an email link, as a zip file, or other preferred formats

When dealing with vast amounts of data, specifically in the realm of millions of records, it is crucial to anticipate limitations that may arise when accessing this data via API calls directly from a browser. Due to the sheer volume of data being extracted, it is common to encounter issues like timeouts or other related errors during direct API requests. To mitigate such challenges and ensure a seamless data retrieval process, it is advisable to implement a backend service application capable of handling extensive data extraction requirements efficiently

Given the spontaneous nature of the data extraction request, a direct API setup was swiftly put in place to facilitate the extraction and subsequent downloading of the data. While this approach sufficed for the immediate need, a more structured and robust solution, such as a dedicated backend service, would be ideal for handling similar large-scale data extraction tasks more effectively in the future

Code:

using System;
using System.Collections.Generic;
using System.Web;
using Newtonsoft.Json;
using System.Text;
using System.IO;
using SC = Sitecore;
using Sitecore.XConnect.Client;
using Sitecore.XConnect.Client.Configuration;
using Sitecore.XConnect;
using Sitecore.XConnect.Collection.Model;
using System.Linq;
using System.Threading.Tasks;

namespace YourProjectName.Sitecore.Foundation.Analytics.Services
{
    public class DownloadService
    {       
        public async Task DownloadUsageReport(DateTime startDate, DateTime endDate, string filetype = "csv")
        {
            string headers = SC.Globalization.Translate.Text("Session ID") + "," +
                   SC.Globalization.Translate.Text("User") + "," +
                   SC.Globalization.Translate.Text("XXX Persona") + "," +
                   SC.Globalization.Translate.Text("XXX Product") + "," +
                   SC.Globalization.Translate.Text("XXX Group") + "," +
                   SC.Globalization.Translate.Text("Visits - Portal") + "," +
                   SC.Globalization.Translate.Text("Page Views - Portal") + "," +
                   SC.Globalization.Translate.Text("Avg Page Views per Visit - Portal") + "," +
                   SC.Globalization.Translate.Text("Duration - Portal") + "," +
                   SC.Globalization.Translate.Text("Avg Duration - Portal") + "," +
                   SC.Globalization.Translate.Text("Visits - Mobile") + "," +
                   SC.Globalization.Translate.Text("Page Views - Mobile") + "," +
                   SC.Globalization.Translate.Text("Avg Page Views per Visit - Mobile") + "," +
                   SC.Globalization.Translate.Text("Duration - Mobile") + "," +
                   SC.Globalization.Translate.Text("Avg Duration - Mobile") + "," +
                   SC.Globalization.Translate.Text("% - Portal") + "," +
                   SC.Globalization.Translate.Text("% - Mobile") + "," +
                   SC.Globalization.Translate.Text("Events List") + "," +
                   SC.Globalization.Translate.Text("Device Type") + "," +
                   SC.Globalization.Translate.Text("Session Platform") + "," +
                   SC.Globalization.Translate.Text("Browser") + "," +
                   SC.Globalization.Translate.Text("Operating System") + "," +
                   SC.Globalization.Translate.Text("Plan Type") + "," +
                   SC.Globalization.Translate.Text("Page URL") + "," + "\r\n"; 

            StringBuilder csv = new StringBuilder();
            int fileCounter = 0;
            int totalCount = 0;
            string folder = @"C:\home\site\wwwroot\App_Data\Exports\";
            startDate = startDate.ToUniversalTime();
            endDate = endDate.ToUniversalTime();

            SC.Diagnostics.Log.Info("Starting Analytics Extract startDate After parsing : " + startDate.ToString() + " - endDate After parsing : " + endDate.ToString(), this);

            using (XConnectClient client = SitecoreXConnectClientConfiguration.GetClient())
            {
                try
                {
                    string type = string.Empty;
                    string product = string.Empty;
                    string groupName = string.Empty;
                    string currentDeviceType = string.Empty;
                    string currentDeviceVendor = string.Empty;
                    string sessionPlatform = string.Empty;
                    string currentBrowser = "null";
                    string operatingSystem = "null";
                    string planType = "null";
                    string domainname = "https://DOMAIN_NAME";
                    string pageUrl = "null";

                    var enumerator = client.Interactions.Where(i => i.StartDateTime >= startDate && i.EndDateTime <= endDate)
                                           .WithExpandOptions(new InteractionExpandOptions(IpInfo.DefaultFacetKey, UserAgentInfo.DefaultFacetKey
                                                                                         , WebVisit.DefaultFacetKey)
                    {
                        Contact = new RelatedContactExpandOptions(PersonalInformation.DefaultFacetKey)
                    }).GetBatchEnumeratorSync(10);

                    SC.Diagnostics.Log.Info("Analytics Extract enumerator count : " + enumerator.TotalCount.ToString(), this);

                    int count = 1;
                    while (enumerator.MoveNext())
                    {
                        if (count <= 40000)
                        {
                            foreach (var interaction in enumerator.Current)
                            {
                                try
                                {
                                    var contact = interaction?.Contact;
                                    var xcontact = await client.GetContactAsync(contact, new ExpandOptions(PersonalInformation.DefaultFacetKey));
                                   var portalId = xcontact?.Identifiers?.Select(x => x.Identifier.ToLower().Replace("XXXXX\\", ""));

                                    if (xcontact?.Identifiers?.Count == 0)
                                    {
                                        SC.Diagnostics.Log.Info("Analytics Extract xcontact.Identifiers.Count is ZERO.", this);
                                        continue;
                                    }

                                    if (xcontact?.Personal() != null)
                                    {
                                        type = xcontact.Personal()?.JobTitle;
                                        product = xcontact.Personal()?.Gender;
                                        var product2 = xcontact.Personal()?.Suffix;
                                        if (!string.IsNullOrEmpty(product2) && product2 != "N/A")
                                        {
                                            product = product2;
                                        }
                                        groupName = xcontact.Personal()?.Title;
                                        if (totalCount == 40001 && count == 1)
                                            SC.Diagnostics.Log.Info("Analytics Extract End of xcontact?.Personal().", this);

                                        planType = xcontact.Personal()?.Title;
                                    }

                                    // Device Type, Device Vendor//
                                    UserAgentInfo userAgwentInfo = interaction?.UserAgentInfo();
                                    currentDeviceType = userAgwentInfo?.DeviceType;
                                    currentDeviceVendor = userAgwentInfo?.DeviceVendor;
                                    // Device Type, Device Vendor//

                                    // Session Platform  //
                                    if (!string.IsNullOrEmpty(currentDeviceType) && currentDeviceType.Equals("Computer"))
                                        sessionPlatform = "Web";
                                    else if (!string.IsNullOrEmpty(currentDeviceVendor) && currentDeviceVendor.Equals("Apple"))
                                        sessionPlatform = "iOS";
                                    else if(!string.IsNullOrEmpty(currentDeviceType) && !string.IsNullOrEmpty(currentDeviceVendor))
                                        sessionPlatform = "Android";
                                    // Session Platform  //

                                    // Browser //
                                    WebVisit webVisit = interaction?.GetFacet(WebVisit.DefaultFacetKey);
                                    currentBrowser = webVisit?.Browser?.BrowserMajorName;
                                    // Browser //

                                    // Operating System //
                                    operatingSystem = webVisit?.OperatingSystem?.Name;
                                    // Operating System //

                                    //Page Url//
                                    pageUrl = interaction?.Events.OfType()?.FirstOrDefault()?.Url;
                                    //Page Url//

                                    csv.Append(interaction.Id.ToString() + "," + string.Join(" | ", portalId) + "," + type + "," + product + "," + groupName.Replace(",", " ") + ",");

                                    double counterPageViews = 0;
                                    double counterVisits = 0;
                                    double counterDuration = 0;
                                    double counterPageViewsMA = 0;
                                    double counterVisitsMA = 0;
                                    double counterDurationMA = 0;
                                    string pageStats = "";

                                    var pageViewEvents = interaction.Events.OfType();
                                    var numPVs = pageViewEvents.Count();
                                    var ppageViewEvents = pageViewEvents.Where(e => e.SitecoreRenderingDevice.Name != "JSON");
                                    var mpageViewEvents = pageViewEvents.Where(e => e.SitecoreRenderingDevice.Name == "JSON");
                                    var numMobilePVs = mpageViewEvents.Count();
                                    var numPortalPVs = numPVs - numMobilePVs;

                                    //Page URL//
                                    //pageUrl = pageViewEvents.FirstOrDefault()?.Data;
                                    //Page URL//

                                    counterPageViews += numPortalPVs;
                                    counterVisits += numPortalPVs > numMobilePVs ? 1 : 0;
                                    counterDuration += ppageViewEvents.Sum(x => x.Duration.TotalSeconds);
                                    counterPageViewsMA += numMobilePVs;
                                    counterVisitsMA += numMobilePVs > numPortalPVs ? 1 : 0;
                                    counterDurationMA += mpageViewEvents.Sum(x => x.Duration.TotalSeconds);

                                    foreach (var ev in pageViewEvents)
                                    {
                                        if (!ev.Url.Contains("XXXXSSO"))
                                        {
                                            var deviceType = "Portal";
                                            if (ev.Url.Contains("json"))
                                                deviceType = "Mobile";

                                            pageStats += ev.Timestamp + ": " + ev.Url + " : " + deviceType + " || ";
                                        }
                                    }

                                    var avePVs = (counterVisits != 0) ? (counterPageViews / counterVisits) : 0;
                                    var avePVsMA = (counterVisitsMA != 0) ? (counterPageViewsMA / counterVisitsMA) : 0;
                                    var aveDur = (counterVisits != 0) ? (counterDuration / counterVisits) : 0;
                                    var aveDurMA = (counterVisitsMA != 0) ? (counterDurationMA / counterVisitsMA) : 0;
                                    double percentWP = 0;
                                    double percentMA = 0;
                                    if (counterVisits > 0 && counterVisitsMA > 0)
                                    {
                                        percentWP = counterVisits / (counterVisits + counterVisitsMA) * 100;
                                        percentMA = counterVisitsMA / (counterVisits + counterVisitsMA) * 100;
                                    }
                                    else if (counterVisitsMA == 0)
                                    {
                                        percentWP = 100;
                                        percentMA = 0;
                                    }
                                    else if (counterVisits == 0)
                                    {
                                        percentWP = 0;
                                        percentMA = 100;
                                    }

                                    csv.Append(counterVisits + "," + counterPageViews + "," + avePVs + "," + 
                                               counterDuration + "," + aveDur + "," + counterVisitsMA + "," + 
                                               counterPageViewsMA + "," + avePVsMA + "," + counterDurationMA + "," + 
                                               aveDurMA + "," + percentWP + "," + percentMA + "," + 
                                               pageStats.Replace(",", " %5C%2C") + "," + currentDeviceType + "," + 
                                               sessionPlatform + "," + currentBrowser + "," + operatingSystem + "," +
                                               planType.Replace(",", " ") + "," +
                                               domainname+pageUrl + "\r\n");
                                    count++;
                                }
                                catch (Exception ex)
                                {
                                    csv.Clear();
                                    SC.Diagnostics.Log.Error("Analytics Extract Error Occured if For loop : " + ex.ToString(), this);
                                }
                            }
                        }
                        else
                        {
                            totalCount += count;
                            SC.Diagnostics.Log.Info("Analytics Extract Count reached 40000 totalCount till now is :  " + totalCount.ToString(), this);
                            count = WriteIntoFile(filetype, headers, csv, ref fileCounter, folder, startDate, endDate);
                            SC.Diagnostics.Log.Info("Writing Analytics File Completed the cvount variable is now : " + count, this);
                        }
                    }
                    if (csv.Length > 0)
                    {
                        totalCount += count;
                        SC.Diagnostics.Log.Info("Analytics Extract Count is greater than 0 on last Iteration. Thus File Created :  " + totalCount.ToString(), this);
                        count = WriteIntoFile(filetype, headers, csv, ref fileCounter, folder, startDate, endDate);
                        SC.Diagnostics.Log.Info("Writing Analytics File Completed the cvount variable is now : " + count, this);
                    }
                }
                catch (XdbExecutionException ex)
                {
                    SC.Diagnostics.Log.Error("Analytics Extract Error Occured : " + ex.ToString(), this);
                }
            }
            SC.Diagnostics.Log.Info("Starting Analytics Extract END", this);
            totalCount = 0;
            fileCounter = 0;
            return;
        }

        private int WriteIntoFile(string filetype, string headers, StringBuilder csv, ref int fileCounter, string folder, DateTime startdatetime, DateTime enddatetime)
        {
            int count;
            fileCounter++;
            string fileName = string.Format("XXX Analytics Report - {0} - {1} - {2}.{3}", startdatetime.ToString("MM-dd-yyyy"), enddatetime.ToString("MM-dd-yyyy"), fileCounter, filetype);
            fileName = Path.Combine(folder, fileName);
            try
            {
                if (!Directory.Exists(folder))
                    Directory.CreateDirectory(folder);
                SC.Diagnostics.Log.Info("Analytics Extract Writing into FIle started .", this);
                csv.Insert(0, headers);
                var bytes = Encoding.GetEncoding("iso-8859-1").GetBytes(csv.ToString());
                File.WriteAllBytes(fileName, bytes);
                SC.Diagnostics.Log.Info("Writing Analytics File: " + fileName, this);
            }
            catch (Exception ex)
            {
                SC.Diagnostics.Log.Info("Analytics Extract Writing into FIle Exception oddured .", this);
                SC.Diagnostics.Log.Error(ex.ToString(), this);
            }
            csv.Clear();
            count = 1;
            return count;
        }
    }
}

Detailed Description of Code:

The provided code is a detailed implementation of a DownloadService class in a C# project within the Sitecore environment. Here is a breakdown of the key features and functionalities of the code:

1 - Namespace and Dependencies:The code includes relevant namespaces like System, Newtonsoft.Json, and Sitecore.XConnect to utilize essential functionalities for data extraction and processing.

2 - Class Structure:The DownloadService class contains a method named DownloadUsageReport that is responsible for extracting usage reports based on specified start and end dates, with an optional file type parameter (defaulted to CSV).

3 - Data Initialization and Preparation: The method initializes a string headers that contains translated text for various categories. It sets up a StringBuilder for constructing the CSV file and initializes counters and necessary variables for file handling.

4 - Logging and Data Retrieval: The code logs information about the start of the extraction process. It leverages the XConnectClient from Sitecore to interact with xDB for data retrieval. Data is fetched based on specified date ranges using LINQ queries and batch processing for efficient handling of large datasets.

5 - Data Processing: The extracted data is processed iteratively, extracting relevant information from interactions and contacts in the xDB. Various data points like session details, device information, browser statistics, and page URLs are collected and formatted for the report.

6 - File Writing: The data is written to a CSV file with the necessary headers and formatting. The method WriteIntoFile handles the file writing process, including file creation, encoding, and error handling. 

7 - Exception Handling: The code includes exception handling to manage errors that may occur during data extraction or file writing processes.

8 - Logging and Closure: Logging is used throughout the process to track progress and identify any issues. Finally, the method logs the end of the extraction process and resets variables before returning.

Overall, the code efficiently extracts data from xDB, processes it, and generates a CSV report based on the specified criteria. It showcases best practices for handling large datasets and performing file operations within a Sitecore environment.

Summary:

In this comprehensive article, we have explored the intricacies of extracting data from XDB within the Sitecore environment using a well-crafted code snippet. The code presented in the DownloadService class efficiently handles the extraction of vast amounts of data, processing it methodically to generate detailed usage reports based on specified date ranges.

By emphasizing the significance of anticipating and overcoming limitations when dealing with extensive data retrieval tasks, the article underscores the importance of implementing robust backend service applications for seamless operation. While the adhoc API setup served the immediate data extraction request, a more structured solution like a dedicated backend service is recommended for enhanced efficiency in handling large-scale data extraction requirements in the future.

The detailed breakdown of the code highlights key functionalities such as data initialization, logging, data processing, file writing, exception handling, and logging closure. This meticulous approach not only ensures the effective extraction and processing of data from XDB but also sets a standard for handling substantial datasets and conducting file operations within a Sitecore ecosystem.

By following the principles and practices outlined in this article, developers can streamline the extraction process, optimize data processing workflows, and generate insightful reports from XDB data with precision and efficiency.

Comments

Popular posts from this blog

Rebuild XDB indexing Issues

Upgrading from Sitecore 9.3 to 10.3: A Practical Guide

Sitecore Bulk Uploading Data Between Identical Databases