Managing Personally Identifiable Information (PII) in Support Tickets with Eunomia

By Tommaso Tassi, Vincenzo Pecorella and Pietro Valfrè

December 26, 2024

Announcing Eunomia Release

Today we are releasing Eunomia, an Open Source Data Governance framework for LLM-based applications.

Eunomia is designed to harness the power of AI to tackle critical data governance challenges. As these challenges have intensified with the recent explosion of LLMs, implementing a framework that applies data governance at the token level is now more crucial than ever.

Eunomia leverages a combination of multiple ML models and small language models to implement robust data governance. These models include both open-source solutions and internally developed modules tailored to meet specific customer needs.

While each module can operate independently, the full potential of Eunomia is unlocked when all components are combined and modularly customized. This design not only enhances the system's accuracy but also ensures a cost-effective alternative to relying on large, general-purpose LLMs with custom prompts.

Managing Personally Identifiable Information (PII) in Support Tickets

In today’s post, we’ll explore practical, real-world applications of Eunomia, focusing specifically on Personal Identifiable Information (PII). We’ll discuss how Eunomia’s capabilities and built-in tools for PII management enable new and innovative use cases.

What You’ll Discover Today:

  1. Leveraging Ticketing Data in RAG Systems
    How Eunomia enables ticketing data to be safely used as a knowledge source in Retrieval-Augmented Generation (RAG) systems.

  2. Secure Identity based access control (IDBAC) of User Information in Chatbots
    How Eunomia ensures chatbots retrieve user-specific information only when explicitly requested, without exposing data from other users.

Leveraging Ticketing Data in RAG Systems

Duplicate Efforts on Problem Resolution
Team members often spend hours troubleshooting an issue, trying multiple approaches, only to realize that the same problem was already solved a week earlier by another colleague—who also spent hours resolving it.

Time-Consuming Repetitive Tickets
Around 60% of your time is wasted on repetitive tickets because end users struggle to perform basic operations on their own. Maintaining a shared repository with all solutions doesn’t help because end users often cannot identify the right solution from a long list. Every IT issue tends to be slightly unique, adding to the confusion.

The Promise of LLMs: A Beacon of Hope

The advent of Large Language Models (LLMs) seemed to offer a game-changing solution. You envisioned building a chatbot that leverages Retrieval-Augmented Generation (RAG) to search through previously resolved tickets. This would help both your team members and end users:

  • Team Members could quickly find solutions without duplicating efforts.
  • End Users could resolve their issues independently, reducing ticket volume.

Excited by this potential, you started building the system—until reality hit hard.

The Nightmare of PII Exposure

The tickets in your system contain a wide range of Personal Identifiable Information (PII): names, roles, passwords, and other sensitive data. Suddenly, your brilliant chatbot idea turns into a data governance nightmare. How can you ensure that sensitive information is never retrieved, avoiding catastrophic data leaks within your company?

And no, the answer is not duplicating the dataset, manually preprocessing it to remove all PII from the tickets, and then feeding that new, sanitized dataset to the LLM as its knowledge base. That would be squandering the potential of LLMs just to build a slightly more powerful chatbot reminiscent of 2010.

Enter Eunomia

To address this challenge, we developed Eunomia. Eunomia’s advanced PII management instruments ensure that sensitive data—such as names, emails, and phone numbers—are effectively safeguarded.

Eunomia can operate at different stages of data pipelines, enabling smooth integration with various architectures. For this particular use case, the following processing method options are available:

  • In one go processing: Sensitive data is identified and removed in advance, storing a static "sanitized" version of the ticket directly in the database when the data source and the information to be masked remain static.
  • On Demand processing: When a ticket is retrieved by the RAG system, Eunomia dynamically cleans the ticket before delivering the generated response to the user.

This dual approach ensures that PII is never exposed, allowing you to maintain robust data governance while harnessing the full power of LLM-based systems.

How Eunomia Works

Let’s explore an example to showcase the true potential of Eunomia.

Imagine a resolved support ticket that contains PII about the user. As mentioned earlier, feeding this ticket directly into a RAG system as part of a knowledge base would potentially expose sensitive information to other users. This is where Eunomia comes into play.

Before the ticket enters the RAG system, Eunomia processes it to remove or mask all sensitive information.

Original Ticket

Ticket ID: #IT-2024-0001
Created Date: 2024-12-20
Status: Resolved
Priority: Medium

User Information:
Name: John Doe
Email: john.doe@example.com
Phone Number: +1 (555) 123-4567

Issue Description:
The user cannot connect to the company's VPN. They receive the error: "VPN Connection Failed. Unable to establish a connection."

Step-by-Step Resolution:
Identified the Problem: Verified the error message, checked VPN credentials, and software version.
Troubleshooting Steps Taken: Ensured the user’s internet connection was stable and verified the VPN client software was up to date.
Solution Applied: Cleared the VPN configuration cache, reinstalled the VPN client, provided the correct VPN server address, ensured credentials were valid, rebooted the system, and tested the connection.
Outcome: VPN successfully connected without errors. Tested connection stability for 10 minutes to confirm issue resolution.

Resolution Notes:
The issue was due to a corrupted VPN configuration file. Reinstalling the VPN client and resetting the configuration resolved the problem. The user confirmed the VPN connection is now stable and working.

Ticket processed by Eunomia

Ticket ID: #IT-2024-0001
Created Date: 2024-12-20
Status: Resolved
Priority: Medium

User Information:
Name: <PERSON>
Email: <EMAIL_ADDRESS>
Phone Number: +1 <PHONE_NUMBER>

Issue Description:
The user cannot connect to the company's VPN. They receive the error: "VPN Connection Failed. Unable to establish a connection."

Step-by-Step Resolution:
Identified the Problem: Verified the error message, checked VPN credentials, and software version.
Troubleshooting Steps Taken: Ensured the user’s internet connection was stable and verified the VPN client software was up to date.
Solution Applied: Cleared the VPN configuration cache, reinstalled the VPN client, provided the correct VPN server address, ensured credentials were valid, rebooted the system, and tested the connection.
Outcome: VPN successfully connected without errors. Tested connection stability for 10 minutes to confirm issue resolution.

Resolution Notes:
The issue was due to a corrupted VPN configuration file. Reinstalling the VPN client and resetting the configuration resolved the problem. The user confirmed the VPN connection is now stable and working.

As shown in the cleaned example above, all sensitive details have been identified and substituted with the <PII_TYPE> string. This ensures the ticket can now be safely used in the RAG system without any risk of PII exposure.

Customization Options

It’s important to note that for this example, we showcased only a specific set of PII, and the masking process replaced sensitive data with the <PII_TYPE> string. In practice, Eunomia offers:

  • Flexible PII Detection: A customizable set of PII types to detect.
  • Configurable Masking Options: You can replace PII with tokens like <PII_TYPE>, anonymized placeholders, or other representations based on your specific needs.

By applying Eunomia, you ensure robust data protection while enabling your RAG system to deliver accurate, safe, and useful responses.

Secure Retrieval of User Information in Chatbots

Congrats! The chatbot you developed for the IT service desk becomes a huge success in your company. Everyone is thrilled and suggesting new features. The most requested functionality is the ability to ask more specific questions about your own tickets.

While it makes perfect sense to clean PII from other users’ tickets, this doesn’t apply to your own tickets—after all, you wrote them.

At the same time, there’s sensitive information, such as details about the agent who resolved the ticket, that should never be displayed, regardless of the requester.

How can we ensure that PII in tickets you created remains accessible to you, while PII in other tickets stays securely cleaned?

ID-Based Access Control: Tailoring Data Access for Personalized Use Cases

To solve this, we developed the ID-Based Access Control Instrument, ensuring precise control over which information is accessible based on the requestor's identity.

The ID-Based Access Control Instrument for PIIs operates through the following components:

  • ID-Based Mechanism: Every user or entity must have a unique ID.
  • Document Ownership: Each document must include the ID of its owner. For example, in our use case, every ticket must store the ID of the user who created it.
  • Global PII Masking: A defined set of PII that is always masked, regardless of who views the document, including its owner.
  • Owner-Specific PII Access: A set of PII that is visible only when the document's owner views it but remains masked for all other users.

The remaining process is seamlessly handled by Eunomia. Using the previous example, the behavior remains the same unless the user shares the same ID as the person who opened the ticket. In that case, no obfuscation occurs, and the ticket is displayed exactly as it appears in the source.

We hope this post has sparked your interest in Eunomia and how it can transform your data workflows. Feel free to explore its capabilities and see how it can fit into your projects.

If you're a company navigating the challenges of data governance for LLMs, don’t hesitate to reach out to us at info@whataboutyou.ai — we’d be thrilled to help you unlock new possibilities!

Interested in contributing to the project? You can find the documentation and guidelines on how to get started here.

We’d love to have you onboard!