On this third episode of Ropes & Gray’s Insights Lab’s four-part Multidimensional Data Reversion podcast series, Shannon Capone Kirk and David Yanofsky discuss the crucial steps in the iterative cycle of data analysis, visualization, and insight. They delve into the complexity of using free text in data analysis, explaining how unstructured text differs from structured data and the challenges it presents. The conversation also covers methods to analyze free text, including the use of rubrics, sentiment analysis, and self-scoring. They highlight the significance of understanding emotions and psychological safety when collecting and analyzing data. Additionally, they address the current limitations and potential of generative AI in data analysis, providing a realistic view of its capabilities and costs.
Transcript:
David Yanofsky: Hello, and welcome to Multidimensional Data Reversion, a show where we are digging into where data analysis intersects with the law. I’m David Yanofsky, director of data insights, analytics and visualization in the R&G Insights Lab.
Shannon Capone Kirk: And I’m Shannon Capone Kirk, managing principal and global head of Ropes & Gray’s advanced e-discovery and AI strategy group. David, the last time we were together, we ended on talking about the visualization of data, about building a garden and tending the garden, but we probably should take a couple steps back and talk about the ways to analyze data—so that step before you ultimately visualize the data.
David Yanofsky: I always like talking about visualizing data first because it is such an important part of the analysis. But in terms of how people think about what you do—you have data, you analyze it. It’s the way that I think about it, too, other than the fact that visualizing data is both an integral part of analyzing it, as well as communicating that data.
Shannon Capone Kirk: It makes a lot of sense in any educational or creative realm. If I’m educating a client, associates or even law students, or I’m creating something—whether I’m redesigning the decor of a room or I’m writing a novel, whatever it is—I want to have visualized what is the ultimate end result that I want to convey, and that’s usually, for me, a visual thing. And so, that’s how I think of it, and it makes sense that that’s what we would have hit first. Does that convey to you as well?
David Yanofsky: Yes. I think it’s interesting that you brought up decor and designing a room because the difference between having a blueprint in terms of you’re renovating a kitchen or you’re designing a room or you’re designing a whole building or whatever it is, there’s nothing really exploratory from the blueprint—everything has already been determined. You’ve picked your finishes. You’ve picked your appliances. You’ve picked the layout. And so, at that point, you are visualizing what you expect to have at the end. As much as we like to think that we can have this perfect one-to-one preconceived notion of what we want—to design it, put it down on paper, and then build it—the analysis process is a bit more iterative, where we have an idea of the direction we want to head in, and we do an analysis. We get some results—they might be insightful results; they might not be. They might hint at some finding, but not get there fully. And so, you have to do an additional analysis to get clarity. Instead of a process of, “Here is a blueprint,” and “Here is your rendering,” and “Now, here is your final result,” that you get in architectural situations, to me, it’s more like trying to clear the fog. You look out your window and you have an idea of what’s out there—whether a city or a town or whatever view you have—but it’s a foggy day, and you try and look closely at certain aspects of it. Maybe you go out into the world, and you go through that fog—and as you get closer to things, you see things. Sometimes, the fog clears—you get to a point where there is no fog, and you have a perfectly clear view, and sometimes you end up in some place a little foggier. And so, going into a data analysis knowing that you’re going on a journey rather than that you’re trying to achieve a result is a really important mindset to have.
Shannon Capone Kirk: In terms of the visualization we talked about last time, whatever our analysis is going to tell us about reality and facts and metrics—whatever that is—we’re going to go through that process. At the end of this, we’re going to visualize it and we’re going to have a house or we’re going to have a completed room, and that’s what we’re aiming to get at, but it very well could be that the specific nature of that house or that room will change and accommodate what that actual analysis is telling us. Is that a fair way to build this analogy?
David Yanofsky: Yes. There’s a very well-respected data journalist by the name of Amanda Cox, and she talks about how you come up with this stuff. How do you create these great data visualizations, these great charts? How do you communicate this stuff so well? She says, “I just make hundreds of charts and find the ones that are working.” It’s not about having the preconceived notion about what is going to work and what isn’t—it is about trying things and trying analysis in different ways to be able to figure out what is working and what isn’t, what is communicating well, what isn’t communicating well, and, of course, having the skills to identify that feature, and then moving forward.
Shannon Capone Kirk: Let’s talk about ways to analyze data. Sometimes, I worry that when we want to help a team or a client, and we want to tell them, “We want to do data analysis with big data,” I worry that some folks assume that means it’s all numbers and metrics and codes, and it’s very automated. But as we talked about last time, there are humans that need to be involved in the process to build the rules and to identify the scope. There is an interaction between whatever data analysis, algorithm, etc., we might be using, and a big part of that is qualitative information that often comes by way of gobs of free text. Why don’t we break it down into basic fundamentals first? Can you first explain what do we mean when we say, “free text”? Can we just talk about that a little bit more and how free text plays into databases, data coding, data scoring, and then, ultimately, into visualization?
David Yanofsky: Free text is written words that have no predetermined structure, and so, it’s words, it’s sentences, it’s paragraphs. It’s typically from multiple sources, though it might be from a single source. A document is full of free text—the alternative being structured text where you ask someone a question. They had five answers—those answers were text: “Does this make you feel good, neutral, or bad?” Every answer that we get from someone who responds to that is going to be good, neutral, or bad, as opposed to, “Please write a paragraph about how you’re feeling today”—the answer that you get in that situation is free text. To understand how someone’s feeling through that paragraph, you need to take different tactics that are more complicated and are more varied than trying to assess good, neutral, or bad. In terms of how you deal with the free text, what’s on the tip of everyone’s tongue, the front of everyone’s head right now is generative AI. I want to put that aside because there are easier-to-understand ways to extract insight from free text and there are more straightforward ways to do so that can give you confidence that what you’re getting is true in a way in which you need to accept uncertainty about whether or not the answer you’re getting is true.
Shannon Capone Kirk: We’re going to set GenAI to the side—we’re talking about humans evaluating free text. If a human is reading different, unstructured, free text answers across, say, a survey of employees, there’s an element of subjectivity to the evaluator. There are more straightforward ways to evaluate and use those free text answers—easier than using GenAI. But first of all, how is it easier—what are those ways? And how do we account for subjectivity?
David Yanofsky: If we’re using third-party humans to score someone’s response, we better have a good rubric. We better have a table or a document that qualifies what we’re trying to score for, to say how happy or sad does the person seem to be in this. How risky or not does the behavior expressed in this paragraph seem? We need to have defined features—just like if you were writing an essay in college, your professor has given a rubric to the TAs to score your response. I went to art school and took a lot of art history classes. After the exams, some professors would share the rubric and say, “This is the how the TAs were supposed to score your essay. Did you mention this, that, or the other thing about this painting? Similarly, we can make rubrics to assess whatever we need to give the consistency of that scoring. That’s third-party scoring. There is also sentiment analysis, to look at the words that are being used and look at a predetermined categorization of those words of whether or not each individual word is more one way or another, and so, now we can assess the sentiment of that text. So, more consistency in how it’s scored (probably), but less human. Some words can have multiple meanings and context matters, and humans are really good at understanding that context and the typical sentiment analysis, algorithms are not.
Shannon Capone Kirk: It’s very similar to when we’re doing a large-document-volume investigation, if we’re looking for fraud, internal fraud, or things of that nature, we also use sentiment analysis not as the primary or sole way to evaluate the words within the documents but as an aide, a boost, or a way to triage things. So, that speaks a lot to my world, which is investigations and data review. And it’s somewhat of the same thing—both of them are evaluating unstructured data. One is usually emails and communications. Here, we’re talking about unstructured free text answers, and the tools are good to manipulate that unstructured data. And again, I use “manipulate” not in a negative way, but in a way that is an active exchange between the human and the data with some technology boost or aide.
David Yanofsky: I want to talk about my favorite way to assess sentiment. Say you have an employee feedback survey, and you’re trying to understand if there are any risks that are not coming up in your speak-up line or are not being reported to managers or anything of that nature. You’re asking for people to tell you information about your company in free text—ask them how they feel about it. Say, “Tell me a story of a time where you encountered an ethical dilemma at work.” And you have someone write a paragraph and now you can ask them, “Do you see this as a severe ethical dilemma? Did this dilemma cause you great concern? Mild concern?” You can ask questions about the thing that you were just told, and that is my favorite way, because all of the subjectivity either from a human reviewer or an algorithmic reviewer goes away—there’s no bias there. You’re asking the biases of the storyteller—that’s the exact bias that already exists in the free text that you’re getting. If you are surveying people, if you’re talking to people, if you’re interviewing people, to be able to capture explicitly from the person that you’re talking to, that you’re soliciting responses to, how they feel about the thing that you want to know about is like rocket fuel to being able to analyze sentiment of free text. To be fair, this is not always available. In your world, Shannon, when you have gobs of emails, documents, or other communications that you need to assess, this doesn’t apply.
Shannon Capone Kirk: That makes so much sense, and it’s important. In the days when we would evaluate, for example, the credibility of a witness or the level of confidence that they were giving us in interview answers, you’d be in a room with them, you’d be reading body language, and after so many years and so many times of doing it, you generate for yourself a skill in evaluating a witness—good, bad, wherever they are on the spectrum of credibility. And when you’re sitting in a room with them, you also have the opportunity to do follow-up questions. Note in an interview when they pause a little too long, so you pause and you then say, “Actually, I noticed that you paused. Can you tell me why and how you’re feeling about that last answer?” That’s the rocket fuel in real time in old school interviews. And to me, what I’m hearing you say is now we want to try to translate that when we’re doing surveys of employees for whatever analysis so that we can build some visualization, some data analysis on it to take action on a macro level. Is that what you’re saying?
David Yanofsky: Absolutely. The other thing that that allows you to do is flip the typical way in which people expect this analysis to be presented to them. We’ve all seen these PowerPoint slides of this little quote, if this person says this, and this little quote and this person says this, you can instead say, “Here is a plot of all of our stories and how they fit on some axis.” On one axis it’s how important this story is to the organization and the other axis is how severe of an emotion did it make the storyteller feel. Now we can say, and, perhaps, even in an interactive way, “Let’s look at the stories that are important and emotional. As a group, let’s highlight them on a plot and have all of those stories show up.” Or “Let’s look at the ones that are unimportant and emotional.” Any combination of those two factors to use the scoring as the starting point and the story as where you end up, to be able to show the full universe and explore the true distribution of all of these stories is a really powerful thing to be able to do.
Shannon Capone Kirk: I’m so glad that you mentioned emotions and trying to gauge, evaluate, and capture emotions and sentiment. I know this is all about technology, big data, and GenAI (and we’re going to talk about that), but we’re still talking about humans. I keep coming back to this. We are talking about organizations of humans—behavior, emotions, and psychology are a big part of what you and the Lab do, isn’t that right?
David Yanofsky: Absolutely. That is part of the reason why we like this strategy of directly asking people how they’re feeling about what they’re saying, because we don’t want to get it wrong. We’re trying to give people as much psychological safety as they can have talking about issues at their company or the culture of their company. They might love certain aspects of the culture and things that they are otherwise uncomfortable to speak about, and so, to give someone the comfort of saying, “This is a thing that I care a lot about, but I don’t think it’s as important to the company,” some people are unwilling to talk about it in a venue where they’re not able to give that context. “This is a thing that’s really specific to me, so I don’t want to talk about it in a town hall. I don’t want to talk about it in a forum that may have someone think that this is a broad-based problem. And so, being able to understand the behavior of people and how they give their response is very important to how we go about our work. Shannon, when clients come to you, “We have all of this information that needs to be analyzed.” What do they say to you?
Shannon Capone Kirk: Our clients come to us with varying levels of expertise on data analysis and, therefore, come to me with either, “Please just fix this,” or, “However you need to do it, just do it.” Then others were actually pretty sophisticated in what my area of practice is, which is e-discovery, and they say, “Here’s how we like to do it. Here’s the tools we use. Here’s the vendors we outsource. Now, you take this case and use those tools the way we typically like. If you feel that you have ways to improve that, let us know.”
David Yanofsky: The thing that I’ve experienced is that people come to me and say, “Tell me about my compliance program. We need you to investigate some misconduct that we think is happening. Isn’t there just a tool that can do this for us?” And I always go back to them with, “What questions do you actually have?” Maybe you have one, overarching question, like, what are the component questions of what you’re getting at? What are the answers that you need? What do your stakeholders need to know? That’s a hugely clarifying thing for people trying to both do a one-off data analysis or create a monitoring system of multiple data analyses—how to structure it—because without that, you might come up with some numbers to describe a behavior and you’re not making any judgment about whether that number is good or bad or whether that number is too high or too low. You’re just coming up with metrics, and metrics without questions that they help answer are basically meaningless—and if not meaningless, are not helpful to the goals of the business. Does that happen in e-discovery—people come to you without questions?
Shannon Capone Kirk: Yes. It comes in the following way that I think is adjacent to what you were saying, because, certainly—and more and more so lately—we have the legal departments whose budgets are very tight, saying, “Just throw tech at it because it’ll be cheaper.” That’s one thing that happens. What happens at the end of the year around the holidays is almost always there is some kind of tactical maneuver by an opponent or some deadline that gets kicked up that really crunches us on time to meet a deadline. And when we’re talking about data review and getting through millions of documents, the inclination sometimes of a client who is in a sensitive situation and, therefore, it’s tense and stressful, they’ll say, “Just throw a bunch of bodies at it and get it over with. Get it done.” Let’s break down both of those things and why it matters when we’re talking about data analysis and the use of technology.
Where we are right now when a client says, “Just use GenAI,” I then have to counsel that client, “Actually, it will cost you more and it will be less precise, less accurate right now if we only use GenAI.” This is for the most part for most cases. Therefore, that will end up also costing you more because I guess the other side is not going to be happy about it. How are we going to justify what may be inaccurate answers? Then, you have to show them the math and compare for them. We’ve done a few studies, David, where we’ve said, “This is how much a review of one million documents will cost you if we use predicative coding,” i.e., machine learning, which is tried and true and established in litigation now. “It will cost you X, it will take you X weeks, and we project we need X contract attorneys to do that and X firm associates doing QC.” That’s our benchmark—that’s what we compare against. “If we try to just use GenAI, here are some really important benefits of that that we’ve found,” but it’s not the whole story because you have to take into account, in precision, the time it takes to revisit and refine prompts. Also, when we run analysis on the precision of those responses, we’re not quite there yet to be able to rely on the GenAI alone. And for all of that, it will cost more money, because now we are layering in yet another tool that requires more refinements, more human involvement, but we still have to use the predictive coding or the other methodologies. We promise you, clients, that we are working to get to a world where we increase the pie, if you will, of the use of GenAI in our arsenal of tools to get through all of this voluminous data. So, that’s one really important state of affairs that is reality, and I say that as a cautionary tale, because I have heard the various software vendor providers out there make promises that may or may not be accurate or what the client is hearing them say. People are hearing there is a way, there is a tool to dramatically cut legal costs—and that is, in part, true, but we’re not yet at a state of affairs where that’s the golden button to save you millions of dollars every year.
The other important thing, and this comes up for me quite a bit, is when you have a deadline and you have mountains of data you have to evaluate and analyze—a product of olden days when we would put armies of attorneys and contract attorneys on a document review—if you’re not careful, if you don’t calibrate the amount of people you have doing document review correctly, you wind up with the reverse. You wind up going longer—it winds up taking longer. If you have more people doing a first-level document review, you’re creating more inconsistencies, because you have more humans—therefore, you have more things to fix and QC. And no matter what, you have a bottleneck at the QC level, because you can only have so many QC reviewers. In order to have a consistent, quality product for this massive data analysis you’re doing, you can’t just throw a bunch of human bodies at it. That concern about adding bodies and getting the calibration right, that’s where I do think GenAI is going to help us the most over the next year.
David Yanofsky: That’s going to be it for Multidimensional Data Reversion for today. On our next episode, we will be talking about AI, so be sure to subscribe wherever you get your podcasts. I’m David Yanofsky.
Shannon Capone Kirk: And I’m Shannon Capone Kirk. Thank you for listening.