Finding Common Language In Whiskey Reviews Through Machine Learning

When one reads a whiskey review published anywhere online, including ours, you notice an awful lot of unique descriptors being used for tasting notes. The likes of smoke, coal dust, fine leather, dark berry fruit and coffee grounds are typical of some of the sometimes exotic word choices being used. 

Finding meaning in and understanding these descriptors is at the heart of discriminating whiskey connoisseurs’ debates. But even for the not-so-discriminating, all of these words can be confusing for investigating the flavor and value of a bourbon that costs $130 a bottle – when a $55 similar substitute would do.

It is to this end that a research project by Virginia Tech “Department of Food Science and Technology researchers Jacob Lahne and Leah Hamilton and University Libraries’ data consultants Chreston Miller, and Michael Stamper …[will]… create a tool that finds a common language in a data set of 6,500 published whiskey reviews of about 50 to 100 words each.”

Octomore whisky reviews

Tasting through a group of whiskies like this results in various review descriptions. (image copyright The Whiskey Wash)

The team, according to the university, “is applying Natural Language Processing (NLP), a subfield of linguistics, computer science, information engineering, and artificial intelligence that involves programming computers to process and analyze large amounts of natural language data — whiskey descriptors.

“This data science technique offers researchers opportunities to analyze more data than what was possible through the traditional time-intensive and expensive manual text analysis process. According to the project team, there have been no previous attempts to apply this sort of NLP approach for sensory-evaluation purposes.”

Read More Whiskey News
Death And Destruction: The Risks Of Whiskey Making

“We don’t know anyone else who has tried to take these reviews, which are in descriptive but messy natural language, and systematically analyze them this way. One of the nice things about whiskey is its enthusiast market,” said Lahne in a prepared statement. “People care about taste deeply. Whiskey lives or dies by sensory perception. These reviews are in metaphorical, messy, natural language. What we’re trying to get to is some shared concept about taste.”

Hamilton said they may even be able to make connections among the descriptors used, the production process, and the geographical origin of the liquor. 

“This tool will analyze free-response comments and identify which words are describing flavor and separate them from what’s not descriptive,” he noted. “It will also identify which words are related and describe the same flavor. This will ultimately be helpful to consumers who may want to buy something that’s close to a high-dollar whiskey but is more affordable.”

“There is value in a tool with deep learning, a subset of machine learning,” added Miller. “Deep learning is a machine learning technique that uses the technique of Deep Neural Networks, based on how neurons in the brain function, to automatically learn features of the data which then aids in identification. By training the tool, we are able to comb through more information and make sense of it more quickly and efficiently than a human. If we throw enough data at it, the peculiarities are diluted. This is a booming area of research and one that is very exciting.”

When the team has its common language defined, it was noted, “they will pass the data to Stamper, information visualization and interaction designer, to create the user stories, flows, and interfaces that audiences will use to interact with, and draw insight and meaning from the data.”

Read More Whiskey News
Whiskey Reviews: Stellum Bourbon and Stellum Rye

“We will define our target audiences and build an interface to communicate the data. We can use visualizations to see how we can dig deeper into the information,” said Stamper. “The data is so rich that the visualization types that we’ll be able to incorporate can include networks, geospatial, and temporal – it’s just figuring out what will work best for making the information in the data meaningful to those who are interested in seeing and interacting with it.”

Upon the completion of the year-long process, the team will look forward to future research that could build upon “this novel approach they have begun.”

“At some point, we may get to a place where we describe flavors like we do colors; it would be standardized,” said Hamilton. “This is a great step in that direction.”