How to do Entity Extraction with Google’s Natural Language API in Google Sheets (Apps Script)

entity analysis Google NLP API tutorial mlforseo

Swamped by text data that you need to quickly pull the key elements of? You’re not alone. In almost any organic strategy project, ranging from a content audit, keyword research, or competitor analysis, to the more user-centric analyses like feedback or comments analysis for product ideation, entity analysis can often save you tons of time and unlock amazing insights. With the help of an Entity Extraction Machine Learning API, you can automatically identify and extract entities (such as people, places, and organizations, and more) from free-form texts in seconds.

Today, I’ll show you how to use Google’s Natural Language API’s entity module in Google Sheets on any text data – long or short-form, or otherwise – to extract the entities from anything as short as a title or meta description through product descriptions or even from articles. The tutorial is suitable for complete beginners – people with no coding experience or having not done entity analysis before, as it requires no prior technical skills.

About the method: How does entity analysis work?

Entity extraction is a core Natural Language Processing (NLP) task that automatically identifies and classifies key elements within text data. These elements, called named entities, fall into predefined categories like people, organizations, locations, and dates.

Entity extraction is a supervised machine learning problem and it involves feeding the model with labeled examples. For example, training a model to find names of people in news articles. The training data would include articles where names are already highlighted. By analyzing these examples, the model learns to identify patterns and replicate them to find similar entities in new, unseen text. This is then extended to other types of entities, and amplified to include more training examples and data.

To custom-train an entity extraction model, you can leverage various algorithms, from traditional methods like regular expressions to advanced deep learning techniques using neural networks. Word embeddings, which capture the contextual meaning of words, can also be employed to improve the model’s accuracy in pinpointing the correct entities.

Entity extraction models are pivotal for sifting through vast amounts of text data, enabling structured data extraction, enhancing content discoverability, and unlocking valuable insights from text by identifying key entities.

About the model: Google Cloud’s Natural Language API

The Natural Language API draws from a vast library filled with knowledge about language structure, grammar structure, sentiment, and real world entities. It’s trained on massive amounts of text data, allowing it to:

  • Identify key syntax elements and structures: It recognises nouns, verbs, phrases, and relationships between words, just like we do when we read, and can create more complex parsing structures like syntax trees.
  • Understand context and entities: It goes beyond individual words, considering the surrounding text and even real-world knowledge to grasp the meaning of text via entity analysis.
  • Detect sentiment: It can sense emotions like joy, anger, or sadness expressed in the text, not only at a document level (the entire text) but also at the entity level (sentiment associated with a specific entity mentioned in the text).
  • Classify text: It has pre-training applied to identify whether the text you analyze aligns with either of more than 1,300+ categories

How it works

Google Cloud’s Natural Language API offers an Entity Analysis module that analyses text and extracts mentioned entities. The analyzeEntities module identifies and classifies named entities within the provided text. These entities can be people, organizations, locations, dates, events, works of art, and more – a comprehensive list of entity types is available in the API documentation.

Since the same API also has the capability to detect sentiment, it also provides entity sentiment scores (emotion label), which can provide more insight into the emotional context, and entity sentiment magnitude (emotion strength).

The API response of each call also provides details like:

  • Entity Types: For each entity, the API maps an entity type (per the list linked above)
  • Entity Salience Score (Prominence): A score assigned to each entity indicating its relative importance within the context of the analyzed text. Higher scores signify a more prominent role for the entity.
  • Number of Entity Mentions and Mention variations: The API identifies different mentions of the same entity throughout the text. For instance, “Barack Obama” and “the former president” might both be recognized as mentions of the same entity, depending on the document context.
  • Wikipedia Links (when available) in metadata: The API may provide links to corresponding Wikipedia pages for well-known entities, enriching the analysis.

This comprehensive entity analysis empowers you to gain a richer understanding of the content and the relationships between entities within your text data.

Additional resources

Check out the additional resources by Google Cloud to practice working with this API, and the entity analysis module specifically:

Step-by-step guide on using the Natural Language API Entity Extraction Module in Google Sheets via Apps Script

Prerequisites

Get your API key

Having selected your Google Cloud project, navigate to the APIs and Services menu > Credentials.

Screenshot 2024 02 25 at 11.51.54

Then, click on the Create Credentials button from the navigation next to the page title, then select API Key from the drop-down menu.

image 3

This is the easiest to use, but least secure method of authentication – you might consider alternatives for more complex projects.

What is the difference between API, OAuth client ID and Service account authentication?

In short, API key authentication is like a Public key for basic access (like a library card), OAuth client ID allows for more user-specific access requiring authorization (like a bank card with PIN), while Service account authentication is the most secure access for applications without users (like a company credit card).

Once you click on the Create API key button, there will be a pop-up menu that will indicate that the API key is being created, after which it will appear on the screen for you to copy.

image 4

You can always navigate back to this section of your project, and reveal the API key at a later stage, using the Show Key button. If you ever need to edit or delete the API key, you can do so from the drop-down menu.

image 5

Extract and organise the text content you want to extract entities from

The next step is to decide on and organise the content you want to extract entities from into Google Sheets.

What content can you extract entities from with the Natural Language API?


The Google Cloud Natural Language API can extract entities from both short snippets of text and longer documents. There is no specific character limit, but the API might not work great when used for extremely long inputs, especially in Google Sheets (Python-based scripts might be a suitable alternatives for such projects). To give you some examples, you can extract entities from titles, product descriptions, web content, or even social media user comments or feedback, left in user research forms.

For a no-code content scraping approach, I recommend using Screaming Frog’s custom extraction function. The approach works in three simple steps:

  • Finding the content selector: Navigate to the page/website section, from which you want to scrape content, and identify the selector that contains the content form the HTML
Screenshot 2024 02 25 at 14.12.46
  • Configuring the crawl: From the Copy menu, copy the setting you will use (e.g. Selector, X-path, etc.) and paste that into the custom extraction module in Screaming Frog, before starting your crawl.
Screenshot 2024 02 25 at 14.13.19
  • Content Extraction: Run the crawl as usual, and find the data in the specified column, as per your extraction settings. Export your data in your desired format, e.g. Google Sheets, csv, or other.
Screenshot 2024 02 25 at 14.14.57

With this approach, you can quickly get a dataset of scraped content from web pages, or the HTML, depending on the extraction method you select.

What language should the content be in for the Natural Language API to work?

The API automatically detects the language of the content, unless one is specifically provided in the source code. There are tens of languages, supported by the Natural Language API (see Language Support). Unsupported languages will return an error in the JSON response.

You can also scrape content via alternative methods, using Python or third-party tools.

Once, you have your content extracted and organised into a spreadsheet-suitable format, you can move on to the next step.

Make a copy of the Google Sheets Template and paste your content and API key

To prepare the data for analysis, we need to do two things – organize the content for analysis, and paste the API key in the script.

Paste your API key

In Google Sheets, open the Extensions menu, and click on Apps Script.

Open the entityextraction.gs script attached, and select the text that says enterAPIkey. Replace it with your Google Cloud API project key. Then click on the disk icon to Save, and return to the Google Sheet file.

image 1

Paste and prepare your content for entity analysis

Paste your content for analysis in the Working Sheet, keeping the URL and content. Keep the top-level navigation structure on the sheets Working Sheet, as well as in the sheet Entity Sentiment Data (meaning – don’t make edits to the column names).

image 2

Run the analysis to identify entities from the provided text

To run the analysis, see the top menu, titled Sentiment Tools, then click on Mark Entities and Sentiment. A pop-up notification will appear at the bottom right part of the screen, notifying you the analysis has started. For each entry, a ‘complete’ sign will appear in column C.

image 3

Important: A pop-up screen might appear, asking you to give permissions for the script to run.

You can now switch over to the sheet titled “Entity Sentiment Data”, to review the output of which entities Google Cloud Natural Language API has identified from your content, including all the associated entity data like the entity type, salience, sentiment score, sentiment magnitude, number of mentions, mentions (examples), and metadata.

image 4

Visualise the entity analysis data (optional)

Although this step is optional, it is highly recommended that you visualize this data. For this purpose, I’ve created a handy Entity Analysis Looker Studio Dashboard Template, which allows you to:

  • view all of your identified entities
  • quickly understand patterns in your entity analysis project
  • View and filter with advanced filters (including Regex) individual page URLs, content, entity type, and more
  • Filter out entities that are not relevant to your project, or have low salience scores

Why use Google’s Natural Language API entity analysis in Google Sheets

There are several advantages to using Google’s Natural Language API for entity analysis within Google Sheets:

  • Scalable Analysis: Manually identifying entities in large content datasets can be tedious, error-prone, and in some contexts near-impossible! The Natural Language API automates this process, efficiently identifying entities across all your text data in Google Sheets.
  • Integration with other Data points: The Natural Language API results can be integrated with other data in platforms like Looker Studio, where you can create visualizations based on the extracted entities, combined with other performance-related data points (like traffic data, rankings, or revenue).
  • Sentiment Analysis Potential: While entity analysis focuses on identifying “what”, the Natural Language API can also be used for sentiment analysis, determining the “how” – the positive, negative, or neutral sentiment associated with the entities. This combined analysis can provide a richer understanding of your text data.

Learn how to implement the generated insights into your Organic Search strategy

Getting the data is one thing, learning how to analyse it and use it as part of Organic Search strategy is another. Here are just some of the projects, where entity analysis can be pivotal to a good organic search strategy:

  • Analyzing Customer Reviews: Identify frequently mentioned products, locations, or customer service representatives in your reviews. This can help you understand customer sentiment towards specific aspects of your business.
  • Processing Survey Responses: Extract key entities from survey responses to gain insights into user demographics, preferences, or areas for improvement.
  • Analyze competitor content and identify gaps: Understand the most commonly discussed entities in competitor content to identify content gaps and opportunities for enriching the information landscape

See the follow-up resources to learn how to harness this data to improve your strategy:

See what else you can do with to this API

As mentioned at the start, the Natural Language API has several additional capabilities that include text, entity sentiment analysis, document sentiment analysis, and syntax analysis. Explore other step-by-step guides on this topic by visiting the resources, linked below:


Meet the Author


Leave a Reply

Your email address will not be published. Required fields are marked *