Extracting Meanings of List of Words From A Website URL (Python Webscraping)

Posted on Mon, Jul 25, 2022 Python Software
🔗

Check the source code of the program on GitHub. The following details will be based on the code.

Introduction

One day, I was preparing for the SAT, specifically the vocabulary. I wanted to memorize several hundred words (500 to be exact). However, a majority of the words I don’t know the meanings of, not even the roots of the words. But, when I tried to find the definitions, I had to look for the menings of the words one by one. To look for the meaning of words a hundred times by searching up the words a hundred times is very tedious. The general process is that I have to search up a word, click a link to a website, copy the definition and input it back to excel. Then I have to repeat the process all over again. If I can automate it, I can do it for as many as words as possible So, I had an idea. Why not put all the words in excel, input all those words into a python program, and the program will output the definitions of those words all at once? With this in mind, I created a function that takes a list of words, extracts the definitions, and puts all the definitions back into excel (Explained in more depth below). I encourage you to use this program or you can modify it as well.

Overview - A Python Multiple Input Dictionary

A basic Python Dictionary in JupyterLab that can take multiple words and output the definitions of those words at once.

The dictionary was created using python pandas, the BeautifulSoup library, and the website: www.dictionary.com. The code webscrapes a modified URL using the word parameter of the function ggl_search(). From html of the URL, the function extracts the first definition typed in the website and outputs it as a string.

Below is a code snippet of the function.

# A function that searches the word the user wants a definition for.

def ggl_search(word):

    #   Using the urlib module, a url is created based on the word 
		#   parameter inputted into the function.
    search_url = "https://www.dictionary.com/browse/{}".format(urllib.parse.quote_plus(str(word), safe='/'))

    #   Reads the site from the url.
    google_request = requests.get(search_url)

    #   Parses the html.
    soup = BeautifulSoup(google_request.text, "html.parser")    

    #   Extracts the definition of the word.
    results = soup.find('div', attrs = {'value':'1'}).text
    return results

In order to get the definition of a word, the function will be typed in the following way: ggl_search("word")

The dictionary can also output multiple single definitions of several words at once using lists. The words can come from a csv file (like in this repository) or they can come from a simple python list.

In both cases, the dictionary outputs the word and definition in a DataFrame. Below is the DataFrame output if the code in the repository is run.