Posted 2025-03-21Projects8 minutes to read (About 1145 words)0 visits

MensaarLecker Development Log (2) -- Web Developing and GitHub Workflow

This blog post is trying to tell you:

My personal experience when developing a web crawler using Selenium

Explained with examples from my Repository: MensaarLecker

Fetching Data from Web Development

Previous post: MensaarLecker Development Log (1) – Web Crawling

Continuing from last post, we have already implemented a script that collect the Mensa menu and stored it on Google Sheets. It is time to build our web interface to connect the database.

Fetch Data from Google Sheets using Publish

First, we need to publish our spreadsheet so that it is public to fetch the data.

In the Spreadsheet, click Share → Change access to Anyone with the link.
Click File → Share → Publish to the web.
Select Entire Document → Comma-separated values (.csv) and click Publish.
Copy the public CSV link.

menu.py

SCRIPT_URL = {PUBLISH_LINK}

# Fetch JSON data
def fetch_menu():
    try:
        response = requests.get(SCRIPT_URL)
        response.raise_for_status()  # Raise error if bad response
        return response.json()
    except requests.exceptions.RequestException as e:
        print(f"❌ Error fetching menu: {e}")
        return []

However, the script return no data, why?

caret.ebnf

1
2
3

Access to fetch at 'https://docs.google.com/spreadsheets/...' from origin 'null' has been blocked 
by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. 
If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.

CORS Policy and XSS

Cross-origin resource sharing (CORS) is an extension of the same-origin policy. You need it for authorized resource sharing with external third parties. – Amazon Web Services

This is a cyber security scheme to avoid XSS (Cross-site scripting), in a nutshell, when we run the script, the code cannot proceed because it doesn’t login to any Google account! You can imagine the request is block by a imaginary login page and our program doesn’t know how to react.

Second Attempt – Google Apps Script

Google doesn’t allow users to fetch their data casually, except this is executed under Google’s server. This means we need to run our fetching function using Google’s service. Apps Script provide a JavaScript editor to save your code.

menu.py

function doGet() {
  var sheet = SpreadsheetApp.openById("PUT_YOUR_SHEET_ID_HERE").getActiveSheet();
  var data = sheet.getDataRange().getValues();

  var headers = data[0];
  var jsonData = [];

  for (var i = 1; i < data.length; i++) {
    var row = {};
    for (var j = 0; j < headers.length; j++) {
      row[headers[j]] = data[i][j];
    }
    jsonData.push(row);
  }

  var output = ContentService.createTextOutput(JSON.stringify(jsonData));
  output.setMimeType(ContentService.MimeType.JSON);
  return output;
}

To get the sheet ID, we can simply open the sheet and it is part of the URL:

Deploying the function and fetch the data

Afterwards, we can deploy this function and it will generate a unique URL for the function output and we can fetch the data (here we export the data in json format) and use it in our code.

You can also double check the URL and make sure it does return the value correctly → link

menu.py

SCRIPT_URL = "URL_DEPLOYED_FROM_APP_SCRIPT";

def fetch_menu():
    try:
        response = requests.get(SCRIPT_URL)
        response.raise_for_status()  # Raise error if bad response
        return response.json()
    except requests.exceptions.RequestException as e:
        print(f"❌ Error fetching menu: {e}")
        return []

Webpage Implementation

To simplify our automation process on GitHub, we will continue implement our HTML code using Python. Our website should have two pages: index.html contains today’s menu, and menu.html contains the history of collected menus.

Static data – Tripe Quotes

We can put all the static code in strings. In Python we can store multi-line strings using triple quotes.

index.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Mensaar Today</title>
    <style>
        body { font-family: Arial, sans-serif; padding: 20px; text-align: center; background-image: url('src/uds_spirit.jpg'); }
        h1 {
            background: rgba(255, 255, 255, 0.8); 
            color: #003C71;
            padding: 10px 20px;
            display: inline-block; 
            border-radius: 10px; 
            box-shadow: 2px 2px 10px rgba(0, 0, 0, 0.2); 
        }
        .container { width: 80%; margin: auto; }
        .closed-message { font-size: 26px; color: red; font-weight: bold; padding: 20px; background: #fff3f3; border-radius: 10px; }
        .menu-card { background: white; padding: 15px; margin: 10px 0; border-radius: 10px; box-shadow: 2px 2px 10px rgba(0,0,0,0.1); text-align: left; }
        .meal-title { font-size: 20px; font-weight: bold; }
        .meal-components { font-size: 16px; color: #666; }
        .meal-frequency { font-size: 14px; color: #888; font-style: italic; }
        .button { padding: 12px 20px; background: #007bff; color: white; border-radius: 5px; text-decoration: none; }
    </style>
</head>
<body>
    <h1>Mensaar Menu for {today}</h1></br>
    <a href="menu.html" class="button">📜 View Full Menu</a>
    <div class="container">
</body>

Enhance the design using DataTable

Since we will collect the menu everyday, the table in menu.html will become too long for loading and hard to check. We can use DataTable that provide basic table layout like filtering, searching, and sorting. Also, it is very easy to implement, simply include the JavaScript and CSS link in the HTML code and you can get the basic, but decent design.

Automation with GitHub Workflow

Finally, after we deployed the code to GitHub, remember our original goal:

Scrape the Mensa menu every weekday and store it to Google Sheets
Fetch the Data Collection from Google Sheets and update the website

In fact, we can run the python script periodically using Github workflow, here are the steps:

The workflow has to be implemented in .yml format, and stored in .github/workflows/{workflow_name}.yml
Before running the python script, make sure python is set with all the dependencies installed:

.github/workflows/menu.yml

- name: 🛠 Set up Python
    uses: actions/setup-python@v4
    with:
        python-version: "3.x"

- name: 📦 Install dependencies
    run: |
        pip install requests selenium webdriver-manager gspread oauth2client

- name: 🚀 Run Mensaar Scraper (update Google Sheets)
    run: |
        echo "🧪 Starting Mensaar_scraper..."
        python Mensaar_scraper.py
        echo "✅ Scraper completed."

- name: 🖼️ Run HTML Generator
    run: |
        echo "🧪 Generating index.html & menu.html"
        python generate_menu.py

For the full workflow, you can find the skeleton code template on GitHub or you can check here.

A random summary

With all these works, we managed to build our web crawler and a static website using only python. And Github workflow can help us with daily updates.

Continue Reading: MensaarLecker Development Log (3) – Telegram Bot Deployment and Integration

MensaarLecker Development Log (2) -- Web Developing and GitHub Workflow

https://greenmeeple.github.io/projects/mensaar-log2/

Author

Alex Li

Posted on

2025-03-21

Updated on

2025-05-14

Licensed under

MensaarLecker Development Log (2) -- Web Developing and GitHub Workflow

Fetching Data from Web Development

Fetch Data from Google Sheets using Publish

CORS Policy and XSS

Second Attempt – Google Apps Script

Deploying the function and fetch the data

Webpage Implementation

Static data – Tripe Quotes

Enhance the design using DataTable

Automation with GitHub Workflow

A random summary

Author

Posted on

Updated on

Licensed under

If you hate me, try to keep me alive for longer torment.

Comments

Catalogue

Recents

Categories

Archives

Tags

Advertisement

Your browser is out-of-date!