2021.09.07

When the web page that you want to get information from is not simply accessible through the URL, you can either:

  1. Find out if it’s a form submission and use curl to simulate the POST request and then use beautifulsoup to parse the html.
  2. Use selenium to simulate an actual click in a browser.

Regardless of what rendered it, a web page displayed on a web browser will always be in an html format (or maybe I’m wrong?). Similar like beautifulsoup we can trace through the html elements, and get which button we should click.

To do this, we need selenium and a driver that can access to your installed Chrome browser headlessly. Apparently python 3 requires beautifulsoup4 instead of beautifulsoup.

pip install selenium beautifulsoup4 lxml chromedriver-binary

The basic code is:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import chromedriver_binary

options = Options()
options.add_argument('--headless')
driver = webdriver.Chrome(options=options)

url = 'https://someurl.com'
wait = WebDriverWait(driver, 10)
driver.get(url)

# Do clicking stuff here

driver.quit()

We can wait for the page load by waiting for certain id or class name:

element = wait.until(EC.element_to_be_clickable((By.ID, "name_of_element_id")))
element = wait.until(EC.element_to_be_clickable((By.CLASS_NAME, "name_of_class")))

Then if we want to click it:

element.click()

After clicking, it will load the next page, so after another wait we can do another clicking, ie.:

element = wait.until(EC.element_to_be_clickable((By.ID, "name_of_element_id1")))
element.click()
element = wait.until(EC.element_to_be_clickable((By.CLASS_NAME, "name_of_class")))
element.click()

We can also check the status of a checkbox:

element = driver.find_element_by_id("checkbox_id")
element.get_attribute('checked')

Or get the page source to parse it using beautifulsoup:

html = driver.page_source
soup = BeautifulSoup(html, 'lxml')
boxes = soup.find_all('div',{"class":'box'})

It’s much more straightforward than I thought but tedious to setup if you really want to scrap a lot of data.

2021.09.03

When I heard of CGI-Bin I think of early 2000s. I never actually know how it worked. But I just found out today that it’s quite simple and even works with python. Basically it will show on the web app, whatever is printed out from the script. That’s quite neat!

Save this file in cgi-bin/test.py

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

a = "CGI-Bin"

print("Content-type: text/html\n") # This line is required to show the html content
print("<html>")
print("<h1>{}</h1>".format(a))
print("</html>")

Make the script executable and launch the server using

`chmod +x cgi-bin/test.py
python -m http.server --cgi 8000

Open the web browser and access it from http://localhost:8000/cgi-bin/test.py.

2021.08.31

Add this line into /boot/config.txt to disable Raspberry Pi’s default Wireless LAN interface (eg. To always use an external dongle instead):

dtoverlay=disable-wifi
2021.08.19

Introduction

  • Goal: Learn about Laravel database (Migration, Deletion)
  • System: Ubuntu 16.04 (Yes it’s old)

Installation

sudo apt-get install php5.6 # Yes it's old... Might need to install something else
sudo apt install mysql-server
curl -sS https://getcomposer.org/installer | sudo php -- --install-dir=/usr/local/bin --filename=composer
  • I forgot what’s the exact installation and setup for mysql
  • Composer is used to create the laravel project

Start a basic Laravel

  • Install, download necessary stuff and create a new folder with the project content
composer create-project laravel/laravel php-laravel-example
cd php-laravel-example
php artisan serve # Yay!
  • artisan is a tool supplied with the laravel, it helps to build up the application
  • Note that the one below is for use with Docker, so I’m not using it
curl -s "https://laravel.build/php-laravel-example" | bash

Connecting Laravel to MySQL

  • You can create a custom user to be used with the database
CREATE USER 'sammy'@'localhost' IDENTIFIED BY 'password';
GRANT ALL PRIVILEGES ON *.* TO 'sammy'@'localhost' WITH GRANT OPTION;
  • Edit the project folder’s .env file with the mysql credentials
  • Run the mysql server
sudo /etc/init.d/mysql start
  • And then login using this, password will be prompted
mysql -u sammy -p

Creating a Model

  • Laravel includes Eloquent (ORM) and we use Model to interact with the database
  • This will create app/Flight.php
php artisan make:model Flight
  • If we want the instance to have soft delete (ie. when deleted it doesn’t actually get deleted from the table), we can import use Illuminate\Database\Eloquent\SoftDeletes; and put use SoftDeletes; inside the model file

Create a new Table

php artisan make:migration create_flights_table
  • The above will create a file inside database/migrations
  • We can edit the files to add in the table columns
      Schema::create('flights', function (Blueprint $table) {
          $table->increments('id');
          $table->string('name');
          $table->string('airline');
          $table->softDeletes(); // This is to enable usage of soft delete by creating the necessary column (deleted_at)
          $table->timestamps();
      });
    
  • After creating the migration file, we can check the migration status by php artisan migrate:status then run it with php artisan migrate
  • In case something is not right, we can undo the migration by php artisan migrate:rollback (Given that the migration file has enough code to roll back itself)

Add new data

  • We can use the supplied php artisan tinker to “tinker” with the application
  • However, we cannot add a new data like this by default since the Model by default is protected
    $flightData = array('name' => 'JAL123', 'airline' => 'Japan Airlines');
    Flight::create($flightData);
    
  • By putting protected $guarded = []; inside the Model file, we can then successfully add a new row (Adding this line requires tinker to be restarted)
    >>> Flight::count()
    => 1
    

Deleting data

  • If we delete the data Flight::find(1)->delete(), Flight::count() becomes 0
  • The row still exists, but now the deleted_at column is filled
  • We can restore is by Flight::withTrashed()->find(1)->restore()

Maintenance View

php artisan down
php artisan up

Neat!

2021.08.13

I was googling for hours and I found no answer. I was also not really sure where the error happens. I found the mistake after I wrote about the problem.

The problem

I’m trying to call ledger-cli using subprocess, preferably without shell=True because it’s not a good practice. However I stumble on encoding error. The command may have a Japanese character, but I think the problem happens regardless of this.

  1. Works
     res = subprocess.run(" ".join(command), capture_output=True, encoding="utf8", universal_newlines=True, shell=True)
    
  2. Works, need to decode.("utf8") the stdout
     res = subprocess.run(" ".join(command), capture_output=True)
    
  3. Error at ledger-cli when trying to parse the argument, the character was passed to ledger-cli as bytes if it contains Japanese character, or "While parsing value expression: ((account =~ /expr/) | (account =~ /comment=~/a//)) Error: Invalid token '<ident 'a'>' (wanted ')')" when it’s just alphanumeric
     res = subprocess.run(command, capture_output=True)
    
  4. Same as (3) if the input is alphanumeric only, but when it has Japanese characters, it has these error
     res = subprocess.run(command, capture_output=True, encoding="utf8", universal_newlines=True)
    
     Traceback (most recent call last):
       File "ledger.py", line 32, in <module>
         res = subprocess.run(command, capture_output=True, encoding="utf8") #UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe3 in position 164: invalid continuation byte
       File "/Users/fransiska/.pyenv/versions/3.7.3/lib/python3.7/subprocess.py", line 474, in run
         stdout, stderr = process.communicate(input, timeout=timeout)
       File "/Users/fransiska/.pyenv/versions/3.7.3/lib/python3.7/subprocess.py", line 939, in communicate
         stdout, stderr = self._communicate(input, endtime, timeout)
       File "/Users/fransiska/.pyenv/versions/3.7.3/lib/python3.7/subprocess.py", line 1725, in _communicate
         self.stderr.errors)
       File "/Users/fransiska/.pyenv/versions/3.7.3/lib/python3.7/subprocess.py", line 816, in _translate_newlines
         data = data.decode(encoding, errors)
     UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe3 in position 158: invalid continuation byte	
    

The Mac OS’s encoding should be utf8 since (1) works well, and the output in (2) can be decoded using utf8. But for some reason, even if I pass encoding="utf8" in (4) it doesn’t pass the argument correctly, and also doesn’t decode the error message correctly. Something in the shell=True fix this encoding, or maybe it’s the argument passing.

The mistake

No need quotes for multiple words in a single list item

I was trying to call ledger reg "expr" "comment =~ /keyword/". I was suspicious with the double slash in the error message when the keyword is alphanumeric, suggesting that the encoding doesn’t work even in this case. Maybe some of the symbols is causing it to get encoded weirdly.

In the end, it was the quotes in the "expr" that was causing the problem. Not sure why, but I think subprocess automatically handle the quotes when the item inside the command has space. Like this. I should’ve known how to use subprocess better.

I guess I had too many problems with encoding that that’s the one that I suspect first. :sweat: