Thinking Allowed

medical / technology / education / art / flub

About this blog

You don't have to have been on the Internet for long to realise that terms and conditions, available features, and pricing on corporate sites - like LinkedIn, Blogger, Facebook, Medium, or Twitter - can and do change.

some random coding activity on a screen

I wrote this software so that the journals, news items, and articles I find can be my starting point for my learning - stuff that I read that I can find again, share, and reflect. The software has a mechanism for fetching the title, short piece of text if available, and a picture from the URL of interest. I can then post on the social media platforms a link to my blog entry instead of the original link - no more broken links for people clicking on my posts.

Technologies used:

  • URL fetching with PHP - I wrote a PHP class to follow redirects, expanding any URL shortening, and managing cookies until a final URL was reached from a link. It uses Open Graph or Twitter card information (or looks as best it can at the HTML) to find the Title, Desription, and Image representing the web page.
  • tf-idf information retrieval - to identify similar blog entries and keywords I have written information retrieval code in python to analyse the term frequency - inverse document frequency of all the blog entries. This shouldn't work well on the blogs as they are all quite short documents but it does a reasonable job. I did this at OnExamination in the 2000s (pre BMJ days) for finding similar MCQs - it was very good at identifying accidental duplicates of questions. The python code then was quite slow so I did it in C++ but the current python libraries are amazing at doing this sort of work. Don't do it in PHP or Javacsript just because you can - use the proper tools for the job.
  • PHP MySQL Apache stack - a traditional webserver stack for storing and displaying the blog.