Differential Diagnosis

Medical / Technology / Education feed

About this blog

Own your own content! Perhaps don't talk about tech unless you've played with it.

You don't have to have been on the Internet for long to realise that terms and conditions, available features, and pricing on corporate sites - like LinkedIn, Blogger, Facebook, Medium, or Twitter - can and do change.

some random coding activity on a screen

I want the links I find to be my starting point for my learning - stuff that I read that I can find again, share, and reflect. The software has a mechanism for fetching the title, short piece of text if available, and a picture from the URL of interest. I can then post on the social media platforms a link to my blog entry instead of the original link - no more broken links for people clicking on my posts.

Technologies used:

  • URL fetching with PHP - I wrote a PHP class to follow redirects, expanding any URL shortening, and managing cookies until a final URL was reached from a link. It uses Open Graph or Twitter card information (or looks as best it can at the HTML) to find the Title, Desription, and Image for a web page. It works on most sites except those that are very nervous about DDoS attacks and use Cloudflare (e.g. all BMJ journals). [I have a plan to get around this that uses serverless code from Netlify.]
  • tf-idf information retrieval - to identify similar blog entries and keywords I have written information retrieval code in python to analyse the term frequency - inverse document frequency of all the blog entries. This shouldn't work well on the blogs as they are all quite short documents but it does a reasonable job. I did this at OnExamination in the 2000s (pre BMJ days) for finding similar MCQs - it was very good at identifying accidental duplicates of questions. The python code then was quite slow so I did it in C++ but the current python libraries are amazing at doing this sort of work. Don't do it in PHP or Javacsript just because you can - use the proper tools for the job.
  • PHP MySQL Apache stack - a traditional webserver stack for storing and displaying the blog. I'm old school. I'll go serverless with perhaps Netlify and FaunaDB at some point but not quite yet.