Posts

Naive classification beats deep-learning

Overview Mitani and co-authors’ present a deep-learning algorithm trained with retinal images and participants’ clinical data from the UK Biobank to estimate blood-haemoglobin levels and predict the presence or absence of anaemia (Mitani et al. 2020). A major limitation of the study is the inadequate evaluation of the algorithm. I will show how a naïve classification (i.e. classify everybody as healthy) performs much better than their deep-learning approach, despite their model having AUC of around 80%.

Approximating Binomial with Poisson

It is usually taught in statistics classes that Binomial probabilities can be approximated by Poisson probabilities, which are generally easier to calculate. This approximation is valid “when $n$ is large and $np$ is small,” and rules of thumb are sometimes given. In this post I’ll walk through a simple proof showing that the Poisson distribution is really just the Binomial with $n$ (the number of trials) approaching infinity and $p$ (the probability of success in each trail) approaching zero.

Another solution to the 'The Hardest Logic Puzzle Ever' using probability

I present a solution to a modification of the “hardest logic puzzle ever” using probability theory. Background “The hardest logic puzzle” was originally presented by Boolos (1996) and since then it has been amended several times in order to make it harder (see B. Rabern and Rabern 2008, Novozhilov (2012)). The puzzle: Three gods A, B, and C are called, in some order, True, False, and Random. True always speaks truly, False always speaks falsely, but whether Random speaks truly or falsely is a completely random matter.

Plastic waste and disease on coral reefs - Another misinterpretation of a statistical model

Recently, I came across this very interesting article published in Science about how plastic waste is associated with disease on coral reefs (J. B. Lamb et al. 2018). The main conclusions are contact with plastic increases the probability of disease, the morphological structure of the reefs is associated with the probability of being in contact with plastic with more complex ones being more likely to be affected by plastic,

On statistical reporting in biomedical journals

Poor quality statistical reporting in the biomedical literature is not uncommon. Here is another example by Cirio et al. (2016). The study itself is well planed, executed and reported. The aim was to assess whether heated and humidified high flow gases delivered through nasal cannula (HFNC) improve exercise performance in severe chronic obstructive pulmonary disease (COPD) patients. It all started when I saw their Fig.1. Here is my attempt to reproduce it

Importing Flat Files Into R

There are many tutorials for importing data into R focusing on a specific function/package. This one focuses on 3 different packages. You will learn how to import all common formats of flat file data with base R functions and the dedicated readr and data.table packages. I first present these three packages and finish with a comparison table between them. Task Import a flat file into R: create an R object that contains the data from a flat file.