R

Naive classification beats deep-learning

Overview Mitani and co-authors’ present a deep-learning algorithm trained with retinal images and participants’ clinical data from the UK Biobank to estimate blood-haemoglobin levels and predict the presence or absence of anaemia (Mitani et al. 2020). A major limitation of the study is the inadequate evaluation of the algorithm. I will show how a naïve classification (i.e. classify everybody as healthy) performs much better than their deep-learning approach, despite their model having AUC of around 80%.

Approximating Binomial with Poisson

It is usually taught in statistics classes that Binomial probabilities can be approximated by Poisson probabilities, which are generally easier to calculate. This approximation is valid “when (n) is large and (np) is small,” and rules of thumb are sometimes given. In this post I’ll walk through a simple proof showing that the Poisson distribution is really just the Binomial with (n) (the number of trials) approaching infinity and (p) (the probability of success in each trail) approaching zero.

Importing Flat Files Into R

There are many tutorials for importing data into R focusing on a specific function/package. This one focuses on 3 different packages. You will learn how to import all common formats of flat file data with base R functions and the dedicated readr and data.table packages. I first present these three packages and finish with a comparison table between them. Task Import a flat file into R: create an R object that contains the data from a flat file.