by Nicolae Caralicea

In this post I would like to talk about MLE method providing a simple use case, and some basic code that might help you in understanding the MLE topic.

MLE is a very important topic in Statistics and is also heavily used in many Machine Learning and Data Mining concepts. Sometimes, it seems like it is overlooked, but nonetheless it is worth our attention.

### Maximum Likelihood (MLE) Definition

First, let’s see what Wikipedia says:

In statistics,

maximum likelihoodestimation (MLE) is a method of estimating the parameters of a statistical model given observations, by finding the parameter values that maximize the likelihood of making the observations given the parameters.

### Settings

A given sample of n observations. We also know that this sample is derived form a Bernoulli distribution. Derived means that the observed values of the sample were observed or as you will see later even randomly generated somehow from a certain distribution (Bernoulli in our case).

Here are the only things we have:

- a sample of n observations:
**x1, x2,…., xn** - the above sample is derived from a
**Bernoulli distribution**

Wikipedia’s definition of a Bernoulli distribution is the following:

Bernoulli distributionis the probability distribution of a random variable which takes the value 1 with probability and the value 0 with probability — i.e., the probability distribution of any single experiment that asks a yes–no question; the question results in a boolean-valued outcome, a single bit of information whose value is success/yes/true/one with probabilitypand failure/no/false/zero with probabilityq. It can be used to represent a coin toss where 1 and 0 would represent “head” and “tail” (or vice versa), respectively. In particular, unfair coins would have .

### Goal

Our goal is to estimate the probability p of the Bernoulli distribution that is the most likely to generate our sample.

We need to have in mind that the only things that we have at our disposal are our sample and the information that our sample is derived from a Bernoulli distribution.

### Solution & Intuition

Using the MLE method for estimating the value of p in the case of a Bernoulli distribution would give us **the estimated p** equal to the **mean** of the provided sample.

So, **p hat** is the estimated value that maximizes the likelihood of our sample.

As you can see further in the provided code, that estimated **p hat** is really close to the actual value p of the probability of the random variable our sample was derived from.

To understand how this was possible please look at the following link: Maximum Likelihood (first section for a Bernoulli distribution)

The above link should give you all the gory details.

Hopefully, the following code will give you some intuition on the practical aspects of the MLE with regard to a Bernoulli distribution.

import numpy as np

import matplotlib.pyplot as plt

from scipy.stats import bernoulli

`# Setting`

p = 0.7

sample_size = 10000

`# Generate a random sample from a Bernoulli distribution`

sample = bernoulli.rvs(p, size=sample_size)

`# Goal: Calculate the estimated probability by using the result derived from MLE regarding the estimation of p_estimated`

p_estimated = np.mean(sample)

`# Results`

print('Probability of the random variable: ', p)

print('Estimated probability: ', p_estimated)

print('The two values should be close enough {0} ~ {1}'.format(p_estimated, p))

Here is the output:

**Probability of the random variable: 0.7**

** Estimated probability: 0.697**

** The two values should be close enough 0.697 ~ 0.7**

I hope this post helped a little in understanding the role of MLE.

Remember that for any sample derived from a known distribution we can think of to a similar approach to estimate MLE parameters.

If you want to experiment with this you can find the python notebook on Github at this location: MLE-Bernoulli-intuition.ipynb