##### 当前位置 |首页 > 作业代写 > 考试助攻 >

Hoeffding’s Inequality

April 15, 2018

# 1 Oralexam

## 1.1 Assignment

For the oral exam, students are asked to present 3 different Machine Learning Techniques. They can be chosen from the problems given in Yaser S. Abu-Mostafa, Malik Magdon-Ismael and Hsuan- Tien Lin "Learning from Data".

Example problems are:

• Problems 1.4 and 1.5 (pp. 34-35). These problems study the Perceptron Learning Algorithm (PLA) on artifical data sets. The first one asks to investigate the PLA on data sets of different size and dimension. And the second one asks to study a variant that includes an extra parameter, the learning rate η.

• Problem 3.3 (p. 109) studies the PLA and its pocket variant on a orginal non-linear separable dataset and a third order polynomial feature transformation of that set.

• Problem 4.4 (p. 155) sets up an experimental framework to study overfitting.

• Problem 4.24 (p. 163) investigates linear regression with weight decay regularization.

• Also in the Problems Section of the e-Chapters on Neural Networks, Support Vector Ma- chines, and Similarity-Based Methods you will find assignments that you can do.

Alternatively, you can also present a project of your own choice but you should discuss this with me beforehand. Also, that project should involve the same amount of work needed to do the assginment described above.

## 1.2 Requirements

1. Jupyter notebook using the Python ecosystem, i.e. the libraries numpy and scipy to imple- ment the algorithms and matplotlib (or seaborn) to display the results of the experiments.

2. The notebook should be self-contained, it contains text explaining the machine learning algo- rithm and the hypotheses investigated in the assignment. Relevant theory is also mentioned.

3. Results are visualized.

Cf. below

# 2 Introduction

In this notebook we study the Hoeffding inequality.

• We do the coin experiment described in Exercise 1.10 of "Learning from Data" (p. 23). Its purpose is to illustrate the difference between verifying a hypothesis and learning the best hypothesis.

• The Hoeffding inequality only holds when some conditions are met. We check the inequaltiy on a number of distributions. For some of them these conditions hold, for others not.

• We compare the Hoeffding and Chebyshev inequalities. The second one is valid for every distribution with finite variance but gives a looser bound than Hoeffding when applicable.

But first of all we implement some functions to classify and visualize the probability distribu- tions of the module scipy.stats.

# 3 Imports

Import the module numpy, from scipy the stats module and from math the function ceil since it returns an integer as opposed to np.ceil that returns a floating point number.

In : import numpy as np

### import scipy.stats as ss from math import ceil precision 10 Out: ' .10f'

Import the module numba. It provides an easy way to vectorize Python functions. The result-  ing vectorized function can then be applied to a whole range of values at once, cf. the cells below with the @vectorize decorator.

In : from numba import vectorize, boolean, float64, int64

Import whatever is needed to generate plots. In : from matplotlib import pyplot as plt plt.style.use('seaborn-whitegrid') matplotlib inline

# 4 Probabilitydistributions

We use some probability distributions from the module scipy.stats. These can be classified as

• continuous or discrete, e.g. Gaussian and Beta distributions are continuous while the binomial and Poisson distributions are discrete

• whose values are bounded or not, e.g. binomial and Beta distributions are bouned while and the Gaussian and Poisson distributions are unbouded.

The function plot_rv visualizes these distributions.

4.1 Some distributions with bounded range

In  :  B_30  =  ss.bernoulli(p=0.3) B_50  =  ss.bernoulli(p=0.5) U = ss.uniform()

Beta_5_5  =  ss.beta(5,  5)

4.2 Some distributions with unbounded range

In  :  G  =  ss.norm(loc=0,  scale=1) P_1  =  ss.poisson(mu=1)

P_5  =  ss.poisson(mu=5)

Chi2_5  =  ss.chi2(df=5,loc=0,  scale=1)

In : RVs = [B_30, B_50, U, Beta_5_5, G, P_1, P_5, Chi2_5]

## 4.3 Classifydistributions

In : def continuous_p(rv):

rv_name  =  rv.dist.name

return isinstance(getattr(ss, rv_name), ss.rv_continuous)

def discrete_p(rv): rv_name  =  rv.dist.name

return isinstance(getattr(ss, rv_name), ss.rv_discrete)

In : def bounded_by_unit_interval_p(rv):

return  rv.a  ==  0.0  and  rv.b  ==  1.0

def bounded_p(rv):

return  np.NINF  <  rv.a  and  rv.b  <  np.PINF

## 4.4 Visualizedistributions

In  :  PPF_BOUNDS  =  {(False,  False):{'ppf_min':0.0,  'ppf_max':1.0},

(False,  True):{'ppf_min':0.0,  'ppf_max':0.99},

(True,  False):{'ppf_min':0.01,  'ppf_max':1.0},

(True,  True):{'ppf_min':0.01,  'ppf_max':0.99}}

def get_ppf_bounds(rv):

global PPF_BOUNDS

bounds  =  PPF_BOUNDS[(rv.a==np.NINF,  rv.b==np.PINF)]

return  bounds['ppf_min'],  bounds['ppf_max']

def get_domain_rv(rv):

'''Returns the part of the domain that will be visualized.'''

ppf_min, ppf_max = get_ppf_bounds(rv)

start,  end  =  rv.ppf([ppf_min,  ppf_max]).astype(np.int64)

kwargs  =  dict(num=end-start+1,  dtype=int)  if  discrete_p(rv)  else  dict(endpoint=Tru

return  np.linspace(start,  end,  **kwargs)

def get_ticks(rv, max_ticks=10):

'''Returns the ticks on the  x-axis. The  maximum  number  is given  by  max_ticks  and the ticks correspond with integer values equally spaced between 0 and max_value.''  domain = get_domain_rv(rv)

max_value = domain[-1] offset = 0 if max_value max_ticks==0 else 1 step = max_value//max_ticks + offset nb_ticks = max_value//step + 1 + offset ticks  =  np.arange(step*nb_ticks,  step=step) return ticks

In : def plot_rv(rv):

'''Plot a distribution from scipy.stats'''

fig, ax =  plt.figure(), plt.axes() domain = get_domain_rv(rv)

if discrete_p(rv):

f,  label,  style  =  rv.pmf,  'pmf','ro'

ax.vlines(domain,  0,  f(domain),  color='b',  alpha=0.5,  lw=10) ax.set(xticks=get_ticks(rv))

### else:

f,  label,  style  =  rv.pdf,  'pdf',  'r-'

ax.plot(domain,  f(domain),  style,  label=label)

rv_name  =  rv.dist.name

title  =  '{}({})'.format(rv_name,  ',  '.join(['{}={:0.2f}'.format(k,v)

for  k,v  in  rv.kwds.items()]))

ax.set(xlabel=r'\$x\$',

ylabel=r'\$p(x)\$', title=title)

In : plot_rv(P_5) # 5 CoinExperiment

The purpose of the coin experiment is to illustrate the difference between verifying a hypothesis

hm H = {h1, h2, · · · , hM} in a finite hypothesis set H, i.e.

2

P [| Ein(hm) Eout(hm) |> ϵ] 2e−2ϵ N and  learning the  best hypothesis g H. In the later case, g depends on the dataset D using during learning and the bound is looser, i.e. it now also contains M

2

P [| Ein(g) Eout(g) |> ϵ] 2Me−2ϵ N

In the experiment we run a computer simulation where we flip 1,000 fair coins. Each coin is flipped independently 10 times. The experiment is repeated 1,000,000 times. Let’s focus on 3 coins as follows:

1. c1 is the first coin flipped,

2. crand is a coin you choose at random

3. cmin is the coin that had the minimum frequency of heads (pick the earlier one in case of a   tie).

Let ν1, νrand and νmin be the fraction of heads obtained for the above three coins. The first two coins play the role of fixed hypotheses,  i.e.  the first and a random one,  while the last coin plays   the role of the final hypothesis g.

In : def get_coin(heads, idx_coin=0):

'''Input: the number of heads for each coin in each experiment. Output: the number of heads of the coin with given index.'''

'''Input: the number of heads for each coin per experiment.

Output: the maximum number of heads per experiment.'''

'''Input: the number of heads for each coin per experiment.

Output: minimum number of heads per experiment.'''

'''Input: the number of heads for each coin per experiment.

Output: the number of heads of a random coin per experiment.'''

nb_coins,  nb_exps  =  heads.shape

rnd_indices  =  np.random.randint(low=0,  high=nb_coins,  size=nb_exps)

In : def run_experiments(success_rate, nb_trials, nb_coins, nb_exps):

# Generate the trials for each coin and repeat this for each experiment

flips  =  np.random.binomial(n=1,  p=success_rate,  size=(nb_trials,  nb_coins,  nb_exp # Count the number of heads for each coin and repeat this for each experiment heads  =  np.sum(flips,  axis=0)

coin_1, coin_rnd, coin_min = get_coin(heads, idx_coin=0), get_random(heads), get_

return success_rate, flips, heads, coin_1, coin_rnd, coin_min

# 6 Run and visualize anexperiment

## 6.1 Inspect

In : def get_experiment_flips(flips, idx_exp):

'''Input: the flips of each coin in each experiment.

Output: the flips of each coin in the experiment with given index.'''

return flips[:, :, idx_exp]

def get_coin_flips(flips, idx_coin, idx_exp=0):

'''Input: the flips of each coin in each experiment.

Output: the flips of the coin in the experiment with given indices.'''

nb_trials,  nb_coins,  nb_exps  =  flips.shape assert idx_coin < nb_coins and idx_exp < nb_exps exp_flips = get_experiment_flips(flips, idx_exp) coin_flips = exp_flips[:, idx_coin]

return coin_flips

## 6.2 Visualize

# Get the data

nb_trials,  nb_coins,  nb_exps  =  flips.shape max_value = nb_trials+1

# Plot the 3 histograms

fig, ax = plt.figure(), plt.axes()

colors,  labels =  ['yellow',  'red',  'blue'],  [r'\$H_1\$',  r'\$H_{rnd}\$',  r'\$H_{min}\$ edges  =  np.linspace(start=-0.5,  stop=max_value+0.5,  endpoint=True,  num=max_value+ kwargs  =  dict(bins=edges,  normed=1,  histtype='stepfilled',  alpha=0.5) ax.hist(data,  label=labels,  color=colors,  **kwargs)

# Plot the binomial

x  =  np.arange(max_value)

binom  =  ss.binom(n=nb_trials,  p=mu)

ax.plot(x,  binom.pmf(x),  'ko--',  label='Binom')

# Give the information of the plot.

title  =  'Estimate  of  number  of  heads\n'  +  \

r'for  \$N\$  =  {0:d}  trials  when  the  success  rate  \$\mu\$  =  {1:1.1f}'.format( ax.set(xlabel=r'Number  of  heads  \$n\$',  xticks=get_ticks(rv=binom),

ylabel=r'Relative  Frequency',  title=title) ax.legend(loc='best')

## 6.3 Takenotice

It takes some time to run the experiment as described above. Therefor, we use 1,000 coins and repeat the experiment 10,000 times instead.

run_experiments(success_rate=0.5,  nb_trials=10,  nb_coins=100,  nb_exps=10000)

### 6.3.1 Inspect the outcome of the first Nexperiments

In : #FIRST_N = 5

In : #get_coin_flips(flips=FLIPS, idx_coin=0, idx_exp=0)

### 6.3.2 Visualize the outcome of allexperiments ## 6.4Conclusion

From the figure above where N = 10 and if we take for instance ϵ = 0.3 we can conclude that the Hoeffding bound holds for both the coins c1 and crnd but not for cmin.  For the specified values, the Hoeffding bound is 33% while the probability for the event P [( µ ν ) > ϵ] is approximated by area of the bars 0, 1, 9, and 10. For c1 and crnd these areas are clearly less than 1/3 of the total area while for cmin that area is more than 90% and exceeds the

Hoeffding bound.

More details are given in the sections below.

# 7 Bernoulli and binomialdistributions   Throwing up a coin with success rate µ is represented by a Bernoulli random variable  X Ber(µ),  n = 1, , N.  Its distribution is given by P(X = 1) = µ and P(X = 0) = 1 µ. We interpret the outcomes Head as 1 and Tail as 0. The mean E(X) and variance V(X) of this distri- bution are µ and µ(1 µ), respectively.

If we have N independent trials of this experiment then the sequence of N outcomes has a  binomial distribution B(N, µ) where the parameter N is the number of trials and µ the success rate of the coin. Its mean and variance are Nµ and Nµ(1 µ), respectively. The sample mean of a sequence of N trials will be represented by ν = #1 where #1 is the number of 1s in the sequence.

The probability that we find n successes in the N trials is given by P(X = n) = (N )µn(1  µ)Nn

where

(N )   N!

n (N  n)!n!

## 7.1 Plot the binomial distribution for N trials and success rateµ

We plot the probability mass function or pm f  and the cumulative distribution function or cdf  of  the binomial distribution for the success rates µ = 0.1, 0.2, 0.3, 0.4, 0.5 and the given number of trails N.

Plot either the probability mass function or the cumulative distribution function of the binomial distribution for given success rate µ when the number of trials N is given.

In  :  def  plot_binomials(nb_trials,  cdf=False,  success_rates=[0.1,  0.2,  0.3,  0.4,  0.5]): fig, ax = plt.figure(), plt.axes()

colors  =  ['r',  'g',  'b',  'y',  'k']

x  =  np.arange(nb_trials+1)

for success_rate, color in zip(success_rates, colors): binom  =  ss.binom(n=nb_trials,  p=success_rate)

f  =  binom.cdf  if  cdf  else  binom.pmf marker  =  '{0:s}o--'.format(color)

ax.plot(x,  f(x),  marker,  label=r'\$\mu\$  =  {0:0.1f}'.format(success_rate))

ax.set(xlabel=r'\$N  \nu\$',  xticks=get_ticks(rv=binom), ylabel=r'Probability  \$p(N  \nu|\mu,  N)\$',

title=r'Binomial  distribution  for  \$N\$  =  {0:d}  trials'.format(nb_trials)) ax.legend(loc='best')

In  :  plot_binomials(nb_trials=10,  cdf=True) 7.2 Evaluating P [(| µ  ν |) > ϵ] when ν has a binomial distribution

1. For a given µ and ϵ we evaluate µ ν > ϵ for all ν. Note that we compare  ν and  µ  multiplied by N instead and that n = Nν is the number of heads in the sample. This is always an integer between 0 and N.

2. Next we select the number of heads n for which    Nµ n > Nϵ is True

3. Finally we sum the probabilities of these n

In : def make_event_p(mu, eps, N):

@vectorize([boolean(int64)])

return  np.abs(N*mu  -  heads)  >  N*eps

return event_p

In : def prob(pmf, event):

return  np.sum(pmf(event))

### 7.2.1 Exampleevaluation

First,

1. Specify the success rate µ, tolerance ϵ and number of trials N

2. Generate the domain of the binomial  distribution:     0, 1, , N

3. Return the event with the specified parameters

In : mu, eps, N = 0.5, 0.1, 10

domain  =  np.arange(N+1)

event_p = make_event_p(mu, eps, N)

Next,

4. Determine for element in the sample space whether it belongs to the specified event 5.  Select  the elements belonging to the event 6. Sum their probabilities using the probability mass function of the binomial distribution 7. This gives the probability of that event

In : selected = event_p(domain) bad_event = domain[selected] pmf  =  ss.binom(n=N,  p=mu).pmf prob(pmf, bad_event)

Out: 0.3437500000

# 8 Inequalities

## 8.1 ChebyshevInequality

The Chebyshev inequality applies to any random variable with finite variance σ2 but it provides a much looser bound than Hoeffding when the latter one is applicable.

Let X1, X2, · · · , XN be independent and identically distributed random variables with finite

variance σ and let  X  = 1  N Xn be their sample mean. Then

N n=1

2 P [| X E(X) |> ϵ]  σ

Note that finite variance implies finite mean.

## 8.2 HoeffdingInequality

The Hoeffding inequality states that when these Xn are bounded by the interval [0, 1] then   P | X E(X) |> ϵ ≤ 2e−2ϵ2 N

When the Xn are bounded by the interval [a, b] then the inequality becomes 2ϵ2 N

P | X E(X) |> ϵ 2e (ba)2 For example, a = 0 and b = K for Binom(K, µ).

8.3 Relation between δ, ϵ and N If we set δ equal to the upper bound we get    P | X E(X) |> ϵ δ and P | X E(X) |≤ ϵ ≤ 1 − δ

for the bad and good event, respectively.

In case of Chebyshev σ2 δC = Nϵ2

For i.i.d. Bernoulli random variables with success rate µ this becomes δC  = µ(1 µ)

ϵ since their variance is µ(1 µ).

And, in case of the Hoeffding

δH  = 2e−2Nϵ2

From these bounds we can derive expressions for N as a function of δ and ϵ and ϵ as a function of δ and N. For instance, in case of Hoeffding we get

NH  =   1   ln 2

2ϵ2 δ

and ϵH =   1  ln 2

2N δ

In : def get_delta(eps, N):

"""Returns the probability delta as a  function  of  the  tolerance  epsilon  and the number trials N"""

return  2*np.exp(-2*N*eps**2)

def get_N(delta, eps):

"""Returns the number of trials N as a  function  of the  probability  delta  and the tolerance epsilon."""

N  =  np.log(2/delta)/(2*eps**2)

return ceil(N)

def get_eps(delta, N):

"""Returns the tolerance epsilon as a  function  of  the  probability  delta  and the number of trials N."""

eps  =  np.sqrt(np.log(2/delta)/(2*N))

return eps

## 8.4 Calculate the Chebyshevbound For a specified success rate µ, plot the Chebyshev bound δC(N, ϵ) = µ(1−µ) for several values of

the tolerance ϵ and the sample size N

• For given values of ϵ we return the Chebyshev bound as a function of N, and

• For given values of N we return the Chebyshev bound as a function of ϵ

In  : def  make_Chebyshev_of_N(mu, eps):

@vectorize([float64(float64)])

def Chebyshev(N):

c  =  mu*(1-mu)/eps**2

return  c/N

return Chebyshev

def make_Chebyshev_of_eps(mu, N): @vectorize([float64(float64)])    def Chebyshev(eps):

c  =  mu*(1-mu)/N

return c/eps**2

return Chebyshev

## 8.5 Calculate the Hoeffdingbound

Plot the Hoeffding bound δH (N, ϵ) = 2e−2ϵ2 N for several values of the tolerance ϵ and the sample size N.

• For given values of ϵ we return the Hoeffding bound as a function of N, and

• For given values of N we return the Hoeffding bound as a function of ϵ

In : def make_Hoeffding_of_N(eps):

@vectorize([float64(float64)])

def Hoeffding(N):

return  2*np.exp(-2*N*eps**2)

return Hoeffding

def make_Hoeffding_of_eps(N): @vectorize([float64(float64)]) def Hoeffding(eps):

return  2*np.exp(-2*N*eps**2)

return Hoeffding

## 8.6 Plot thebounds

Note that in the plots we use logarithmic scale for the x- and/or y-axis.

In : from matplotlib import colors as mcolors

COLORS  =  dict(mcolors.BASE_COLORS,  **mcolors.CSS4_COLORS)

In  :  def  plot_bounds(mu=0.5,

max_N=10**5,

eps_vals=[0.1,  0.05,  0.025,  0.01,  0.005,  0.0025,  0.001]):

global COLORS

fig,  (top,  bottom) =  plt.subplots(2)

# Plot the bound  as a function of N  for a range of values of epsilon      # using ogarithmic scale.

max_logN  =  ceil(np.log10(max_N)) N_range  =  np.logspace(0,  max_logN)

for eps, color in zip(eps_vals, COLORS): eps_pct = 100*eps

Hoeffding_N, Chebyshev_N = make_Hoeffding_of_N(eps), make_Chebyshev_of_N(mu, top.plot(N_range, Chebyshev_N(N_range),

color=color,  linestyle='--') top.plot(N_range,   Hoeffding_N(N_range), color=color,  label=r'\$\epsilon\$  =  {0:2.2f} '.format(eps_pct))

top.set(xscale='log',  xlabel=r'Sample  size  \$N\$',

yscale='log',  ylim=(0.1**5,  0.50),  ylabel=r'Bound  \$\delta\$', title=r'Hoeffding  and  Chebyshev  (dashed)  bound  as  a  function  of  \$N\$')

top.legend(loc='center  left',  bbox_to_anchor=(1,  0.5))

# Plot the bound as a function of epsilon for a range of values of N

eps_range  =  np.linspace(10**-5,  0.5,  100,  endpoint=True)

exps  =  np.arange(1,  max_logN+1)

for exp,  color  in  zip(exps,  COLORS): N = 10**exp

Hoeffding_eps = make_Hoeffding_of_eps(N=N) Chebyshev_eps = make_Chebyshev_of_eps(mu=mu, N=N) bottom.plot(eps_range,  Chebyshev_eps(eps_range),

color=color,  linestyle='--') bottom.plot(eps_range,  Hoeffding_eps(eps_range),

color=color,  label=r'\$N\$  =  {0:1.1e}'.format(N))

bottom.set(xlabel=r'Tolerance  \$\epsilon\$',

yscale='log',  ylim=(10**-5,  0.2),  ylabel=r'Bound  \$\delta\$', title=r'Hoeffding  and  Chebyshev  (dashed)  bound  as  a  function  of  \$\epsi

bottom.legend(loc='center  left',  bbox_to_anchor=(1,  0.5))

# To avoid that the text of the subplots overlap. ## 8.7 Conclusion

The figures above show that the Chebyshev bound is indeed a looser bound than the Hoeffding bound for sufficiently large N, cfr. the top figure.

# 9 Testing the Hoeffdingbound

We test the bound on

• Distributions with bounded range for which the Hoeffding inequality holds. Without loss of generality we can assume that the range is the closed interval [0, 1],

• Other distributions like the Gaussian and Poisson distribution that don’t meet the conditions

and so the Hoeffding inequality does not necessarily holds.

The test is done as follows • We generate N samples of a given distribution and calculate the sample mean X = N Xn and compare it with the true mean µ

•    We  record the outcome of    X µ > ϵ. Note that this is a Bernoulli random variable Y with unknown mean  µY  and unknown variance σ2  = µY(1 µY)

• We repeat this experiment K times and collect the outcomes

• The relative frequency that the outcome of an experiment is True is an estimate µˆY of the true probability P  |  XN µ |> ϵ

• The standard error seY of this estimate is given by

σˆY seY  = M

• We plot the estimate with its error bars and compare it to the Hoeffding bound.

In : def estimate_bad(distr, eps, N, nb_exps):

'''Input: probability distribution, the tolerance eps,  the  number  of  trials  N, and the number of times we repeat this experiment.

Output: estimate of the probability with standard error based on the number of experiments.'''

mean  =  distr.mean()

delta = get_delta(eps, N)

samples  =  distr.rvs(size=(N,  nb_exps)) avgs  =  np.average(samples,  axis=0) outcomes  =  np.abs(avgs  -  mean)  >  eps avg_outcome  =  np.average(outcomes)

se  =  np.sqrt(avg_outcome*(1-avg_outcome)/nb_exps)

return avg_outcome, se, delta

In : def calculate_estimates(distr, eps, nb_exps, N_range): size_domain = len(N_range)

estimates  =  np.zeros(size_domain,  dtype=np.float64) standard_errors  =  np.zeros(size_domain,  dtype=np.float64) deltas  =  np.zeros(size_domain,  dtype=np.float64)

for n, N in enumerate(N_range):

estimates[n], standard_errors[n], deltas[n] = estimate_bad(distr, eps, N, nb_

return estimates, standard_errors, deltas

## 9.1 Visualization of thebound

The default is to use logarithmic scale for both the x- and y-axis.

In  :  def  plot_estimates(eps,  nb_exps,  N_range,  logscale=True,  bounded_range_p=True,  conf_f

global RVs

fig, ax = plt.figure(), plt.axes()

if bounded_range_p:

rvs = [rv for rv in RVs if bounded_p(rv)]

### else:

rvs = [rv for rv in RVs if not bounded_p(rv)]

for rv, color in zip(rvs, COLORS):

# Plot the estimates for each distribution

estimates, standard_errors, deltas = calculate_estimates(rv, eps, nb_exps, N_ ax.plot(N_range,  estimates,  color=color,  label=rv.dist.name)

# Plot the errorbars

confidences = conf_factor*standard_errors

ax.fill_between(N_range,  estimates  -  confidences,  estimates  +  confidences,

color='grey',  alpha=0.2)

# Plot the Hoeffding bound

ax.plot(N_range,  deltas,  'k--',  label=r'Hoeffding  bound  \$\delta\$')

#

if logscale:

ax.set_xscale('log',  basex=2) ax.set_yscale('log',  basey=2) ax.set_ylim(2**-10,  2**0)

s  =  'Bounded'  if  bounded_range_p  else  'Unbounded'

ax.set(xlabel=r'\$N\$',

ylabel=r'\$\mathbb{P}\left[  (\mid  \mu  -  \nu  \mid  )  >  \epsilon  \right]\$', title=r'Hoeffding  Test  for  Distributions  with  {0:s}  Support'.format(s))

ax.legend(loc='best')

In  :  def  generate_N_range(exponential=False,  num=10,  max_logN=20):

assert  type(max_logN)  is  int,  'Argument  max_logN  has  to  be  in  integer  between  0  a

if exponential:

return  np.array(2**(np.arange(max_logN+1)),dtype  =  np.int64)

### else:

return  np.linspace(start=1,  stop=2**max_logN  +  1,  num=num,  dtype=np.int64)

## 9.2 Visualize Hoeffding test for distributions with and without boundedrange

In  :  for  boolean  in  True,  False:

plot_estimates(eps=0.025,

nb_exps=100,

N_range=generate_N_range(exponential=True,  max_logN=14), bounded_range_p=boolean) ## 9.3 Conclusion

The above two figures show that the Hoeffding bound always holds for  distributions  with  bounded range, cf. the top figure. In contrast, the bottom figure shows  that  this  bound not  al- ways hold when a distribution has unbounded range.

Note that we have restricted ourselves to the range [0, 1] but the Hoeffding bound can be generalized to any bounded range [a, b], cf. the section on the Hoeffding inequality above.

# 10 Overallconclusion

To Be Added, repeat the structure of the Introduction.  