Class GaussianDistribution
- All Implemented Interfaces:
Serializable
,Distribution
,ExponentialFamily
The family of normal distributions is closed under linear transformations.
That is, if X is normally distributed, then a linear transform aX + b
(for some real numbers a ≠ 0 and b
) is also normally distributed.
If X1
, X2
are two
independent normal random variables, then their linear combination
will also be normally distributed. The converse is also true: if
X1
and X2
are independent
and their sum X1 + X2
is distributed
normally, then both X1
and X2
must also be normal, which is known as the Cramer's theorem. Of all
probability distributions over the real domain with mean μ
and variance σ2
, the normal
distribution N(μ, σ2)
is the one with the maximum entropy.
The central limit theorem states that under certain, fairly common conditions, the sum of a large number of random variables will have approximately normal distribution. For example if X1, …, Xn is a sequence of iid random variables, each having mean μ and variance σ2 but otherwise distributions of Xi's can be arbitrary, then the central limit theorem states that
√n (1⁄n Σ Xi - μ) → N(0, σ2).
The theorem will hold even if the summands Xi
are not iid,
although some constraints on the degree of dependence and the growth rate
of moments still have to be imposed.
Therefore, certain other distributions can be approximated by the normal distribution, for example:
- The binomial distribution
B(n, p)
is approximately normalN(np, np(1-p))
for large n and for p not too close to zero or one. - The
Poisson(λ)
distribution is approximately normalN(λ, λ)
for large values of λ. - The chi-squared distribution
Χ2(k)
is approximately normalN(k, 2k)
for large k. - The Student's t-distribution
t(ν)
is approximately normalN(0, 1)
when ν is large.
- See Also:
-
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptiondouble
cdf
(double x) Cumulative distribution function.double
entropy()
Returns Shannon entropy of the distribution.static GaussianDistribution
fit
(double[] data) Estimates the distribution parameters by MLE.static GaussianDistribution
Returns the standard normal distribution.double
Generates a Gaussian random number with the inverse CDF method.int
length()
Returns the number of parameters of the distribution.double
logp
(double x) The density at x in log scale, which may prevents the underflow problem.M
(double[] x, double[] posteriori) The M step in the EM algorithm, which depends on the specific distribution.double
mean()
Returns the mean of distribution.double
p
(double x) The probability density function for continuous distribution or probability mass function for discrete distribution at x.double
quantile
(double p) The quantile, the probability to the left of quantile(p) is p.double
rand()
Generates a Gaussian random number with the Box-Muller algorithm.double
sd()
Returns the standard deviation of distribution.toString()
double
variance()
Returns the variance of distribution.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface smile.stat.distribution.Distribution
inverseTransformSampling, likelihood, logLikelihood, quantile, quantile, rand, rejectionSampling
-
Field Details
-
mu
public final double muThe mean. -
sigma
public final double sigmaThe standard deviation.
-
-
Constructor Details
-
GaussianDistribution
public GaussianDistribution(double mu, double sigma) Constructor- Parameters:
mu
- mean.sigma
- standard deviation.
-
-
Method Details
-
fit
Estimates the distribution parameters by MLE.- Parameters:
data
- the training data.- Returns:
- the distribution.
-
getInstance
Returns the standard normal distribution.- Returns:
- the standard normal distribution.
-
length
public int length()Description copied from interface:Distribution
Returns the number of parameters of the distribution. The "length" is in the sense of the minimum description length principle.- Specified by:
length
in interfaceDistribution
- Returns:
- The number of parameters.
-
mean
public double mean()Description copied from interface:Distribution
Returns the mean of distribution.- Specified by:
mean
in interfaceDistribution
- Returns:
- The mean.
-
variance
public double variance()Description copied from interface:Distribution
Returns the variance of distribution.- Specified by:
variance
in interfaceDistribution
- Returns:
- The variance.
-
sd
public double sd()Description copied from interface:Distribution
Returns the standard deviation of distribution.- Specified by:
sd
in interfaceDistribution
- Returns:
- The standard deviation.
-
entropy
public double entropy()Description copied from interface:Distribution
Returns Shannon entropy of the distribution.- Specified by:
entropy
in interfaceDistribution
- Returns:
- Shannon entropy.
-
toString
-
rand
public double rand()Generates a Gaussian random number with the Box-Muller algorithm.- Specified by:
rand
in interfaceDistribution
- Returns:
- a random number.
-
inverseCDF
public double inverseCDF()Generates a Gaussian random number with the inverse CDF method.- Returns:
- a random number.
-
p
public double p(double x) Description copied from interface:Distribution
The probability density function for continuous distribution or probability mass function for discrete distribution at x.- Specified by:
p
in interfaceDistribution
- Parameters:
x
- a real number.- Returns:
- the density.
-
logp
public double logp(double x) Description copied from interface:Distribution
The density at x in log scale, which may prevents the underflow problem.- Specified by:
logp
in interfaceDistribution
- Parameters:
x
- a real number.- Returns:
- the log density.
-
cdf
public double cdf(double x) Description copied from interface:Distribution
Cumulative distribution function. That is the probability to the left of x.- Specified by:
cdf
in interfaceDistribution
- Parameters:
x
- a real number.- Returns:
- the probability.
-
quantile
public double quantile(double p) The quantile, the probability to the left of quantile(p) is p. This is actually the inverse of cdf.Original algorithm and Perl implementation can be found at this page.
- Specified by:
quantile
in interfaceDistribution
- Parameters:
p
- the probability.- Returns:
- the quantile.
-
M
Description copied from interface:ExponentialFamily
The M step in the EM algorithm, which depends on the specific distribution.- Specified by:
M
in interfaceExponentialFamily
- Parameters:
x
- the input data for estimationposteriori
- the posteriori probability.- Returns:
- the (unnormalized) weight of this distribution in the mixture.
-