Statistics with Euler Math Toolbox

Content

EMT contains many statistical distributions, tests, plots, and functions for reading and writing data. For examples and more functions read the following introduction notebook.

12 - Statistics

Statistical functions.

Random Variables

Euler has a reliable random number creator. It can be used to create random variables for many distributions. If you need a fixed sequence, you can set a seed value with seed(x). Otherwise, the time value (in seconds) at the start of the current Euler session will be used.

The newer functions for creating random variables start with rand...(), replacing older functions with less uniform naming.

function comment seed (x)

  Set the seed for the random numbers
  
  After setting a seed, all random numbers will be determined from
  the seed.

function comment random (n,m)

  Uniformly distributed random variables in [0,1]
  
  random() : One random variable
  random(n,m) : Matrix of random variables
  random(n) : Row vector of random variables
  random([n,m]) : Matrix of random variables
  
  The function fastrandom is a quicker, but less reliable,
  alternative.
  
  See: 
  intrandom (Statistics with Euler Math Toolbox), 
  normal (Statistics with Euler Math Toolbox)

function randuniform (n : index, m : index, a : number, b : number)

  Random samples uniformly of the interval (a,b)
  
  See: 
  random (Statistics with Euler Math Toolbox), 
  random (Maxima Documentation)

function comment intrandom (n,m,k)

  Integer random variables in {1,...,k}
  
  intrandom(k) : One random variable
  intrandom(n,m,k) : Matrix of random variables
  intrandom(n,k) : Row vector of random variables
  intrandom([n,m],k) : Matrix of random variables
  
  See: 
  random (Maxima Documentation)

function randint (n,m,k=none)

  Integer random variables in {1,...,k}
  
  randint(n,m,k) : Matrix of random variables
  randint(n,k) : Vector of random variables

function comment normal (n,m)

  0-1-Gaussian distributed random variables
  
  normal() : One random variable
  normal(n,m) : Matrix of random variables
  normal(n) : Row vector of random variables
  normal([n,m]) : Matrix of random variables
  
  The function fastnormal() is a quicker, but less reliable,
  alternative.

function randnormal (n : index, m : index,
    mean : number = 0, stdev : nonnegative number = 1)

  Random samples from a normal (Gaussian) distribution

The following distributions are based on Julia code by John D. Cook.

function randmatrix (n:index, m:index=none, f$:string)

  Apply the random generator f$ to generate a matrix.

function randexponential (n : index, m : index=none,
    mean : positive number = 1)

  Random matrix from an exponential distribution
  
  randexponential(n,m) : mean=1
  randexponential(n,m,mean) : nxm matrix
  randexponential(n,mean=v) : vector with mean=v
  
  See: 
  randnormal (Statistics with Euler Math Toolbox), 
  randuniform (Statistics with Euler Math Toolbox)

function randgamma (n : index, m : index = none,
    shape : nonnegative number=1, scale : nonnegative number=1)

  Random samples from a gamma distribution
  
  Implementation based on "A Simple Method for Generating Gamma Variables"
  by George Marsaglia and Wai Wan Tsang.
  ACM Transactions on Mathematical Software
  Vol 26, No 3, September 2000, pages 363-372.
  
  Example:
  
  >k=10; theta=2;
  >x=randgamma(10000,shape=k,scale=theta);
  >plot2d(x,>distribution); ...
  >plot2d("x^(k-1)*exp(-x/theta)/theta^k/gamma(k)", ...
  >   >add,color=blue,thickness=2):

function randchi (n : index, m : index, dof : index)

  Random samples from a chi square distribution

function randinversegamma (n : index, m : index,
    shape : positive number, scale : positive number)

  return a random matrix from an inverse gamma random variable

function randweibull (n : index, m : index,
    shape : positive number, scale : positive number)

  Random samples from a Weibull distribution

function randcauchy (n : index, m : index,
    mean : number=0, scale : positive number=1)

  Random samples from a Cauchy distribution

function randt (n : index, m : index,
    dof : positive integer)

  Random samples from a Student-t distribution
  
  See Seminumerical Algorithms by Knuth

function randlaplace (n : index, m : index,
    mean : number, scale : positive number)

  Random samples from a Laplace distribution
  The Laplace distribution is also known as the double exponential distribution.

function randlognormal (n : index, m : index,
    mu : number, sigma : positive number)

  Random samples from a log-normal distribution

function randbeta (n : index, m : index,
    a: positive number, b : positive number)

  Random samples from a Beta distribution
  
  There are more efficient methods for generating beta samples.
  However such methods are little more efficient and much more complicated.

Statistical Distributions

Euler has a lot of routines to generate random numbers (named "rand..."), like the built-in functions random and normal. Moreover, Euler has functions for distributions ("...dis") and their densities ("q..."). For examples, see the following introduction notebook.

12 - Statistics

This file provides more distributions, random numbers, and tests.

function comment bindis (k:natural, n:natural, p:number)

  Cumulative binomial distribution
  
  Binomial distribution for i<=k out of n with probability p.
  
  From AlgLib.

function map binsum (k:natural, n:natural, p:number)

  Binomial sum for getting k<=i out of n runs with probability p.
  
  Uses an actual summation to compute the binomial sum. binsum() is
  faster.
  
  See: 
  bindis (Statistics with Euler Math Toolbox), 
  normalsum (Statistics with Euler Math Toolbox)

function map invbindis (px:number, n:natural, p:number)

  Inverse cumulative binomial distribution
  
  Finds k such that the probability of i<=k out of n is just more
  than px. The result may not be integer. Then k=floor(result). A
  binary intersection method is used.
  
  >bindis(4,10,0.6), invbindis(%,10,0.6)
  0.1662386176
  4

function comment bincdis (k,n,p)

  Complementary cumulative binomial distribution
  
  Inverse of the binomial distribution for i<=k out of n with
  probability p.
  
  From AlgLib.

function comment invpbindis (k,n,px)

  Inverse (for p) cumulative binomial distribution
  
  Solves px=bindis(k,n,p) for p. Assumes integer k and n.
  
  From AlgLib.

function overwrite normaldis (x : real, mean : real = 0, dev : real = 1)

  Cumulative normal distribution
  
  This function calls the built-in _normaldis(x) with adjusted mean
  and standard deviation.

function overwrite invnormaldis (p : real, mean : real = 0, dev : real = 1)

  Inverse of cumulative normal distribution
  
  This function calls the built-in _invnormaldis(x) and adjusts the
  mean and the standard deviation.

function comment erf (x)

  Gauss error function
  
  This is the integral of exp(-t^2)/sqrt(pi) from -x to x (from
  AlgLib). It is connected to normaldis() via
  2*normaldis(sqrt(2)*x)-1=erf(x).

function comment erfc (x)

  Complementary Gauss error function
  
  1-erf(x)

function normalsum (i:natural, n:natural, p:number)

  Probability of getting i or less hits in n runs.
  
  Works like binsum, but is much faster for large n and medium p.
  
  See: 
  binsum (Statistics with Euler Math Toolbox)

function map hypergeomsum (i:natural, n:natural, itot:natural, ntot:natural)

  Hypergemotric sum.
  
  This is the probability to get i or less hits, if n are picked
  randomly in an urn containing ntot objects, with itot good objects.
  
  i : we want i or less hits in n picked objects
  n : number of randomly picked objects
  itot : total number of positive objects
  ntot : total number of objects
  
  Examples:
  >1-hypergeomsum(7,13,13,52) // 8 or more spaces in Bridge
  0.00126372228099
  >columnsplot(hypergeomsum(0:20,20,20,40),lab=0:20):
  >hypergeomsum(4,20,20,40), 1-hypergeomsum(15,20,20,40)
  0.000179983683393
  0.000179983683393

function qnormal (x, m=0, d=1)

  Density (DPF) of the m-d-normal distribution
  
  This is the density function the Gauss normal distribution with
  mean m and standard deviation 1.
  
  See: 
  normaldis (Statistics with Euler Math Toolbox), 
  erf (Statistics with Euler Math Toolbox), 
  erf (Maxima Documentation)

function map gammarestr (x)

  Special Gamma function, works only for 2x natural
  
  See: 
  gamma (Mathematical Functions), 
  gamma (Maxima Documentation)

function qchidis (x, n)

  Density (DPF) of the chi-squared distribution

function comment chidis (x,n)

  Chi-squared distribution with n degrees of freedom
  
  Algorithm from AlgLib.

function comment chicdis (x,n)

  Complementary chi-squared distribution with n degrees of freedom
  
  Algorithm from AlgLib.
  
  See: 
  chidis (Statistics with Euler Math Toolbox), 
  invchidis (Statistics with Euler Math Toolbox), 
  invchicdis (Statistics with Euler Math Toolbox)

function invchidis (x, n)

  Inverse of of the chi-squared distribution
  
  See: 
  invchicdis (Statistics with Euler Math Toolbox)

function comment invchicdis (x, n)

  Inverse of of the complentary chi-squared distribution
  
  Algrithm from AlgLib.

function qtdis (t:real, n:nonnegative integer)

  Density (DPF) of the student t distribution

function comment tdis (x:real, n:natural)

  Student T distribution with n degrees of freedom
  
  Algrithm from AlgLib.
  
  See: 
  invtdis (Statistics with Euler Math Toolbox)

function comment invtdis (x:nonnegative, n:natural)

  Inverse Student T distributio with n degrees of freedom
  
  Algrithm from AlgLib.

function qfdis (x, n, m)

  Denisity (DPF) of the F-distribution

function overwrite map fdis (x, a, b)

  F distribution
  
  Vectorizes the built-in function _fdis(x,a,b).

function overwrite map fcdis (x, a, b)

  F distribution
  
  Vectorizes the built-in function _fcdis(x,a,b).

function overwrite map invfcdis (x, a, b)

  Complementary F distribution
  
  Vectorizes the built-in function _invfcdis(x,a,b)

function map invfdis (x, a, b)

  Inverse of of the F distribution

Descriptive Statistical Functions

function meandev (x:numerical, v=none)

  Mean value and statistical standard deviation of [x1,...,xn]
  
  An optional additional parameter v contains the multiplicities of
  x. m=mean(x) will assign the mean value only! If x is a matrix the
  function works on each row.
  
  x : data (1xm or nxm)
  v : multiplicities (1xn or nxm)
  
  See: 
  mean (Maxima Documentation)

function mean (x:numerical, v:real vector=none)

  Mean value of x.
  
  An optional additional parameter contains multiplicities.
  
  See: 
  meandev (Statistics with Euler Math Toolbox), 
  median (Statistics with Euler Math Toolbox), 
  median (Maxima Documentation)

function dev (x:numerical, v:real vector=none)

  Experimental standard deviation of x
  
  An additional parameter may contain multiplicities.
  
  See: 
  meandev (Statistics with Euler Math Toolbox)

function median (x, v=none, p:real vector=0.5)

  Quantile such that p of the x[i] are less equal.
  
  v are optional multiplicities for the values. If x is a matrix, the
  function works on all rows of x.
  
  x : data (1xm or nxm)
  v : multiplicities (1xm or nxm)
  p : desired percentage (real or row vector)
  
  See: 
  mean (Statistics with Euler Math Toolbox), 
  mean (Maxima Documentation), 
  quartiles (Statistics with Euler Math Toolbox), 
  quantile (Statistics with Euler Math Toolbox), 
  quantile (Maxima Documentation)

function pfold (v: real vector, w: real vector)

  Distribution of the sum of two distributions
  
  v[i], w[i] contain the probabilities that each random variable is
  equal to i-1. result[i] contains the probability that the sum of
  the two random variables is i-1.
  
  See: 
  fold (Numerical Algorithms), 
  fftfold (Numerical Algorithms)

function comment quantile (v:vector,p:real)

  Compute the p-quantile of the elements in v
  
  Function from AlgLib. This functions takes care of multiplicities
  of the two values closest to the quantile. For the lower, upper or
  middle quantile, use the median function.
  
  >quantile([1,2],20%)
  1.2
  >quantile([1,2,2],20%)
  1.4

function covar (x:real vector, y:real vector)

  Empirical covariance of x and y
  
  The covariance is the scalar product of x and y after
  centralization (x-mean(x),y-mean(y)) divided by the n-1, where n is
  the length of x and y.
  
  See: 
  covarmatrix (Statistics with Euler Math Toolbox)

function covarmatrix (x:real)

  Empirical covariance matrix of the rows of x
  
  The covariance matrix contains the empirical covariances of the rows
  of x, i.e., the scalar products of the centralized rows divided by
  the number columns of x minus 1.

function sphering (X)

  Sphering of the matrix X.
  
  The matrix X contains samples of random variables in its rows. The
  sphering of X is a linear transformation Y=T.(X-m), such that the
  rows of B have mean 0 and an identity correlation matrix.
  
  Returns {Y,T,m)
  
  See: 
  covarmatrix (Statistics with Euler Math Toolbox)

function correl (x:real vector, y:real vector)

  Correlation of x and y
  
  The correlation is the salar product of the centralized and
  normalized vectors x and y.

function correlmatrix (x:real)

  Correlation matrix of the rows of x
  
  See: 
  covar (Statistics with Euler Math Toolbox)

function ranks (x)

  Ranks of the elements of x in x.
  
  This is the number i of the item x[i] in the vector x. With
  multiplicities, the rank is the mean rank of the equal elements.
  
  Works for reals, real vectors, or string vectors x.
  
  See: 
  rankcorrel (Statistics with Euler Math Toolbox)

function rankcorrel (x:real vector, y:real vector)

  Correlation of x and y
  
  See: 
  ranks (Statistics with Euler Math Toolbox)

function empdist (x:real vector, vsorted:real vector)

  Empirical distribution
  
  The vector vsorted contains empirical data. Then we compute the
  empirical cumulative distribution (CPF) of the data at the points
  x[i].
  
  x : vector of values, usually sorted
  vsorted : sorted(!) vector of empirical values.
  
  >short empdist(1:6,sort(intrandom(1,6000,6)))
  [ 0.16283  0.33083  0.49317  0.662  0.832  1 ]

function randpint (n:index, m:index, p:vector)

  nxm random numbers with probabilities in p
  
  Generates nxm random numbers from 1 to k based on the vector of
  probabilities p[1],...,p[k].

function randmultinomial (n:index, m:index, p: vector)

  n mulitnomial random numbers based on a density
  
  This generates n outcomes of m throws with probabilities
  p[1],...,p[k]. The result is a nxk matrix
  
  See: 
  randpint (Statistics with Euler Math Toolbox), 
  chitest (Statistics with Euler Math Toolbox)

Statistical Tests

function chitest (x:real vector, y:positive vector,
    montecarlo=false, nmontecarlo=1000, p=false)

  Perform a chi^2 test, if x has the expected frequency y
  
  This function tests an observed frequency x against an expected
  frequency y. E.g., if 40 men are found sampling 100 persons, then
  [40,60] has to be tested against [50,50]. The result of the test is
  too small, which means that the sample does not obey the expected
  frequency with an error less than 5%.
  
  For a meaningful test, sum(x) should be equal to sum(y), unless
  p=true. In this case, y is interpreted as a vector of probabilities
  not a vector of events.
  
  To get frequencies of data from the data, use "getfrequencies",
  "count", or "histo".
  
  montecarlo : If montecarlo is not zero, the method uses a
  Monte Carlo simulation. It generates nmontecarlo random events of
  sum(x) data with the distribution in y, and counts how often the
  statistics sum((x-y)^2/y) is larger than the observed statistics.
  
  x,y : two real row vectors (1xn)
  
  Returns the error level for rejecting the hypothesis that the
  observed frequency x has the expected frequency y.
  
  >load statistics;
  >x=[100,90]; y=[0.5,0.5]*sum(x); chitest(x,y)
  0.468159909854
  >chitest(x,y,>montecarlo)
  0.43
  >chitest(x,[0.5,0.5],>p)
  0.468159909854
  
  See: 
  getfrequencies (Statistics with Euler Math Toolbox), 
  count (Statistics with Euler Math Toolbox), 
  histo (Statistics with Euler Math Toolbox)

function testnormal (r:real vector, n:integer, v:real vector, ..
    m:number, d:number)

  Test an observed frequency for normal distribution.
  
  Test the number of data v[i] in the ranges r[i],r[i+1] against the
  normal distribution with mean m and deviation d, using the chi^2
  method.
  
  r : ranges (sorted 1xm vector)
  n : total number of data
  v : number of data in the ranges (1x(m-1) vector)
  m : expected mean value
  d : expected deviation
  
  Return the error we get, if we reject the normal distribution.

function ttest (m:number, d:real scalar, n:natural, mu:number)

  T student test
  
  Test, if the measured mean m with measured deviation d of n data
  comes from a distribution with mean value mu.
  
  m : mean value of data
  d : standard deviation of data
  n : number of data
  mu : mean value to test for
  
  Returns the error alpha, if we reject that the data come from a
  distribution with mean mu.

function tcompare (m1:number, d1:number, n1:natural, ..
    m2:number, d2:number, n2:natural)

  Test, if two measured data agree in mean.
  
  The data must be normally distributed. Returns the error you make,
  if you reject that both data are from the same normal distribution.
  
  m1,m2 : means of the data
  d1,d2 : standard deviation of the data
  n1,n2 : number of data
  
  Returns the error alpha, if we reject that the data come from a
  distribution with the same expected mean.

function tcomparedata (x:real vector, y:real vector)

  Compare x and y for same mean
  
  Calls "tcompare" to compare the two observations for the same mean.
  
  Returns the error we make, if we reject that both data come from a
  distribution with the same expected mean.

function tabletest (A:real)

  Chi^2-Test the results a[i,j] for independence of the rows from the columns.
  
  The table test test for indepence of the rows of the tables
  from the column. E.g., if some items are observed [40,50] times
  for men, and [50,30] times for woman, we can ask, if the
  observations depend on the gender. In this case we can reject
  independece with 1.8% error level.
  
  This test should only be used for large table entries.
  
  Return the error you make, if you reject independence.

function expectedtable (A:real)

function contingency (A:real, correct=1)

  Contigency Coefficent of a matrix A.
  
  If the coefficient is close to 0, we tend to say that the rows and
  the colums are independent.
  
  correct : Correct the coefficient, so that it is between 0 and 1

function varanalysis

  varanalysis(x1,x2,x3,...) test for same mean.
  
  Test the data sets for the same mean, assuming normal distributed
  data sets. This is also known as one of the ANOVA tests.
  
  Returns the error we make, if we reject same mean.
  
  Example:
  >seed(0.5); v=normal(1,10)+1; w=normal(1,12)+2; u=normal(1,5);
  >varanalysis(v,w,u)
  0.000556414242764 // reject same mean!

function mediantest (a:real vector, b:real vector)

  Median test for equal mean.
  
  Test the two distributions a and b on equal mean value. For this,
  both distributions are checked on exceeding the median of the
  cumulative distribution.
  
  Returns the error we make, if we reject that a and b can have the
  same mean.

function ranktest (a:real vector, b:real vector, eps=epsilon())

  Mann-Whitney test tests a and b on same distribution
  
  Return the error we make, if we reject the same distribution.

function signtest (a:real vector, b:real vector)

  Test, if the expected mean of a is not better than b
  
  Assume a(i) and b(i) are results of a treatment. Then we can ask,
  if a is better than b.
  
  a,b : row vectors of same size
  
  Return the error we make, if we decide that a is better
  than b.

function wilcoxon (a:real vector, b:real vector, eps=sqrt(epsilon()))

  Test, if the expected mean of a is not better than b
  
  This is a sharper test for the same problem as in "signtest".
  
  Returns the error you make, if you decide that a is better
  than b.
  
  See: 
  signtest (Statistics with Euler Math Toolbox)

Statistical Plots

function quartiles (x, outliers=1.5)

  Quartiles for each row of x.
  
  This computes [Min,Q1,M,Q2,Max], where M is the median, Q1
  the median of the lower half and Q2 the median of the upper half.
  
  outliers : If none, Min and Max are the minimal and maximal values
  of the data. Otherwise, Min is the least data value, which is not
  smaller than Q1-outliers*range, where range=Q2-Q1. Similar for
  Max.
  
  See: 
  boxplot (Statistics with Euler Math Toolbox), 
  boxplot (Maxima Documentation)

function boxplot (data:real, lab=none, style="0#",
    textcolor=none, outliers=1.5, pointstyle="o",
    range=none)

  Summary of the quartiles in graphical form.
  
  data : vector or matrix. In case of a matrix, the rows are used.
  style : If present, it is used as fill style, the default is "O#"
  lab : Labels for each row of the data (vector of strings)
  textcolor : Color of the labels (vector of colors)
  outliers : Factor for the maximal whisker length or none
  pointstyle : Point style for outliers
  range : 1x2 vector for the plot range (or none)
  
  >x=normal(1000)*10+1000; boxplot(x):
  >x=randnormal(5,1000,100,10); boxplot(x,outliers=none):
  
  See: 
  quartiles (Statistics with Euler Math Toolbox), 
  barstyle (Euler Core)

function columnsplot (x:vector, lab=none,
    style="O#", color=green, textcolor=none,
    width=0.4, frame=true, grid=true)

  Plot the elements of x as columns.
  
  x : vector of values
  lab : a string vector with one label for each element of x.
  style,color : fill style and color for the bars
  textcolor : color for the labels
  
  See: 
  style (Euler Core), 
  style (Maxima Documentation), 
  color (Euler Core), 
  color (Maxima Documentation), 
  plot2d (Plot Functions), 
  plot2d (Maxima Documentation)

function dataplot (x:real, y:real, style="[]w", color=1)

  Plot the data (x,y) with point and line plots.
  
  x : real row vector
  y : real row vector or matrix (one row for each data).
  style : a style or a vector of styles
  color : a color or a vector of colors
  
  You can use a vector of styles and a vector of colors. These
  vectors must contain as many elements as there are rows of y.
  
  See: 
  statplot (Statistics with Euler Math Toolbox)

function piechart (x:real vector, style="0#",
    color=green, lab:string vector=none, r=1.5, textcolor=red)

  plot the data x in a pie chart.
  
  x : the vector of data
  color : a color or a vector of colors (same length as x)
  style : a style or a vector of styles
  lab : a vector of labels (same length as x)
  r : The piechart has radius 1. To leave space use r=1.5.

function starplot (v, style="/", color=green, lab:string=none,
    rays:integer=0, pstyle="[]w", textcolor=red, r=1.5)

  A star like plot with a filled star or with rays and dots only

function logimpulseplot (x, y=none, style="O#", color=green, d=0.1)

  Logarithmic impulse plot of y.

function columnsplot3d (z:real, srows=none, scols=none,
    angle=30�, height=40�, zoom=2.5, distance=5,
    crows:vector=none, ccols:vector=none, positive:integer=false)

  Plot 3D columns from the matrix z.
  
  This function shows a 3D plot of columns with heights z[i,j] in
  a rectangular array. z can be any real nxm matrix.
  
  z : the values to be displayed
  srows : labels for the rows
  scols : labels for the columns
  crows : colors of the rows
  ccols : colors of the columns (alternatively)
  positive : plot only positive columns
  
  Example
  >x=normal(1,1000); y=normal(1,1000);
  >v=-6:6; z=find2(x,y,v,v);
  >columnsplot3d(z,v,v,>positive):
  
  See: 
  find2 (Statistics with Euler Math Toolbox)

function mosaicplot (z: real, srows=none, scols=none,
    textcolor=red, color=green, style="O#")

  Moasaic plot of the data in z.
  
  z : matrix with values
  srows, scols : label strings for the rows and columns (string
  vectors)
  color : a color or a vector of colors for the columns of the plot.
  style : a style or a vector of styles.
  
  For an example see the introduction to statistics.

function scatterplots (M:real, lab=none,
    ticks=1, grid=4, style="..")

  Plot all rows of M against all rows of M.
  
  The labels are shown in the diagonal of the plot.
  
  lab : labels for the rows.

function statplot (x, y=none, plottype="b",
    pstyle="[]w", lstyle="-", fstyle="O#",
    xl="", yl="", color=none, vertical=0)

  Plots x against y.
  
  This is a simple form of using plot2d with point, line or bar
  options.
  
  The available plotplottypes are
  
  'p' : point plot
  'l' : line plot
  'b' : both
  'h' : histgram plot
  's' : surface plot
  
  pstyle, lstyle, fstyle : Styles for the points, lines and bars
  
  color : color or color array
  vertical : vertical labels
  
  See: 
  style (Euler Core), 
  style (Maxima Documentation)

function getspectral (x)

  Get a spectral color for 0<=x<=1.
  
  The scheme runs from blue (0) to red (1)

function colormap (A, spectral=0, color=white)

  Plot a color map of the matrix A.
  
  Color have a color scale on the right. The color is either a fixed
  color (white by default) or spectral colors.
  
  Example
  >colormap(randexponential(50,50),color=yellow); ...
  >title("Exponential distribution"); ...
  >xlabel("n"); ylabel("m"):

Data Tables in Statistics

function writetable (x,
    fixed:integer=0, wc:index=10, dc:nonnegative integer=2,
    labc=none, labr=none, wlabr=none, lablength=1,
    NA=".", NAval=NAN,
    ctok:index=none, tok:string vector=none,
    file=none, separator=none, comma=false,
    date=none, time=none)

  Write a table x of statistical values
  
  wc : default width for all columns or vector of widths. This is
  used only if the separator is not set.
  dc : default decimal precision for all columns or vector of
  precision values.
  fixed : use fixed number of decimal digits
  (boolean or vector of boolean).
  
  labc : labels for the columns (string or real vector)
  lablength : increase the width of the columns, if labels are wider.
  labr : labels for the rows (string or real vector)
  
  NA, NAval : Token string and value to represent "Not Available". By
  default "." and NAN is used.
  
  comma : write with decimal comma instead of dot.
  separator : use this separator string instead of the default
  blanks. Note that the number of blanks is determined
  by wc, if no separator is given.
  
  date : vector of columns, which should be written as dates.
  time : vector of columns, which should be written as times.
  
  Write a table with labels for the columns and rows and formats for
  each row. A typical table looks like this
  
  A     B     C
  G   1.02     2     f
  H   3.05     5     m
  
  Each number in the table can be translated into a token string.
  This translation can be set with a global variable tok (string
  vector) which applies to all columns with indices in ctok (index
  vector). Or it can be set in each column with an assigned variable
  tok? (string vector), where ? is the number of the column. Note
  that these assigned variables need to be declared with :=, since
  they are not in the parameter list of readtable().
  
  See the introduction to statistics for an example.
  
  See: 
  readtable (Statistics with Euler Math Toolbox)

function readtable (filename:string, clabs=1, rlabs=0,
    NA=".", NAval=NAN,
    ctok:index=none, tokens=[none],
    separator=none, comma=false,
    date=none, list=false)

  Read a table from a file.
  
  filename: readtable(none,...) will used an open file.
  clabs : The table has a line with headings
  rlabs : Each line has a heading label.
  NA, MAval : Sets the string and the returned value for NA (not
  available).
  ctok : Indices of the columns, where tokens are to be collected.
  tok1=..., tok2=... : Individual string arrays for columns.
  separator : Optional separating characters.
  comma : Use decimal commas instead of dots.
  date : vector of columns which contain a date.
  
  The table can have a header line (clabs=1) and row labels
  (rlabs=1). The entries of the table can be numbers (by default with
  decimal dots) or strings. In case of strings, these tokens are
  translated to unique numbers. The translation can either be set for
  each column separately in string vectors with names tok1, tok2
  etc., or for the complete table in the tokens parameter.
  
  The tokens are collected from the columns with indices in the ctok
  vector. If a column has a tok? parameter (tok1, tok2, etc.), tokens
  are not collected automatically from that column but the
  translation in tok? is used.
  
  Note that your have to write tok1:=... since the token parameters
  are not pre-defined parameters in the parameter list.
  
  The table can also contain expressions with units or global
  variables.
  
  "Not Available" can be represented by a special string. The
  default is ".". In the numerical table, it is represented by
  default as NAN. If you do not like this, simply let NAN be
  represented by any other string and translate ti into a numerical
  token.
  
  Dates are converted to a unique day number.
  
  See the introduction for statistics for an example.
  
  The default separator is a comma, semicolon, blank or tabulator. If
  you have a file with semicolons and decimal commas, just enable
  >comma. This will replace all commas with dots before the
  evaluation.
  
  Returns {table, heading string, token strings, rowlabel strings}
  
  See: 
  writetable (Statistics with Euler Math Toolbox), 
  date (Basic Utilities), 
  day (Basic Utilities), 
  day (Astronomical Functions)

function tablecol (M:real, j:nonnegative vector, NAval=NAN)

  The non-NAN values in the columns j of the table M.
  
  To access a table column, you could simply use M[,j], where j is a
  row vector of indices or a single index. But this function skips
  any NAN values in any of the columns j. It returns the columns
  as rows (transposed) and the indices of the rows.
  
  NANval : The value that should be treated as "Not Available"
  
  Returns {colums as rows, indices of non-NAN rows}

function selectrows (M:real, j:index, v:real vector, NAval=NAN)

  Select the rows indices i with M[i,j] in v and not-NAN.

function sortedrows (M:real, j:nonnegative integer vector)

  Index of rows for sorted table with respect to columns in j
  
  The table gets sorted in lexicographic order.
  
  Returns : {sorted table, index of sorted values}

Shuffle, Sort and Find

For statistical purposes and many other applications, Euler has efficient functions to find values in a vector.

function comment shuffle (v)

  Shuffle the vector v
  
  See: 
  sort (Statistics with Euler Math Toolbox), 
  sort (Maxima Documentation)

function comment sort (v)

  Sort the vector v
  
  The function returns {x,i}, where x is the sorted vector, and i is
  the vector of indices, which sort the vector.
  
  >v=shuffle(1:10)
  [6,  3,  1,  5,  10,  4,  9,  8,  2,  7]
  >{vx,i}=sort(v); vx,
  [1,  2,  3,  4,  5,  6,  7,  8,  9,  10]
  >v[i]
  [1,  2,  3,  4,  5,  6,  7,  8,  9,  10]
  
  See: 
  shuffle (Statistics with Euler Math Toolbox)

function comment lexsort (A)

  Lexicographic sort of the rows of A
  
  Returns {Asorted,i}, where i is the vector of indices, which sorts
  the rows of A.
  
  >A=intrandom(5,5,3)
  2       1       2       1       2
  1       3       3       1       2
  3       3       2       1       2
  3       1       3       2       2
  3       2       1       1       1
  >lexsort(A)
  1       3       3       1       2
  2       1       2       1       2
  3       1       3       2       2
  3       2       1       1       1
  3       3       2       1       2
  
  See: 
  sort (Maxima Documentation)

function overwrite unique (v)

  Unique elements in v
  
  >v=intrandom(10,12)
  [6,  2,  3,  9,  6,  5,  7,  7,  10,  2]
  >unique(v)
  [2,  3,  5,  6,  7,  9,  10]

function comment find (v,x)

  Find x in the intervals of the sorted vector v
  
  Returns the index i such that v(i) <= x < v(i+1). It returns 0 for
  elements smaller than v[0], and length(v) for elements larger or equal
  the last element of v. The function maps to x.
  
  The function works for sorted vectors of strings v, and strings or
  string vectors x using alphabetic (ASCII) string comparison.
  
  >s=random(10)
  [0.270906,  0.704419,  0.217693,  0.445363,  0.308411,  0.914541,
  0.193585,  0.463387,  0.095153,  0.595017]
  >v=0.2:0.2:0.8
  [0.2,  0.4,  0.6,  0.8]
  >find(v,s)
  [1,  3,  1,  2,  1,  4,  0,  2,  0,  2]
  
  See: 
  indexof (Statistics with Euler Math Toolbox), 
  indexofsorted (Statistics with Euler Math Toolbox)

function comment count (v,n)

  Counts v[i] in integer intervals [i-1,i] up to n
  
  Returns a vector n, where n[i] is the number of elements of v in
  the interval [i-1,i[ for 1<=i<=n.
  >count([0,0.1,0.2,1,1.5,2],2)
  [3,  2]

function comment indexof (v,x)

  Find x in the vector v
  
  Find the first occurence of x in the vector v. Maps to x.
  
  >v=intrandom(10,4)
  [6,  5,  2,  2,  3,  8,  5,  4,  4,  2]
  >indexof(v,1:10)
  [0,  3,  5,  8,  2,  1,  0,  6,  0,  0]
  >indexof(["This","is","a","test"],"a")
  3
  
  See: 
  indexofsorted (Statistics with Euler Math Toolbox), 
  find (Statistics with Euler Math Toolbox)

function comment indexofsorted (v,x)

  Find x in the sorted vector v
  
  Find the last occurence of x in the vector v. Note that indexof
  returns the first occurence. Maps to x.
  
  >v=sort(intrandom(10,4))
  [3,  4,  5,  5,  5,  6,  8,  8,  9,  10]
  >indexofsorted(v,1:10)
  [0,  0,  1,  2,  5,  6,  0,  8,  9,  10]
  
  See: 
  find (Statistics with Euler Math Toolbox)

function comment multofsorted (v, x)

  Counts x in the sorted vector v
  
  The function maps to x.
  
  >v=intrandom(1000,10); multofsorted(sort(v),1:10), sum(%)
  [88,  84,  126,  86,  110,  104,  86,  103,  113,  100]
  1000
  
  See: 
  getmultiplicities (Statistics with Euler Math Toolbox), 
  getfrequencies (Statistics with Euler Math Toolbox)

function getfrequencies (x:real vector, r: real vector)

  Count the number of x in the intervals of the sorted vector r.
  
  The function returns the number of x[j] in the intervals r[i-1] to
  r[i].
  
  x : real row vector (1xn)
  r : real sorted row vector (1xm)
  
  Returns the frequencies f as a row vector (1x(m-1))
  
  See: 
  count (Statistics with Euler Math Toolbox), 
  histo (Statistics with Euler Math Toolbox), 
  multofsorted (Statistics with Euler Math Toolbox), 
  getmultiplicities (Statistics with Euler Math Toolbox)

function getmultiplicities (x, y, sorted=0)

  Counts how often the elements of x appear in y.
  
  This works for string vectors and for real vectors.
  
  sorted : if true, then y is assumed to be sorted.
  
  See: 
  count (Statistics with Euler Math Toolbox), 
  getfrequencies (Statistics with Euler Math Toolbox), 
  multofsorted (Statistics with Euler Math Toolbox)

function getstatistics (x:real vector, y:real vector=none)

  Return a statics of the values in the vector x.
  
  If y is none, the function returns {xu,mu}, where xu are the
  unique elements of x, and mu are the multiplicities of these
  values.
  
  Else the function returns {xu,yu,m}, where xu are the unique
  elements of x, yu the unique elements of y, and M is a table of
  multiplicities of pairs (xu[i],yu[j]) in (x[k],y[k]), k=1...n.

function args histo (d:real vector, n:index=10,
    integer:integer=0, even:integer=0, v:real vector=none,
    bar=1)

  Computes {x,y} for histogram plots.
  
  d : 1xm vector of data
  
  Returns {x,y} whith
  
  x - End points of the intervals (equispaced n+1 points)
  y - The number of data in the subintervals (frequencies)
  
  integer : flag for distributions on integers
  even : flag for evenly spaced discrete distributions
  This is used by plot2d for bar styles.
  
  v : optional interval boundaries (ordered).
  
  bar : If true, the function returns two vectors for >bar in plot2d.
  If false, it returns a sawtooth function for plot2d.
  
  The plot function plot2d has parameters distribution=1, histogram=1
  to achieve the same effect.
  
  See: 
  plot2d (Plot Functions), 
  plot2d (Maxima Documentation)

function find2 (x:vector, y:vector,
    vx:vector=none, vy:vector=none, n:integer=none)

  Matrix count for pairs x[i],y[i] in the bounds.
  
  x,y : Vectors of same size.
  vx,vy : Sorted vector of bounds, if present (must enclose x resp. y)
  n : If vx or vy is not present, number of intervals between the bounds of x.
  
  Returns a matrix with counts.
  
  See: 
  columnsplot3d (Statistics with Euler Math Toolbox)

Confidence Intervals

function cinormal (mean:numerical, sigma:numerical, alpha=0.05)

  Confidence interval for known mean and standard deviation.
  
  See: 
  cimean (Statistics with Euler Math Toolbox)

function cimean (data: real vector, alpha=0.05)

  Confidence interval for the mean of normal distributed data
  
  This is a symmetric interval around the mean value of the data
  containing the true mean of the random experiment in 95% (default
  alpha=0.05) of the cases. The data are assumed to be
  from identically normal distributed independent random variables.
  
  Clopper-Pearson confidence interval for k hits in n.
  
  The upper bound of the interval is such that P(X<=k,p)=alpha/2, the
  lower bound such that P(X>=k,p)=alpha/2. In other words, if p is
  outside the interval then k is an event which is less likely then
  alpha. This interval estimator yields an interval which contains
  the true p in 95% (default alpha=0.05) of the cases.
  
  >clopperpearson(20,400)
  [0.0308831,  0.076167]

Documentation Homepage