# Binning is sinning

*You should never bin your data*, you hear people saying.

But what if you really really want to?

## let’s bin!

First, we create fake sinusoidal data

```
import numpy as np
t = np.linspace(0, 10, 1000)
y = 5*np.sin(t) + np.random.randn(t.size)
```

which we now want to bin in bins of 0.5.

```
width = 0.5
bins = np.arange(t.min(), t.max()+width, width)
```

`scipy`

has a handy function for this:

```
from scipy.stats import binned_statistic
```

which can be used as

```
binned_y = binned_statistic(t, y, bins=bins)[0]
```

Let’s plot the result

```
import matplotlib.pyplot as plt
plt.plot(t, y, alpha=0.5)
plt.plot(bins[1:] - width/2., binned_y, 'o-')
```

## more options

By default, `binned_statistic`

calculates the *mean* of the data inside each bin.
But we can calculate other statistics, as explained in the
documentation.

The `statistic`

optional argument can be:

```
statistic : string or callable, optional
The statistic to compute (default is 'mean'). The following statistics are available:
'mean' : compute the mean of values for points within each bin.
'median' : compute the median of values for points within each bin.
'count' : compute the count of points within each bin.
This is identical to an unweighted histogram.
'sum' : compute the sum of values for points within each bin.
This is identical to a weighted histogram.
'min' : compute the minimum of values for points within each bin.
'max' : compute the maximum of values for point within each bin.
function : a user-defined function which takes a 1D array of values,
and outputs a single numerical statistic.
This function will be called on the values in each bin.
Empty bins will be represented by function([]), or NaN if this returns an error.
```

## wrap-up

That’s it, a simple handy function to bin data!

Let me know in the comments if this is helpful, wrong or super-duper cool.