Python code to split probabilities/numbers in multiple bins

There could be multiple ways to split 100s of probability values or numbers into 10 or 100 bins. I am going to show you a short Python code to do this. The code uses different functions of the Numpy library. In this code, I will generate ten bins and will divide 100 float values into those bins. The first bin will contain values less than 0.1; the second bin will contain values between 0.1 and 0.2; the third bin will contain values between 0.2 and 0.3; and so on.

  • Generate 100 float numbers (probabilities) and sort them in ascending order:
random.seed(100)
probs = sorted([random.uniform(0, 1) for _ in range(100)])
Output: [0.00564508589179924, 0.01252727724286462, 0.047887516459941715, 0.050720324978788645, 0.08015370917850195, 0.11402992513415988, 0.12100318609031835, 0.12409328058844371, 0.12661520856523822, 0.1456692551041303, 0.14718865646585277, 0.16042180997493383, 0.16112705356908175, 0.16470458399195842, 0.17846076295399127, 0.18859491417448548, 0.19264065598707336, 0.1928283484433999, 0.20386952877685705, 0.20914567294674224, 0.21083399208685016, 0.21914663080812313, 0.23722506292270407, 0.25253291606751593, 0.2898552990686951, 0.3094059291400342, 0.30985864642647365, 0.3337407295084719, 0.33535077594001006, 0.34326827930271964, 0.34700445361481724, 0.36299885808263155, 0.3898450339202282, 0.3955161806017494, 0.40060947595683594, 0.43351443489540376, 0.44819248318227456, 0.45492700451402135, 0.45594588118356716, 0.47671096723338546, 0.48356388808590334, 0.4934819258874641, 0.5071130373622724, 0.5135789722286814, 0.5252832108667903, 0.5281792333857197, 0.5329014146425713, 0.5450878369724225, 0.5539516192440809, 0.555399665801069, 0.566157707292886, 0.568213428238215, 0.5826845113501725, 0.5903745691382535, 0.5953635341405692, 0.6091002318791362, 0.6184843851752674, 0.6211733908687608, 0.6263216391927974, 0.6377998063140823, 0.648542908120573, 0.6508305207214754, 0.6515067326609668, 0.6597342762606756, 0.6759756847124709, 0.7008343372138875, 0.705513226934028, 0.7061820603607597, 0.7132090415917072, 0.7147436398076238, 0.7173040778848189, 0.7201063081299326, 0.7319589730332557, 0.7487490901889821, 0.7538770308729308, 0.7615016181911767, 0.7669579080335924, 0.7680181487450805, 0.7707838056590222, 0.773500702168781, 0.7749555999671643, 0.7987538611432622, 0.8000204571334277, 0.8138184927502465, 0.8180181933574304, 0.8411747144619733, 0.8473725082277769, 0.8541716556500671, 0.8613410616424554, 0.8904569543890624, 0.9011520429873923, 0.9023430227313984, 0.9092967860914605, 0.9137768422492283, 0.9329624000750505, 0.9364167906644235, 0.9470780060029439, 0.9561006461166511, 0.9633157837008631, 0.9685239944790129]
  • List of bins that will be used to decide boundaries:
bins = np.asarray([i / 10 for i in range(0, 11, 1)])
  • Indices of the bins to which each value in input array (probs) belongs.
bins_idx = np.digitize(probs, bins, right=False)
  • Count the number of items in each bin:
bin_count = np.bincount(bins_idx)
Output: [ 0  5 13  7  9  8 13 10 17  8 10]
  • Split the numbers into bins as per the bin counts:
idx = np.cumsum(bin_count)
prob_bins = [probs[idx[i]:idx[i + 1]] for i in range(len(idx) - 1)]
Output: [[0.00564508589179924, 0.01252727724286462, 0.047887516459941715, 0.050720324978788645, 0.08015370917850195], [0.11402992513415988, 0.12100318609031835, 0.12409328058844371, 0.12661520856523822, 0.1456692551041303, 0.14718865646585277, 0.16042180997493383, 0.16112705356908175, 0.16470458399195842, 0.17846076295399127, 0.18859491417448548, 0.19264065598707336, 0.1928283484433999], [0.20386952877685705, 0.20914567294674224, 0.21083399208685016, 0.21914663080812313, 0.23722506292270407, 0.25253291606751593, 0.2898552990686951], [0.3094059291400342, 0.30985864642647365, 0.3337407295084719, 0.33535077594001006, 0.34326827930271964, 0.34700445361481724, 0.36299885808263155, 0.3898450339202282, 0.3955161806017494], [0.40060947595683594, 0.43351443489540376, 0.44819248318227456, 0.45492700451402135, 0.45594588118356716, 0.47671096723338546, 0.48356388808590334, 0.4934819258874641], [0.5071130373622724, 0.5135789722286814, 0.5252832108667903, 0.5281792333857197, 0.5329014146425713, 0.5450878369724225, 0.5539516192440809, 0.555399665801069, 0.566157707292886, 0.568213428238215, 0.5826845113501725, 0.5903745691382535, 0.5953635341405692], [0.6091002318791362, 0.6184843851752674, 0.6211733908687608, 0.6263216391927974, 0.6377998063140823, 0.648542908120573, 0.6508305207214754, 0.6515067326609668, 0.6597342762606756, 0.6759756847124709], [0.7008343372138875, 0.705513226934028, 0.7061820603607597, 0.7132090415917072, 0.7147436398076238, 0.7173040778848189, 0.7201063081299326, 0.7319589730332557, 0.7487490901889821, 0.7538770308729308, 0.7615016181911767, 0.7669579080335924, 0.7680181487450805, 0.7707838056590222, 0.773500702168781, 0.7749555999671643, 0.7987538611432622], [0.8000204571334277, 0.8138184927502465, 0.8180181933574304, 0.8411747144619733, 0.8473725082277769, 0.8541716556500671, 0.8613410616424554, 0.8904569543890624], [0.9011520429873923, 0.9023430227313984, 0.9092967860914605, 0.9137768422492283, 0.9329624000750505, 0.9364167906644235, 0.9470780060029439, 0.9561006461166511, 0.9633157837008631, 0.9685239944790129]]

From the output, you can see that the code split the float numbers into multiple bins. The values in each bin also satisfy the boundary criteria.

The complete code is as follows:

# generate 100 random float numbers
random.seed(100)
probs = sorted([random.uniform(0, 1) for _ in range(100)])
# print(probs)
# start binning
bins = np.asarray([i / 10 for i in range(0, 11, 1)])
bins_idx = np.digitize(probs, bins, right=False)
bin_count = np.bincount(bins_idx)
# print(bin_count)
idx = np.cumsum(bin_count)
# print(len(idx))
prob_bins = [probs[idx[i]:idx[i + 1]] for i in range(len(idx) - 1)]
print(prob_bins)
print(len(sum(prob_bins ,[]))) # check if total elements in all bins = len(probs)

Similar Posts

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.