import dask.dataframe as dd
import pandas as pd
import numpy as np20 Aggregate
df = pd.DataFrame([('bird', 'Falconiformes', 389.0),
                   ('bird', 'Psittaciformes', 24.0),
                   ('mammal', 'Carnivora', 80.2),
                   ('mammal', 'Primates', np.nan),
                   ('mammal', 'Carnivora', 58)],
                  index=['falcon', 'parrot', 'lion', 'monkey', 'leopard'],
                  columns=('class', 'order', 'max_speed'))
df = dd.from_pandas(df, npartitions=1)
dfDask DataFrame Structure:
| class | order | max_speed | |
|---|---|---|---|
| npartitions=1 | |||
| falcon | string | string | float64 | 
| parrot | ... | ... | ... | 
Dask Name: frompandas, 1 expression
20.0.1 groupby(): group by categorical
grouped = df.groupby('class')grouped2 = df.groupby(['class', 'order'])grouped.size().compute()class
bird      2
mammal    3
dtype: int64
grouped2.mean().compute()| max_speed | ||
|---|---|---|
| class | order | |
| bird | Falconiformes | 389.0 | 
| Psittaciformes | 24.0 | |
| mammal | Carnivora | 69.1 | 
| Primates | NaN | 
or in a single step:
df.groupby(['class', 'order']).mean().compute()| max_speed | ||
|---|---|---|
| class | order | |
| bird | Falconiformes | 389.0 | 
| Psittaciformes | 24.0 | |
| mammal | Carnivora | 69.1 | 
| Primates | NaN |