Vous avez dans le fichier zip ci-joint un dossier contenant des données de ventes d'une entreprise sur l'année 2019. Vous avez un fichier csv par mois. Chaque fichier contient les colonnes suivantes:
Vous êtes appelé à analyser ces données pour répondre aux questions suivantes:
Accompagnez vos solutions de visualisations parlantes afin d'aider le business à améliorer ses chiffres de l'année suivante.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import os
import warnings
warnings.filterwarnings('ignore')
df = pd.read_csv("./Sales_Data/Sales_April_2019.csv")
df
Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address | |
---|---|---|---|---|---|---|
0 | 176558 | USB-C Charging Cable | 2 | 11.95 | 04/19/19 08:46 | 917 1st St, Dallas, TX 75001 |
1 | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 176559 | Bose SoundSport Headphones | 1 | 99.99 | 04/07/19 22:30 | 682 Chestnut St, Boston, MA 02215 |
3 | 176560 | Google Phone | 1 | 600 | 04/12/19 14:38 | 669 Spruce St, Los Angeles, CA 90001 |
4 | 176560 | Wired Headphones | 1 | 11.99 | 04/12/19 14:38 | 669 Spruce St, Los Angeles, CA 90001 |
... | ... | ... | ... | ... | ... | ... |
18378 | 194090 | Google Phone | 1 | 600 | 04/08/19 17:11 | 177 Jackson St, Los Angeles, CA 90001 |
18379 | 194091 | AA Batteries (4-pack) | 1 | 3.84 | 04/15/19 16:02 | 311 Forest St, Austin, TX 73301 |
18380 | 194092 | AAA Batteries (4-pack) | 2 | 2.99 | 04/28/19 14:36 | 347 Sunset St, San Francisco, CA 94016 |
18381 | 194093 | AA Batteries (4-pack) | 1 | 3.84 | 04/14/19 15:09 | 835 Lake St, Portland, OR 97035 |
18382 | 194094 | Lightning Charging Cable | 1 | 14.95 | 04/18/19 11:08 | 354 North St, Boston, MA 02215 |
18383 rows × 6 columns
data = ("./Sales_Data/")
os.listdir(data)
['Sales_April_2019.csv', 'Sales_August_2019.csv', 'Sales_December_2019.csv', 'Sales_February_2019.csv', 'Sales_January_2019.csv', 'Sales_July_2019.csv', 'Sales_June_2019.csv', 'Sales_March_2019.csv', 'Sales_May_2019.csv', 'Sales_November_2019.csv', 'Sales_October_2019.csv', 'Sales_September_2019.csv']
janvier_sales = pd.read_csv(data + 'Sales_January_2019.csv')
janvier_sales.head()
Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address | |
---|---|---|---|---|---|---|
0 | 141234 | iPhone | 1 | 700 | 01/22/19 21:25 | 944 Walnut St, Boston, MA 02215 |
1 | 141235 | Lightning Charging Cable | 1 | 14.95 | 01/28/19 14:15 | 185 Maple St, Portland, OR 97035 |
2 | 141236 | Wired Headphones | 2 | 11.99 | 01/17/19 13:33 | 538 Adams St, San Francisco, CA 94016 |
3 | 141237 | 27in FHD Monitor | 1 | 149.99 | 01/05/19 20:33 | 738 10th St, Los Angeles, CA 90001 |
4 | 141238 | Wired Headphones | 1 | 11.99 | 01/25/19 11:59 | 387 10th St, Austin, TX 73301 |
janvier_sales.describe()
Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address | |
---|---|---|---|---|---|---|
count | 9697 | 9697 | 9697 | 9697 | 9697 | 9697 |
unique | 9269 | 20 | 8 | 19 | 8077 | 9161 |
top | Order ID | USB-C Charging Cable | 1 | 11.95 | Order Date | Purchase Address |
freq | 16 | 1171 | 8795 | 1171 | 16 | 16 |
janvier_sales.columns
Index(['Order ID', 'Product', 'Quantity Ordered', 'Price Each', 'Order Date', 'Purchase Address'], dtype='object')
files = [fichier for fichier in os.listdir(data) if fichier.endswith('.csv')]
files
['Sales_April_2019.csv', 'Sales_August_2019.csv', 'Sales_December_2019.csv', 'Sales_February_2019.csv', 'Sales_January_2019.csv', 'Sales_July_2019.csv', 'Sales_June_2019.csv', 'Sales_March_2019.csv', 'Sales_May_2019.csv', 'Sales_November_2019.csv', 'Sales_October_2019.csv', 'Sales_September_2019.csv']
df = pd.DataFrame()
for file in files:
data_interm = pd.read_csv(data + file)
df = pd.concat([df, data_interm])
df.shape
(186850, 6)
df.describe()
Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address | |
---|---|---|---|---|---|---|
count | 186305 | 186305 | 186305 | 186305 | 186305 | 186305 |
unique | 178438 | 20 | 10 | 24 | 142396 | 140788 |
top | Order ID | USB-C Charging Cable | 1 | 11.95 | Order Date | Purchase Address |
freq | 355 | 21903 | 168552 | 21903 | 355 | 355 |
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 186850 entries, 0 to 11685 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Order ID 186305 non-null object 1 Product 186305 non-null object 2 Quantity Ordered 186305 non-null object 3 Price Each 186305 non-null object 4 Order Date 186305 non-null object 5 Purchase Address 186305 non-null object dtypes: object(6) memory usage: 10.0+ MB
df.isnull().sum(axis=0)
Order ID 545 Product 545 Quantity Ordered 545 Price Each 545 Order Date 545 Purchase Address 545 dtype: int64
val_manquantes = df[df.isnull().any(axis=1)]
val_manquantes
Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address | |
---|---|---|---|---|---|---|
1 | NaN | NaN | NaN | NaN | NaN | NaN |
356 | NaN | NaN | NaN | NaN | NaN | NaN |
735 | NaN | NaN | NaN | NaN | NaN | NaN |
1433 | NaN | NaN | NaN | NaN | NaN | NaN |
1553 | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... |
10012 | NaN | NaN | NaN | NaN | NaN | NaN |
10274 | NaN | NaN | NaN | NaN | NaN | NaN |
10878 | NaN | NaN | NaN | NaN | NaN | NaN |
11384 | NaN | NaN | NaN | NaN | NaN | NaN |
11662 | NaN | NaN | NaN | NaN | NaN | NaN |
545 rows × 6 columns
val_manquantes.isnull().all()
Order ID True Product True Quantity Ordered True Price Each True Order Date True Purchase Address True dtype: bool
df.shape
(186850, 6)
df.dropna(inplace=True)
df.shape
(186305, 6)
df.isnull().any()
Order ID False Product False Quantity Ordered False Price Each False Order Date False Purchase Address False dtype: bool
df[df['Order Date'] == "Order Date"]
Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address | |
---|---|---|---|---|---|---|
519 | Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address |
1149 | Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address |
1155 | Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address |
2878 | Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address |
2893 | Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address |
... | ... | ... | ... | ... | ... | ... |
10000 | Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address |
10387 | Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address |
11399 | Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address |
11468 | Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address |
11574 | Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address |
355 rows × 6 columns
"1624".isdigit()
True
df.loc[~df['Order ID'].str.isdigit(), : ]
Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address | |
---|---|---|---|---|---|---|
519 | Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address |
1149 | Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address |
1155 | Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address |
2878 | Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address |
2893 | Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address |
... | ... | ... | ... | ... | ... | ... |
10000 | Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address |
10387 | Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address |
11399 | Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address |
11468 | Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address |
11574 | Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address |
355 rows × 6 columns
df_clean = df.drop(df.loc[df['Order Date'] == "Order Date", :].index)
df_clean.loc[~df_clean['Order ID'].str.isdigit(), : ]
Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address |
---|
df_clean.head()
Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address | |
---|---|---|---|---|---|---|
0 | 176558 | USB-C Charging Cable | 2 | 11.95 | 04/19/19 08:46 | 917 1st St, Dallas, TX 75001 |
2 | 176559 | Bose SoundSport Headphones | 1 | 99.99 | 04/07/19 22:30 | 682 Chestnut St, Boston, MA 02215 |
3 | 176560 | Google Phone | 1 | 600 | 04/12/19 14:38 | 669 Spruce St, Los Angeles, CA 90001 |
4 | 176560 | Wired Headphones | 1 | 11.99 | 04/12/19 14:38 | 669 Spruce St, Los Angeles, CA 90001 |
5 | 176561 | Wired Headphones | 1 | 11.99 | 04/30/19 09:27 | 333 8th St, Los Angeles, CA 90001 |
import warnings
warnings.filterwarnings('ignore')
df_clean['Quantity Ordered'] = pd.to_numeric(df_clean['Quantity Ordered'])
# df_clean['Quantity Ordered'] = df_clean['Quantity Ordered'].astype('int')
df_clean['Price Each'] = pd.to_numeric(df_clean['Price Each'])
df_clean.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 182735 entries, 0 to 11685 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Order ID 182735 non-null object 1 Product 182735 non-null object 2 Quantity Ordered 182735 non-null int64 3 Price Each 182735 non-null float64 4 Order Date 182735 non-null object 5 Purchase Address 182735 non-null object dtypes: float64(1), int64(1), object(4) memory usage: 9.8+ MB
df_clean['Order Date'] = pd.to_datetime(df_clean['Order Date'])
df_clean.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 182735 entries, 0 to 11685 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Order ID 182735 non-null object 1 Product 182735 non-null object 2 Quantity Ordered 182735 non-null int64 3 Price Each 182735 non-null float64 4 Order Date 182735 non-null datetime64[ns] 5 Purchase Address 182735 non-null object dtypes: datetime64[ns](1), float64(1), int64(1), object(3) memory usage: 9.8+ MB
df['Order Date']
0 04/19/19 08:46 2 04/07/19 22:30 3 04/12/19 14:38 4 04/12/19 14:38 5 04/30/19 09:27 ... 11681 09/17/19 20:56 11682 09/01/19 16:00 11683 09/23/19 07:39 11684 09/19/19 17:30 11685 09/30/19 00:18 Name: Order Date, Length: 186305, dtype: object
df_clean['Order Date']
0 2019-04-19 08:46:00 2 2019-04-07 22:30:00 3 2019-04-12 14:38:00 4 2019-04-12 14:38:00 5 2019-04-30 09:27:00 ... 11681 2019-09-17 20:56:00 11682 2019-09-01 16:00:00 11683 2019-09-23 07:39:00 11684 2019-09-19 17:30:00 11685 2019-09-30 00:18:00 Name: Order Date, Length: 182735, dtype: datetime64[ns]
df_clean.head()
Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address | |
---|---|---|---|---|---|---|
0 | 176558 | USB-C Charging Cable | 2 | 11.95 | 2019-04-19 08:46:00 | 917 1st St, Dallas, TX 75001 |
2 | 176559 | Bose SoundSport Headphones | 1 | 99.99 | 2019-04-07 22:30:00 | 682 Chestnut St, Boston, MA 02215 |
3 | 176560 | Google Phone | 1 | 600.00 | 2019-04-12 14:38:00 | 669 Spruce St, Los Angeles, CA 90001 |
4 | 176560 | Wired Headphones | 1 | 11.99 | 2019-04-12 14:38:00 | 669 Spruce St, Los Angeles, CA 90001 |
5 | 176561 | Wired Headphones | 1 | 11.99 | 2019-04-30 09:27:00 | 333 8th St, Los Angeles, CA 90001 |
df_clean.index
Int64Index([ 0, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... 11676, 11677, 11678, 11679, 11680, 11681, 11682, 11683, 11684, 11685], dtype='int64', length=182735)
df_clean = df_clean.set_index('Order Date')
df_clean.head()
Order ID | Product | Quantity Ordered | Price Each | Purchase Address | |
---|---|---|---|---|---|
Order Date | |||||
2019-04-19 08:46:00 | 176558 | USB-C Charging Cable | 2 | 11.95 | 917 1st St, Dallas, TX 75001 |
2019-04-07 22:30:00 | 176559 | Bose SoundSport Headphones | 1 | 99.99 | 682 Chestnut St, Boston, MA 02215 |
2019-04-12 14:38:00 | 176560 | Google Phone | 1 | 600.00 | 669 Spruce St, Los Angeles, CA 90001 |
2019-04-12 14:38:00 | 176560 | Wired Headphones | 1 | 11.99 | 669 Spruce St, Los Angeles, CA 90001 |
2019-04-30 09:27:00 | 176561 | Wired Headphones | 1 | 11.99 | 333 8th St, Los Angeles, CA 90001 |
df_clean.sort_index(inplace=True)
df_clean['Month'] = df_clean.index.month_name()
df_clean.head()
Order ID | Product | Quantity Ordered | Price Each | Purchase Address | Month | |
---|---|---|---|---|---|---|
Order Date | ||||||
2019-01-01 03:07:00 | 147268 | Wired Headphones | 1 | 11.99 | 9 Lake St, New York City, NY 10001 | January |
2019-01-01 03:40:00 | 148041 | USB-C Charging Cable | 1 | 11.95 | 760 Church St, San Francisco, CA 94016 | January |
2019-01-01 04:56:00 | 149343 | Apple Airpods Headphones | 1 | 150.00 | 735 5th St, New York City, NY 10001 | January |
2019-01-01 05:53:00 | 149964 | AAA Batteries (4-pack) | 1 | 2.99 | 75 Jackson St, Dallas, TX 75001 | January |
2019-01-01 06:03:00 | 149350 | USB-C Charging Cable | 2 | 11.95 | 943 2nd St, Atlanta, GA 30301 | January |
df_clean['chiffre_daffaire'] = df_clean['Quantity Ordered'] * df_clean['Price Each']
df_clean.sample(5)
Order ID | Product | Quantity Ordered | Price Each | Purchase Address | Month | chiffre_daffaire | |
---|---|---|---|---|---|---|---|
Order Date | |||||||
2019-05-03 22:01:00 | 198775 | AA Batteries (4-pack) | 1 | 3.84 | 516 Chestnut St, Boston, MA 02215 | May | 3.84 |
2019-02-20 13:17:00 | 152419 | 27in FHD Monitor | 1 | 149.99 | 66 Washington St, San Francisco, CA 94016 | February | 149.99 |
2019-01-13 13:09:00 | 142639 | 27in 4K Gaming Monitor | 1 | 389.99 | 345 South St, Seattle, WA 98101 | January | 389.99 |
2019-04-16 11:49:00 | 187268 | Wired Headphones | 1 | 11.99 | 811 Forest St, San Francisco, CA 94016 | April | 11.99 |
2019-01-23 18:55:00 | 144038 | iPhone | 1 | 700.00 | 751 Wilson St, Austin, TX 73301 | January | 700.00 |
df_clean.groupby('Month')["chiffre_daffaire"].sum().sort_values(ascending=False)
Month December 4.557905e+06 October 3.679254e+06 April 3.336376e+06 November 3.149785e+06 May 3.101881e+06 March 2.755969e+06 July 2.587445e+06 June 2.524465e+06 August 2.191698e+06 February 2.158127e+06 September 2.050361e+06 January 1.786511e+06 Name: chiffre_daffaire, dtype: float64
df_clean.groupby('Month').sum()["chiffre_daffaire"].sort_values(ascending=False).December
4557905.420001525
Décembre est le mois avec le plus de ventes et le CA est de 4557905.420001525
df_clean['Purchase Address']
Order Date 2019-01-01 03:07:00 9 Lake St, New York City, NY 10001 2019-01-01 03:40:00 760 Church St, San Francisco, CA 94016 2019-01-01 04:56:00 735 5th St, New York City, NY 10001 2019-01-01 05:53:00 75 Jackson St, Dallas, TX 75001 2019-01-01 06:03:00 943 2nd St, Atlanta, GA 30301 ... 2020-01-01 04:13:00 825 Adams St, Portland, OR 97035 2020-01-01 04:21:00 754 Hickory St, New York City, NY 10001 2020-01-01 04:54:00 784 River St, San Francisco, CA 94016 2020-01-01 05:13:00 657 Spruce St, New York City, NY 10001 2020-01-01 05:13:00 657 Spruce St, New York City, NY 10001 Name: Purchase Address, Length: 182735, dtype: object
def get_ville(addresse):
return addresse.split(',')[1].strip()
get_ville("760 Church St, San Francisco, CA 94016")
'San Francisco'
df_clean['ville'] = df_clean['Purchase Address'].apply(get_ville)
df_clean.sample(5)
Order ID | Product | Quantity Ordered | Price Each | Purchase Address | Month | chiffre_daffaire | ville | |
---|---|---|---|---|---|---|---|---|
Order Date | ||||||||
2019-11-14 11:41:00 | 284107 | AAA Batteries (4-pack) | 2 | 2.99 | 772 11th St, San Francisco, CA 94016 | November | 5.98 | San Francisco |
2019-12-22 22:20:00 | 305268 | iPhone | 1 | 700.00 | 516 Lincoln St, Los Angeles, CA 90001 | December | 700.00 | Los Angeles |
2019-02-17 03:50:00 | 154226 | Wired Headphones | 1 | 11.99 | 499 Cedar St, Los Angeles, CA 90001 | February | 11.99 | Los Angeles |
2019-09-30 15:30:00 | 252165 | 27in FHD Monitor | 1 | 149.99 | 605 6th St, Los Angeles, CA 90001 | September | 149.99 | Los Angeles |
2019-03-11 21:53:00 | 165902 | Lightning Charging Cable | 1 | 14.95 | 337 10th St, Portland, OR 97035 | March | 14.95 | Portland |
df_clean['ville'].unique()
array(['New York City', 'San Francisco', 'Dallas', 'Atlanta', 'Boston', 'Seattle', 'Los Angeles', 'Portland', 'Austin'], dtype=object)
df_clean.groupby('ville').sum()["chiffre_daffaire"].sort_values(ascending=False)
ville San Francisco 8.124121e+06 Los Angeles 5.354040e+06 New York City 4.581659e+06 Boston 3.604081e+06 Atlanta 2.741642e+06 Dallas 2.717794e+06 Seattle 2.693049e+06 Portland 2.276649e+06 Austin 1.786746e+06 Name: chiffre_daffaire, dtype: float64
On a fait un meilleur chiffre d'affaire à San Francisco
ca_by_ville = df_clean.groupby('ville').sum()["chiffre_daffaire"].sort_values(ascending=False)
ca_by_ville.plot(kind='bar', figsize=(8, 6))
plt.ylabel('Chiffre daffaire en Millions de dollars')
plt.title('Chiffre daffaire par Ville')
plt.show()
A quelle heure on vend le plus ?
df_clean['heure'] = df_clean.index.hour
df_clean['time'] = df_clean.index.time
df_clean.head()
Order ID | Product | Quantity Ordered | Price Each | Purchase Address | Month | chiffre_daffaire | ville | heure | time | |
---|---|---|---|---|---|---|---|---|---|---|
Order Date | ||||||||||
2019-01-01 03:07:00 | 147268 | Wired Headphones | 1 | 11.99 | 9 Lake St, New York City, NY 10001 | January | 11.99 | New York City | 3 | 03:07:00 |
2019-01-01 03:40:00 | 148041 | USB-C Charging Cable | 1 | 11.95 | 760 Church St, San Francisco, CA 94016 | January | 11.95 | San Francisco | 3 | 03:40:00 |
2019-01-01 04:56:00 | 149343 | Apple Airpods Headphones | 1 | 150.00 | 735 5th St, New York City, NY 10001 | January | 150.00 | New York City | 4 | 04:56:00 |
2019-01-01 05:53:00 | 149964 | AAA Batteries (4-pack) | 1 | 2.99 | 75 Jackson St, Dallas, TX 75001 | January | 2.99 | Dallas | 5 | 05:53:00 |
2019-01-01 06:03:00 | 149350 | USB-C Charging Cable | 2 | 11.95 | 943 2nd St, Atlanta, GA 30301 | January | 23.90 | Atlanta | 6 | 06:03:00 |
df_clean.time.sort_values(ascending=False)
Order Date 2019-11-02 23:59:00 23:59:00 2019-09-26 23:59:00 23:59:00 2019-11-04 23:59:00 23:59:00 2019-07-17 23:59:00 23:59:00 2019-07-01 23:59:00 23:59:00 ... 2019-12-31 00:00:00 00:00:00 2019-12-31 00:00:00 00:00:00 2019-11-15 00:00:00 00:00:00 2019-11-30 00:00:00 00:00:00 2019-05-03 00:00:00 00:00:00 Name: time, Length: 182735, dtype: object
df_clean.groupby('heure')['chiffre_daffaire'].sum().sort_values(ascending=False)
heure 19 2370005.59 12 2269676.91 11 2267169.73 20 2242856.37 18 2176209.65 13 2118394.83 17 2086082.23 14 2045085.42 21 2013288.02 10 1907179.29 15 1900151.56 16 1864530.92 9 1612498.90 22 1578173.21 8 1181363.76 23 1161803.96 7 734595.75 0 698694.76 1 447663.86 6 441523.14 2 234191.08 5 224313.72 4 161350.57 3 142976.54 Name: chiffre_daffaire, dtype: float64
ca_par_heure = pd.DataFrame(df_clean.groupby('heure')['chiffre_daffaire'].sum())
ca_par_heure
chiffre_daffaire | |
---|---|
heure | |
0 | 698694.76 |
1 | 447663.86 |
2 | 234191.08 |
3 | 142976.54 |
4 | 161350.57 |
5 | 224313.72 |
6 | 441523.14 |
7 | 734595.75 |
8 | 1181363.76 |
9 | 1612498.90 |
10 | 1907179.29 |
11 | 2267169.73 |
12 | 2269676.91 |
13 | 2118394.83 |
14 | 2045085.42 |
15 | 1900151.56 |
16 | 1864530.92 |
17 | 2086082.23 |
18 | 2176209.65 |
19 | 2370005.59 |
20 | 2242856.37 |
21 | 2013288.02 |
22 | 1578173.21 |
23 | 1161803.96 |
sns.lineplot(data=ca_par_heure['chiffre_daffaire'])
plt.xticks(ticks=range(0, 24))
plt.show()
On devrait afficher la publicité entre 10 et 11 et entre 18 et 19.
df_clean.head()
Order ID | Product | Quantity Ordered | Price Each | Purchase Address | Month | chiffre_daffaire | ville | heure | time | |
---|---|---|---|---|---|---|---|---|---|---|
Order Date | ||||||||||
2019-01-01 03:07:00 | 147268 | Wired Headphones | 1 | 11.99 | 9 Lake St, New York City, NY 10001 | January | 11.99 | New York City | 3 | 03:07:00 |
2019-01-01 03:40:00 | 148041 | USB-C Charging Cable | 1 | 11.95 | 760 Church St, San Francisco, CA 94016 | January | 11.95 | San Francisco | 3 | 03:40:00 |
2019-01-01 04:56:00 | 149343 | Apple Airpods Headphones | 1 | 150.00 | 735 5th St, New York City, NY 10001 | January | 150.00 | New York City | 4 | 04:56:00 |
2019-01-01 05:53:00 | 149964 | AAA Batteries (4-pack) | 1 | 2.99 | 75 Jackson St, Dallas, TX 75001 | January | 2.99 | Dallas | 5 | 05:53:00 |
2019-01-01 06:03:00 | 149350 | USB-C Charging Cable | 2 | 11.95 | 943 2nd St, Atlanta, GA 30301 | January | 23.90 | Atlanta | 6 | 06:03:00 |
def concat_produit_par_order_id(p):
return ";".join(p)
produit_ensemble = df_clean.drop_duplicates(['Order ID'])
len(produit_ensemble)
175449
df_clean['sales_together'] = df_clean.groupby('Order ID')['Product'].apply(concat_produit_par_order_id)
df_clean.head()
Order ID | Product | Quantity Ordered | Price Each | Purchase Address | Month | chiffre_daffaire | ville | heure | time | sales_together | |
---|---|---|---|---|---|---|---|---|---|---|---|
Order Date | |||||||||||
2019-01-01 03:07:00 | 147268 | Wired Headphones | 1 | 11.99 | 9 Lake St, New York City, NY 10001 | January | 11.99 | New York City | 3 | 03:07:00 | NaN |
2019-01-01 03:40:00 | 148041 | USB-C Charging Cable | 1 | 11.95 | 760 Church St, San Francisco, CA 94016 | January | 11.95 | San Francisco | 3 | 03:40:00 | NaN |
2019-01-01 04:56:00 | 149343 | Apple Airpods Headphones | 1 | 150.00 | 735 5th St, New York City, NY 10001 | January | 150.00 | New York City | 4 | 04:56:00 | NaN |
2019-01-01 05:53:00 | 149964 | AAA Batteries (4-pack) | 1 | 2.99 | 75 Jackson St, Dallas, TX 75001 | January | 2.99 | Dallas | 5 | 05:53:00 | NaN |
2019-01-01 06:03:00 | 149350 | USB-C Charging Cable | 2 | 11.95 | 943 2nd St, Atlanta, GA 30301 | January | 23.90 | Atlanta | 6 | 06:03:00 | NaN |
len(df_clean['sales_together'])
182735
produit_ensemble['groupby'] = df_clean.groupby('Order ID')['Product'].apply(concat_produit_par_order_id)
produit_ensemble.head()
Order ID | Product | Quantity Ordered | Price Each | Purchase Address | Month | chiffre_daffaire | ville | heure | time | groupby | |
---|---|---|---|---|---|---|---|---|---|---|---|
Order Date | |||||||||||
2019-01-01 03:07:00 | 147268 | Wired Headphones | 1 | 11.99 | 9 Lake St, New York City, NY 10001 | January | 11.99 | New York City | 3 | 03:07:00 | NaN |
2019-01-01 03:40:00 | 148041 | USB-C Charging Cable | 1 | 11.95 | 760 Church St, San Francisco, CA 94016 | January | 11.95 | San Francisco | 3 | 03:40:00 | NaN |
2019-01-01 04:56:00 | 149343 | Apple Airpods Headphones | 1 | 150.00 | 735 5th St, New York City, NY 10001 | January | 150.00 | New York City | 4 | 04:56:00 | NaN |
2019-01-01 05:53:00 | 149964 | AAA Batteries (4-pack) | 1 | 2.99 | 75 Jackson St, Dallas, TX 75001 | January | 2.99 | Dallas | 5 | 05:53:00 | NaN |
2019-01-01 06:03:00 | 149350 | USB-C Charging Cable | 2 | 11.95 | 943 2nd St, Atlanta, GA 30301 | January | 23.90 | Atlanta | 6 | 06:03:00 | NaN |
df_clean.groupby('Order ID')['Product'].apply(concat_produit_par_order_id)
Order ID 141234 iPhone 141235 Lightning Charging Cable 141236 Wired Headphones 141237 27in FHD Monitor 141238 Wired Headphones ... 319666 Lightning Charging Cable 319667 AA Batteries (4-pack) 319668 Vareebadd Phone 319669 Wired Headphones 319670 Bose SoundSport Headphones Name: Product, Length: 175449, dtype: object
df_clean.groupby('Order ID')['Product'].groups
{'141234': [2019-01-22 21:25:00], '141235': [2019-01-28 14:15:00], '141236': [2019-01-17 13:33:00], '141237': [2019-01-05 20:33:00], '141238': [2019-01-25 11:59:00], '141239': [2019-01-29 20:22:00], '141240': [2019-01-26 12:16:00], '141241': [2019-01-05 12:04:00], '141242': [2019-01-01 10:30:00], '141243': [2019-01-22 21:20:00], '141244': [2019-01-07 11:29:00], '141245': [2019-01-31 10:12:00], '141246': [2019-01-09 18:57:00], '141247': [2019-01-25 19:19:00], '141248': [2019-01-03 21:54:00], '141249': [2019-01-05 17:20:00], '141250': [2019-01-10 11:20:00], '141251': [2019-01-24 08:13:00], '141252': [2019-01-30 09:28:00], '141253': [2019-01-17 00:09:00], '141254': [2019-01-08 11:51:00], '141255': [2019-01-09 20:55:00], '141256': [2019-01-29 10:40:00], '141257': [2019-01-12 18:51:00], '141258': [2019-01-19 21:47:00], '141259': [2019-01-20 17:26:00], '141260': [2019-01-01 22:00:00], '141261': [2019-01-09 18:14:00], '141262': [2019-01-16 12:35:00], '141263': [2019-01-11 23:33:00], '141264': [2019-01-03 09:46:00], '141265': [2019-01-01 16:52:00], '141266': [2019-01-02 22:21:00], '141267': [2019-01-09 08:28:00], '141268': [2019-01-14 10:13:00], '141269': [2019-01-03 20:05:00], '141270': [2019-01-27 23:10:00], '141271': [2019-01-30 10:51:00], '141272': [2019-01-12 13:09:00], '141273': [2019-01-29 12:04:00], '141274': [2019-01-17 11:30:00], '141275': [2019-01-07 16:06:00, 2019-01-07 16:06:00], '141276': [2019-01-21 22:23:00], '141277': [2019-01-13 19:07:00], '141278': [2019-01-26 12:14:00], '141279': [2019-01-03 19:10:00], '141280': [2019-01-20 16:10:00], '141281': [2019-01-05 16:51:00], '141282': [2019-01-11 18:10:00], '141283': [2019-01-02 16:16:00], '141284': [2019-01-29 18:30:00], '141285': [2019-01-14 14:13:00], '141286': [2019-01-02 20:33:00], '141287': [2019-01-31 08:38:00], '141288': [2019-01-19 08:17:00], '141289': [2019-01-28 11:17:00], '141290': [2019-01-02 08:25:00, 2019-01-02 08:25:00], '141291': [2019-01-26 18:11:00], '141292': [2019-01-21 08:46:00], '141293': [2019-01-18 12:21:00], '141294': [2019-01-25 08:12:00], '141295': [2019-01-06 20:06:00], '141296': [2019-01-20 00:21:00], '141297': [2019-01-04 11:09:00], '141298': [2019-01-21 13:24:00], '141299': [2019-01-31 23:23:00], '141300': [2019-01-23 10:21:00], '141301': [2019-01-27 11:58:00], '141302': [2019-01-18 20:12:00], '141303': [2019-01-19 09:23:00], '141304': [2019-01-28 13:10:00], '141305': [2019-01-27 16:51:00], '141306': [2019-01-19 10:25:00], '141307': [2019-01-11 11:18:00], '141308': [2019-01-12 12:00:00], '141309': [2019-01-20 07:02:00], '141310': [2019-01-07 13:32:00], '141311': [2019-01-28 12:45:00], '141312': [2019-01-08 23:20:00], '141313': [2019-01-09 15:03:00], '141314': [2019-01-13 23:51:00], '141315': [2019-01-10 01:32:00], '141316': [2019-01-01 07:26:00], '141317': [2019-01-21 11:18:00], '141318': [2019-01-09 15:21:00], '141319': [2019-01-01 10:43:00], '141320': [2019-01-24 13:37:00], '141321': [2019-01-10 09:07:00], '141322': [2019-01-12 21:56:00], '141323': [2019-01-17 21:54:00], '141324': [2019-01-04 23:36:00], '141325': [2019-01-23 15:17:00], '141326': [2019-01-05 13:49:00], '141327': [2019-01-31 01:38:00], '141328': [2019-01-06 23:18:00], '141329': [2019-01-01 16:01:00], '141330': [2019-01-21 13:04:00], '141331': [2019-01-09 18:32:00], '141332': [2019-01-19 18:52:00], '141333': [2019-01-01 13:58:00], ...}
df_clean['sales_product_together'] = df_clean.groupby('Order ID')['Product'].transform(lambda x: '; '.join(set(x)))
df_clean.head()
Order ID | Product | Quantity Ordered | Price Each | Purchase Address | Month | chiffre_daffaire | ville | heure | time | sales_together | sales_product_together | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Order Date | ||||||||||||
2019-01-01 03:07:00 | 147268 | Wired Headphones | 1 | 11.99 | 9 Lake St, New York City, NY 10001 | January | 11.99 | New York City | 3 | 03:07:00 | NaN | Wired Headphones |
2019-01-01 03:40:00 | 148041 | USB-C Charging Cable | 1 | 11.95 | 760 Church St, San Francisco, CA 94016 | January | 11.95 | San Francisco | 3 | 03:40:00 | NaN | USB-C Charging Cable |
2019-01-01 04:56:00 | 149343 | Apple Airpods Headphones | 1 | 150.00 | 735 5th St, New York City, NY 10001 | January | 150.00 | New York City | 4 | 04:56:00 | NaN | Apple Airpods Headphones |
2019-01-01 05:53:00 | 149964 | AAA Batteries (4-pack) | 1 | 2.99 | 75 Jackson St, Dallas, TX 75001 | January | 2.99 | Dallas | 5 | 05:53:00 | NaN | AAA Batteries (4-pack) |
2019-01-01 06:03:00 | 149350 | USB-C Charging Cable | 2 | 11.95 | 943 2nd St, Atlanta, GA 30301 | January | 23.90 | Atlanta | 6 | 06:03:00 | NaN | USB-C Charging Cable |
pduit = pd.DataFrame(df_clean['sales_product_together'].value_counts())
pduit
sales_product_together | |
---|---|
Lightning Charging Cable | 19632 |
AAA Batteries (4-pack) | 19597 |
USB-C Charging Cable | 19563 |
AA Batteries (4-pack) | 19537 |
Wired Headphones | 16979 |
... | ... |
Google Phone; LG Washing Machine | 2 |
27in 4K Gaming Monitor; LG Washing Machine | 2 |
AA Batteries (4-pack); LG Washing Machine | 2 |
27in FHD Monitor; LG Washing Machine | 2 |
Vareebadd Phone; LG Dryer | 2 |
241 rows × 1 columns
pduit.index.to_list()
['Lightning Charging Cable', 'AAA Batteries (4-pack)', 'USB-C Charging Cable', 'AA Batteries (4-pack)', 'Wired Headphones', 'Apple Airpods Headphones', 'Bose SoundSport Headphones', '27in FHD Monitor', '27in 4K Gaming Monitor', '34in Ultrawide Monitor', 'iPhone', 'Flatscreen TV', 'Macbook Pro Laptop', 'ThinkPad Laptop', '20in Monitor', 'Google Phone', 'USB-C Charging Cable; Google Phone', 'Vareebadd Phone', 'iPhone; Lightning Charging Cable', 'Lightning Charging Cable; iPhone', 'iPhone; Wired Headphones', 'LG Washing Machine', 'LG Dryer', 'USB-C Charging Cable; Vareebadd Phone', 'Google Phone; Wired Headphones', 'iPhone; Apple Airpods Headphones', 'Bose SoundSport Headphones; Google Phone', 'USB-C Charging Cable; Google Phone; Wired Headphones', 'Vareebadd Phone; Wired Headphones', 'USB-C Charging Cable; Lightning Charging Cable', 'USB-C Charging Cable; AAA Batteries (4-pack)', 'AAA Batteries (4-pack); AA Batteries (4-pack)', 'AAA Batteries (4-pack); Wired Headphones', 'USB-C Charging Cable; Wired Headphones', 'AAA Batteries (4-pack); Lightning Charging Cable', 'AA Batteries (4-pack); Wired Headphones', 'AAA Batteries (4-pack); Apple Airpods Headphones', 'USB-C Charging Cable; AA Batteries (4-pack)', 'AA Batteries (4-pack); Apple Airpods Headphones', 'Lightning Charging Cable; Apple Airpods Headphones', 'Lightning Charging Cable; Wired Headphones', 'Bose SoundSport Headphones; Lightning Charging Cable', 'USB-C Charging Cable; Apple Airpods Headphones', 'Bose SoundSport Headphones; Vareebadd Phone', 'Bose SoundSport Headphones; AA Batteries (4-pack)', 'USB-C Charging Cable; Bose SoundSport Headphones', 'Lightning Charging Cable; iPhone; Wired Headphones', 'Lightning Charging Cable; AA Batteries (4-pack)', 'USB-C Charging Cable; Bose SoundSport Headphones; Google Phone', 'AA Batteries (4-pack); Lightning Charging Cable', 'Bose SoundSport Headphones; Apple Airpods Headphones', 'Apple Airpods Headphones; Wired Headphones', 'USB-C Charging Cable; Vareebadd Phone; Wired Headphones', 'Bose SoundSport Headphones; Wired Headphones', 'USB-C Charging Cable; 27in FHD Monitor', 'AAA Batteries (4-pack); 27in FHD Monitor', 'Lightning Charging Cable; 27in FHD Monitor', 'iPhone; Lightning Charging Cable; Apple Airpods Headphones', 'AA Batteries (4-pack); 34in Ultrawide Monitor', 'Bose SoundSport Headphones; Google Phone; Wired Headphones', 'Lightning Charging Cable; 34in Ultrawide Monitor', '27in 4K Gaming Monitor; Lightning Charging Cable', 'iPhone; Lightning Charging Cable; Wired Headphones', '27in 4K Gaming Monitor; AAA Batteries (4-pack)', '34in Ultrawide Monitor; Wired Headphones', 'AAA Batteries (4-pack); Bose SoundSport Headphones', 'Bose SoundSport Headphones; AAA Batteries (4-pack)', 'AA Batteries (4-pack); 27in FHD Monitor', '27in 4K Gaming Monitor; Wired Headphones', 'Lightning Charging Cable; iPhone; Apple Airpods Headphones', 'AAA Batteries (4-pack); iPhone', '27in 4K Gaming Monitor; AA Batteries (4-pack)', 'USB-C Charging Cable; 34in Ultrawide Monitor', 'Bose SoundSport Headphones; 27in FHD Monitor', 'USB-C Charging Cable; 20in Monitor', '20in Monitor; Lightning Charging Cable', '27in FHD Monitor; Apple Airpods Headphones', 'Lightning Charging Cable; Google Phone', 'AAA Batteries (4-pack); 34in Ultrawide Monitor', 'USB-C Charging Cable; iPhone', 'Wired Headphones; Apple Airpods Headphones', 'Macbook Pro Laptop; Lightning Charging Cable', '27in FHD Monitor; Wired Headphones', '27in 4K Gaming Monitor; Apple Airpods Headphones', 'AAA Batteries (4-pack); Flatscreen TV', 'AA Batteries (4-pack); Google Phone', 'USB-C Charging Cable; ThinkPad Laptop', 'AA Batteries (4-pack); Flatscreen TV', 'USB-C Charging Cable; Bose SoundSport Headphones; Vareebadd Phone', 'Lightning Charging Cable; Flatscreen TV', 'AAA Batteries (4-pack); Google Phone', 'USB-C Charging Cable; Macbook Pro Laptop', 'Macbook Pro Laptop; Wired Headphones', 'iPhone; Wired Headphones; Apple Airpods Headphones', 'ThinkPad Laptop; Lightning Charging Cable', '27in 4K Gaming Monitor; Bose SoundSport Headphones', 'AA Batteries (4-pack); iPhone', '20in Monitor; Wired Headphones', 'Macbook Pro Laptop; AA Batteries (4-pack)', 'ThinkPad Laptop; AA Batteries (4-pack)', 'AAA Batteries (4-pack); Macbook Pro Laptop', 'Google Phone; Apple Airpods Headphones', 'USB-C Charging Cable; Flatscreen TV', 'ThinkPad Laptop; Apple Airpods Headphones', 'Macbook Pro Laptop; Apple Airpods Headphones', 'Bose SoundSport Headphones; Macbook Pro Laptop', 'Bose SoundSport Headphones; 34in Ultrawide Monitor', 'iPhone; Apple Airpods Headphones; Wired Headphones', '20in Monitor; AA Batteries (4-pack)', '27in 4K Gaming Monitor; USB-C Charging Cable', 'AAA Batteries (4-pack); ThinkPad Laptop', '20in Monitor; Bose SoundSport Headphones', 'ThinkPad Laptop; Wired Headphones', '34in Ultrawide Monitor; Apple Airpods Headphones', 'Flatscreen TV; Apple Airpods Headphones', '20in Monitor; Apple Airpods Headphones', 'ThinkPad Laptop; AAA Batteries (4-pack)', 'iPhone; 34in Ultrawide Monitor', '27in 4K Gaming Monitor; Macbook Pro Laptop', 'Bose SoundSport Headphones; Flatscreen TV', 'Bose SoundSport Headphones; iPhone', 'AA Batteries (4-pack); Vareebadd Phone', 'iPhone; AA Batteries (4-pack)', 'Flatscreen TV; Wired Headphones', '34in Ultrawide Monitor; 27in FHD Monitor', '20in Monitor; AAA Batteries (4-pack)', 'Vareebadd Phone; Apple Airpods Headphones', 'iPhone; Flatscreen TV', '27in 4K Gaming Monitor; 27in FHD Monitor', '27in 4K Gaming Monitor; iPhone', 'Macbook Pro Laptop; 34in Ultrawide Monitor', '34in Ultrawide Monitor; Flatscreen TV', 'Bose SoundSport Headphones; ThinkPad Laptop', 'Macbook Pro Laptop; 27in FHD Monitor', 'USB-C Charging Cable; 27in 4K Gaming Monitor', 'Macbook Pro Laptop; iPhone', '20in Monitor; Macbook Pro Laptop', '27in 4K Gaming Monitor; 34in Ultrawide Monitor', 'ThinkPad Laptop; iPhone', 'ThinkPad Laptop; Flatscreen TV', 'Lightning Charging Cable; LG Washing Machine', 'iPhone; Google Phone', '27in 4K Gaming Monitor; Google Phone', 'ThinkPad Laptop; 27in FHD Monitor', 'AAA Batteries (4-pack); Vareebadd Phone', 'Macbook Pro Laptop; Google Phone', 'AA Batteries (4-pack); LG Dryer', '27in 4K Gaming Monitor; ThinkPad Laptop', 'ThinkPad Laptop; Bose SoundSport Headphones', '27in 4K Gaming Monitor; Flatscreen TV', 'ThinkPad Laptop; Google Phone', 'Google Phone; 27in FHD Monitor', '27in FHD Monitor; Flatscreen TV', 'USB-C Charging Cable; AAA Batteries (4-pack); Google Phone', 'AAA Batteries (4-pack); iPhone; Apple Airpods Headphones', '20in Monitor; 27in FHD Monitor', 'Vareebadd Phone; Google Phone', 'ThinkPad Laptop; 34in Ultrawide Monitor', 'Google Phone; Flatscreen TV', 'USB-C Charging Cable; Bose SoundSport Headphones; Google Phone; Wired Headphones', 'USB-C Charging Cable; Bose SoundSport Headphones; Vareebadd Phone; Wired Headphones', 'iPhone; Vareebadd Phone', 'Macbook Pro Laptop; Flatscreen TV', 'Vareebadd Phone; 34in Ultrawide Monitor', 'AAA Batteries (4-pack); LG Washing Machine', '20in Monitor; Google Phone', 'ThinkPad Laptop; Macbook Pro Laptop', 'Bose SoundSport Headphones; Vareebadd Phone; Wired Headphones', 'Lightning Charging Cable; Vareebadd Phone', 'USB-C Charging Cable; AA Batteries (4-pack); Google Phone', 'LG Dryer; Apple Airpods Headphones', 'Lightning Charging Cable; LG Dryer', 'LG Washing Machine; Wired Headphones', '34in Ultrawide Monitor; Google Phone', 'Wired Headphones; iPhone; Lightning Charging Cable; Google Phone; Apple Airpods Headphones', '20in Monitor; Flatscreen TV', 'AAA Batteries (4-pack); LG Dryer', 'AA Batteries (4-pack); Lightning Charging Cable; iPhone; Wired Headphones', 'USB-C Charging Cable; Apple Airpods Headphones; Google Phone; Wired Headphones', '20in Monitor; ThinkPad Laptop', 'iPhone; Apple Airpods Headphones; Lightning Charging Cable; Wired Headphones', 'Vareebadd Phone; 27in FHD Monitor', 'USB-C Charging Cable; Lightning Charging Cable; iPhone; Wired Headphones', '27in 4K Gaming Monitor; 20in Monitor', 'Bose SoundSport Headphones; LG Washing Machine', 'USB-C Charging Cable; iPhone; Google Phone; Wired Headphones', '20in Monitor; LG Washing Machine', 'Macbook Pro Laptop; LG Washing Machine', 'LG Dryer; 27in FHD Monitor', '27in 4K Gaming Monitor; Vareebadd Phone', '20in Monitor; iPhone', '27in 4K Gaming Monitor; LG Dryer', 'USB-C Charging Cable; Bose SoundSport Headphones; 34in Ultrawide Monitor; Google Phone', 'iPhone; LG Washing Machine', 'USB-C Charging Cable; AA Batteries (4-pack); Google Phone; Wired Headphones', 'USB-C Charging Cable; Google Phone; 27in FHD Monitor; Wired Headphones', 'Vareebadd Phone; Flatscreen TV', 'Lightning Charging Cable; Apple Airpods Headphones; iPhone; Wired Headphones', 'ThinkPad Laptop; Vareebadd Phone', 'iPhone; Wired Headphones; Lightning Charging Cable; Apple Airpods Headphones', '20in Monitor; 34in Ultrawide Monitor', 'USB-C Charging Cable; Vareebadd Phone; Apple Airpods Headphones', 'AA Batteries (4-pack); Lightning Charging Cable; iPhone', 'Bose SoundSport Headphones; iPhone; Apple Airpods Headphones', '27in 4K Gaming Monitor; iPhone; Lightning Charging Cable', 'USB-C Charging Cable; Vareebadd Phone; Google Phone', 'USB-C Charging Cable; Lightning Charging Cable; iPhone', 'iPhone; Lightning Charging Cable; Flatscreen TV', 'Bose SoundSport Headphones; Google Phone; Apple Airpods Headphones', 'iPhone; Vareebadd Phone; Wired Headphones', 'Bose SoundSport Headphones; Google Phone; 27in FHD Monitor', 'USB-C Charging Cable; iPhone; Vareebadd Phone', 'AA Batteries (4-pack); Google Phone; Wired Headphones', 'Bose SoundSport Headphones; Lightning Charging Cable; Google Phone', 'Lightning Charging Cable; Google Phone; iPhone', 'USB-C Charging Cable; Lightning Charging Cable; Google Phone', 'USB-C Charging Cable; Google Phone; 27in FHD Monitor', 'Macbook Pro Laptop; Google Phone; Wired Headphones', 'Apple Airpods Headphones; Vareebadd Phone; Wired Headphones', 'Lightning Charging Cable; 34in Ultrawide Monitor; iPhone', '27in 4K Gaming Monitor; Vareebadd Phone; Wired Headphones', 'Lightning Charging Cable; Vareebadd Phone; iPhone', 'USB-C Charging Cable; Google Phone; Apple Airpods Headphones', 'Bose SoundSport Headphones; Vareebadd Phone; Flatscreen TV', 'iPhone; Lightning Charging Cable; AA Batteries (4-pack)', 'USB-C Charging Cable; iPhone; Lightning Charging Cable', 'LG Dryer; Wired Headphones', 'LG Dryer; Flatscreen TV', 'Google Phone; LG Dryer', 'Apple Airpods Headphones; LG Washing Machine', 'ThinkPad Laptop; LG Dryer', 'Wired Headphones; LG Washing Machine', 'USB-C Charging Cable; LG Dryer', 'iPhone; 27in FHD Monitor', '34in Ultrawide Monitor; LG Washing Machine', 'Google Phone; 34in Ultrawide Monitor', 'Google Phone; LG Washing Machine', '27in 4K Gaming Monitor; LG Washing Machine', 'AA Batteries (4-pack); LG Washing Machine', '27in FHD Monitor; LG Washing Machine', 'Vareebadd Phone; LG Dryer']
df_clean.groupby('Product').sum()['Quantity Ordered'].sort_values(ascending=False)
Product AAA Batteries (4-pack) 30487 AA Batteries (4-pack) 27148 USB-C Charging Cable 23555 Lightning Charging Cable 22841 Wired Headphones 20201 Apple Airpods Headphones 15383 Bose SoundSport Headphones 13236 27in FHD Monitor 7429 iPhone 6732 27in 4K Gaming Monitor 6134 34in Ultrawide Monitor 6076 Google Phone 5440 Flatscreen TV 4724 Macbook Pro Laptop 4645 20in Monitor 4058 ThinkPad Laptop 4054 Vareebadd Phone 2023 LG Washing Machine 649 LG Dryer 640 Name: Quantity Ordered, dtype: int64
def filter_number(pdu):
return len(pdu.split(';')) > 1
pduit['name'] = pduit.index
pduit['n'] = pduit.name.apply(filter_number)
pduit
sales_product_together | name | n | |
---|---|---|---|
Lightning Charging Cable | 19632 | Lightning Charging Cable | False |
AAA Batteries (4-pack) | 19597 | AAA Batteries (4-pack) | False |
USB-C Charging Cable | 19563 | USB-C Charging Cable | False |
AA Batteries (4-pack) | 19537 | AA Batteries (4-pack) | False |
Wired Headphones | 16979 | Wired Headphones | False |
... | ... | ... | ... |
Google Phone; LG Washing Machine | 2 | Google Phone; LG Washing Machine | True |
27in 4K Gaming Monitor; LG Washing Machine | 2 | 27in 4K Gaming Monitor; LG Washing Machine | True |
AA Batteries (4-pack); LG Washing Machine | 2 | AA Batteries (4-pack); LG Washing Machine | True |
27in FHD Monitor; LG Washing Machine | 2 | 27in FHD Monitor; LG Washing Machine | True |
Vareebadd Phone; LG Dryer | 2 | Vareebadd Phone; LG Dryer | True |
241 rows × 3 columns
Lightning Charging Cable; iPhone
df_clean.groupby('Product').sum()['Price Each'].sort_values()
Product AAA Batteries (4-pack) 60676.07 AA Batteries (4-pack) 77621.76 Wired Headphones 222438.48 USB-C Charging Cable 257199.85 Lightning Charging Cable 318554.60 LG Dryer 384000.00 LG Washing Machine 389400.00 20in Monitor 443479.68 Vareebadd Phone 808000.00 27in FHD Monitor 1107826.14 Bose SoundSport Headphones 1310768.91 Flatscreen TV 1411500.00 Apple Airpods Headphones 2290950.00 34in Ultrawide Monitor 2301979.42 27in 4K Gaming Monitor 2387518.78 Google Phone 3259800.00 ThinkPad Laptop 4051959.48 iPhone 4707500.00 Macbook Pro Laptop 7889700.00 Name: Price Each, dtype: float64