-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathuntitled0.py
444 lines (316 loc) · 17.9 KB
/
untitled0.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
# -*- coding: utf-8 -*-
"""Untitled0.ipynb
Automatically generated by Colab.
Original file is located at
https://colab.research.google.com/drive/1TYd5hXQwzKJjccE6U_I622Chv0YVDRZN
#**Aerofit**
Aerofit is a leading brand in the field of fitness equipment. Aerofit provides a product range
including machines such as treadmills, exercise bikes, gym equipment, and fitness
accessories to cater to the needs of all categories of people.
Business Problem
The market research team at AeroFit wants to identify the characteristics of the target
audience for each type of treadmill offered by the company, to provide a better
recommendation of the treadmills to the new customers. The team decides to investigate
whether there are differences across the product with respect to customer characteristics.
1. Perform descriptive analytics to create a customer profile for each AeroFit treadmill
product by developing appropriate tables and charts.
2. For each AeroFit treadmill product, construct two-way contingency tables and compute
all conditional and marginal probabilities along with their insights/impact on the
business.
Dataset
The company collected the data on individuals who purchased a treadmill from the AeroFit
stores during the prior three months. The dataset has the following features:
"""
!gdown https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/001/125/original/aerofit_treadmill.csv?1639992749
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/001/125/original/aerofit_treadmill.csv?1639992749')
df
df.head(10)
df.describe(include='object').T
df.shape
"""This tells us the data is made of 180 rows and 9 columns
* Columns consist:
1. Products
2. Age
3. Gender
4. Education
5. MaritalStatus
6. Usage
7. Fitness
8. Income
9. Miles
"""
df.describe().T
"""This statistical summary gives us the
1. count: The number of non-null values in each column.
2. mean: The average value of each column.
3. std: The standard deviation, a measure of the dispersion of values around the mean.
4. min: The minimum value in each column.
5. 25%: The first quartile, also known as the lower quartile or 25th percentile.
6. 50%: The second quartile, also known as the median or 50th percentile.
7. 75%: The third quartile, also known as the upper quartile or 75th percentile.
8. max: The maximum value in each column.
"""
df.info()
"""This gives us
1. Index: It tells about the index type and number of entries.
2. Columns: It lists the column names and the number of non-null values in each column.
3. Data Type: For each column, it displays the data type (e.g., integer, float, object/string).
4. Memory Usage: It provides an estimate of the memory usage of the DataFrame.
"""
df['Product'].unique()
"""gives us 3 models (their data type is obejct)"""
df['Age'].unique()
age_count =df['Age'].value_counts(normalize=True)*100
age_count.round(0)
"""This gives us the mode of the data that is 25 as the age therefore telling us that most number of buyers are 25 years old."""
age_count[(age_count.index>=20)&(age_count.index<=35)].sum().round(0)
"""We can clearly see that 82% of all customers fall within the age range of 20 to 35"""
df['Gender'].value_counts(normalize=True)*100
"""58% customers are male while the other 42% are females"""
df['MaritalStatus'].value_counts(normalize=True)*100
"""60% customers have a marital partner"""
df['Income'].describe()
count_of_product = df['Product'].value_counts(normalize=True)*100.0
print(count_of_product.round(3))
"""## *We can clearly see that the KP281 is preferred by 44% of all customers while KP481 is preferred by 33% and KP781 22% respectively.*"""
plt.figure(figsize=(12, 8))
plt.subplot(2, 2, 1)
sns.countplot(data=df, x=df['Product'], palette='viridis')
plt.xlabel('Treadmill')
plt.ylabel('Number of Users')
plt.show()
"""It is Clearly visible that KP281 is the most sold model while KP781 does not perform that well in the market
Hence pushing for KP281 even more and positioning KP781 in the right markets
"""
df['Age Group']= pd.cut(df['Age'],bins=[17,29,39,50],labels=['young','middle aged','old'])
df['Age Group'].value_counts()
plt.figure(figsize=(12, 8))
plt.subplot(2, 2, 1)
sns.countplot(data=df, x=df['Age Group'], palette='viridis')
"""Majority of users are young hence more gen-z and millenial targeted marketing can benefit while also assisting the older individuals by educating them about the advantages of regualr walking in middle and old age"""
df['Income Group']= pd.cut(df['Income'],bins=[29000,50000,75000,105000],labels=['low','mid','high'])
df['Income Group'].value_counts()
plt.figure(figsize=(12, 8))
plt.subplot(2, 2, 1)
sns.countplot(data=df, x=df['Income Group'], palette='viridis')
"""We see low and mid income individuals buying the major chunk of Treadmills therefore more focused marketing towards these individuals alongside a premium offering for the high income individuals"""
plt.figure(figsize=(12, 8))
plt.subplot(2, 2, 1)
sns.countplot(data=df, x=df['Gender'], palette='viridis')
"""Majority are Male customers hence there is scope for attracting more female audience as well with proper marketing and education campaigns
Huge market of housewives who don't have much opportunities to exercise regularly
"""
plt.figure(figsize=(12, 8))
plt.subplot(2, 2, 1)
sns.countplot(data=df, x=df['MaritalStatus'], palette='viridis')
"""We can clearly notice that Partnered individuals tend to buy more treadmills Hence better targetting for singles can be done with quirky and innovative marketing campaigns"""
plt.figure(figsize=(20,10))
plt.subplot(2,3,1)
sns.boxplot(data=df, x='Age')
plt.subplot(2,3,2)
sns.kdeplot(data=df, x='Age')
"""We can clearly see there are 3 outliers in the Age category and the most customers belong to the age group around 25 years old"""
plt.figure(figsize=(20,10))
plt.subplot(2,3,1)
sns.boxplot(data=df, x='Income')
plt.subplot(2,3,2)
sns.kdeplot(data=df, x='Income')
"""We can clearly see that there are 11 outliers in the Income category and the most number of cutomers earn around 50000"""
plt.figure(figsize=(20,10))
plt.subplot(2,3,1)
sns.boxplot(data=df, x='Education')
plt.subplot(2,3,2)
sns.kdeplot(data=df, x='Education')
plt.figure(figsize=(20,10))
plt.subplot(2,3,1)
sns.histplot(data=df,x='Fitness')
plt.xlabel('Fitness Level')
plt.ylabel('Count of users')
plt.subplot(2,3,2)
sns.kdeplot(data=df,x='Fitness')
plt.xlabel('Fitness Level')
plt.ylabel('Prob. of buying')
"""WE can clearly observe people with a fitness level of around 3 or above have a higher tendancy to buy and use the treadmills"""
plt.figure(figsize=(20,10))
plt.subplot(2,3,2)
sns.kdeplot(data=df,x='Usage')
plt.xlabel('Usage of Treadmill(in days)')
plt.ylabel('Prob. of buying')
plt.subplot(2,3,3)
sns.boxplot(data=df,x= 'Usage')
plt.xlabel('Usage of Treadmil(in days)')
plt.suptitle('Usage Distribution')
"""The probability of buying a treadmill increases when a person uses it for almost 3-4 days a week"""
plt.figure(figsize=(20,10))
plt.subplot(2,3,2)
sns.kdeplot(data=df,x='Miles')
plt.xlabel('Miles')
plt.ylabel('Prob. of buying')
plt.subplot(2,3,3)
sns.boxplot(data=df,x='Miles')
plt.xlabel('Miles')
plt.suptitle('Miles Distribution')
"""a person walking around 100 miles is more likely to invest ina. treadmill"""
for col in df.select_dtypes(np.number):
mean = df[col].mean().round(2)
standard_deviation = df[col].std().round(2)
median = df[col].median().round(2)
minimum = df[col].min()
maximum = df[col].max()
q1 = np.percentile(df[col],25)
q3 = np.percentile(df[col],75)
IQR = q3-q1
upper_bound = q3+1.5*IQR
lower_bound = q1-1.5*IQR
print(f'--- Descriptive Statistics of', col, 'column ---')
print(f'Mean :', mean)
print(f'Standard Deviation :', standard_deviation)
print(f'Median :',median)
print(f'Minimum :',minimum)
print(f'Maximum :', maximum)
print(f'25th Percentile :',q1)
print(f'75th Percentile :',q3)
print(f'Inter Quartile Range :', IQR)
print(f'Upper bound:',upper_bound)
print(f'Lower bound:',lower_bound)
print()
sns.countplot(data=df,x='Gender',hue='Product' ,palette='viridis')
plt.suptitle('Gender Distribution', fontsize=18)
plt.xlabel('Gender')
plt.show()
"""Both genders equally prefer KP281 but males tend to buy more KP781s as compared to females"""
sns.countplot(data=df,x='Age Group', hue='Product', palette='viridis')
plt.title('Age Group', fontsize=14)
plt.xlabel('Age group')
plt.show()
print()
print ('18-29: Young')
print('30-39: Middle-aged')
print ('40-50: 01d')
"""The young fitness and health senstive generation is more inclined towards buying treadmills"""
sns.countplot(data=df, x='MaritalStatus', hue='Product', palette='viridis')
plt.suptitle('Marital Status', fontsize=18)
plt.xlabel('Marital Status')
plt.show()
"""partnered people invest more in treadmills than singles"""
sns.countplot(data=df, x='Income Group', hue='Product', palette='viridis')
plt.title('Income Groups', fontsize=18)
plt.xlabel('Income Group')
plt.show()
"""It is clearly visible that low income individuals prefer KP281 in huge numbers while the high incoome individuals only buy KP781"""
sns.countplot(data=df, x='Education', hue='Product', palette='viridis')
plt.title('Education', fontsize=18)
plt.figure(figsize=(20,5))
plt.subplot(1,2,1)
sns.countplot(data=df, x='Product', hue='Usage' , palette='viridis')
plt.xlabel('Treadmill')
plt.ylabel('Count of customers')
plt.title('Distribution of Usage for each Treadmill')
plt.subplot(1,2,2)
sns.countplot(data=df, x='Product', hue='Fitness', palette='viridis')
plt.xlabel('Treadmill')
plt.ylabel('Count of customers')
plt.title('Distribution of Fitness for each Treadmill')
plt.show()
"""• KP281 and KP481 are popular among users exercising 3 times weekly, while KP781 is the choice for those using treadmills 4-5 times a week.
• KP281 and KP481 are preferred by customers with a fitness level of 3, whereas those at level 5 mainly opt for the advanced KP781.
"""
numeric_df = df.select_dtypes(include=[np.number])
plt.figure(figsize=(10,6))
sns.heatmap(numeric_df.corr(), annot=True, cmap='plasma', fmt='.2f')
plt.show()
"""This shows that age is related to education with a unit of 0.28
Usage and fitness relation of 0.67 shows that fitter individuals use the treadmill more
"""
sns.pairplot(df,hue ='Product')
"""##Probablities and how each aspect affects whether someone will buy a treadmill or not"""
pd.crosstab(index =df['Product'],columns = df['Age Group'],margins =True,normalize = True).round(2)
pd.crosstab(index =df['Product'],columns = df['Education'],margins =True,normalize = True).round(2)
pd.crosstab(index =df['Product'],columns = df['Gender'],margins =True,normalize = True).round(2)
pd.crosstab(index =df['Product'],columns = df['MaritalStatus'],margins =True,normalize = True).round(2)
pd.crosstab(index =df['Product'],columns = df['Usage'],margins =True,normalize = True).round(2)
pd.crosstab(index =df['Product'],columns = df['Income Group'],margins =True,normalize = True).round(2)
pd.crosstab(index =df['Product'],columns = df['Fitness'],margins =True,normalize = True).round(2)
"""All these probabilities give us the following observations
* The Probability of a treadmill being purchased by a Young Adult(18-25) is 44%.
* The Probability of a treadmill being purchased by a customer with Higher Education(Above 15 Years) is 62%.
* P(KP281|Low) = 0.27 which is the highest of all conditional probabilities
* The Probability of a treadmill being purchased by a Married Customer is 59%.
* The Probability of a treadmill being purchased by a customer with Usage 3 per week is 38%.
* The Probability of a treadmill being purchased by a customer with Average(3) Fitness is 54%.
* The Probability of a treadmill being purchased by a male is 58%.
#My Business Insights and Observations
1. Most potential customers for KP281 are:
* Young (18-29)
* Males and Females both
* Couples (Married)
* Education between 14 to 16 years
* Fitness level>=3
* Usage of 3 times a week
* running 60-100 miles
* Income between 29000 to 50000
2. Most potential customers for KP481 are:
* Young (18-29)
* Males and Females both
* Married or unmarried
* Education between 14 to 16 years
* Fitness level>=3
* Usage of 3 times a week
* running 80-120 miles
* Income between 29000 to 50000
3. Most potential customers for KP781 are:
* Young (18-25)
* Males
* Couples (Married)
* Education between 16 to 18 years
* Fitness level = 5
* Usage of 4-6 times a week
* running 120-200 miles
* Income between 75000 to 105000
###Concluding Observations
1. KP281 is the top choice for 44.44% of users, followed by KP481 with 33.33%, and KP781 appeals to 22.22%.
2. Young, educated, low income, married customers are more likely to get a treadmill
3. KP281 is the cheapest and therefore is the most sold one
4. 33% people buy KP481 for a balance in features and price
5. Fitness freaks and high income individuals go for the best in line KP781
6. Women don't buy KP781 at all probably due to high costs
#My recommendations and Suggestions
**Integrated Business Analysis and Growth Strategy for Aerofit:**
**1. Market Analysis:**
- **Targeting Gen-Z and Millennials:** Focus marketing efforts on understanding preferences and lifestyle choices of Gen-Z and Millennials, utilizing platforms like Instagram, TikTok, and YouTube for product showcases and testimonials.
- **Women-Centric Approach:** Develop marketing strategies highlighting treadmill features tailored to women's needs and preferences, emphasizing comfort, design, and customizable workout programs.
- **Competitive Analysis:** Analyze competitors' strategies to identify differentiation opportunities and market gaps.
**2. Product Analysis:**
- **Decoy Effect in Pricing:** Use pricing strategies to position KP481 as the best value option between KP281 and KP781, appealing to value-conscious customers.
- **Product Performance Evaluation:** Continuously assess performance and gather customer feedback to drive innovation and improvement, particularly focusing on features that cater to women's needs and older adults' comfort.
- **Product Diversification:** Explore opportunities to introduce complementary fitness equipment or accessories targeting specific customer segments, such as low-impact options for older adults.
**3. Customer Analysis:**
- **Segmentation and Personalization:** Segment customers based on demographics and psychographics for targeted marketing and personalized recommendations, with specific emphasis on women and older adults.
- **Enhanced Customer Experience:** Implement efficient customer service and post-purchase support to improve satisfaction and retention, with tailored support for women and older adults.
**4. Pricing Strategy:**
- **Value-Based Pricing:** Align pricing with the perceived value of Aerofit products, offering strategic discounts and promotions to stimulate demand, including targeted offers for women and older adults.
- **Promotional Strategies:** Offer bundle deals and loyalty programs to incentivize purchases, especially targeting Gen-Z, Millennials, women, and older adults.
**5. Marketing and Promotion:**
- **Targeted Marketing Campaigns:** Develop campaigns resonating with Gen-Z, Millennials, women, and older adults, emphasizing Aerofit's benefits in achieving fitness goals and addressing specific needs.
- **Digital Marketing Channels:** Utilize social media and influencer partnerships for effective outreach and engagement, with content tailored to the interests and preferences of each demographic.
- **Brand Building:** Emphasize quality, innovation, and sustainability in branding to appeal to environmentally conscious consumers, including messaging focused on Aerofit's commitment to providing inclusive and age-friendly fitness solutions.
**6. Distribution and Sales Channels:**
- **Expanding Reach:** Forge partnerships with retailers and e-commerce platforms to expand distribution networks, ensuring accessibility to Aerofit products for women and older adults in various regions.
- **Streamlined Sales Processes:** Optimize online and offline sales processes for a seamless buying experience, with special attention to ease of use for older adults and first-time buyers.
**7. Innovation and Technology:**
- **Research and Development:** Invest in R&D for product innovation addressing evolving customer needs, including features that enhance comfort, safety, and ease of use for women and older adults.
- **Smart Features Integration:** Incorporate IoT connectivity and data analytics to enhance functionality and user experience, with intuitive interfaces suitable for users of all ages and fitness levels.
**8. Customer Retention and Loyalty:**
- **Loyalty Programs:** Implement programs to incentivize repeat purchases and foster brand loyalty, with rewards and benefits tailored to the preferences of women and older adults.
- **Feedback Mechanisms:** Regularly gather feedback to demonstrate responsiveness to customer needs, with dedicated channels for women and older adults to provide input on product preferences and experiences.
**9. Expansion and International Growth:**
- **Market Exploration:** Identify opportunities for expansion into new geographical markets, adapting strategies to suit local preferences, including offerings that cater to the specific fitness needs of women and older adults.
- **Localization Strategies:** Tailor marketing and product offerings to meet the needs of target markets, with cultural sensitivity and inclusivity as key considerations.
**10. Sustainability and CSR:**
- **Environmental Initiatives:** Integrate sustainable practices to appeal to environmentally conscious consumers, with transparency in sourcing and manufacturing processes, appealing to women and older adults concerned about the environment.
- **Community Engagement:** Participate in CSR initiatives to enhance brand reputation, especially among educated and affluent youth demographics, with initiatives focused on promoting health and fitness for women and older adults in local communities.
"""