From a416c48927420a35cfebfcc36c851befe2bb62b8 Mon Sep 17 00:00:00 2001 From: Yury Kashnitsky Date: Tue, 28 Feb 2017 11:42:07 +0300 Subject: [PATCH] add topic 1 pandas --- .../topic1_pandas.ipynb | 4562 +++++++++++++++++ 1 file changed, 4562 insertions(+) create mode 100644 jupyter_notebooks/topic1_pandas_data_analysis/topic1_pandas.ipynb diff --git a/jupyter_notebooks/topic1_pandas_data_analysis/topic1_pandas.ipynb b/jupyter_notebooks/topic1_pandas_data_analysis/topic1_pandas.ipynb new file mode 100644 index 0000000000..f50923c17c --- /dev/null +++ b/jupyter_notebooks/topic1_pandas_data_analysis/topic1_pandas.ipynb @@ -0,0 +1,4562 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "
\n", + "\n", + "## Открытый курс по машинному обучению\n", + "
\n", + "Автор материала: программист-исследователь Mail.ru Group, старший преподаватель Факультета Компьютерных Наук ВШЭ Юрий Кашницкий\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "#
Тема 1. Первичный анализ данных с Pandas
\n", + "##
Часть 2. Обзор библиотеки Pandas
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "**[Pandas](http://pandas.pydata.org)** — это библиотека Python, предоставляющая широкие возможности для анализа данных. С ее помощью очень удобно загружать, обрабатывать и анализировать табличные данные с помощью SQL-подобных запросов. В связке с библиотеками Matplotlib и Seaborn появляется возможность удобного визуального анализа табличных данных." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true, + "scrolled": true + }, + "outputs": [], + "source": [ + "# Python 2 and 3 compatibility\n", + "# pip install future\n", + "from __future__ import (absolute_import, division,\n", + " print_function, unicode_literals)\n", + "# отключим предупреждения Anaconda\n", + "import warnings\n", + "warnings.simplefilter('ignore')\n", + "import pandas as pd\n", + "import numpy as np" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "Данные, с которыми работают дата саентисты и аналитики, обычно хранятся в виде табличек — например, в форматах .csv, .tsv или .xlsx. Для того, чтобы считать нужные данные из такого файла, отлично подходит библиотека Pandas.\n", + "\n", + "Основными структурами данных в Pandas являются классы Series и DataFrame. Первый из них представляет собой одномерный индексированный массив данных некоторого фиксированного типа. Второй - это двухмерная структура данных, представляющая собой таблицу, каждый столбец которой содержит данные одного типа. Можно представлять её как словарь объектов типа Series. Структура DataFrame отлично подходит для представления реальных данных: строки соответствуют признаковым описаниям отдельных объектов, а столбцы соответствуют признакам." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "---------\n", + "\n", + "## Демонстрация основных методов Pandas \n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "### Чтение из файла и первичный анализ" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "Прочитаем данные и посмотрим на первые 5 строк с помощью метода `head()`:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true, + "scrolled": true + }, + "outputs": [], + "source": [ + "df = pd.read_csv('../../data/telecom_churn.csv')" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true, + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StateAccount lengthArea codeInternational planVoice mail planNumber vmail messagesTotal day minutesTotal day callsTotal day chargeTotal eve minutesTotal eve callsTotal eve chargeTotal night minutesTotal night callsTotal night chargeTotal intl minutesTotal intl callsTotal intl chargeCustomer service callsChurn
0KS128415NoYes25265.111045.07197.49916.78244.79111.0110.032.701False
1OH107415NoYes26161.612327.47195.510316.62254.410311.4513.733.701False
2NJ137415NoNo0243.411441.38121.211010.30162.61047.3212.253.290False
3OH84408YesNo0299.47150.9061.9885.26196.9898.866.671.782False
4OK75415YesNo0166.711328.34148.312212.61186.91218.4110.132.733False
\n", + "
" + ], + "text/plain": [ + " State Account length Area code International plan Voice mail plan \\\n", + "0 KS 128 415 No Yes \n", + "1 OH 107 415 No Yes \n", + "2 NJ 137 415 No No \n", + "3 OH 84 408 Yes No \n", + "4 OK 75 415 Yes No \n", + "\n", + " Number vmail messages Total day minutes Total day calls \\\n", + "0 25 265.1 110 \n", + "1 26 161.6 123 \n", + "2 0 243.4 114 \n", + "3 0 299.4 71 \n", + "4 0 166.7 113 \n", + "\n", + " Total day charge Total eve minutes Total eve calls Total eve charge \\\n", + "0 45.07 197.4 99 16.78 \n", + "1 27.47 195.5 103 16.62 \n", + "2 41.38 121.2 110 10.30 \n", + "3 50.90 61.9 88 5.26 \n", + "4 28.34 148.3 122 12.61 \n", + "\n", + " Total night minutes Total night calls Total night charge \\\n", + "0 244.7 91 11.01 \n", + "1 254.4 103 11.45 \n", + "2 162.6 104 7.32 \n", + "3 196.9 89 8.86 \n", + "4 186.9 121 8.41 \n", + "\n", + " Total intl minutes Total intl calls Total intl charge \\\n", + "0 10.0 3 2.70 \n", + "1 13.7 3 3.70 \n", + "2 12.2 5 3.29 \n", + "3 6.6 7 1.78 \n", + "4 10.1 3 2.73 \n", + "\n", + " Customer service calls Churn \n", + "0 1 False \n", + "1 1 False \n", + "2 0 False \n", + "3 2 False \n", + "4 3 False " + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "В Jupyter-ноутбуках датафреймы Pandas выводятся в виде вот таких красивых табличек, и `print(df.head())` выглядит хуже.\n", + "\n", + "Кстати, по умолчанию Pandas выводит всего 20 столбцов и 60 строк, поэтому если ваш датафрейм больше, воспользуйтесь функцией `set_option`:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [], + "source": [ + "pd.set_option('display.max_columns', 100)\n", + "pd.set_option('display.max_rows', 100)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "А также укажем значение параметра `presicion` равным 2, чтобы отображать два знака после запятой (а не 6, как установлено по умолчанию." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [], + "source": [ + "pd.set_option('precision', 2)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "**Посмотрим на размер данных, названия признаков и их типы**" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(3333, 20)\n" + ] + } + ], + "source": [ + "print(df.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "Видим, что в таблице 3333 строки и 20 столбцов. Выведем названия столбцов:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Index(['State', 'Account length', 'Area code', 'International plan',\n", + " 'Voice mail plan', 'Number vmail messages', 'Total day minutes',\n", + " 'Total day calls', 'Total day charge', 'Total eve minutes',\n", + " 'Total eve calls', 'Total eve charge', 'Total night minutes',\n", + " 'Total night calls', 'Total night charge', 'Total intl minutes',\n", + " 'Total intl calls', 'Total intl charge', 'Customer service calls',\n", + " 'Churn'],\n", + " dtype='object')\n" + ] + } + ], + "source": [ + "print(df.columns)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "Чтобы посмотреть общую информацию по датафрейму и всем признакам, воспользуемся методом **`info`**:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true, + "scrolled": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 3333 entries, 0 to 3332\n", + "Data columns (total 20 columns):\n", + "State 3333 non-null object\n", + "Account length 3333 non-null int64\n", + "Area code 3333 non-null int64\n", + "International plan 3333 non-null object\n", + "Voice mail plan 3333 non-null object\n", + "Number vmail messages 3333 non-null int64\n", + "Total day minutes 3333 non-null float64\n", + "Total day calls 3333 non-null int64\n", + "Total day charge 3333 non-null float64\n", + "Total eve minutes 3333 non-null float64\n", + "Total eve calls 3333 non-null int64\n", + "Total eve charge 3333 non-null float64\n", + "Total night minutes 3333 non-null float64\n", + "Total night calls 3333 non-null int64\n", + "Total night charge 3333 non-null float64\n", + "Total intl minutes 3333 non-null float64\n", + "Total intl calls 3333 non-null int64\n", + "Total intl charge 3333 non-null float64\n", + "Customer service calls 3333 non-null int64\n", + "Churn 3333 non-null bool\n", + "dtypes: bool(1), float64(8), int64(8), object(3)\n", + "memory usage: 498.1+ KB\n", + "None\n" + ] + } + ], + "source": [ + "print(df.info())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "`bool`, `int64`, `float64` и `object` — это типы признаков. Видим, что 1 признак — логический (bool), 3 признака имеют тип object и 16 признаков — числовые.\n", + "\n", + "**Изменить тип колонки** можно с помощью метода `astype`. Применим этот метод к признаку `Churn` и переведём его в `int64`:" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [], + "source": [ + "df['Churn'] = df['Churn'].astype('int64')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "Метод **`describe`** показывает основные статистические характеристики данных по каждому числовому признаку (типы `int64` и `float64`): число непропущенных значений, среднее, стандартное отклонение, диапазон, медиану, 0.25 и 0.75 квартили." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Account lengthArea codeNumber vmail messagesTotal day minutesTotal day callsTotal day chargeTotal eve minutesTotal eve callsTotal eve chargeTotal night minutesTotal night callsTotal night chargeTotal intl minutesTotal intl callsTotal intl chargeCustomer service callsChurn
count3333.003333.003333.003333.003333.003333.003333.003333.003333.003333.003333.003333.003333.003333.003333.003333.003333.00
mean101.06437.188.10179.78100.4430.56200.98100.1117.08200.87100.119.0410.244.482.761.560.14
std39.8242.3713.6954.4720.079.2650.7119.924.3150.5719.572.282.792.460.751.320.35
min1.00408.000.000.000.000.000.000.000.0023.2033.001.040.000.000.000.000.00
25%74.00408.000.00143.7087.0024.43166.6087.0014.16167.0087.007.528.503.002.301.000.00
50%101.00415.000.00179.40101.0030.50201.40100.0017.12201.20100.009.0510.304.002.781.000.00
75%127.00510.0020.00216.40114.0036.79235.30114.0020.00235.30113.0010.5912.106.003.272.000.00
max243.00510.0051.00350.80165.0059.64363.70170.0030.91395.00175.0017.7720.0020.005.409.001.00
\n", + "
" + ], + "text/plain": [ + " Account length Area code Number vmail messages Total day minutes \\\n", + "count 3333.00 3333.00 3333.00 3333.00 \n", + "mean 101.06 437.18 8.10 179.78 \n", + "std 39.82 42.37 13.69 54.47 \n", + "min 1.00 408.00 0.00 0.00 \n", + "25% 74.00 408.00 0.00 143.70 \n", + "50% 101.00 415.00 0.00 179.40 \n", + "75% 127.00 510.00 20.00 216.40 \n", + "max 243.00 510.00 51.00 350.80 \n", + "\n", + " Total day calls Total day charge Total eve minutes Total eve calls \\\n", + "count 3333.00 3333.00 3333.00 3333.00 \n", + "mean 100.44 30.56 200.98 100.11 \n", + "std 20.07 9.26 50.71 19.92 \n", + "min 0.00 0.00 0.00 0.00 \n", + "25% 87.00 24.43 166.60 87.00 \n", + "50% 101.00 30.50 201.40 100.00 \n", + "75% 114.00 36.79 235.30 114.00 \n", + "max 165.00 59.64 363.70 170.00 \n", + "\n", + " Total eve charge Total night minutes Total night calls \\\n", + "count 3333.00 3333.00 3333.00 \n", + "mean 17.08 200.87 100.11 \n", + "std 4.31 50.57 19.57 \n", + "min 0.00 23.20 33.00 \n", + "25% 14.16 167.00 87.00 \n", + "50% 17.12 201.20 100.00 \n", + "75% 20.00 235.30 113.00 \n", + "max 30.91 395.00 175.00 \n", + "\n", + " Total night charge Total intl minutes Total intl calls \\\n", + "count 3333.00 3333.00 3333.00 \n", + "mean 9.04 10.24 4.48 \n", + "std 2.28 2.79 2.46 \n", + "min 1.04 0.00 0.00 \n", + "25% 7.52 8.50 3.00 \n", + "50% 9.05 10.30 4.00 \n", + "75% 10.59 12.10 6.00 \n", + "max 17.77 20.00 20.00 \n", + "\n", + " Total intl charge Customer service calls Churn \n", + "count 3333.00 3333.00 3333.00 \n", + "mean 2.76 1.56 0.14 \n", + "std 0.75 1.32 0.35 \n", + "min 0.00 0.00 0.00 \n", + "25% 2.30 1.00 0.00 \n", + "50% 2.78 1.00 0.00 \n", + "75% 3.27 2.00 0.00 \n", + "max 5.40 9.00 1.00 " + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.describe()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "Чтобы посмотреть статистику по нечисловым признакам, нужно явно указать интересующие нас типы в параметре `include`." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true, + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StateInternational planVoice mail plan
count333333333333
unique5122
topWVNoNo
freq10630102411
\n", + "
" + ], + "text/plain": [ + " State International plan Voice mail plan\n", + "count 3333 3333 3333\n", + "unique 51 2 2\n", + "top WV No No\n", + "freq 106 3010 2411" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.describe(include=['object', 'bool'])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "Для категориальных (тип `object`) и булевых (тип `bool`) признаков можно воспользоваться методом **`value_counts`**. Посмотрим на распределение данных по нашей целевой переменной — `Churn`:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0 2850\n", + "1 483\n", + "Name: Churn, dtype: int64" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['Churn'].value_counts()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "2850 пользователей из 3333 — лояльные, значение переменной `Churn` у них — `0`.\n", + "\n", + "Посмотрим на распределение пользователей по переменной `Area code`. Укажем значение параметра `normalize=True`, чтобы посмотреть не абсолютные частоты, а относительные." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "415 0.50\n", + "510 0.25\n", + "408 0.25\n", + "Name: Area code, dtype: float64" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['Area code'].value_counts(normalize=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "### Сортировка\n", + "\n", + "DataFrame можно отсортировать по значению какого-нибудь из признаков. В нашем случае, например, по `Total day charge` (`ascending=False` для сортировки по убыванию):" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StateAccount lengthArea codeInternational planVoice mail planNumber vmail messagesTotal day minutesTotal day callsTotal day chargeTotal eve minutesTotal eve callsTotal eve chargeTotal night minutesTotal night callsTotal night chargeTotal intl minutesTotal intl callsTotal intl chargeCustomer service callsChurn
365CO154415NoNo0350.87559.64216.59418.40253.910011.4310.192.7311
985NY64415YesNo0346.85558.96249.57921.21275.410212.3913.393.5911
2594OH115510YesNo0345.38158.70203.410617.29217.51079.7911.883.1911
156OH83415NoNo0337.412057.36227.411619.33153.91146.9315.874.2701
605MO112415NoNo0335.57757.04212.510918.06265.013211.9312.783.4321
\n", + "
" + ], + "text/plain": [ + " State Account length Area code International plan Voice mail plan \\\n", + "365 CO 154 415 No No \n", + "985 NY 64 415 Yes No \n", + "2594 OH 115 510 Yes No \n", + "156 OH 83 415 No No \n", + "605 MO 112 415 No No \n", + "\n", + " Number vmail messages Total day minutes Total day calls \\\n", + "365 0 350.8 75 \n", + "985 0 346.8 55 \n", + "2594 0 345.3 81 \n", + "156 0 337.4 120 \n", + "605 0 335.5 77 \n", + "\n", + " Total day charge Total eve minutes Total eve calls Total eve charge \\\n", + "365 59.64 216.5 94 18.40 \n", + "985 58.96 249.5 79 21.21 \n", + "2594 58.70 203.4 106 17.29 \n", + "156 57.36 227.4 116 19.33 \n", + "605 57.04 212.5 109 18.06 \n", + "\n", + " Total night minutes Total night calls Total night charge \\\n", + "365 253.9 100 11.43 \n", + "985 275.4 102 12.39 \n", + "2594 217.5 107 9.79 \n", + "156 153.9 114 6.93 \n", + "605 265.0 132 11.93 \n", + "\n", + " Total intl minutes Total intl calls Total intl charge \\\n", + "365 10.1 9 2.73 \n", + "985 13.3 9 3.59 \n", + "2594 11.8 8 3.19 \n", + "156 15.8 7 4.27 \n", + "605 12.7 8 3.43 \n", + "\n", + " Customer service calls Churn \n", + "365 1 1 \n", + "985 1 1 \n", + "2594 1 1 \n", + "156 0 1 \n", + "605 2 1 " + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.sort(columns='Total day charge', \n", + " ascending=False).head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "Сортировать можно и по группе столбцов:" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StateAccount lengthArea codeInternational planVoice mail planNumber vmail messagesTotal day minutesTotal day callsTotal day chargeTotal eve minutesTotal eve callsTotal eve chargeTotal night minutesTotal night callsTotal night chargeTotal intl minutesTotal intl callsTotal intl chargeCustomer service callsChurn
688MN13510NoYes21315.610553.65208.97117.76260.112311.7012.133.2730
2259NC210415NoYes31313.88753.35147.710312.55192.7978.6710.172.7330
534LA67510NoNo0310.49752.7766.51235.65246.59911.099.2102.4840
575SD114415NoYes36309.99052.68200.38917.03183.51058.2614.223.8310
2858AL141510NoYes28308.012352.36247.812821.06152.91036.887.432.0010
\n", + "
" + ], + "text/plain": [ + " State Account length Area code International plan Voice mail plan \\\n", + "688 MN 13 510 No Yes \n", + "2259 NC 210 415 No Yes \n", + "534 LA 67 510 No No \n", + "575 SD 114 415 No Yes \n", + "2858 AL 141 510 No Yes \n", + "\n", + " Number vmail messages Total day minutes Total day calls \\\n", + "688 21 315.6 105 \n", + "2259 31 313.8 87 \n", + "534 0 310.4 97 \n", + "575 36 309.9 90 \n", + "2858 28 308.0 123 \n", + "\n", + " Total day charge Total eve minutes Total eve calls Total eve charge \\\n", + "688 53.65 208.9 71 17.76 \n", + "2259 53.35 147.7 103 12.55 \n", + "534 52.77 66.5 123 5.65 \n", + "575 52.68 200.3 89 17.03 \n", + "2858 52.36 247.8 128 21.06 \n", + "\n", + " Total night minutes Total night calls Total night charge \\\n", + "688 260.1 123 11.70 \n", + "2259 192.7 97 8.67 \n", + "534 246.5 99 11.09 \n", + "575 183.5 105 8.26 \n", + "2858 152.9 103 6.88 \n", + "\n", + " Total intl minutes Total intl calls Total intl charge \\\n", + "688 12.1 3 3.27 \n", + "2259 10.1 7 2.73 \n", + "534 9.2 10 2.48 \n", + "575 14.2 2 3.83 \n", + "2858 7.4 3 2.00 \n", + "\n", + " Customer service calls Churn \n", + "688 3 0 \n", + "2259 3 0 \n", + "534 4 0 \n", + "575 1 0 \n", + "2858 1 0 " + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.sort(columns=['Churn', 'Total day charge'],\n", + " ascending=[True, False]).head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "### Индексация и извлечение данных" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "DataFrame можно индексировать по-разному. В связи с этим рассмотрим различные способы индексации и извлечения нужных нам данных из датафрейма на примере простых вопросов.\n", + "\n", + "Для извлечения отдельного столбца можно использовать конструкцию вида `DataFrame['Name']`. Воспользуемся этим для ответа на вопрос: **какова доля людей нелояльных пользователей в нашем датафрейме?**" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0.14491449144914492" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['Churn'].mean()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "14,5% — довольно плохой показатель для компании, с таким процентом оттока можно и разориться." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "Очень удобной является логическая индексация DataFrame по одному столбцу. Выглядит она следующим образом: `df[P(df['Name'])]`, где `P` - это некоторое логическое условие, проверяемое для каждого элемента столбца `Name`. Итогом такой индексации является DataFrame, состоящий только из строк, удовлетворяющих условию `P` по столбцу `Name`. \n", + "\n", + "Воспользуемся этим для ответа на вопрос: **каковы средние значения числовых признаков среди нелояльных пользователей?**" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true, + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "Account length 102.66\n", + "Area code 437.82\n", + "Number vmail messages 5.12\n", + "Total day minutes 206.91\n", + "Total day calls 101.34\n", + "Total day charge 35.18\n", + "Total eve minutes 212.41\n", + "Total eve calls 100.56\n", + "Total eve charge 18.05\n", + "Total night minutes 205.23\n", + "Total night calls 100.40\n", + "Total night charge 9.24\n", + "Total intl minutes 10.70\n", + "Total intl calls 4.16\n", + "Total intl charge 2.89\n", + "Customer service calls 2.23\n", + "Churn 1.00\n", + "dtype: float64" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df['Churn'] == 1].mean()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "Скомбинировав предыдущие два вида индексации, ответим на вопрос: **сколько в среднем в течение дня разговаривают по телефону нелояльные пользователи**?" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "206.91407867494814" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df['Churn'] == 1]['Total day minutes'].mean()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "**Какова максимальная длина международных звонков среди лояльных пользователей (`Churn == 0`), не пользующихся услугой международного роуминга (`'International plan' == 'No'`)?**" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "18.899999999999999" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[(df['Churn'] == 0) & (df['International plan'] == 'No')]['Total intl minutes'].max()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "Датафреймы можно индексировать как по названию столбца или строки, так и по порядковому номеру. Для индексации **по названию** используется метод **`loc`**, **по номеру** — **`iloc`**.\n", + "\n", + "В первом случае мы говорим _«передай нам значения первых пяти строк в столбцах от State до Area code»_, а во втором — _«передай нам значения первых пяти строк в первых трёх столбцах»_. " + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true, + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StateAccount lengthArea code
0KS128415
1OH107415
2NJ137415
3OH84408
4OK75415
5AL118510
\n", + "
" + ], + "text/plain": [ + " State Account length Area code\n", + "0 KS 128 415\n", + "1 OH 107 415\n", + "2 NJ 137 415\n", + "3 OH 84 408\n", + "4 OK 75 415\n", + "5 AL 118 510" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.loc[0:5, 'State':'Area code']" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true, + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StateAccount lengthArea code
0KS128415
1OH107415
2NJ137415
3OH84408
4OK75415
\n", + "
" + ], + "text/plain": [ + " State Account length Area code\n", + "0 KS 128 415\n", + "1 OH 107 415\n", + "2 NJ 137 415\n", + "3 OH 84 408\n", + "4 OK 75 415" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.iloc[0:5, 0:3]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "Метод `ix` индексирует и по названию, и по номеру." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "Если нам нужна первая или последняя строчка датафрейма, пользуемся конструкцией `df[:1]` или `df[-1:]`:" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true, + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StateAccount lengthArea codeInternational planVoice mail planNumber vmail messagesTotal day minutesTotal day callsTotal day chargeTotal eve minutesTotal eve callsTotal eve chargeTotal night minutesTotal night callsTotal night chargeTotal intl minutesTotal intl callsTotal intl chargeCustomer service callsChurn
3332TN74415NoYes25234.411339.85265.98222.6241.47710.8613.743.700
\n", + "
" + ], + "text/plain": [ + " State Account length Area code International plan Voice mail plan \\\n", + "3332 TN 74 415 No Yes \n", + "\n", + " Number vmail messages Total day minutes Total day calls \\\n", + "3332 25 234.4 113 \n", + "\n", + " Total day charge Total eve minutes Total eve calls Total eve charge \\\n", + "3332 39.85 265.9 82 22.6 \n", + "\n", + " Total night minutes Total night calls Total night charge \\\n", + "3332 241.4 77 10.86 \n", + "\n", + " Total intl minutes Total intl calls Total intl charge \\\n", + "3332 13.7 4 3.7 \n", + "\n", + " Customer service calls Churn \n", + "3332 0 0 " + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[-1:]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "### Применение функций: `apply`, `map` и др." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "**Применение функции к каждому столбцу:**" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "State WY\n", + "Account length 243\n", + "Area code 510\n", + "International plan Yes\n", + "Voice mail plan Yes\n", + "Number vmail messages 51\n", + "Total day minutes 3.5e+02\n", + "Total day calls 165\n", + "Total day charge 60\n", + "Total eve minutes 3.6e+02\n", + "Total eve calls 170\n", + "Total eve charge 31\n", + "Total night minutes 4e+02\n", + "Total night calls 175\n", + "Total night charge 18\n", + "Total intl minutes 20\n", + "Total intl calls 20\n", + "Total intl charge 5.4\n", + "Customer service calls 9\n", + "Churn 1\n", + "dtype: object" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.apply(np.max) " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "Метод `apply` можно использовать и для того, чтобы применить функцию к каждой строке. Для этого нужно указать `axis=1`." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "**Применение функции к каждой ячейке столбца**\n", + "\n", + "Допустим, нас интересуют все строки датафрейма, у которых значение `\"Number vmail messages\"` больше 45:" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true, + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StateAccount lengthArea codeInternational planVoice mail planNumber vmail messagesTotal day minutesTotal day callsTotal day chargeTotal eve minutesTotal eve callsTotal eve chargeTotal night minutesTotal night callsTotal night chargeTotal intl minutesTotal intl callsTotal intl chargeCustomer service callsChurn
71MN162510NoYes46224.99738.23188.28416.00254.66111.4612.123.2700
268MO64510NoYes4894.410416.05136.210111.58147.4896.634.541.2200
277SD144408NoYes48189.89632.27123.46710.49214.21069.646.521.7621
599OH75510NoYes46214.16236.40200.911117.08246.812611.119.262.4800
845FL144415NoYes51283.99848.26192.010916.32196.3858.8310.042.7010
1285SC109415NoYes46217.512336.98233.78419.86163.9997.389.032.4340
1441NC172408NoYes47274.910246.73186.611815.86245.012311.038.822.3810
1596AR63510NoYes49214.98636.53198.28916.85170.81397.698.252.2100
1797WV92415NoYes47141.69524.07207.913017.67203.6959.1610.2112.7500
2608IN81408NoYes46168.312428.61270.910323.03222.59810.016.721.8140
2716WV137510NoYes50186.59431.71178.010615.13215.61009.7012.143.2720
2887OR134415NoYes50208.813035.50132.910411.30136.71076.1511.143.0020
3154CT73415NoYes47173.711729.53204.011417.34174.6947.866.331.7020
\n", + "
" + ], + "text/plain": [ + " State Account length Area code International plan Voice mail plan \\\n", + "71 MN 162 510 No Yes \n", + "268 MO 64 510 No Yes \n", + "277 SD 144 408 No Yes \n", + "599 OH 75 510 No Yes \n", + "845 FL 144 415 No Yes \n", + "1285 SC 109 415 No Yes \n", + "1441 NC 172 408 No Yes \n", + "1596 AR 63 510 No Yes \n", + "1797 WV 92 415 No Yes \n", + "2608 IN 81 408 No Yes \n", + "2716 WV 137 510 No Yes \n", + "2887 OR 134 415 No Yes \n", + "3154 CT 73 415 No Yes \n", + "\n", + " Number vmail messages Total day minutes Total day calls \\\n", + "71 46 224.9 97 \n", + "268 48 94.4 104 \n", + "277 48 189.8 96 \n", + "599 46 214.1 62 \n", + "845 51 283.9 98 \n", + "1285 46 217.5 123 \n", + "1441 47 274.9 102 \n", + "1596 49 214.9 86 \n", + "1797 47 141.6 95 \n", + "2608 46 168.3 124 \n", + "2716 50 186.5 94 \n", + "2887 50 208.8 130 \n", + "3154 47 173.7 117 \n", + "\n", + " Total day charge Total eve minutes Total eve calls Total eve charge \\\n", + "71 38.23 188.2 84 16.00 \n", + "268 16.05 136.2 101 11.58 \n", + "277 32.27 123.4 67 10.49 \n", + "599 36.40 200.9 111 17.08 \n", + "845 48.26 192.0 109 16.32 \n", + "1285 36.98 233.7 84 19.86 \n", + "1441 46.73 186.6 118 15.86 \n", + "1596 36.53 198.2 89 16.85 \n", + "1797 24.07 207.9 130 17.67 \n", + "2608 28.61 270.9 103 23.03 \n", + "2716 31.71 178.0 106 15.13 \n", + "2887 35.50 132.9 104 11.30 \n", + "3154 29.53 204.0 114 17.34 \n", + "\n", + " Total night minutes Total night calls Total night charge \\\n", + "71 254.6 61 11.46 \n", + "268 147.4 89 6.63 \n", + "277 214.2 106 9.64 \n", + "599 246.8 126 11.11 \n", + "845 196.3 85 8.83 \n", + "1285 163.9 99 7.38 \n", + "1441 245.0 123 11.03 \n", + "1596 170.8 139 7.69 \n", + "1797 203.6 95 9.16 \n", + "2608 222.5 98 10.01 \n", + "2716 215.6 100 9.70 \n", + "2887 136.7 107 6.15 \n", + "3154 174.6 94 7.86 \n", + "\n", + " Total intl minutes Total intl calls Total intl charge \\\n", + "71 12.1 2 3.27 \n", + "268 4.5 4 1.22 \n", + "277 6.5 2 1.76 \n", + "599 9.2 6 2.48 \n", + "845 10.0 4 2.70 \n", + "1285 9.0 3 2.43 \n", + "1441 8.8 2 2.38 \n", + "1596 8.2 5 2.21 \n", + "1797 10.2 11 2.75 \n", + "2608 6.7 2 1.81 \n", + "2716 12.1 4 3.27 \n", + "2887 11.1 4 3.00 \n", + "3154 6.3 3 1.70 \n", + "\n", + " Customer service calls Churn \n", + "71 0 0 \n", + "268 0 0 \n", + "277 2 1 \n", + "599 0 0 \n", + "845 1 0 \n", + "1285 4 0 \n", + "1441 1 0 \n", + "1596 0 0 \n", + "1797 0 0 \n", + "2608 4 0 \n", + "2716 2 0 \n", + "2887 2 0 \n", + "3154 2 0 " + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[ df['Number vmail messages'].map(lambda x: x > 45) ]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "Метод `map` можно использовать и для **замены значений в колонке**, передав ему в качестве аргумента словарь вида `{old_value: new_value}`:" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StateAccount lengthArea codeInternational planVoice mail planNumber vmail messagesTotal day minutesTotal day callsTotal day chargeTotal eve minutesTotal eve callsTotal eve chargeTotal night minutesTotal night callsTotal night chargeTotal intl minutesTotal intl callsTotal intl chargeCustomer service callsChurn
0KS128415FalseYes25265.111045.07197.49916.78244.79111.0110.032.7010
1OH107415FalseYes26161.612327.47195.510316.62254.410311.4513.733.7010
2NJ137415FalseNo0243.411441.38121.211010.30162.61047.3212.253.2900
3OH84408TrueNo0299.47150.9061.9885.26196.9898.866.671.7820
4OK75415TrueNo0166.711328.34148.312212.61186.91218.4110.132.7330
\n", + "
" + ], + "text/plain": [ + " State Account length Area code International plan Voice mail plan \\\n", + "0 KS 128 415 False Yes \n", + "1 OH 107 415 False Yes \n", + "2 NJ 137 415 False No \n", + "3 OH 84 408 True No \n", + "4 OK 75 415 True No \n", + "\n", + " Number vmail messages Total day minutes Total day calls \\\n", + "0 25 265.1 110 \n", + "1 26 161.6 123 \n", + "2 0 243.4 114 \n", + "3 0 299.4 71 \n", + "4 0 166.7 113 \n", + "\n", + " Total day charge Total eve minutes Total eve calls Total eve charge \\\n", + "0 45.07 197.4 99 16.78 \n", + "1 27.47 195.5 103 16.62 \n", + "2 41.38 121.2 110 10.30 \n", + "3 50.90 61.9 88 5.26 \n", + "4 28.34 148.3 122 12.61 \n", + "\n", + " Total night minutes Total night calls Total night charge \\\n", + "0 244.7 91 11.01 \n", + "1 254.4 103 11.45 \n", + "2 162.6 104 7.32 \n", + "3 196.9 89 8.86 \n", + "4 186.9 121 8.41 \n", + "\n", + " Total intl minutes Total intl calls Total intl charge \\\n", + "0 10.0 3 2.70 \n", + "1 13.7 3 3.70 \n", + "2 12.2 5 3.29 \n", + "3 6.6 7 1.78 \n", + "4 10.1 3 2.73 \n", + "\n", + " Customer service calls Churn \n", + "0 1 0 \n", + "1 1 0 \n", + "2 0 0 \n", + "3 2 0 \n", + "4 3 0 " + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "d = {'No' : False, 'Yes' : True}\n", + "df['International plan'] = df['International plan'].map(d)\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "Аналогичную операцию можно провернуть с помощью метода `replace`:" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StateAccount lengthArea codeInternational planVoice mail planNumber vmail messagesTotal day minutesTotal day callsTotal day chargeTotal eve minutesTotal eve callsTotal eve chargeTotal night minutesTotal night callsTotal night chargeTotal intl minutesTotal intl callsTotal intl chargeCustomer service callsChurn
0KS128415FalseTrue25265.111045.07197.49916.78244.79111.0110.032.7010
1OH107415FalseTrue26161.612327.47195.510316.62254.410311.4513.733.7010
2NJ137415FalseFalse0243.411441.38121.211010.30162.61047.3212.253.2900
3OH84408TrueFalse0299.47150.9061.9885.26196.9898.866.671.7820
4OK75415TrueFalse0166.711328.34148.312212.61186.91218.4110.132.7330
\n", + "
" + ], + "text/plain": [ + " State Account length Area code International plan Voice mail plan \\\n", + "0 KS 128 415 False True \n", + "1 OH 107 415 False True \n", + "2 NJ 137 415 False False \n", + "3 OH 84 408 True False \n", + "4 OK 75 415 True False \n", + "\n", + " Number vmail messages Total day minutes Total day calls \\\n", + "0 25 265.1 110 \n", + "1 26 161.6 123 \n", + "2 0 243.4 114 \n", + "3 0 299.4 71 \n", + "4 0 166.7 113 \n", + "\n", + " Total day charge Total eve minutes Total eve calls Total eve charge \\\n", + "0 45.07 197.4 99 16.78 \n", + "1 27.47 195.5 103 16.62 \n", + "2 41.38 121.2 110 10.30 \n", + "3 50.90 61.9 88 5.26 \n", + "4 28.34 148.3 122 12.61 \n", + "\n", + " Total night minutes Total night calls Total night charge \\\n", + "0 244.7 91 11.01 \n", + "1 254.4 103 11.45 \n", + "2 162.6 104 7.32 \n", + "3 196.9 89 8.86 \n", + "4 186.9 121 8.41 \n", + "\n", + " Total intl minutes Total intl calls Total intl charge \\\n", + "0 10.0 3 2.70 \n", + "1 13.7 3 3.70 \n", + "2 12.2 5 3.29 \n", + "3 6.6 7 1.78 \n", + "4 10.1 3 2.73 \n", + "\n", + " Customer service calls Churn \n", + "0 1 0 \n", + "1 1 0 \n", + "2 0 0 \n", + "3 2 0 \n", + "4 3 0 " + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = df.replace({'Voice mail plan': d})\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "### Группировка данных\n", + "\n", + "В общем случае группировка данных в Pandas выглядит следующим образом:\n", + "\n", + "```\n", + "df.groupby(by=grouping_columns)[columns_to_show].function()\n", + "```\n", + "\n", + "1. К датафрейму применяется метод **`groupby`**, который разделяет данные по `grouping_columns` – признаку или набору признаков.\n", + "3. Индексируем по нужным нам столбцам (`columns_to_show`). \n", + "2. К полученным группам применяется функция или несколько функций." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "**Группирование данных в зависимости от значения признака `Churn` и вывод статистик по трём столбцам в каждой группе.**" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Total day minutesTotal eve minutesTotal night minutes
Churn
0count2850.002850.002850.00
mean175.18199.04200.13
std50.1850.2951.11
min0.000.0023.20
50%177.20199.60200.25
max315.60361.80395.00
1count483.00483.00483.00
mean206.91212.41205.23
std69.0051.7347.13
min0.0070.9047.40
50%217.60211.30204.80
max350.80363.70354.90
\n", + "
" + ], + "text/plain": [ + " Total day minutes Total eve minutes Total night minutes\n", + "Churn \n", + "0 count 2850.00 2850.00 2850.00\n", + " mean 175.18 199.04 200.13\n", + " std 50.18 50.29 51.11\n", + " min 0.00 0.00 23.20\n", + " 50% 177.20 199.60 200.25\n", + " max 315.60 361.80 395.00\n", + "1 count 483.00 483.00 483.00\n", + " mean 206.91 212.41 205.23\n", + " std 69.00 51.73 47.13\n", + " min 0.00 70.90 47.40\n", + " 50% 217.60 211.30 204.80\n", + " max 350.80 363.70 354.90" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "columns_to_show = ['Total day minutes', 'Total eve minutes', 'Total night minutes']\n", + "\n", + "df.groupby(['Churn'])[columns_to_show].describe(percentiles=[])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "Сделаем то же самое, но немного по-другому, передав в `agg` список функций:" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Total day minutesTotal eve minutesTotal night minutes
meanstdaminamaxmeanstdaminamaxmeanstdaminamax
Churn
0175.1850.180.0315.6199.0450.290.0361.8200.1351.1123.2395.0
1206.9169.000.0350.8212.4151.7370.9363.7205.2347.1347.4354.9
\n", + "
" + ], + "text/plain": [ + " Total day minutes Total eve minutes \\\n", + " mean std amin amax mean std amin \n", + "Churn \n", + "0 175.18 50.18 0.0 315.6 199.04 50.29 0.0 \n", + "1 206.91 69.00 0.0 350.8 212.41 51.73 70.9 \n", + "\n", + " Total night minutes \n", + " amax mean std amin amax \n", + "Churn \n", + "0 361.8 200.13 51.11 23.2 395.0 \n", + "1 363.7 205.23 47.13 47.4 354.9 " + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "columns_to_show = ['Total day minutes', 'Total eve minutes', 'Total night minutes']\n", + "\n", + "df.groupby(['Churn'])[columns_to_show].agg([np.mean, np.std, np.min, np.max])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "### Сводные таблицы" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "Допустим, мы хотим посмотреть, как наблюдения в нашей выборке распределены в контексте двух признаков — `Churn` и `Customer service calls`. Для этого мы можем построить **таблицу сопряженности**, воспользовавшись методом **`crosstab`**:" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
International planFalseTrue
Churn
02664186
1346137
\n", + "
" + ], + "text/plain": [ + "International plan False True \n", + "Churn \n", + "0 2664 186\n", + "1 346 137" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pd.crosstab(df['Churn'], df['International plan'])" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true, + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Voice mail planFalseTrue
Churn
00.600.25
10.120.02
\n", + "
" + ], + "text/plain": [ + "Voice mail plan False True \n", + "Churn \n", + "0 0.60 0.25\n", + "1 0.12 0.02" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pd.crosstab(df['Churn'], df['Voice mail plan'], normalize=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "Мы видим, что большинство пользователей — лояльные и пользуются дополнительными услугами (международного роуминга / голосовой почты)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "Продвинутые пользователи Excel наверняка вспомнят о такой фиче, как **сводные таблицы** (pivot tables). В Pandas за сводные таблицы отвечает метод **`pivot_table`**, который принимает в качестве параметров:\n", + "\n", + "* `values` – список переменных, по которым требуется рассчитать нужные статистики,\n", + "* `index` – список переменных, по которым нужно сгруппировать данные,\n", + "* `aggfunc` — то, что нам, собственно, нужно посчитать по группам — сумму, среднее, максимум, минимум или что-то ещё.\n", + "\n", + "Давайте посмотрим среднее число дневных, вечерних и ночных звонков для разных Area code:" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true, + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Total day callsTotal eve callsTotal night calls
Area code
408100.5099.7999.04
415100.58100.50100.40
510100.1099.67100.60
\n", + "
" + ], + "text/plain": [ + " Total day calls Total eve calls Total night calls\n", + "Area code \n", + "408 100.50 99.79 99.04\n", + "415 100.58 100.50 100.40\n", + "510 100.10 99.67 100.60" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.pivot_table(['Total day calls', 'Total eve calls', 'Total night calls'], ['Area code'], aggfunc='mean').head(10)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "### Преобразование датафреймов\n", + "\n", + "Как и многие другие вещи, добавлять столбцы в DataFrame можно несколькими способами." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "Например, мы хотим посчитать общее количество звонков для всех пользователей. Создадим объект `total_calls` типа Series и вставим его в датафрейм:" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StateAccount lengthArea codeInternational planVoice mail planNumber vmail messagesTotal day minutesTotal day callsTotal day chargeTotal eve minutesTotal eve callsTotal eve chargeTotal night minutesTotal night callsTotal night chargeTotal intl minutesTotal intl callsTotal intl chargeCustomer service callsChurnTotal calls
0KS128415FalseTrue25265.111045.07197.49916.78244.79111.0110.032.7010303
1OH107415FalseTrue26161.612327.47195.510316.62254.410311.4513.733.7010332
2NJ137415FalseFalse0243.411441.38121.211010.30162.61047.3212.253.2900333
3OH84408TrueFalse0299.47150.9061.9885.26196.9898.866.671.7820255
4OK75415TrueFalse0166.711328.34148.312212.61186.91218.4110.132.7330359
\n", + "
" + ], + "text/plain": [ + " State Account length Area code International plan Voice mail plan \\\n", + "0 KS 128 415 False True \n", + "1 OH 107 415 False True \n", + "2 NJ 137 415 False False \n", + "3 OH 84 408 True False \n", + "4 OK 75 415 True False \n", + "\n", + " Number vmail messages Total day minutes Total day calls \\\n", + "0 25 265.1 110 \n", + "1 26 161.6 123 \n", + "2 0 243.4 114 \n", + "3 0 299.4 71 \n", + "4 0 166.7 113 \n", + "\n", + " Total day charge Total eve minutes Total eve calls Total eve charge \\\n", + "0 45.07 197.4 99 16.78 \n", + "1 27.47 195.5 103 16.62 \n", + "2 41.38 121.2 110 10.30 \n", + "3 50.90 61.9 88 5.26 \n", + "4 28.34 148.3 122 12.61 \n", + "\n", + " Total night minutes Total night calls Total night charge \\\n", + "0 244.7 91 11.01 \n", + "1 254.4 103 11.45 \n", + "2 162.6 104 7.32 \n", + "3 196.9 89 8.86 \n", + "4 186.9 121 8.41 \n", + "\n", + " Total intl minutes Total intl calls Total intl charge \\\n", + "0 10.0 3 2.70 \n", + "1 13.7 3 3.70 \n", + "2 12.2 5 3.29 \n", + "3 6.6 7 1.78 \n", + "4 10.1 3 2.73 \n", + "\n", + " Customer service calls Churn Total calls \n", + "0 1 0 303 \n", + "1 1 0 332 \n", + "2 0 0 333 \n", + "3 2 0 255 \n", + "4 3 0 359 " + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "total_calls = df['Total day calls'] + df['Total eve calls'] + df['Total night calls'] + df['Total intl calls']\n", + "df.insert(loc=len(df.columns), column='Total calls', value=total_calls) \n", + "# loc - номер столбца, после которого нужно вставить данный Series\n", + "# мы указали len(df.columns), чтобы вставить его в самом конце\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "Добавить столбец из имеющихся можно и проще, не создавая промежуточных Series:" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StateAccount lengthArea codeInternational planVoice mail planNumber vmail messagesTotal day minutesTotal day callsTotal day chargeTotal eve minutesTotal eve callsTotal eve chargeTotal night minutesTotal night callsTotal night chargeTotal intl minutesTotal intl callsTotal intl chargeCustomer service callsChurnTotal callsTotal charge
0KS128415FalseTrue25265.111045.07197.49916.78244.79111.0110.032.701030375.56
1OH107415FalseTrue26161.612327.47195.510316.62254.410311.4513.733.701033259.24
2NJ137415FalseFalse0243.411441.38121.211010.30162.61047.3212.253.290033362.29
3OH84408TrueFalse0299.47150.9061.9885.26196.9898.866.671.782025566.80
4OK75415TrueFalse0166.711328.34148.312212.61186.91218.4110.132.733035952.09
\n", + "
" + ], + "text/plain": [ + " State Account length Area code International plan Voice mail plan \\\n", + "0 KS 128 415 False True \n", + "1 OH 107 415 False True \n", + "2 NJ 137 415 False False \n", + "3 OH 84 408 True False \n", + "4 OK 75 415 True False \n", + "\n", + " Number vmail messages Total day minutes Total day calls \\\n", + "0 25 265.1 110 \n", + "1 26 161.6 123 \n", + "2 0 243.4 114 \n", + "3 0 299.4 71 \n", + "4 0 166.7 113 \n", + "\n", + " Total day charge Total eve minutes Total eve calls Total eve charge \\\n", + "0 45.07 197.4 99 16.78 \n", + "1 27.47 195.5 103 16.62 \n", + "2 41.38 121.2 110 10.30 \n", + "3 50.90 61.9 88 5.26 \n", + "4 28.34 148.3 122 12.61 \n", + "\n", + " Total night minutes Total night calls Total night charge \\\n", + "0 244.7 91 11.01 \n", + "1 254.4 103 11.45 \n", + "2 162.6 104 7.32 \n", + "3 196.9 89 8.86 \n", + "4 186.9 121 8.41 \n", + "\n", + " Total intl minutes Total intl calls Total intl charge \\\n", + "0 10.0 3 2.70 \n", + "1 13.7 3 3.70 \n", + "2 12.2 5 3.29 \n", + "3 6.6 7 1.78 \n", + "4 10.1 3 2.73 \n", + "\n", + " Customer service calls Churn Total calls Total charge \n", + "0 1 0 303 75.56 \n", + "1 1 0 332 59.24 \n", + "2 0 0 333 62.29 \n", + "3 2 0 255 66.80 \n", + "4 3 0 359 52.09 " + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['Total charge'] = df['Total day charge'] + df['Total eve charge'] + df['Total night charge'] + df['Total intl charge']\n", + "\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "Чтобы удалить столбцы или строки, воспользуйтесь методом `drop`, передавая в качестве аргумента нужные индексы и требуемое значение параметра `axis` (`1`, если удаляете столбцы, и ничего или `0`, если удаляете строки):" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true, + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StateAccount lengthArea codeInternational planVoice mail planNumber vmail messagesTotal day minutesTotal day callsTotal day chargeTotal eve minutesTotal eve callsTotal eve chargeTotal night minutesTotal night callsTotal night chargeTotal intl minutesTotal intl callsTotal intl chargeCustomer service callsChurn
0KS128415FalseTrue25265.111045.07197.49916.78244.79111.0110.032.7010
3OH84408TrueFalse0299.47150.9061.9885.26196.9898.866.671.7820
4OK75415TrueFalse0166.711328.34148.312212.61186.91218.4110.132.7330
5AL118510TrueFalse0223.49837.98220.610118.75203.91189.186.361.7000
6MA121510FalseTrue24218.28837.09348.510829.62212.61189.577.572.0330
\n", + "
" + ], + "text/plain": [ + " State Account length Area code International plan Voice mail plan \\\n", + "0 KS 128 415 False True \n", + "3 OH 84 408 True False \n", + "4 OK 75 415 True False \n", + "5 AL 118 510 True False \n", + "6 MA 121 510 False True \n", + "\n", + " Number vmail messages Total day minutes Total day calls \\\n", + "0 25 265.1 110 \n", + "3 0 299.4 71 \n", + "4 0 166.7 113 \n", + "5 0 223.4 98 \n", + "6 24 218.2 88 \n", + "\n", + " Total day charge Total eve minutes Total eve calls Total eve charge \\\n", + "0 45.07 197.4 99 16.78 \n", + "3 50.90 61.9 88 5.26 \n", + "4 28.34 148.3 122 12.61 \n", + "5 37.98 220.6 101 18.75 \n", + "6 37.09 348.5 108 29.62 \n", + "\n", + " Total night minutes Total night calls Total night charge \\\n", + "0 244.7 91 11.01 \n", + "3 196.9 89 8.86 \n", + "4 186.9 121 8.41 \n", + "5 203.9 118 9.18 \n", + "6 212.6 118 9.57 \n", + "\n", + " Total intl minutes Total intl calls Total intl charge \\\n", + "0 10.0 3 2.70 \n", + "3 6.6 7 1.78 \n", + "4 10.1 3 2.73 \n", + "5 6.3 6 1.70 \n", + "6 7.5 7 2.03 \n", + "\n", + " Customer service calls Churn \n", + "0 1 0 \n", + "3 2 0 \n", + "4 3 0 \n", + "5 0 0 \n", + "6 3 0 " + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = df.drop(['Total charge', 'Total calls'], axis=1) # избавляемся от созданных только что столбцов\n", + "\n", + "df.drop([1, 2]).head() # а вот так можно удалить строчки" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "--------\n", + "\n", + "\n", + "\n", + "## Первые попытки прогнозирования оттока\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Посмотрим, как отток связан с признаком *\"Подключение международного роуминга\" (International plan)*. Сделаем это с помощью сводной таблички *crosstab*, а также путем иллюстрации с Seaborn (как именно строить такие картинки и анализировать с их помощью графики – материал следующей статьи.)" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Populating the interactive namespace from numpy and matplotlib\n" + ] + } + ], + "source": [ + "# !conda install seaborn # надо дополнительно установить (команда в терминале)\n", + "# чтоб картинки рисовались в тетрадке\n", + "%pylab inline \n", + "import seaborn as sns\n", + "figsize(10,8)" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
International planFalseTrueAll
Churn
026641862850
1346137483
All30103233333
\n", + "
" + ], + "text/plain": [ + "International plan False True All\n", + "Churn \n", + "0 2664 186 2850\n", + "1 346 137 483\n", + "All 3010 323 3333" + ] + }, + "execution_count": 47, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pd.crosstab(df['Churn'], df['International plan'], margins=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAmcAAAHfCAYAAAAVw3+UAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGxpJREFUeJzt3Xu03WV95/HPCUEjmMQwK4qtzFBv3zreBcELaFBbi8sB\na8eltVrxguKo4GhRCqFeBqsY0eGypCoiWOu0iouWpoK4FC0wKio6gpenYL10plWPIZCUlITImT/2\nDh7ShBzJ2Wc/nPN6reVyn2f/zm9/j3/E93p++7f3xNTUVAAA6MOicQ8AAMAviTMAgI6IMwCAjogz\nAICOiDMAgI6IMwCAjiwe9wCzaXJyo88FAQDuFlauXDqxo3U7ZwAAHRFnAAAdEWcAAB0RZwAAHRFn\nAAAdEWcAAB0RZwAAHRFnAAAdmVcfQgsAsCPf+c61+eAH35+tW7fm1ltvzctf/qp89rOX5Mgjfy+P\neMQjxz3eHYgzAGBeu+mmG3PaaadmzZr/mX32+Q+54YZ1OeaYl2X//R847tF2aGJqav5845GvbwIA\ntvd3f3dRfvazn+alLz369rUNGzbkzDPfm02bbs6GDRuyaNEe+dM/fXe++MXLsm7durz4xUfl6qu/\nls997tK86EVH5YQT3pi99947z3veC3LuuR/Mgx70kPzoRz/IU5/6tBx11Cvu0ly+vgkAWJBuuGFd\n9t33/ndYW7ZsWZLkcY97fM488wPZb7/9cvXVX9vpOTZtujlnnfXBHHbYM/Iv//LPecMb3pwPfOC8\nXHTRhbM+r8uaAMC8tnLlfTM5+bM7rF199deybt26VP1mkmTFin2yefPmnZ7jAQ/YL4sWLbr9fNvi\nbsmSJbM+r50zAGBee9KTDslll30u69ffkCT5+c8nc+qpp2TRookkd7yyeI973CM///kg5K67rt2+\nPjGxaNrjHV6NnDV2zgCAeW3ZsuU59tg3ZPXqN2diYiJbtmzJH/3RCbn00kv+3bEHHfSEfOpTn8hr\nX/vKsd0w4IYAAIAxcEMAAMDdgDgDAOiIOAMA6Ig4AwDoiLs1d8Nxay4a9wjcRacff8S4RwCAHbJz\nBgDQETtnAMC8MdtXtXZ1peW2227Laae9K9dff1323HPPnHDCyXnAA/bbrde0cwYAcBddfvkXsmXL\nlnzgAx/JMce8Lmed9b7dPqc4AwC4i771rW/m4IOfmCR5xCMeme9977u7fU5xBgBwF918883Ze+97\n3/7zokWLsnXr1t06pzgDALiL9t5772zatOn2n6emprJ48e69pV+cAQDcRY985KPz5S9fmSS59tpr\n8sAHPni3z+luTQCAu+gpTzksX/3qV3LMMS/L1NRUTjzxLbt9TnEGAMwbc/0h44sWLcrxx584u+ec\n1bMBALBbxBkAQEfEGQBAR8QZAEBHxBkAQEfEGQBAR3yUBgAwbxy/dvWsnm/Ns0+Z0XHf/va1Ofvs\nM3LWWR/c7dcUZwAAu+Ev/uL8fOYzn86SJfealfO5rAkAsBt+/dcfkHe8Y82snU+cAQDshlWrnr7b\nX3Y+nTgDAOiIOAMA6MhIbgioqj2TnJtk/yT3THJKkn9KsjbJdcPDzm6t/VVVHZ3kVUm2Jjmltba2\nqu6V5GNJ7ptkY5KXtNYmRzErAEBPJqampmb9pFX10iSPbq29vqr2SfLNJG9Psry1dtq04/ZN8tkk\nByZZkuSK4ePXJFnWWntrVb0gyRNba8ft6nUnJzfO/h9zJ45bc9Fcvhyz6PTjjxj3CAAscCtXLp3Y\n0fqoPkrjk0kuGD6eyGBX7IAkVVVHZrB79vokByW5srW2Ocnmqro+yaOSHJLk3cPfvzjJySOaEwCg\nKyOJs9bavyZJVS3NINJWZ3B585zW2ter6qQkb8lgR+2mab+6McnyJMumrW9b26UVK/bK4sV7zMrf\nwPy2cuXScY8AADs0sg+hrar9klyY5P2ttY9X1X1aazcOn74wyZlJ/j7J9P+XXJrkxiQbpq1vW9ul\n9es3zcboLACTkxvHPQIAC9zONgpGcrdmVd0vyaVJ3txaO3e4/JmqOmj4+OlJvp7kqiSHVtWSqlqe\n5GFJrk1yZZJnDY89PMnlo5gTAKA3o9o5OzHJiiQnV9W294u9Icn7qurWJD9J8srW2oaqOiOD+FqU\n5KTW2i1VdXaS86vqiiRbkrxwRHMCAHRlJHdrjou7NZkpd2sCMG47u1vTh9ACAHREnAEAdEScAQB0\nRJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdESc\nAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEA\ndEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHRE\nnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwB\nAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0\nRJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHRk8ShOWlV7\nJjk3yf5J7pnklCTfSXJekqkk1yZ5TWvttqo6OsmrkmxNckprbW1V3SvJx5LcN8nGJC9prU2OYlYA\ngJ6MaufsRUnWtdYOTfI7Sc5K8t4kq4drE0mOrKp9kxyb5MlJnpnknVV1zySvTnLN8NiPJlk9ojkB\nALoyqjj7ZJKTh48nMtgVOyDJF4drFyd5RpKDklzZWtvcWrspyfVJHpXkkCSXbHcsAMC8N5LLmq21\nf02Sqlqa5IIMdr7e01qbGh6yMcnyJMuS3DTtV3e0vm1tl1as2CuLF++x2/Mz/61cuXTcIwDADo0k\nzpKkqvZLcmGS97fWPl5V75729NIkNybZMHx8Z+vb1nZp/fpNuzs2C8Tk5MZxjwDAArezjYKRXNas\nqvsluTTJm1tr5w6Xv1FVq4aPD09yeZKrkhxaVUuqanmSh2Vws8CVSZ613bEAAPPeqHbOTkyyIsnJ\nVbXtvWfHJTmjqu6R5LtJLmit/aKqzsggvhYlOam1dktVnZ3k/Kq6IsmWJC8c0ZwAAF2ZmJqa2vVR\ndxOTkxvn9I85bs1Fc/lyzKLTjz9i3CMAsMCtXLl0YkfrPoQWAKAj4gwAoCPiDACgI+IMAKAj4gwA\noCPiDACgI+IMAKAj4gwAoCPiDACgI+IMAKAj4gwAoCPiDACgI+IMAKAj4gwAoCPiDACgI+IMAKAj\n4gwAoCPiDACgI+IMAKAj4gwAoCPiDACgI+IMAKAj4gwAoCPiDACgI+IMAKAj4gwAoCPiDACgI+IM\nAKAj4gwAoCPiDACgI+IMAKAj4gwAoCPiDACgI+IMAKAj4gwAoCPiDACgI+IMAKAj4gwAoCPiDACg\nI+IMAKAj4gwAoCPiDACgI+IMAKAj4gwAoCPiDACgI+IMAKAj4gwAoCPiDACgI+IMAKAj4gwAoCPi\nDACgI+IMAKAj4gwAoCPiDACgI+IMAKAj4gwAoCPiDACgI+IMAKAj4gwAoCPiDACgI+IMAKAj4gwA\noCPiDACgI+IMAKAj4gwAoCPiDACgI+IMAKAj4gwAoCPiDACgI4tHefKqOjjJqa21VVX12CRrk1w3\nfPrs1tpfVdXRSV6VZGuSU1pra6vqXkk+luS+STYmeUlrbXKUswIA9GBkcVZVb0ry4iQ3D5cOSPLe\n1tpp047ZN8mxSQ5MsiTJFVX12SSvTnJNa+2tVfWCJKuTHDeqWQEAejHKnbPvJ3lukj8f/nxAkqqq\nIzPYPXt9koOSXNla25xkc1Vdn+RRSQ5J8u7h712c5OQRzgkA0I2RveestfapJLdOW7oqyfGttack\n+cckb0myLMlN047ZmGT5duvb1gAA5r2RvudsOxe21m7c9jjJmUn+PsnSaccsTXJjkg3T1ret7dKK\nFXtl8eI9Zmda5rWVK5fu+iAAGIO5jLPPVNXrWmtXJXl6kq9nsJv2jqpakuSeSR6W5NokVyZ51vD5\nw5NcPpMXWL9+0yjmZh6anNw47hEAWOB2tlEwl3H26iRnVtWtSX6S5JWttQ1VdUYG8bUoyUmttVuq\n6uwk51fVFUm2JHnhHM4JADA2E1NTU+OeYdZMTm6c0z/muDUXzeXLMYtOP/6IcY8AwAK3cuXSiR2t\nz+iGgKo6cwdr5+/uUAAA3NGdXtasqnOSPDDJgVX18GlP7Rl3UAIAzLpdvefslCT7Jzk9ydumrW9N\n8t0RzQQAsGDdaZy11n6Y5IdJHl1VyzLYLdt2ffTeSW4Y5XAAAAvNjO7WrKo/TvLHSdZNW57K4JIn\nAACzZKYfpfGKJA/y5eMAAKM1069v+nFcwgQAGLmZ7pxdl+SKqrosyS3bFltrbx/JVAAAC9RM4+z/\nDf+T/PKGAAAAZtmM4qy19rZdHwUAwO6a6d2at2Vwd+Z0/9xa22/2RwIAWLhmunN2+40DVbVnkuck\neeKohgIAWKhmerfm7Vprt7bWPpnkaSOYBwBgQZvpZc0/nPbjRJKHJ9kykokAABawmd6tedi0x1NJ\nfp7k+bM/DgDAwjbT95y9dPhesxr+zrWtta0jnQwAYAGa0XvOquqADD6I9vwkH0ny46o6eJSDAQAs\nRDO9rHlGkue31r6SJFX1hCRnJjloVIMBACxEM71b897bwixJWmtfTrJkNCMBACxcM42zG6rqyG0/\nVNVzkqwbzUgAAAvXTC9rvjLJ2qr6cAYfpTGV5EkjmwoAYIGa6c7Z4Uk2JflPGXysxmSSVSOaCQBg\nwZppnL0yyZNbaze31r6V5IAkrxvdWAAAC9NM42zP3PEbAbbk338ROgAAu2mm7zn76ySfr6pPDH9+\nbpK/Gc1IAAAL14x2zlprb87gs84qyQOTnNFaO3mUgwEALEQz3TlLa+2CJBeMcBYAgAVvpu85AwBg\nDogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6I\nMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMA\ngI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICO\niDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjiwe5cmr6uAkp7bWVlXVg5Oc\nl2QqybVJXtNau62qjk7yqiRbk5zSWltbVfdK8rEk902yMclLWmuTo5wVAKAHI9s5q6o3JTknyZLh\n0nuTrG6tHZpkIsmRVbVvkmOTPDnJM5O8s6rumeTVSa4ZHvvRJKtHNScAQE9GeVnz+0meO+3nA5J8\ncfj44iTPSHJQkitba5tbazcluT7Jo5IckuSS7Y4FAJj3RnZZs7X2qaraf9rSRGttavh4Y5LlSZYl\nuWnaMTta37a2SytW7JXFi/fYnbFZIFauXDruEQBgh0b6nrPt3Dbt8dIkNybZMHx8Z+vb1nZp/fpN\nuz8lC8Lk5MZxjwDAArezjYK5vFvzG1W1avj48CSXJ7kqyaFVtaSqlid5WAY3C1yZ5FnbHQsAMO/N\nZZy9McnbqupLSe6R5ILW2k+SnJFBfH0+yUmttVuSnJ3k4VV1RZJXJnnbHM4JADA2E1NTU7s+6m5i\ncnLjnP4xx625aC5fjll0+vFHjHsEABa4lSuXTuxo3YfQAgB0RJwBAHREnAEAdEScAQB0RJwBAHRE\nnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwB\nAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0\nRJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdESc\nAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEA\ndEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHRE\nnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0ZPFcv2BVXZ1kw/DHHyR5R5Lz\nkkwluTbJa1prt1XV0UlelWRrklNaa2vnelYAgLk2p3FWVUuSTLTWVk1buyjJ6tbaF6rqz5IcWVVf\nSnJskgOTLElyRVV9trW2eS7nBQCYa3O9c/boJHtV1aXD1z4xyQFJvjh8/uIkv53kF0muHMbY5qq6\nPsmjknx1jucFAJhTcx1nm5K8J8k5SR6SQYxNtNamhs9vTLI8ybIkN037vW3rd2rFir2yePEeszow\n89PKlUvHPQIA7NBcx9k/JLl+GGP/UFXrMtg522ZpkhszeE/a0h2s36n16zfN4qjMZ5OTG8c9AgAL\n3M42Cub6bs2XJTktSarq1zLYIbu0qlYNnz88yeVJrkpyaFUtqarlSR6Wwc0CAADz2lzvnH04yXlV\ndUUGd2e+LMnPk3yoqu6R5LtJLmit/aKqzsgg1BYlOam1dssczwoAMOfmNM5aa1uSvHAHTz11B8d+\nKMmHRj4UAEBHfAgtAEBHxBkAQEfEGQBAR8QZAEBHxBkAQEfEGQBAR8QZAEBHxBkAQEfEGQBAR8QZ\nAEBHxBkAQEfEGQBAR8QZAEBHxBkAQEfEGQBAR8QZAEBHxBkAQEfEGQBAR8QZAEBHFo97ABiH49eu\nHvcI7IY1zz5l3CMAjIydMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMA\ngI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICO\niDMAgI6IMwCAjogzAICOiDMAgI4sHvcAANzRcWsuGvcI3EWnH3/EuEdgHrBzBgDQEXEGANARcQYA\n0BFxBgDQEXEGANARcQYA0BEfpQEAs+T4tavHPQK7Yc2zTxn3CEnsnAEAdEWcAQB0RJwBAHREnAEA\ndEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHRE\nnAEAdEScAQB0RJwBAHRk8bgH2JmqWpTk/UkenWRzkle01q4f71QAAKPV887Zc5Isaa09MckJSU4b\n8zwAACPXc5wdkuSSJGmtfTnJgeMdBwBg9CampqbGPcMOVdU5ST7VWrt4+POPkzywtbZ1vJMBAIxO\nzztnG5IsnfbzImEGAMx3PcfZlUmelSRV9YQk14x3HACA0ev2bs0kFyb5rar630kmkrx0zPMAAIxc\nt+85AwBYiHq+rAkAsOCIMwCAjvT8njP4lVTV/km+leTqacufb629fQfHnpfkL1trl8zNdMDdWVWd\nluSAJPsm2SvJPyaZbK09b6yDMS+JM+ab77TWVo17CGB+aa29MUmq6qgkv9laO2G8EzGfiTPmtara\nI8kHkuyX5P5JLmqtrZ72/EOTfCTJ1gwu87+wtfZPVfXOJIcm2SPJe1trn5zz4YGuVdWqJKcm2ZLk\ng0n+RwbhdktVvSvJ91pr5/n3hF+V95wx3/znqvrCtv8keUKSL7fWnpnkoCTHbHf8byW5Kskzkrwl\nyfKqOjzJb7TWDklyWJKTquo+c/YXAHcnS1prh7bW/nxHT/r3hLvCzhnzzR0ua1bVsiR/WFWHZfCt\nE/fc7vgPJ3lzBt/jelOSE5M8MskBw7hLkj2T7J/km6McHLhbajtZnxj+t39P+JXZOWO+OyrJja21\nP0hyWpK9qmpi2vNHJrm8tfb0JJ/MINS+l+SyYeQ9Lcknknx/LocG7jZum/b4liT3H/4b85jhmn9P\n+JXZOWO++1ySj1fVE5NsTnJdkl+b9vzXkpxfVaszeD/If0/yjSSrquryJPdOcmFrbePcjg3cDb07\nyaeT/DDJ+uHa38a/J/yKfEMAAEBHXNYEAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4A8amqnZ5u3hV\nXTYHc7ytqg4dPj6nqg6c5fP/sKr2n+Gxq6Z9YCmwAPmcM6B3q+bgNZ6a5LIkaa29Yg5eD2CnxBkw\ndsMvkD4xyaYkD0tyTZIXJnnP8PmvtNYOrqrfSfL2DL4C5wdJjm6trauqHyb5Sgafyv7iDL7s/tok\nj03y0yTPa63dUFWvHT6/dwaf7P78JI9PcmCSc6rqd5OcmeStrbUvVNWJSV6U5BdJLk3ypiT7Jblw\npudvrX13J3/zUUmem2SfJPfL4MNK37jdMU9N8o4keyVZkeRNrbVPVtV5GXzd2AFJHpDkba21j8z4\nf3Cgay5rAr14UpLXZhBn/zHJM1trxybJMMxWJnnXcP2xST6T5NRpv39xa62S/CzJo5O8t7X2iCQ3\nJvmD4fesPifJquH6Xyf5b621j2bwTRGvaK1ds+1kVfWsJEdkEECPTfLgJMcMn57x+XfxNz8+ye8l\neXiSJyT53e2ef91wrscleXmSP5n23H5JDk3yXzKMWGB+EGdAL65trf3f1tptSb6bwY7SdAdnEG2X\nVdU3Mwi5h0x7/ivTHv+stfaNbedNsk9rbUMGu3EvqKp3ZhA1976TeZ6W5H+11v6ttbY1yblJnj6L\n50+Si1prP22tbUnyl8PXnO5FSR5RVSdnsKs2/XyXttamtr3+Ll4HuBsRZ0Avbpn2eCrJxHbP75Hk\nitbaY1prj8lg1+m/Tnv+3+7sXFW1X5IvJblPkouTnLeD15hu+38fJ/LLt4LMxvmTZOt2r7d1u+cv\nT3JQkq9ncHlz+vluSZJhoAHziDgDeveLqlqcwc7YE6vqocP1k5Os+RXO8/gk17fW3jc81+EZBF8y\niKLt34P7+SS/X1X3Gr7+SzO8aeAunH9nDq+q5VW1JMnvZxB1SZKq2ifJQ5P8SWvt00l+ewbnA+YB\ncQb07m+S/J8M3tv1siSfqKprkjwu272BfhcuTbKoqr6T5MtJfpjkN4bPXZLkz6rqSdsObq2tTbI2\ng/ejfTvJjzK4WeCunH9nfpbk0xn8fX/bWvvMtNe/Ick5Sb5dVd9Ict8ke1XV3jP5Y4G7r4mpKTvi\nAHNteLfmqtbaUWMeBeiMnTMAgI7YOQMA6IidMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI78\nf3yflEJqbQ1yAAAAAElFTkSuQmCC\n", + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "sns.countplot(x='International plan', hue='Churn', data=df);\n", + "savefig('int_plan_and_churn.png', dpi=300);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Видим, что когда роуминг подключен, доля оттока намного выше – интересное наблюдение! Возможно, большие и плохо контролируемые траты в роуминге очень конфликтогенны и приводят к недовольству клиентов телеком-оператора и, соответственно, к их оттоку. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Далее посмотрим на еще один важный признак – *\"Число обращений в сервисный центр\" (Customer service calls)*. Также построим сводную таблицу и картинку." + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Customer service calls0123456789All
Churn
06051059672385902684102850
1921228744764014512483
All697118175942916666229223333
\n", + "
" + ], + "text/plain": [ + "Customer service calls 0 1 2 3 4 5 6 7 8 9 All\n", + "Churn \n", + "0 605 1059 672 385 90 26 8 4 1 0 2850\n", + "1 92 122 87 44 76 40 14 5 1 2 483\n", + "All 697 1181 759 429 166 66 22 9 2 2 3333" + ] + }, + "execution_count": 48, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pd.crosstab(df['Churn'], df['Customer service calls'], margins=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAmcAAAHfCAYAAAAVw3+UAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHw9JREFUeJzt3Xu053Vd7/HXHgaYwIGwxkjlaFh9NBUvGCgIjB48iMsD\n1ql0dbTARDmhkBdSETIMNMJLXI4JKIJHa5V4KMTwklxCUlGxBC9voTTPSspxuI0Sl4F9/vj+BrY4\nDBvYv9/vM3s/HmvNWnv/Lt/f+zezZs9zPt/v7/udmZ2dDQAAfVg27QEAALibOAMA6Ig4AwDoiDgD\nAOiIOAMA6Ig4AwDoyPJpD7CQ1qxZ57wgAMBmYdWqlTMbu93KGQBAR8QZAEBHxBkAQEfEGQBAR8QZ\nAEBHxBkAQEfEGQBAR8QZAEBHFtVJaB+or33tqpx++ruzfv363H777fmd33lFPvWpj+fAA/9HnvCE\nJ057PABgCVnycXbjjTfkHe84ISee+Kd56EN/KtddtzaHHvrSPPrRO097NABgCZqZnV08Vzx6IJdv\n+tjHzsv3vvcfOfjgQ+667aabbsopp7wzN9/8w9x0001ZtmyLvPWtf5JLLrkoa9euzUteclCuuOKL\n+fSnP5kXv/igvOENr822226bX//1F+XMM0/PYx7zC/nXf/1W9tnn2TnooJct6HsEABYHl2+6F9dd\ntzY77vizP3LbdtttlyR56lN/Oaecclp22mmnXHHFF+91Gzff/MOceurpedaz9s211343r3nN63Pa\naWflvPPOHevsAMDis+R3a65a9bCsWfO9H7ntiiu+mLVr16a1xyZJdtjhobn11lvvdRuPfOROWbZs\n2V3b2xB3K1asGNPUAMBiteRXzvbY45m56KJP5/rrr0uSfP/7a3LCCcdl2bKZJD+62rjVVlvl+98f\nQu7qq+uu22dmls35eqMrlAAA87LkV8622277HH74a3L00a/PzMxMbrvttrzudW/IJz/58R977G67\nPT0f+chf5ZWvfLkPDAAAY7HkPxAAADANPhAAALAZEGcAAB0RZwAAHRFnAAAdWfKf1tzcHXHieWPZ\n7klHHjCW7QIAm2blDACgI0tu5WyhV5rua4XpzjvvzDve8ce55pqrs+WWW+YNbzgmj3zkTgs6AwCw\neFg5G7NLL704t912W0477f059NBX5dRT3zXtkQCAjomzMfvKV/4xu+/+jCTJE57wxHzjG1+f8kQA\nQM/E2Zj98Ic/zLbbPuSu75ctW5b169dPcSIAoGfibMy23Xbb3HzzzXd9Pzs7m+XLl9yhfgDAPImz\nMXviE5+Uz33usiTJVVddmZ13/vkpTwQA9MwSzpjtvfez8oUvfD6HHvrSzM7O5qij3jztkQCAji25\nOJv0yVWXLVuWI488aqKvCQBsvuzWBADoiDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6MiSO5XGkecf\nvaDbO/H5x83rcV/96lX5sz87OaeeevqCvj4AsLgsuTibhg996Ox84hN/mxUrfmLaowAAnbNbcwIe\n8YhH5vjjT5z2GADAZmCsK2ettd2TnFBVq1trP5/krCSzSa5KclhV3dlaOyTJK5KsT3JcVZ3fWvuJ\nJB9M8rAk65L8dlWtGees47R69X/Ntdd+d9pjAACbgbGtnLXWfj/Je5OsGN30ziRHV9VeSWaSHNha\n2zHJ4Un2TLJfkre11rZO8r+SXDl67AeSLOyBYgAAnRrnbs1/TvKrc77fNcklo68vSLJvkt2SXFZV\nt1bVjUmuSbJLkmcm+fg9HgsAsOiNLc6q6iNJbp9z00xVzY6+Xpdk+yTbJblxzmM2dvuG2wAAFr1J\nflrzzjlfr0xyQ5KbRl9v6vYNt92nHXbYJsuXb7HJx5x18EnzHHdhrVrVcu65H5nKaz8Qq1atvO8H\nAQALbpJx9uXW2uqqujjJ/kkuSnJ5kuNbayuSbJ3kcRk+LHBZkueN7t8/yaXzeYHrr795DGMvTWvW\nrJv2CACwqN3bQsgkT6Xx2iTHttY+m2SrJOdU1b8nOTlDfF2Y5E1VdUuSP0vy+NbaZ5K8PMmxE5wT\nAGBqZmZnZ+/7UZuJNWvWLZ43M09HnHjeWLZ70pEHjGW7AMBg1aqVMxu73UloAQA6Is4AADoizgAA\nOiLOAAA6Is4AADoizgAAOiLOAAA6Is4AADoizgAAOiLOAAA6Is4AADoizgAAOiLOAAA6Is4AADoi\nzgAAOiLOAAA6Is4AADoizgAAOiLOAAA6Is4AADoizgAAOiLOAAA6Is4AADoizgAAOiLOAAA6Is4A\nADoizgAAOiLOAAA6Is4AADoizgAAOiLOAAA6Is4AADoizgAAOiLOAAA6Is4AADoizgAAOiLOAAA6\nIs4AADoizgAAOiLOAAA6Is4AADoizgAAOiLOAAA6Is4AADoizgAAOiLOAAA6Is4AADoizgAAOiLO\nAAA6Is4AADoizgAAOiLOAAA6Is4AADoizgAAOiLOAAA6Is4AADoizgAAOiLOAAA6Is4AADoizgAA\nOiLOAAA6Is4AADoizgAAOiLOAAA6Is4AADoizgAAOiLOAAA6Is4AADoizgAAOiLOAAA6Is4AADqy\nfJIv1lrbMsnZSR6d5I4khyRZn+SsJLNJrkpyWFXd2Vo7JMkrRvcfV1XnT3JWAIBpmPTK2fOSLK+q\nPZK8JcnxSd6Z5Oiq2ivJTJIDW2s7Jjk8yZ5J9kvyttba1hOeFQBg4iYdZ99Msry1tizJdkluT7Jr\nkktG91+QZN8kuyW5rKpuraobk1yTZJcJzwoAMHET3a2Z5AcZdml+I8lPJ3l+kr2ranZ0/7ok22cI\ntxvnPG/D7Zu0ww7bZPnyLRZy3iVr1aqV0x4BAJakScfZq5N8oqre2FrbKcmFSbaac//KJDckuWn0\n9T1v36Trr795AUdd2tasWTftEQBgUbu3hZBJ79a8PneviF2XZMskX26trR7dtn+SS5NcnmSv1tqK\n1tr2SR6X4cMCAACL2qRXzt6V5MzW2qUZVsyOSvLFJGe01rZK8vUk51TVHa21kzOE2rIkb6qqWyY8\nKwDAxE00zqrqB0l+YyN37bORx56R5IyxDwUA0BEnoQUA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoiDgD\nAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDo\niDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4\nAwDoiDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA\n6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoiDgDAOjI8mkPAJtyxInnjWW7Jx15wFi2CwAPlpUzAICOiDMA\ngI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICO\niDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjiyf\n9Au21t6Y5IAkWyV5d5JLkpyVZDbJVUkOq6o7W2uHJHlFkvVJjquq8yc9KwDApE00zlprq5PskWTP\nJNskeV2SdyY5uqoubq29J8mBrbXPJjk8ydOSrEjymdbap6rq1vv7mkeceN5Cjf9jTjrygLFtGwBY\nmia9W3O/JFcmOTfJR5Ocn2TXDKtnSXJBkn2T7Jbksqq6tapuTHJNkl0mPCsAwMRNerfmTyd5VJLn\nJ/m5JOclWVZVs6P71yXZPsl2SW6c87wNt2/SDjtsk+XLt1jQgTdl1aqVE3utSVvM7y1Z/O8PgM3X\npONsbZJvVNVtSaq1dkuSnebcvzLJDUluGn19z9s36frrb17AUe/bmjXrJvp6k7SY31uy+N8fAP27\nt4WCSe/W/EyS57bWZlprD0+ybZJPj45FS5L9k1ya5PIke7XWVrTWtk/yuAwfFgAAWNQmunJWVee3\n1vbOEF/LkhyW5FtJzmitbZXk60nOqao7WmsnZwi1ZUneVFW3THJWAIBpmPipNKrq9zdy8z4bedwZ\nSc4Y/0QAAP1wEloAgI7MK85aa6ds5LazF34cAIClbZO7NVtr702yc5KntdYeP+euLTOPU1sAAHD/\n3NcxZ8cleXSSk5IcO+f29RkO3gcAYAFtMs6q6ttJvp3kSa217TKsls2M7n5IkuvGORwAwFIzr09r\nji5W/sYMJ5HdYDbDLk8AABbIfE+l8bIkj6mqNeMcBgBgqZvvqTS+E7swAQDGbr4rZ1cn+Uxr7aIk\nd52pv6reMpapAACWqPnG2b+NfiV3fyAAAIAFNq84q6pj7/tRAAA8WPP9tOadGT6dOdd3q2qnhR8J\nAGDpmu/K2V0fHGitbZnkBUmeMa6hAACWqvt94fOqur2qPpzk2WOYBwBgSZvvbs3fmvPtTJLHJ7lt\nLBMBACxh8/205rPmfD2b5PtJXrjw4wAALG3zPebs4NGxZm30nKuqav1YJwMAWILmdcxZa23XDCei\nPTvJ+5N8p7W2+zgHAwBYiua7W/PkJC+sqs8nSWvt6UlOSbLbuAYDAFiK5vtpzYdsCLMkqarPJVkx\nnpEAAJau+cbZda21Azd801p7QZK14xkJAGDpmu9uzZcnOb+19r4Mp9KYTbLH2KYCAFii5rtytn+S\nm5M8KsNpNdYkWT2mmQAAlqz5xtnLk+xZVT+sqq8k2TXJq8Y3FgDA0jTfONsyP3pFgNvy4xdCBwDg\nQZrvMWd/neTC1tpfjb7/1SR/M56RAACWrnmtnFXV6zOc66wl2TnJyVV1zDgHAwBYiua7cpaqOifJ\nOWOcBQBgyZvvMWcAAEyAOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoiDgD\nAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDo\niDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4\nAwDoiDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoyPJpvGhr7WFJ\nvpTkOUnWJzkryWySq5IcVlV3ttYOSfKK0f3HVdX505gVAGCSJr5y1lrbMslpSf5zdNM7kxxdVXsl\nmUlyYGttxySHJ9kzyX5J3tZa23rSswIATNo0dmu+Pcl7knx39P2uSS4ZfX1Bkn2T7Jbksqq6tapu\nTHJNkl0mPSgAwKRNdLdma+2gJGuq6hOttTeObp6pqtnR1+uSbJ9kuyQ3znnqhts3aYcdtsny5Vss\n4MSbtmrVyom91qQt5veWLP73B8Dma9LHnL00yWxrbd8kT07ygSQPm3P/yiQ3JLlp9PU9b9+k66+/\neeEmnYc1a9ZN9PUmaTG/t2Txvz8A+ndvCwUTjbOq2nvD1621i5McmuTE1trqqro4yf5JLkpyeZLj\nW2srkmyd5HEZPiwAALCoTeXTmvfw2iRntNa2SvL1JOdU1R2ttZOTXJrhuLg3VdUt0xwSAGASphZn\nVbV6zrf7bOT+M5KcMbGBAAA64CS0AAAd6WG3JixZR5x43li2e9KRB4xluwCMn5UzAICOiDMAgI6I\nMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMA\ngI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICO\niDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogz\nAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCA\njogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6I\nMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI4sn+SLtda2THJmkkcn\n2TrJcUm+luSsJLNJrkpyWFXd2Vo7JMkrkqxPclxVnT/JWQEApmHSK2cvTrK2qvZK8twkpyZ5Z5Kj\nR7fNJDmwtbZjksOT7JlkvyRva61tPeFZAQAmbqIrZ0k+nOSc0dczGVbFdk1yyei2C5L8tyR3JLms\nqm5Ncmtr7ZokuyT5wmTHBQCYrInGWVX9IElaayszRNrRSd5eVbOjh6xLsn2S7ZLcOOepG24HAFjU\nJr1yltbaTknOTfLuqvrz1tqfzLl7ZZIbktw0+vqet2/SDjtsk+XLt1jIcTdp1aqV9/2gzdRifm+J\n9wdAvyb9gYCfSfLJJK+sqk+Pbv5ya211VV2cZP8kFyW5PMnxrbUVGT448LgMHxbYpOuvv3ksc9+b\nNWvWTfT1Jmkxv7fE+wNg+u7tP9KTXjk7KskOSY5prR0zuu2IJCe31rZK8vUk51TVHa21k5NcmuFD\nC2+qqlsmPCsAwMRN+pizIzLE2D3ts5HHnpHkjLEPBQDQESehBQDoiDgDAOiIOAMA6Ig4AwDoiDgD\nAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDo\niDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoiDgDAOjI8mkPQJ+O\nPP/osW37xOcfN7ZtA8DmzsoZAEBHxBkAQEfEGQBAR8QZAEBHxBkAQEfEGQBAR8QZAEBHxBkAQEfE\nGQBAR8QZAEBHxBkAQEdcW/NBGNf1J117EgCWLnEGjMURJ543tm2fdOQBY9s2wLSJM5akca16JlY+\nAXhwHHMGANARcQYA0BFxBgDQEcecAZsdn5QGFjMrZwAAHRFnAAAdEWcAAB0RZwAAHRFnAAAdEWcA\nAB0RZwAAHRFnAAAdEWcAAB0RZwAAHRFnAAAdEWcAAB1x4XNYhFwYHGDzJc4AOiOuYWmzWxMAoCPi\nDACgI+IMAKAj4gwAoCPiDACgIz6tCfAAHHHieWPb9laPG9umgc2AlTMAgI6IMwCAjogzAICOOOYM\ngIkZ19UPEldAYPGwcgYA0BFxBgDQkW53a7bWliV5d5InJbk1ycuq6prpTgWwNIzrVCFOEwL3rds4\nS/KCJCuq6hmttacneUeSA6c8EwCLwPji8/KxbDdxTN1S0nOcPTPJx5Okqj7XWnvalOcBgM3C5haf\nvYTnuD6wcn/f38zs7OxYBnmwWmvvTfKRqrpg9P13kuxcVeunOxkAwPj0/IGAm5KsnPP9MmEGACx2\nPcfZZUmelySjY86unO44AADj1/MxZ+cmeU5r7R+SzCQ5eMrzAACMXbfHnAEALEU979YEAFhyxBkA\nQEd6PuasS0vlygWttd2TnFBVq6c9y0JqrW2Z5Mwkj06ydZLjqmo8JwSagtbaFknOSNKSzCY5tKqu\nmu5UC6u19rAkX0rynKr6xrTnWUittSsyfFI9Sb5VVYvqWNvW2huTHJBkqyTvrqr3TXmkBdNaOyjJ\nQaNvVyR5cpIdq+qGac20kEY/O8/O8LPzjiSHLJa/f621rZO8P8nOGf7+HVZVV09zJitn999dVy5I\n8oYMVy5YVFprv5/kvRl+wCw2L06ytqr2SvLcJKdOeZ6F9t+TpKr2THJ0kuOnO87CGv0DcVqS/5z2\nLAuttbYiyUxVrR79WmxhtjrJHkn2TLJPkp2mOtACq6qzNvzZZfjPw+GLJcxGnpdkeVXtkeQtWVw/\nWw5J8oOqenqSV6WDfxfE2f33I1cuSLIYr1zwz0l+ddpDjMmHkxwz+nomyaI6d15V/XWSl4++fVSS\nxfSPQ5K8Pcl7knx32oOMwZOSbNNa+2Rr7cLRKYQWk/0ynBLp3CQfTXL+dMcZj9HVbB5fVadPe5YF\n9s0ky0d7j7ZLcvuU51lIv5TkgiSpqkoy9SvAirP7b7skN875/o7W2qLaPVxVH8ni+ot3l6r6QVWt\na62tTHJOhtWlRaWq1rfWzk5ySpIPTXuehTLabbSmqj4x7VnG5OYM8blfkkOTfGiR/Wz56Qz/mf31\n3P3+ZqY70lgcleTYaQ8xBj/IsEvzGxkOnTh5qtMsrH9M8vzW2szoP0WPGB0iMjXi7P5z5YLNXGtt\npyQXJfk/VfXn055nHKrqt5P8YpIzWmvbTnueBfLSDOc+vDjD8TwfaK3tON2RFtQ3k3ywqmar6ptJ\n1ib52SnPtJDWJvlEVd02Wp24JcmqKc+0oFprP5mkVdVF055lDF6d4c/vFzOs8p492hW/GJyZ4d/2\nS5P8SpIvVdUd0xxInN1/rlywGWut/UySTyZ5fVWdOe15Flpr7SWjg66TYSXmztGvzV5V7V1V+4yO\n6fnHJL9VVf8+5bEW0kszOoa1tfbwDKv01051ooX1mSTPHa1OPDzJthmCbTHZO8mnpz3EmFyfu/ca\nXZdkyyRTXV1aQL+c5NNV9cwMh778y5Tn8WnNB8CVCzZvRyXZIckxrbUNx57tX1WL5QDz/5vk/a21\nv8/ww/P3FtF7W+zel+Ss1tpnMnzS9qWLaVW+qs5vre2d5PIMCwOHTXt1YgxaOviHfUzeleTM1tql\nGT5te1RV/XDKMy2Uq5P8UWvtTRmO0/2dKc/jCgEAAD2xWxMAoCPiDACgI+IMAKAj4gwAoCPiDACg\nI06lASyI1tp2Sd6W4bqJ6zOcF+m1VXXFA9jWy5Osq6q/WNgpp6O19pYkX6yq8yb4mt9OsnrDr6o6\naFKvDTw4Vs6AB210vb2/zXByyidX1ZMzXBz5gtbaTz2ATe6RZOsFHHGqquoPJhlmwObNyhmwEJ6V\n5OFJ3lxVdyZJVV3UWjs4yRattdVJ/nB0dv+01s5KcnGGk+b+RZINl2E6NsOVDQ5I8uzW2rUZrgbw\nviT/JcOK3FFV9fHW2h+ObntSkodluE7qs5PsnuSfkryoqmZba29I8hsZzmb+iSSvz3BR+I8n+X6S\nW6pq3w1vpLW2S5LTM/x8vCXJwVV1dWvtuRmCc8sk30pySFWtHa1QfT7DJaUuS/K1qnr7aFvnJPnz\n0fu5uKrOaq29OsO1Je9I8tGqev3oyhWnJdkpwxUd3lhVfzf3N7i19tDR78Njk9ya5DVVdWFr7ZVJ\nXpLhjPt3JnlhVX19Y39IrbW3J3nO6LX/pqoW4zUgYbNn5QxYCE9J8oUNYbZBVf1tVX1vE8/7lSTf\nrqpdk7w4yV6jKDkvyR+MLnJ+SpILq2qXJL+W4SzlPzN6/hMzxNiLM1wf74QkT0jy1CS7jIJq1wyX\nZ3lKkkck+Z+j57YkL54bZiOvTvKOqnra6LWf3lpbleSPk+xXVU/JEHknzHnOBVXVRo9/UZK01lZm\nWAH82IYHtdZ2S/K7SXZLskuSXVtruyY5KcmZo9+HA5KcNnr+XH+U5JqqelyGGDt+tCv5BRl2Wz4h\nyV+Ptv9jWmuPynA1jCeN5vqFRXRtRFhUrJwBC+HODJczu7/+IclbW2uPyBAxf7SRxzw7ySFJUlX/\n0lr7fIYgS5JPVdX61tq/Jrm2qr6WJK21f8twma59R4/90ujxP5HkOxmu8/i9qvr2Rl7vY0n+9yjs\nzk9yTpL9M6zSXdRaS4ZVuOvmPOfzo/m+3Fpb0Vr7+QwBdH5V3Tp6TjJce/GjVbXhGoX7jubdN8lj\nR8emJcPq3GMyrBpusE+S3xy9zpVJnjF67m8meVFr7ReTPPcez5nr35L8Z2vtstH7OrqqbrmXxwJT\nZOUMWAhfTPLU1tqPBFpr7a2ttWdluFbk3Pu2TJKqujrDbroPJdkryeX33EZ+/OfUTO7+j+Vtc27f\n2HUot0jyp1W14Ti43ZMcP7pvo9ccrapzMqy8XZ7k95K8Z7Sdz8zZzi9nWMXbYO62PpjkhaNfH7zH\n5m+f+01r7eGttZ8cbf/Zc7b/9CRX3sdzHztaDftskp9MckGSs3IvkTy6TufuSY5J8lNJPjsKOqAz\n4gxYCJcm+V6SN7fWtkiS1tp+SQ5O8rUMx3btPFpVemiGEMvoeKljq+rDGXbHPSzJ9hlCa0OAXZjR\nhYhbazsn2TNDkMzHhUle0lp7SGtteYbdfr+2qSe01v4yyW5VdVqGkHlqhpWxZ8yJmWOSnHgvm/hQ\nhjD7hdHvy1yXJtl/zjx/keRpozl/d/T6v5TkK0m2ucdz/z537zJ9bIZj5p6WYVfnu0Yz7p8h9Db2\nvp6S5JIkf19Vr8vw59I29lhgusQZ8KBV1WyGY6Uek+Sq1tpXMhx4/7yq+o+q+mqG3YVfTfLh3B0t\nH0jSWmtXZoiPP6yqG5L8XZKjWmu/luTwDB8OuDJDXL2sqq6d51wfTfKRDOFyVYZdfmffx9PeOnrt\nK5K8PcOB9/+e5KVJ/mo0x1OTvPZeXvP/ZYjRc0a/L3PvuyLJqRni8p8yhNLfJXlVhmPbvpLkL5O8\npKrW3WPTb85wnNg/ZQjAl2Q49m1Za+1rST6X5NtJfu5e5vry6HWvGr23b2dYbQM6MzM7O3vfjwIA\nYCKsnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB05P8DjaPNxE2cKnUAAAAA\nSUVORK5CYII=\n", + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "sns.countplot(x='Customer service calls', hue='Churn', data=df);\n", + "savefig('serv_calls__and_churn.png', dpi=300);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Может быть, по сводной табличке это не так хорошо видно (или скучно ползать взглядом по строчкам с цифрами), а вот картинка красноречиво свидетельствует о том, что доля оттока сильно возрастает начиная с 4 звонков в сервисный центр. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Добавим теперь в наш DataFrame бинарный признак — результат сравнения `Customer service calls > 3`. И еще раз посмотрим, как он связан с оттоком. " + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "metadata": { + "collapsed": false, + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Churn01All
Many_service_calls
027213453066
1129138267
All28504833333
\n", + "
" + ], + "text/plain": [ + "Churn 0 1 All\n", + "Many_service_calls \n", + "0 2721 345 3066\n", + "1 129 138 267\n", + "All 2850 483 3333" + ] + }, + "execution_count": 54, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['Many_service_calls'] = (df['Customer service calls'] > 3).astype('int')\n", + "\n", + "pd.crosstab(df['Many_service_calls'], df['Churn'], margins=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAmcAAAHfCAYAAAAVw3+UAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGz1JREFUeJzt3X+05XVd7/HXGQYYhRkcXUNksi4XrXeUqFcQrEDHH5W4\nXGB2u5pXQ00QK6FroQTDVbuDphOawJIkQjDqXhUuNVIolKhAISGakPYJump39fMwDMzIXGYYOfeP\nvYeOND+OM2ef/ZlzHo+1WOz92d/93e/DPzzX57t/TExNTQUAgD4sGvcAAAD8G3EGANARcQYA0BFx\nBgDQEXEGANARcQYA0JHF4x5gNk1ObvS9IADAXmHFiqUT21u3cwYA0BFxBgDQEXEGANARcQYA0BFx\nBgDQEXEGANARcQYA0BFxBgDQkXn1JbQAANvz1a/elUsu+VC2bt2ahx9+OD//82/KDTd8Kied9NN5\n+tOPHPd430GcAQDz2gMP3J/zz39v1qz5rTzxiU/Kffety2mnvSGHHXb4uEfbrompqfnzi0d+vgkA\neKw//uO1+dd//Ze8/vWnPLq2YcOGXHjh+7Np04PZsGFDFi3aJ+9+9/vyuc/dmHXr1uW1r31d7rjj\n9vzZn12f17zmdTnrrF/JAQcckJ/5mVflsssuyVOf+v355je/nuc//4V53eveuFtz+fkmAGBBuu++\ndTnkkO/9jrVly5YlSZ797Ofkwgs/nEMPPTR33HH7Ds+xadODueiiS/KCF7w4//RP/5i3vvXt+fCH\nL8/atdfM+rwuawIA89qKFQdncvJfv2Ptjjtuz7p161L1g0mS5cufmM2bN+/wHE95yqFZtGjRo+fb\nFndLliyZ9XntnAEA89qP/uhxufHGP8v69fclSe69dzLvfe/qLFo0keQ7ryzut99+uffeQcjdfXd7\ndH1iYtG029u9Gjlr7JwBAPPasmUH5fTT35pVq96eiYmJbNmyJb/6q2fl+us/9e+OPeaY5+bqqz+e\nX/qlU8f2gQEfCAAAGAMfCAAA2AuIMwCAjogzAICOiDMAgI74tOYeOGPN2nGPwG764JknjnsEANgu\nO2cAAB2xcwYAzBuzfVVrV1daHnnkkZx//m/knnvuzr777puzzjo3T3nKoXv0mnbOAAB20003fTZb\ntmzJhz/8kZx22lty0UUf2ONzijMAgN30la98Occe+yNJkqc//cj8zd98bY/PKc4AAHbTgw8+mAMO\nOPDR+4sWLcrWrVv36JziDABgNx1wwAHZtGnTo/enpqayePGevaVfnAEA7KYjj3xmbr31liTJXXfd\nmcMPf9oen9OnNQEAdtPznveC/OVffiGnnfaGTE1N5eyz37HH5xRnAMC8MddfMr5o0aKceebZs3vO\nWT0bAAB7RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdMRXaQAA88aZ166a1fOtednqGR331399Vy6+\n+IJcdNEle/ya4gwAYA/8/u9fkU9/+k+yZMnjZuV8LmsCAOyB7/u+p+S889bM2vnEGQDAHli58kV7\n/GPn04kzAICOiDMAgI6IMwCAjvi0JgAwb8z0qy9m2/d+75NzySWXz8q57JwBAHREnAEAdEScAQB0\nRJwBAHREnAEAdEScAQB0ZCRfpVFV+ya5LMlhSfZPsjrJ/01ybZK7h4dd3Fr7WFWdkuRNSbYmWd1a\nu7aqHpfkyiQHJ9mY5OTW2uQoZgUA6MmovufsNUnWtdZeW1VPTPLlJL+e5P2ttfO3HVRVhyQ5PcnR\nSZYkubmqbkjy5iR3ttbeWVWvSrIqyRkjmhUAoBujirNPJLlqeHsig12xo5JUVZ2Uwe7ZLyc5Jskt\nrbXNSTZX1T1JnpHkuCTvGz7/uiTnjmhOAICujCTOWmvfSpKqWppBpK3K4PLmpa21L1bVOUnekcGO\n2gPTnroxyUFJlk1b37a2S8uXPz6LF+8zK38D89uKFUvHPQIAbNfIfr6pqg5Nck2SD7XW/qCqntBa\nu3/48DVJLkzy+STT/y+5NMn9STZMW9+2tkvr12+ajdFZACYnN457BAAWuB1tFIzk05pV9T1Jrk/y\n9tbaZcPlT1fVMcPbL0ryxSS3JTm+qpZU1UFJjkhyV5Jbkrx0eOwJSW4axZwAAL0Z1c7Z2UmWJzm3\nqra9X+ytST5QVQ8n+eckp7bWNlTVBRnE16Ik57TWHqqqi5NcUVU3J9mS5NUjmhMAoCsTU1NT455h\n1kxObpzTP+aMNWvn8uWYRR8888RxjwDAArdixdKJ7a37EloAgI6IMwCAjogzAICOiDMAgI6IMwCA\njogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6I\nMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMA\ngI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICO\niDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogz\nAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCA\njogzAICOiDMAgI6IMwCAjogzAICOiDMAgI6IMwCAjiwexUmrat8klyU5LMn+SVYn+WqSy5NMJbkr\nyS+21h6pqlOSvCnJ1iSrW2vXVtXjklyZ5OAkG5Oc3FqbHMWsAAA9GdXO2WuSrGutHZ/kJUkuSvL+\nJKuGaxNJTqqqQ5KcnuTHkvxkkvdU1f5J3pzkzuGxH02yakRzAgB0ZVRx9okk5w5vT2SwK3ZUks8N\n165L8uIkxyS5pbW2ubX2QJJ7kjwjyXFJPvWYYwEA5r2RXNZsrX0rSapqaZKrMtj5+s3W2tTwkI1J\nDkqyLMkD0566vfVta7u0fPnjs3jxPns8P/PfihVLxz0CAGzXSOIsSarq0CTXJPlQa+0Pqup90x5e\nmuT+JBuGt3e2vm1tl9av37SnY7NATE5uHPcIACxwO9ooGMllzar6niTXJ3l7a+2y4fKXqmrl8PYJ\nSW5KcluS46tqSVUdlOSIDD4scEuSlz7mWACAeW9UO2dnJ1me5Nyq2vbeszOSXFBV+yX5WpKrWmvf\nrqoLMoivRUnOaa09VFUXJ7miqm5OsiXJq0c0JwBAVyampqZ2fdReYnJy45z+MWesWTuXL8cs+uCZ\nJ457BAAWuBUrlk5sb92X0AIAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwB\nAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0\nRJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdESc\nAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEA\ndEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHRE\nnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwB\nAHREnAEAdEScAQB0ZPEoT15VxyZ5b2ttZVX9pyTXJrl7+PDFrbWPVdUpSd6UZGuS1a21a6vqcUmu\nTHJwko1JTm6tTY5yVgCAHowszqrqbUlem+TB4dJRSd7fWjt/2jGHJDk9ydFJliS5uapuSPLmJHe2\n1t5ZVa9KsirJGaOaFQCgF6PcOfu7JK9I8nvD+0clqao6KYPds19OckySW1prm5Nsrqp7kjwjyXFJ\n3jd83nVJzh3hnAAA3ZhRnFXVha21tzxm7YrW2sk7ek5r7eqqOmza0m1JLm2tfbGqzknyjiRfTvLA\ntGM2JjkoybJp69vWdmn58sdn8eJ9ZnIoC9yKFUvHPQIAbNdO46yqLk1yeJKjq+qHpz20b2YYTNNc\n01q7f9vtJBcm+XyS6f+XXJrk/iQbpq1vW9ul9es3fZcjsVBNTm4c9wgALHA72ijY1c7Z6iSHJflg\nkndNW9+a5Gvf5Qyfrqq3tNZuS/KiJF/MYDftvKpakmT/JEckuSvJLUleOnz8hCQ3fZevBQCwV9pp\nnLXWvpHkG0meWVXLMtgtmxg+fGCS+76L13pzkgur6uEk/5zk1Nbahqq6IIP4WpTknNbaQ1V1cZIr\nqurmJFuSvPq7eB0AgL3WxNTU1C4PqqpfS/JrSdZNW55qrR0+qsF2x+Tkxl3/MbPojDVr5/LlmEUf\nPPPEcY8AwAK3YsXSie2tz/TTmm9M8lTfNQYAMFoz/YWAv893dwkTAIDdMNOds7sz+ILYG5M8tG2x\ntfbrI5kKAGCBmmmc/cPwn+TfPhAAAMAsm1GctdbeteujAADYUzP9hYBHkjz2k5D/2Fo7dPZHAgBY\nuGa6c/boBweqat8kL0/yI6MaCgBgoZrppzUf1Vp7uLX2iSQvHME8AAAL2kwva/7ctLsTSX44g2/u\nBwBgFs3005ovmHZ7Ksm9SV45++MAACxsM33P2euH7zWr4XPuaq1tHelkAAAL0Izec1ZVR2XwRbRX\nJPlIkr+vqmNHORgAwEI008uaFyR5ZWvtC0lSVc9NcmGSY0Y1GADAQjTTT2seuC3MkqS1dmuSJaMZ\nCQBg4ZppnN1XVSdtu1NVL0+ybjQjAQAsXDO9rHlqkmur6ncz+CqNqSQ/OrKpAAAWqJnunJ2QZFOS\n/5DB12pMJlk5opkAABasmcbZqUl+rLX2YGvtK0mOSvKW0Y0FALAwzTTO9s13/iLAlvz7H0IHAGAP\nzfQ9Z3+Y5DNV9fHh/Vck+aPRjAQAsHDNaOestfb2DL7rrJIcnuSC1tq5oxwMAGAhmunOWVprVyW5\naoSzAAAseDN9zxkAAHNAnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwB\nAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0\nRJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdESc\nAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEA\ndEScAQB0RJwBAHRk8ShPXlXHJnlva21lVT0tyeVJppLcleQXW2uPVNUpSd6UZGuS1a21a6vqcUmu\nTHJwko1JTm6tTY5yVgCAHoxs56yq3pbk0iRLhkvvT7KqtXZ8kokkJ1XVIUlOT/JjSX4yyXuqav8k\nb05y5/DYjyZZNao5AQB6MsrLmn+X5BXT7h+V5HPD29cleXGSY5Lc0lrb3Fp7IMk9SZ6R5Lgkn3rM\nsQAA897I4qy1dnWSh6ctTbTWpoa3NyY5KMmyJA9MO2Z769vWAADmvZG+5+wxHpl2e2mS+5NsGN7e\n2fq2tV1avvzxWbx4nz2flHlvxYqluz4IAMZgLuPsS1W1srX22SQnJLkxyW1JzquqJUn2T3JEBh8W\nuCXJS4ePn5Dkppm8wPr1m0YwNvPR5OTGcY8AwAK3o42CufwqjV9J8q6q+osk+yW5qrX2z0kuyCC+\nPpPknNbaQ0kuTvLDVXVzklOTvGsO5wQAGJuJqampXR+1l5ic3Dinf8wZa9bO5csxiz545onjHgGA\nBW7FiqUT21v3JbQAAB0RZwAAHRFnAAAdEWcAAB0RZwAAHRFnAAAdEWcAAB0RZwAAHRFnAAAdEWcA\nAB0RZwAAHRFnAAAdEWcAAB0RZwAAHRFnAAAdEWcAAB0RZwAAHRFnAAAdEWcAAB0RZwAAHRFnAAAd\nEWcAAB0RZwAAHRFnAAAdEWcAAB0RZwAAHRFnAAAdEWcAAB0RZwAAHRFnAAAdEWcAAB0RZwAAHRFn\nAAAdEWcAAB0RZwAAHRFnAAAdEWcAAB0RZwAAHRFnAAAdEWcAAB0RZwAAHRFnAAAdEWcAAB0RZwAA\nHRFnAAAdEWcAAB0RZwAAHRFnAAAdEWcAAB0RZwAAHRFnAAAdEWcAAB0RZwAAHRFnAAAdEWcAAB0R\nZwAAHRFnAAAdEWcAAB0RZwAAHRFnAAAdEWcAAB0RZwAAHRFnAAAdEWcAAB0RZwAAHRFnAAAdEWcA\nAB0RZwAAHVk81y9YVXck2TC8+/Uk5yW5PMlUkruS/GJr7ZGqOiXJm5JsTbK6tXbtXM8KADDX5jTO\nqmpJkonW2sppa2uTrGqtfbaqfjvJSVX1F0lOT3J0kiVJbq6qG1prm+dyXgCAuTbXO2fPTPL4qrp+\n+NpnJzkqyeeGj1+X5CeSfDvJLcMY21xV9yR5RpK/nON5AQDm1FzH2aYkv5nk0iTfn0GMTbTWpoaP\nb0xyUJJlSR6Y9rxt6wAA89pcx9nfJrlnGGN/W1XrMtg522ZpkvszeE/a0u2s79Ty5Y/P4sX7zOK4\nzFcrVizd9UEAMAZzHWdvSHJkkl+oqidnsEN2fVWtbK19NskJSW5McluS84bvUds/yREZfFhgp9av\n3zSquZlnJic3jnsEABa4HW0UzHWc/W6Sy6vq5gw+nfmGJPcm+Z2q2i/J15Jc1Vr7dlVdkOSmDL7u\n45zW2kNzPCsAwJyb0zhrrW1J8urtPPT87Rz7O0l+Z+RDAQB0xJfQAgB0RJwBAHREnAEAdEScAQB0\nRJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdEScAQB0RJwBAHREnAEAdESc\nAQB0RJwBAHREnAEAdEScAQB0ZPG4B4BxOPPaVeMegT2w5mWrxz0CwMjYOQMA6Ig4AwDoiDgDAOiI\nOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoiDgD\nAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDo\niDgDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6MjicQ8AAPPFmdeuGvcI7IE1L1s97hGSiDOA7pyxZu24\nR2A37XfEuCdgPnBZEwCgI+IMAKAj4gwAoCPiDACgI+IMAKAj4gwAoCPiDACgI+IMAKAj4gwAoCPi\nDACgI+IMAKAj4gwAoCPiDACgI+IMAKAj4gwAoCPiDACgI4vHPcCOVNWiJB9K8swkm5O8sbV2z3in\nAgAYrZ53zl6eZElr7UeSnJXk/DHPAwAwcj3H2XFJPpUkrbVbkxw93nEAAEZvYmpqatwzbFdVXZrk\n6tbadcP7f5/k8Nba1vFOBgAwOj3vnG1IsnTa/UXCDACY73qOs1uSvDRJquq5Se4c7zgAAKPX7ac1\nk1yT5Mer6s+TTCR5/ZjnAQAYuW7fcwYAsBD1fFkTAGDBEWcAAB3p+T1nMOv88gQwSlV1bJL3ttZW\njnsW9l52zlho/PIEMBJV9bYklyZZMu5Z2LuJMxYavzwBjMrfJXnFuIdg7yfOWGiWJXlg2v1vV5XL\n+8Aea61dneThcc/B3k+csdD45QkAuibOWGj88gQAXXM5h4XGL08A0DW/EAAA0BGXNQEAOiLOAAA6\nIs4AADoizgAAOiLOAAA6Is6AOVFVh1XVVFV9+DHrzxquv25Mo+22qjqxqn59jK//zqp65/C2j97D\nPOF7zoC5tC7JS6pqn9bat4drr0wyOcaZdltrbW2SteOeA5hfxBkwl76V5MtJnpfkxuHaTyT50ySp\nql9K8tokByR5JMkrW2tfq6pvJPm9JD85fOznMviN1M8kOay19khVPT/JWa21E7b3wlW1LMn/THLI\ncOldrbW1VfW0JBcneVKSTUne0lr7UlVdPlx7WpKzkpzSWnvZtDl/IMkdSVa21l5XVS9Ocn4GVyS+\nmeTVSR5MsibJyiT7JLm8tfaBnf0Hqqr/luS0JN9O8snW2tur6ulJLkxyYJKDk5zfWrtgB89/UZL3\nJZlKsj7Jz7bW7t3ZawJ9cVkTmGsfT/Kfk6SqnpPkK0m2ZPCj9C/PIHaenuQPk/zCtOeta60dk+S3\nk5zdWrsnydczCJ8kOTnJ5Tt53Z9K8o3W2lFJXpPk+OH6FUne1lp7dpJTk/yvx7zmEUn+JMmzq2r5\ncP1nk1y57aCq2j/J7yc5ubV25PBvOjnJKUkyPPcxSU6qquOzA1V1zPBvPibJM5IcVVVHJXljktWt\nteckeUGS83byd65Kclpr7egkn0zy7J0cC3TIzhkw1z6ZZHVVLcrgkubHkrwqgx+lf3WSV1XVDyR5\nSQa7bNt8avjvu5K8Ynj7siSvrapbk7woyZt38rp/nuTdVfV9Sf44yf+oqgOTPCfJR6pq23EHVtWT\nhre/kCSttYer6n8n+emquiHJk1prt1XVDw2POzLJP7TWvjw8/uwkqaqrkjyrql647dzDY2/awYzP\ny2C37IHh/RcPz/PlDC4H/1oG0XbgTv7OtUmuqao/TPJHrbUbdnIs0CE7Z8Ccaq1tTPJXSY5L8sIM\nL2kmOTTJXyR5QpLrMtgFm5j21IeG/56atv6JJD+ewU7cn7TWNu/kde9O8oMZ7HAdn+S2DC41PtRa\ne9a2f5Icm+S+4dP+37RTXJlBTP6XJH/wmNM/PP1OVR1UVU8Znv9t08793CQf2dGM2znPk6vqCRns\nNv5Ukq8mOXsnz8/wsunKJPckeV9VnbOz44H+iDNgHD6e5DeS3N5a2zpcezDJPcO4+EKSEzKImx1q\nrW3KIOTenZ1f0tz2PrF3tdY+kcGlw4MziLy7q+o1w2N+PMnnd/BatyZ5cgbvibvysQ8nWTFtJ+1t\nGbxv7DNJTqmqfYe7dDdnEH87clOSE6rqwKpanMF75I7OIED/e2vtj5I8fzjrdv/bVNUXkixtrf1W\nkg/EZU3Y64gzYBw+meRZGVzS3GZLkkVV9dUktyb5RpL/OINzfSzJhtbaF3Zx3EeTVFXdmUGAvbO1\ndn+S/5rkjVX1lSTvyeBDCDv6WoqPJflWa+3/TF9srT2UwfvYPjo8zw9lEJ+/neTuJF9KcnuSj7TW\nPrujAVtrdyS5KIMdxL9K8vnW2p8meWeSm6vqjgw+FPGN7Pi/zdlJLq+qL2bwHrp37Oj1gD5NTE35\nahxg7zTcPXp3kn9prb1/3PMAzAYfCAD2ZrcnuTfJiUlSVU9NcvUOjn1ja+32uRpsR/aGGYHxsnMG\nANAR7zkDAOiIOAMA6Ig4AwDoiDgDAOiIOAMA6Ig4AwDoyP8HEzCLJkZAyG8AAAAASUVORK5CYII=\n", + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "sns.countplot(x='Many_service_calls', hue='Churn', data=df);\n", + "savefig('many_serv_calls__and_churn.png', dpi=300);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Объединим рассмотренные выше условия и построим сводную табличку для этого объединения и оттока." + ] + }, + { + "cell_type": "code", + "execution_count": 80, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Churn01
row_0
False2841464
True919
\n", + "
" + ], + "text/plain": [ + "Churn 0 1\n", + "row_0 \n", + "False 2841 464\n", + "True 9 19" + ] + }, + "execution_count": 80, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pd.crosstab(df['Many_service_calls'] & df['International plan'] , df['Churn'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Значит, прогнозируя лояльность клиента в случае, когда число звонков в сервисный центр меньше 4 и не подключени роуминг (и отток – в противном случае), можно ожидать процент \"угадывания лояльности клиента\" около 85.8% (ошибаемся всего 464 + 9 раз). Эти 85.8%, которые мы получили с помощью очень простых рассуждений – это неплохая отправная точка (*baseline*) для дальнейших моделей машинного обучения, которые мы будем строить. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "В целом до появления машинного обучения процесс анализа данных выглядел примерно так. Прорезюмируем:\n", + " \n", + "- Доля лояльных клиентов в выборке – 85.5%. Самая наивная модель, ответ которой \"Клиент всегда лоялен\" на подобных данных будет угадывать примерно в 85.5% случаев. То есть доли правильных ответов (*accuracy*) последующих моделей должны быть как минимум не меньше, а лучше, значительно выше этой цифры;\n", + "- С помощью простого прогноза , который условно можно выразить такой формулой: \"International plan = True & Customer Service calls < 4 => Churn = 0, else Churn = 1\", можно ожидать долю угадываний 85.8%, что еще чуть выше 85.5%\n", + "- Эти два бейзлайна мы получили без всякого машинного обучения, и они служат отправной точной для наших последующих моделей. Если окажется, что мы громадными усилиями увеличиваем долю правильных ответов всего, скажем, на 0.5%, то возможно, мы что-то делаем не так, и достаточно ограничиться простой моделью из двух условий. \n", + "- Перед обучением сложных моделей рекомендуется немного покрутить данные и проверить простые предположения. Более того, в бизнес-приложениях машинного обучения чаще всего начинают именно с простых решений, а потом экспериментируют с его усложнением. " + ] + } + ], + "metadata": { + "anaconda-cloud": {}, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.0" + }, + "name": "seminar02_part2_pandas.ipynb" + }, + "nbformat": 4, + "nbformat_minor": 0 +}