本書結合作者多年工程和實踐經驗,從Python 基礎編程語法入手,系統介紹了基于Python 語言進行數據處理、分析與可視化展示所需的各項知識和技術。讀者無須特別的數學或統計方面的理論知識,只需理解數據分析的思路,就可以參考示例學會針對實際問題進行有效數據分析的步驟和方法。 本書分4 篇共20 章,主要內容涉及Python 基本語法、程序控制結構、函數、面向對象基礎、文件操作、標準庫、正則表達式、numpy 庫、pandas 庫、數據預處理、matplotlib 可視化圖表、seaborn可視化圖表、pyecharts 可視化圖表、SciPy 科學計算、共享自行車案例及在線銷售案例。
唐藝,助理研究員,長期從事計算機應用、信息系統的教學與科研工作,作為課題負責人申報省部級課題10余項。
第一篇 Python 基礎
第1 章 Python 概述 ······················.2
1.1 Python 簡介 ······························.2
1.1.1 Python 的起源 ··················.2
1.1.2 Python 的發展 ··················.2
1.2 Python 解釋器 ···························.3
1.2.1 安裝Python 解釋器 ···········.3
1.2.2 交互運行模式 ··················.4
1.2.3 命令行運行模式 ···············.4
1.3 集成開發環境PyCharm ················.5
1.3.1 安裝PyCharm ··················.5
1.3.2 創建項目 ························.7
1.3.3 創建并運行Python 文件 ·····.8
第2 章 Python 編程基礎 ················10
2.1 常量和變量 ·······························10
2.1.1 常量和變量的定義 ············10
2.1.2 變量命名規則 ··················10
2.2 簡單數據類型 ···························.11
2.2.1 數值類型 ·······················.11
2.2.2 字符串 ···························12
2.2.3 None ·····························13
2.2.4 布爾類型 ························14
2.2.5 數據類型轉換 ··················14
2.3 算術運算 ··································15
2.4 賦值運算符 ·······························16
2.5 字符串相關運算 ·························17
2.5.1 字符串連接運算 ···············17
2.5.2 字符串截取 ·····················18
2.6 輸出 ········································18
2.6.1 print 函數的基本用法 ········19
2.6.2 print 函數格式化輸出 ········20
2.7 輸入 ········································23
2.8 程序注釋 ··································23
第3 章 程序控制結構 ····················25
3.1 選擇結構 ··································25
3.1.1 條件表達式 ·····················25
3.1.2 單分支結構if 語句············27
3.1.3 二分支結構if-else 語句 ······27
3.1.4 多分支結構if-elif-else語句 ······························29
3.2 循環結構 ··································31
3.2.1 for 語句實現遍歷循環 ········31
3.2.2 while 語句實現條件循環 ····33
3.2.3 循環結構中的else 語句 ······35
3.2.4 break 語句和continue語句 ······························36
第4 章 組合數據類型 ····················39
4.1 列表 ········································39
4.1.1 列表的表示與訪問列表元素 ······························39
4.1.2 遍歷列表 ························40
4.1.3 添加列表元素 ··················42
4.1.4 刪除列表元素 ··················44
4.1.5 列表排序 ························45
4.2 元組 ········································46
4.3 字典 ········································47
4.3.1 創建字典 ························48
4.3.2 添加和刪除鍵值對 ············49
4.3.4 遍歷字典 ························49
4.3.5 字典嵌套 ························50
第5 章 函數 ································52
5.1 函數的定義和調用 ······················52
5.2 函數參數傳遞 ····························54
5.3 列表作為函數參數 ······················57
5.3.1 簡單數據類型參數傳遞值 ···························57
5.3.2 組合數據類型參數公用存儲空間 ··················57
5.3.3 組合數據類型的數據作為函數參數的應用 ·········58
5.4 模塊 ········································59
5.4.1 創建模塊 ························59
5.4.2 導入模塊 ························60
第6 章 類和對象 ··························63
6.1 類和對象的概念 ·························63
6.2 定義只具有方法的類和對象 ··········64
6.2.1 定義類 ···························64
6.2.2 實例化對象 ·····················65
6.3 對象初始化方法及屬性 ················66
6.3.1 對象初始化方法_ _init_ _() ······················66
6.3.2 定義類的屬性 ··················66
6.3.3 訪問對象屬性 ··················67
6.3.4 輸出對象的描述信息 ·········68
6.3.5 封裝性 ···························69
6.4 類和對象應用實例 ······················69
6.5 類的繼承 ··································70
6.5.1 繼承的定義 ·····················70
6.5.2 _init_ _() 方法的繼承 ·······72
6.5.3 重寫父類方法 ··················73
第7 章 文件操作 ··························74
7.1 基本操作 ··································74
7.2 打開文件 ··································75
7.2.1 文件指針 ························75
7.2.2 打開方式 ························76
7.3 讀取文件 ··································77
7.4 寫入文件 ··································78
7.4.1 使用write() 方法向文件中寫入內容 ························78
7.4.2 使用write() 方法向文件中追加內容 ························79
7.5 讀寫CSV 文件 ··························80
7.5.1 讀取數據 ························80
7.5.2 寫入數據 ························81
第8 章 常用Python 標準庫 ············83
8.1 datetime 模塊 ·····························83
8.1.1 date 類 ···························83
8.1.2 time 類 ···························86
8.1.3 datetime 類 ······················86
8.1.4 timedelta 類 ·····················87
8.1.5 時間轉化 ························88
8.1.6 設置日期時間格式 ············88
8.2 math 模塊 ·································89
8.3 random 模塊 ······························90
8.4 os 模塊 ····································92
第二篇 數 據 分 析
第9 章 正則表達式 ·······················98
9.1 正則表達式中的元字符 ················98
9.1.1 主要元字符 ·····················98
9.1.2 對字符進行轉義 ···············99
9.1.3 標記開始與結束 ···············99
9.2 匹配一組字符 ·························.100
9.2.1 定義一組字符 ···············.100
9.2.2 對一組字符取反 ············.100
9.2.3 使用區間簡化一組字符的定義 ···························.100
9.3 使用量詞進行多次匹配 ·············.101
9.3.1 常用量詞 ·····················.101
9.3.2 貪婪和非貪婪匹配 ·········.101
9.3.3 分組 ···························.102
9.4 使用re 模塊處理正則表達式 ······.102
9.4.1 Python 正則表達式的語法 ···························.102
9.4.2 匹配字符串 ··················.102
9.4.3 替換字符串 ··················.106
9.4.4 分割字符串 ··················.107
第10 章 使用numpy 進行數值計算 ·····························108
10.1 使用numpy 生成數組 ·············.108
10.1.1 常用數組生成函數 ·······.108
10.1.2 ndarray 對象屬性 ·········.109
10.1.3 數組變換 ·····················110
10.1.4 numpy 的隨機數函數 ······112
10.2 數組的索引和切片 ···················112
10.2.1 數組的索引 ··················112
10.2.2 數組的切片 ··················113
10.3 數組的運算 ····························114
10.3.1 數組和標量間的運算 ······114
10.3.2 通用函數 ·····················114
10.3.3 統計運算 ·····················115
10.4 數組的存儲與讀取 ···················116
10.4.1 數組的存儲 ··················116
10.4.2 數組的讀取 ··················116
第11 章 pandas 數據分析模塊 ·····.118
11.1 pandas 數據結構 ······················118
11.1.1 創建Series 數據 ·············118
11.1.2 創建DataFrame 數據 ·····.120
11.2 添加、修改和刪除數據 ···········.121
11.2.1 添加數據 ···················.122
11.2.2 修改數據 ···················.123
11.2.3 刪除數據 ···················.124
11.3 索引操作 ·····························.126
11.3.1 重設索引 ···················.126
11.3.2 將已有列設置為索引 ····.126
11.3.3 重新命名索引 ·············.127
11.3.4 層次化索引 ················.128
11.4 選取數據 ·····························.130
11.4.1 Series 數據的選取 ········.130
11.4.2 DataFrame 數據的選取 ·························.131
11.5 數據運算 ·····························.133
11.5.1 算術運算 ···················.133
11.5.2 函數應用和映射 ··········.134
11.5.3 匯總與統計 ················.135
11.5.4 唯一值和值計數 ··········.138
第12 章 使用pandas 獲取和寫入數據 ·····························140
12.1 文本數據的讀取與存儲 ···········.140
12.1.1 CSV 文件的讀取 ··········.140
12.1.2 TXT 文件的讀取 ··········.142
12.1.3 文本數據的存儲 ··········.143
12.2 Excel 與JSON 數據 ················.143
12.2.1 Excel 數據 ··················.143
12.2.2 JSON 數據 ·················.144
12.3 數據庫的讀取與寫入 ··············.145
12.3.1 SQLAlchemy 包的安裝和數據庫的鏈接 ·············.145
12.3.2 SQLite 數據庫寫入和讀取數據 ·························.145
第13 章 數據預處理 ····················147
13.1 數據清洗 ·····························.147
13.1.1 處理缺失值 ················.147
13.1.2 刪除重復數據 ·············.150
13.1.3 替換值 ······················.151
13.1.4 利用函數或映射進行數據轉換 ···················.152
13.2 對數據進行排序和排名 ···········.153
13.2.1 數據排序 ···················.153
13.2.2 數據排名 ···················.155
13.3 數據合并和重塑 ····················.156
13.3.1 數據合并 ···················.156
13.3.2 數據連接 ···················.157
13.3.3 數據轉置 ···················.159
13.4 字符串處理 ··························.159
13.4.1 字符串方法 ················.159
13.4.2 使用正則表達式 ··········.160
14.1 數據分組 ·····························.161
14.1.1 認識GroupBy ·············.161
第14 章 數據的分組與聚合 ···········161
14.1.2 按照列名進行分組 ·······.162
14.1.3 按照Series 數據進行
分組 ·························.163
14.2 數據聚合 ·····························.164
14.2.1 聚合函數 ···················.164
14.2.2 使用aggregate() 方法
進行數據聚合 ·············.165
14.3 長表變寬表 ··························.166
14.3.1 什么是長表和寬表 ·······.166
14.3.2 使用pivot 函數將長表變為寬表 ···················.167
14.3.3 使用pivot_table 函數進行數據透視分析 ·······.167
第三篇 數據可視化
第15 章 使用matplotlib 可視化數據 170
15.1 創建圖表的基本方法 ··············.170
15.1.1 圖表的基本組成元素 ····.170
15.1.2 建立畫布和坐標系 ·······.171
15.1.3 設置坐標軸 ················.175
15.1.4 設置網格線 ················.177
15.1.5 設置圖例 ···················.178
15.1.6 設置圖表標題 ·············.179
15.1.7 設置數據標簽 ·············.180
15.1.8 設置數據表 ················.181
15.1.9 繪制常用幾何圖形 ·······.182
15.2 常用圖表的創建 ····················.186
15.2.1 折線圖 ······················.186
15.2.2 柱形圖 ······················.188
15.2.3 餅圖和圓環圖 ·············.191
15.2.4 散點圖和氣泡圖 ··········.191
15.2.5 直方圖 ······················.193
15.2.6 箱形圖 ······················.194
15.2.7 等高線圖 ···················.196
15.2.8 階梯圖 ······················.196
第16 章 使用seaborn 可視化數據 ·····························198
16.1 seaborn 的樣式 ······················.198
16.1.1 基本樣式 ···················.198
16.1.2 自定義樣式 ················.199
16.2 繪制分布圖 ··························.200
16.2.1 單變量分布圖 ·············.200
16.2.2 多變量分布圖 ·············.202
16.3 繪制分類圖 ··························.204
16.3.1 分類散點圖 ················.204
16.3.2 箱形圖與琴形圖 ··········.204
16.3.3 回歸圖 ······················.205
第17 章 使用pyecharts 動態可視化數據 ····················207
17.1 pyecharts 的版本與特點 ···········.207
17.2 pyechats 可視化的流程及選項設置 ···································.207
17.2.1 pyecharts 可視化的一般流程 ·························.207
17.2.2 pyecharts 選項設置 ·······.209
17.2.3 pyecharts 常用的圖表設置方法 ·····················211
17.3 使用pyecharts 創建圖表 ··········.214
17.3.1 餅圖和圓環圖 ·············.214
17.3.2 折線圖和面積圖 ··········.216
17.3.3 散點圖和氣泡圖 ··········.218
17.3.4 直方圖和箱形圖 ··········.219
17.3.5 詞云圖 ······················.221
17.3.6 數據地圖 ···················.222
17.3.7 雷達圖 ······················.224
17.3.8 儀表盤和水球圖 ··········.225
第18 章 使用SciPy 進行科學計算和統計分析 ··············227
18.1 使用SciPy 進行科學計算 ·········.227
18.1.1 獲取基本科學常量 ·······.227
18.1.2 線性代數和微積分 ·······.228
18.1.3 插值與擬合 ················.229
18.2 使用SciPy 進行統計分析 ·········.230
18.2.1 正態分布有關計算 ·······.230
18.2.2 通過樣本推斷總體參數 ·························.231
18.2.3 檢驗均值 ···················.232
18.2.4 檢驗均值差 ················.233
18.2.5 卡方檢驗 ···················.234
18.2.6 回歸分析 ···················.235
第四篇 實 例 應 用
第19 章 共享自行車大數據分析 ·····239
19.1 數據預處理 ··························.239
19.1.1 讀取數據 ···················.239
19.1.2 數據清洗與轉換 ··········.240
19.2 探索數據規律 ·······················.241
19.2.1 年份數據比較 ·············.241
19.2.2 月份趨勢比較 ·············.241
19.2.3 每日高峰時段分析 ·······.243
19.2.4 不同季度差異分析 ·······.244
19.2.5 周末和工作日差異分析 ·························.245
第20 章 在線銷售數據分析與建模 ··246
20.1 獲取和清洗數據 ····················.246
20.1.1 獲取數據 ···················.246
20.1.2 了解數據的基本特征 ····.247
20.1.3 清洗與整理數據 ··········.248
20.2 分析與可視化銷售數據 ···········.249
20.2.1 查看銷量的描述統計結果 ·························.249
20.2.2 按產品對銷量進行匯總 ·························.249
20.2.3 按城市匯總產品 ··········.250
20.2.4 對產品和城市進行交叉分析 ·························.251
20.3 銷量趨勢分析 ·······················.251
20.3.1 日期格式轉換 ·············.252
20.3.2 時間和季節趨勢分析 ····.252
20.3.3 比較不同城市季節趨勢的差異 ·························.253