1 Star 1 Fork 0

Shea / pyspark-algorithms

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
Apache-2.0

Source Code for PySpark Algorithms Book

Unlock the Power of Big Data by PySpark Algorithms book

Buy PySpark Algorithms Book → PDF Version (.pdf)

Buy PySpark Algorithms Book → Kindle Version (.kpf)


PySpark Algorithms Book:

Author: Mahmoud Parsian (mahmoud.parsian@yahoo.com)

Purchase PySpark Algorithms Book from amazon.com

Publication date: August 2019


About PySpark Algorithms Book

  • This book is about PySpark (Python API for Spark)
  • Introductory book on how to solve data problems using PySpark
  • Learn how to use mappers, filters, and reducers
  • Learn how to partition data for fast queries
  • Learn how to use the mapPartitions() transformation
  • Learn how to use reduceByKey(), groupByKey(), and combineByKey() transformations
  • Learn how to use Spark's transformations and actions for solving real problems
  • Learn how to use RDDs and DataFrames
  • Learn how to read/write data from many data sources
  • Learn how to use Logistic regression
  • Learn how to use Spark's reduction transformations
  • Learn how to use GraphFrames
  • Learn how to use Motifs in GraphFrames
  • Learn how to use Monoids in MapReduce algorithms

PySpark Algorithms Book


Software


Table of Contents

chap01: Introduction to PySpark
chap02: Hello World
chap03: Data Abstractions
chap04: Getting Started -- Sample Chapter
chap05: Transformations in Spark
chap06: Reductions in Spark
chap07: DataFrames and SQL
chap08: Spark DataSources
chap09: Logistic Regression
chap10: Movie Recommendations
chap11: Graph Algorithms
chap12: Design Patterns and Monoids

Appendix A: How To Install Spark
Appendix B: How to Use Lambda Expressions
Appendix C: Questions And Answers (50+ QA)


Future chapters:

chap13: FP-Growth
chap14: LDA
chap15: Linear Regression


PySpark Algorithms Book

Copyright [2019] [Mahmoud Parsian] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

简介

hadoop/spark数据算法 pyspark的例子 源码 展开 收起
Python
Apache-2.0
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
Python
1
https://gitee.com/shea1992/pyspark-algorithms.git
git@gitee.com:shea1992/pyspark-algorithms.git
shea1992
pyspark-algorithms
pyspark-algorithms
master

搜索帮助