List of videos

Peter Hoffmann - PySpark - Data processing in Python on top of Apache Spark.

Peter Hoffmann - PySpark - Data processing in Python on top of Apache Spark. [EuroPython 2015] [22 July 2015] [Bilbao, Euskadi, Spain] [Apache Spark][1] is a computational engine for large-scale data processing. It is responsible for scheduling, distribution and monitoring applications which consist of many computational task across many worker machines on a computing cluster. This Talk will give an overview of PySpark with a focus on Resilient Distributed Datasets and the DataFrame API. While Spark Core itself is written in Scala and runs on the JVM, PySpark exposes the Spark programming model to Python. It defines an API for Resilient Distributed Datasets (RDDs). RDDs are a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner. RDDs are immutable, partitioned collections of objects. Transformations construct a new RDD from a previous one. Actions compute a result based on an RDD. Multiple computation steps are expressed as directed acyclic graph (DAG). The DAG execution model is a generalization of the Hadoop MapReduce computation model. The Spark DataFrame API was introduced in Spark 1.3. DataFrames envolve Spark's RDD model and are inspired by Pandas and R data frames. The API provides simplified operators for filtering, aggregating, and projecting over large datasets. The DataFrame API supports diffferent data sources like JSON datasources, Parquet files, Hive tables and JDBC database connections. Resources: - [An Architecture for Fast and General Data Processing on Large Clusters][2] Matei Zaharia - [Spark][6] Cluster Computing with Working Sets - Matei Zaharia et al. - [Resilient Distributed Datasets][5] A Fault-Tolerant Abstraction for In-Memory Cluster Computing -Matei Zaharia et al. - [Learning Spark][3] Lightning Fast Big Data Analysis - Oreilly - [Advanced Analytics with Spark][4] Patterns for Learning from Data at Scale - Oreilly [1]: https://spark.apache.org [2]: http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-12.pdf [3]: http://shop.oreilly.com/product/0636920028512.do [4]: http://shop.oreilly.com/product/0636920035091.do [5]: https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf [6]: http://www.cs.berkeley.edu/~matei/papers/2010/hotcloud_spark.pdf

Watch

Brecht Machiels - RinohType, a document processor inspired by LaTeX

Brecht Machiels - RinohType, a document processor inspired by LaTeX [EuroPython 2015] [24 July 2015] [Bilbao, Euskadi, Spain] RinohType is a document processor inspired by [LaTeX][1] and written in Python. It renders [reStructuredText][2] and [Sphinx][3] documents to PDF based on a document template and a style sheet. RinohType already implements many of the features that make LaTeX so great. Not stopping there, RinohType also tries to fix LaTeX's weaknesses; it should not only be easy to use, but easy to _customize_ and _extend_ as well. To minimize frustration when things go wrong, care is taken to provide descriptive warning and error messages. The powerful layout engine makes it easy to define custom page layouts. And the CSS- inspired stylesheets simplify the styling of document elements. At a lower level, Python makes the writing of extensions much more accessible when compared to TeX's rather arcane macro language. In the talk, I would like to introduce RinohType to the Python community. No special prerequisite knowledge is required. I will start off by discussing my motivation for starting RinohType development, its design goals and the currently available features. This will be followed by an example of how you can use RinohType to render a reStructuredText document to a neat PDF document, highlighting some of the features along the way. Next, we'll explore some of RinohType's internals such as the page layout engine and the style sheet system. We will explore how these can be used in a Python application to create a document from scratch. A first RinohType release was recently created. While this preview release is of alpha quality, it should be able to render most reStructuredText documents. It also includes a preliminary Sphinx builder. Please find more details in the package's description at [PyPI][4]. [1]: http://en.wikipedia.org/wiki/LaTeX [2]: http://docutils.sourceforge.net/rst.html [3]: http://sphinx-doc.org [4]: https://pypi.python.org/pypi/RinohType

Watch

Raphael Pierzina - Come to the Dark Side! We have a whole bunch of Cookiecutters!

Raphael Pierzina - Come to the Dark Side! We have a whole bunch of Cookiecutters! [EuroPython 2015] [23 July 2015] [Bilbao, Euskadi, Spain] *(This talk is intended for intermediate-level participants who have a basic understanding of the Python language and contains quotes from Darth Vader that some attendees may find hilarious)* Writing a Python script from scratch is fairly easy and you get on with very little boilerplate code in general. However starting a new Python project can be tiring if you decide to stick to best practices and plan on submitting it to PyPI. It requires great diligence and occasionally gets pretty cumbersome if you start new tools on a regular basis. You underestimate the power of a good template ---------------------------------------------- Why not just use a template for it? Cookiecutter is a CLI tool written in pure Python that enables you to do so. Not only is it working for Python code, but also markdown formats and even other programming languages. We will talk about the ideas behind Cookiecutter and go over how you can create your very own template, so you and others can benefit from your experience. I would like to briefly go into the technologies used and how you can get involved in the Cookiecutter GitHub project. There are already plenty of Cookiecutter templates, or Cookiecutters as we call them, available online. Most of them target Python projects, but others can be used to create C++, LaTeX or Javascript projects. The ability to destroy a planet is insignificant next to the power of Cookiecutter. ---------------------------------------------------------------------- ------------- I will show you how to use Cookiecutter and highlight some of the amazing templates created by the community. More importantly we will create a Cookiecutter template from scratch using the example of a simple Kivy app and make use of advanced features such as post-gen hooks, copy-without-render and templates in context values. Finally I will recommend resources on how to follow up on this talk and how to get in touch in case of any queries. GitHub: [https://github.com/hackebrot][1] Twitter: [https://twitter.com/hackebrot][2] Blog: [http://www.hackebrot.de/][3] [1]: https://github.com/hackebrot [2]: https://twitter.com/hackebrot [3]: http://www.hackebrot.de/

Watch

Patrick Mühlbauer - Building nice command line interfaces - a look beyond the stdlib

Patrick Mühlbauer - Building nice command line interfaces - a look beyond the stdlib [EuroPython 2015] [22 July 2015] [Bilbao, Euskadi, Spain] One of the problems programmers are most often faced with is the parsing and validation of command-line arguments. If you're new to Python or programming in general, you might start by parsing sys.argv. Or perhaps you might've already come across standard library solutions such as getopt, optparse or argparse in the official documentation. While these modules are probably preferable to parsing sys.argv yourself, you might wonder if there are more satisfactory solutions outside of the standard library. Well, yes there are! This talk will give you an overview of some popular alternatives to the standard library solutions (e.g. click, docopt and cliff), explain their basic concepts and differences and show how you can test your CLIs.

Watch

Ekaterina Tuzova - NumPy: vectorize your brain

Ekaterina Tuzova - NumPy: vectorize your brain [EuroPython 2015] [23 July 2015] [Bilbao, Euskadi, Spain] NumPy is the fundamental Python package for scientific computing. However, being efficient with NumPy might require slightly changing how you write Python code. I’m going to show you the basic idioms essential for fast numerical computations in Python with NumPy. We'll see why Python loops are slow and why vectorizing these operations with NumPy can often be good. Topics covered in this talk will be array creation, broadcasting, universal functions, aggregations, slicing and indexing. Even if you're not using NumPy you'll benefit from this talk.

Watch

Jean-Philippe Caissy - Static type-checking is dead, long live static type-checking in Python!

Jean-Philippe Caissy - Static type-checking is dead, long live static type-checking in Python! [EuroPython 2015] [24 July 2015] [Bilbao, Euskadi, Spain] A few months ago, Guido unfolded PEP 484, which was highlighted at PyCon 2015 as a keynote presentation. This proposal would introduce type hints for Python 3.5. While the debate is still roaring and without taking a side, I believe that there is much to learn from static type-checking systems. The purpose of this talk is to introduce ways that could be used to fully take over the amazing power that comes with static types, inside a dynamic type language such as Python. The talk will go over what exactly a static type system is, and what kind of problem it tries to solve. We will also review Guido's proposal of type hinting, and what it could mean to you. Finally, I will present a few libraries that are available, such as Hypothesis or various QuickCheck-inspired library that tries to build more robust tests, how they achieve it and their limitations. Throughout the talk, a lot of examples will used to fully illustrate the ideas being explained. At the end of this talk, you should have a better understanding of the wonderful world of type systems, and what it really means to you. It should help you decide wether using type hints will be helpful to you and also if an external library trying to fuzz your tests has its place inside your project

Watch

Alejandro Garcia - Python Gamedev MLG

Alejandro Garcia - Python Gamedev MLG [EuroPython 2015] [22 July 2015] [Bilbao, Euskadi, Spain] An overview of the currently available Python game development libraries and frameworks and how is Python currently being used in the videogame industry. Presentation of Kobra, a modern open source Python game development framework with ECS (Entity Component System) architecture and C++ bindings.

Watch

Fabrizio Romano - TDD is not about tests!

Fabrizio Romano - TDD is not about tests! [EuroPython 2015] [21 July 2015] [Bilbao, Euskadi, Spain] TDD is not about tests! Well, actually, it’s not a about writing tests, or writing them before the code. This talk will show you how to use tests to really drive development by transforming business requirements into tests, and allowing your code to come as their natural consequence. Too often this key aspect is neglected and the result is that tests and code are somehow “disconnected”. The code is not as short and efficient as it could be, and the tests are not as effective. Refactoring is not always easy, and over time all sorts of issues start to come out of the surface. However, we will show that when TDD is done properly, tests and code merge beautifully into an organic whole that fulfills the business requirements, and provides all sorts of advantages: your code is minimal, easy to amend and extend, readable, clean. Your tests will be effective, short and focused, and allow for light-hearted refactoring and excellent coverage. We will provide enough information and examples to spark the curiosity of the novice, and satisfy the need of a deeper insight for the intermediate, and help you immediately benefit from this transformative technique that is still often underestimated and misunderstood. Slides: [http://slides.com/gianchub/ep2015-tdd#/][1] [1]: http://slides.com/gianchub/ep2015-tdd#/

Watch

Vincent Warmerdam - PySpark and Warcraft Data

Vincent Warmerdam - PySpark and Warcraft Data [EuroPython 2015] [21 July 2015] [Bilbao, Euskadi, Spain] In this talk I will describe how to use Apache Spark (PySpark) with some data from the World of Warcraft API from an iPython notebook. Spark is interesting because it speeds up iterative processes on your hadoop cluster as well as your local machine. I will give basic benchmarks (comparing it to numpy/pandas/scikit), explain the architecture/performance behind the technology and will give a live demo on how I used Spark to analyse an interesting dataset. I'll explain why you might want to use Spark and I'll also go in and explain when you don't want to use it. The dataset I will be using is a 22Gb json blob containing auction house data from all world of warcraft servers over a period of time. The goal of the analysis will be to determine when and if basic economics still applies in a massively online game. I will assume that the everyone knows what the ipython notebook is and I will assume a basic knowledge of numpy/pandas but nothing fancy. The dataset has been chosen such that people who are less interested in Spark can still enjoy the analysis part of the talk. If you know very little about data science but if you love video games then you should like this talk.

Watch