List of videos

Chesnay Schepler - Big Data Analytics with Python using Stratosphere

Chesnay Schepler - Big Data Analytics with Python using Stratosphere [EuroPython 2014] [25 July 2014] Stratosphere is a distributed platform for advanced big data analytics. It features a rich set of operators, advanced, iterative data flows, an efficient runtime, and automatic program optimization. We present Stratophere's new Python programming interface. It allows Python developers to easily get their hands on Big Data. ----- [Stratosphere](http://stratosphere.eu/) is implemented in Java. In 2013 we introduced support for writing Stratosphere programs in Scala. Since Scala also runs in the Java JVM the language integration was easy for Scala. In late 2013, we started to develop a generic language binding framework for Stratosphere to support non-JVM languages such as Python, JavaScript, Ruby but also compiled languages such as C++. The language binding framework uses [Google’s Protocol Buffers](https://code.google.com/p/protobuf/) for efficient data serialization and transportation between the languages. Since many “Data Scientists” and machine learning experts are using Python on a daily basis, we decided to use Python as the reference implementation for Stratosphere’s language binding feature. Our talk at the EuroPython 2014 will present how Python developers can leverage the Stratosphere Platform to solve their big data problems. We introduce the most important concepts of Stratosphere such as the operators, connectors to data sources, data flows, the compiler, iterative algorithms and more. Stratosphere is a mature, next generation big-data analytics platform developed by a vibrant [open-source community](https://github.com/stratosphere/stratosphere). The system is available under the Apache 2.0 license. The project started in 2009 as a joint research project of multiple universities in the Berlin area (Technische Universität, Humboldt Universität and Hasso-Plattner Institut). Nowadays it is an award winning system that has gained worldwide attention in both research and industry. A note to the program committee: As mentioned, the development of the Python language binding of Stratosphere has started a few months ago, therefore, the code is not yet in the main development branch. However, we are already able to execute the “Hello World” of big data, the “Word Count” example using the Python interface. See this example in the development branch: https://github.com/filiphaase/stratosphere/blob/langbinding/stratosphere-addons/stratosphere-language-binding/src/main/python/eu/stratosphere/language/binding/wordcountexample/WordCountPlan.py Please contact us if you have any questions!

Watch

Piotr Przymus - Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Piotr Przymus - Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask [EuroPython 2014] [25 July 2014] Have you ever wondered what happens to all the precious RAM after running your 'simple' CPython code? Prepare yourself for a short introduction to CPython memory management! This presentation will try to answer some memory related questions you always wondered about. It will also discuss basic memory profiling tools and techniques. ----- This talk will cover basics of CPython memory usage. It will start with basics like objects and data structures representation. Then advanced memory management aspects, such as sharing, segmentation, preallocation or caching, will be discussed. Finally, memory profiling tools will be presented.

Watch

Nicholas Tollervey/holger krekel - The Return of "The Return of Peer to Peer Computing".

Nicholas Tollervey/holger krekel - The Return of "The Return of Peer to Peer Computing". [EuroPython 2014] [24 July 2014] At last year's Europython Holger Krekel gave a keynote called "The Return of Peer to Peer Computing". He described how developers, in light of the Snowden surveillance revelations, ought to learn about and build decentralized peer-to-peer systems with strong cryptography. This talk introduces, describes and demonstrates ideas, concepts and code that a group of Pythonistas have been working on since Holger's keynote. We asked ourselves two questions: what are the fundamental elements / abstractions of a peer-to-peer application and, given a reasonable answer to the first question, what can we build? We will present work done so far, discuss the sorts of application that might be written and explore how peer-to-peer technology could be both attractive and viable from an economic point of view. ----- This talk introduces, describes and demonstrates concepts and code created during sprints and via online collaboration by a distributed group of Pythonistas under the working title p4p2p (http://p4p2p.net). We asked ourselves, as frameworks such as Zope/Plone, Django, Pyramid or Flask are to web development what would the equivalent sort of framework look like for peer-to-peer application development? We've tackled several different technical issues: remote execution of code among peers, distributed hash tables as a mechanism for peer discovery and data storage, various cryptographic requirements and the nuts and bolts of punching holes in firewalls. Work is ongoing (we have another sprint at the end of March) and the final content of the talk will depend on progress made. However, we expect to touch upon the following (subject to the caveat above): * What is the problem we're trying to solve? * Why P2P? * The story of how we ended up asking the questions outlined in the abstract. * What we've done to address these questions. * An exploration of the sorts of application that could be built using P2P. * A call for helpers and collaboration. Happy to answer any questions!

Watch

Austin Bingham - Python refactoring with Rope and Traad

Austin Bingham - Python refactoring with Rope and Traad [EuroPython 2014] [23 July 2014] Rope is a powerful Python refactoring library. Traad (Norwegian for “thread”) is a tool which makes it simpler to integrate rope into IDEs via a simple HTTP API. In this session we’ll look at how traad and rope work together and how traad integrates with at least one popular editor. ----- Python is a modern, dynamic language which is growing in popularity, but tool support for it is sometime lacking or only available in specific environments. For refactoring and other common IDE functions, however, the powerful open-source rope library provides a set of tools which are designed to be integrated into almost any programming environment. Rope supports most common refactorings, such as renaming and method extraction, but also more Python-specific refactorings, such as import organization. Rope’s underlying code analysis engine also allows it to do things like locating method definitions and generating auto-completion suggestions. While rope is designed to be used from many environments, it’s not always easy or ideal to integrate rope directly into other programs. Traad (Norwegian for “thread”) is another open-source project that addresses this problem by wrapping rope into a simple client-server model so that client programs (IDEs, editors, etc.) can perform refactorings without needing to embed rope directly. This simplifies dependencies, makes clients more robust in the face of errors, eases traad client development, and even allows clients to do things like switch between Python 2 and 3 refactoring in the same session. In this session we’ll look at how rope operates, and we’ll see how traad wraps it to provide an easier integration interface. The audience will get enough information to start using rope themselves, either directly or via traad, and they’ll see how to use traad for integrating rope into their own environments. More generally, we’ll look at why client-server refactoring tools might be preferable to the more standard approach of direct embedding.

Watch

Nicola Iarocci - Eve - REST APIs for Humans™

Nicola Iarocci - Eve - REST APIs for Humans™ [EuroPython 2014] [24 July 2014] Powered by Flask, Redis, MongoDB and good intentions the Eve REST API framework allows to effortlessly build and deploy highly customizable, fully featured RESTful Web Services. The talk will introduce the project and its community, recount why and how it's being developed, and show the road ahead. ----- Nowadays everyone has data stored somewhere and needs to expose it through a Web API, possibly a RESTful one. [Eve](http://python-eve.org) is the BSD-licensed, Flask-powered RESTful application and framework that allows to effortlessly build and deploy highly customizable, fully freatured RESTful Web Services. Eve features a robust, feature rich, REST-centered API implementation. MongoDB support comes out of the box and community-driven efforts to deliver ElasticSearch and SQLAlchemy data layers are ongoing. Eve approach is such that you only need to configure your API settings and behaviour, plug in your datasource, and you’re good to go. Features such as Pagination, Sorting, Conditional Requests, Concurrency Control, Validation, HATEOAS, JSON and XML rendering, Projections, Customisable Endpoints, Rate Limiting are all included. Advanced features such as custom Authentication and Authorisation, Custom Validation, Embedded Resource Serialisation are also easily available. In my talk I will introduce the project and its community, recount why and how it's being developed, show the source code, illustrate key concepts and show the road ahead.

Watch

Erik van Zijst - The inner guts of Bitbucket

Erik van Zijst - The inner guts of Bitbucket [EuroPython 2014] [24 July 2014] Today Bitbucket is more than 30 times bigger than at the time of acquisition almost 4 years ago and serves repositories to over a million developers. This talk lays out its current architecture in great detail, from Gunicorn and Django to Celery and HA-Proxy to NFS. ----- This talk is about Bitbucket's architecture. Leaving no stone unturned, I'll be covering the entire infrastructure. Every component, from web servers to message brokers and load balancing to managing hundreds of terabytes of data. Since its inception in 2008, Bitbucket has grown from a standard, modest Django app into a large, complex stack that while still based around Django, has expanded into many more components. Today Bitbucket is more than 30 times bigger than at the time of acquisition almost 4 years ago and serves Git and Mercurial repos to over a million users and growing faster now than ever before. Our current architecture and infrastructure was shaped by rapid growth and has resulted in a large, mostly horizontally scalable system. What has not changed is that it's still nearly all Python based and could serve as inspiration or validation for other community members responsible for rapidly scaling their apps. This talk will layout the entire architecture and motivate our technology choices. From our Gunicorn to Celery and HA-Proxy to NFS.

Watch

Dmitry Trofimov - Python Debugger Uncovered

Dmitry Trofimov - Python Debugger Uncovered [EuroPython 2014] [24 July 2014] This talk will explain how to implement a debugger for Python. We'll start with setting a simple trace function, then look how it is implemented in modern IDEs like PyCharm and Pydev. Then we go further in the details and uncover the tricks used to implement some cool features like exception handling and multiprocess debugging. ----- Presentation describes how to implement debugger for Python and has 4 parts: * Tracing Python code Explains how to use trace function * Debugger Architecture Explains which parts consists of a modern full-fledged debugger. * A Bit of Details Explains how to make code to work for all python versions and implementations, survive gevent monkey-patching etc. * Cool Features Explains how to implement exception handling and multiprocess debugging

Watch

Roberto Polli - Statistics 101 for System Administrators

Roberto Polli - Statistics 101 for System Administrators [EuroPython 2014] [22 July 2014] Python allows every sysadmin to run (and learn) basic statistics on system data, replacing sed, awk, bc and gnuplot with an unique, reusable and interactive framework. The talk is a case study where python allowed us to highlight some network performance points in minutes using itertools, scipy and matplotlib. The presentation includes code snippets and a brief plot discussion. ----- #Statistics 101 for System Administrators ## Agenda * A latency issue * Data distribution * 30 seconds correlation with pearsonr * Combinating data * Plotting and the power of color ## An use case - Network latency issues - Correlate latency with other events ## First statistics - we created our parsing library - [using various recipes](http://chimera.labs.oreilly.com/books/1230000000393/ch06.html) - Having the data in a dict like > table = { > 'time': [ 1,2,3, ..], > 'elapsed': [ 0.12, 12.43, ..], > 'error': [ 2, 0, ..], > 'size': [123,3223, ..], > 'peers': [2313, 2303, ..], - It's easy to get max, min and standard deviation > print [k, max(v), min(v), stats.mean(v) ] for k,v in table.items() ] ## Distribution - A distribution shows event frequency > from matplotlib import pyplot > pyplot.hist(table['elapsed']) - Time and Size distributions ## (Linear) Correlation - What's correlation - What's not correlation - pearsonr and probability - catch for linear correlation > from scipy.stats.stats import pearsonr > a, b = range(0,10), range(0,20, 2) > c = [randint(0,10) for x in a] > pearsonr(a, b), pearsonr(a,c) > (1.0, 0.0), (0.43, 0.2) ## Combinations - using itertools.combinations - netfishing correlation >from itertools import combination >for f1, f2 in combinations(table, 2): > r, p_value = pearsonr(table[f1], table[f2]) > print("the correlation between %s and %s is: %s" % (f1, f2, r)) > print("the probability of a given distribution (see manual) is: %s" % p_value) ## Plot always - pearsonr finds *only* linear correlation - our eyes work better :P - so...plot always! - color is the 3d dimension of a plot! > from pyplot import scatter, title, xlabel, ylabel, legend > from pyplot import savefig, close as closefig > > for f1, f2 in combinations(table, 2): > scatter(table[f1], table[2], label="%s_%s" % (f1,f2)) > # add legend and other labels > r, p = pearsonr(table[f1], table[f2]) > title("Correlation: %s v %s, %s" % (f1, f2, r)) > xlabel(f1), ylabel(f2) > legend(loc='upper left') # show the legend in a suitable corner > savefig(f1 + "_" + f2 + ".png") > closefig() ## Wrap Up! - do not use pearsonr to *exclude* relation between events - plots may serve better - scatter plot can show a system thruput and exclude correlation between fields A and fields B - continue collecting results

Watch

Fredrik Håård - Jython in practice

Fredrik Håård - Jython in practice [EuroPython 2014] [24 July 2014] A lot of people have heard of Jython, some have tried it, but it seems few have actually deployed it in a corporate environment. In this talk I'll share my experiences in using Jython as a testbed for Java applications, for rapid prototyping in Java desktop and web environments, and for embedding scripting capabilities in Java products. ----- Not everyone gets paid to work with Python all the time, but if you find yourself in a Java project, there are good chances you could benefit from Python without throwing out the Java stack. Using Jython, you can do rapid prototyping without the long edit-compile-test cycles normally associated with large Java projects, whether on the web or the desktop, and when testing an application might become a nightmare of scaffolding in Java, a little Jython may be just what you need to be able to run your tests smoothly. At the end of this talk, I will put on my politician´s hat and bring up the best - and worst - arguments to use to get permission to use Jython in a corporate environment.

Watch