List of videos

Thomas Wiecki - Probabilistic Programming in Python

Thomas Wiecki - Probabilistic Programming in Python [EuroPython 2014] [24 July 2014] Probabilistic Programming allows flexible specification of statistical models to gain insight from data. The high interpretability and ease by which different sources can be combined has huge value for Data Science. PyMC3 features next generation sampling algorithms, an intuitive model specification syntax, and just-in-time compilation for speed, to allow estimation of large-scale probabilistic models. ----- Probabilistic Programming allows flexible specification of statistical models to gain insight from data. Estimation of best fitting parameter values, as well as uncertainty in these estimations, can be automated by sampling algorithms like Markov chain Monte Carlo (MCMC). The high interpretability and flexibility of this approach has lead to a huge paradigm shift in scientific fields ranging from Cognitive Science to Data Science and Quantitative Finance. PyMC3 is a new Python module that features next generation sampling algorithms and an intuitive model specification syntax. The whole code base is written in pure Python and Just-in-time compiled via Theano for speed. In this talk I will provide an intuitive introduction to Bayesian statistics and how probabilistic models can be specified and estimated using PyMC3.

Watch

Simon Cross - Conversing with people living in poverty

Simon Cross - Conversing with people living in poverty [EuroPython 2014] [24 July 2014] Vumi is a text messaging system designed to reach out to those in poverty on a massive scale via their mobile phones. It's written in Python using Twisted. This talk is about how and why we built it and how you can join us in making the world a better place. ----- 43% of the world's population live on less than €1.5 per day. The United Nations defines poverty as a "lack of basic capacity to participate effectively in society". While we often think of the poor as lacking primarily food and shelter, the UN definition highlights their isolation. They have the least access to society's knowledge and services and the most difficulty making themselves and their needs heard in our democracies. While smart phones and an exploding ability to collect and process information are transforming our access to knowledge and the way we organize and participate in our societies, those living in poverty have largely been left out. This has to change. Basic mobile phones present an opportunity to effect this change [3]. Only three countries in the world have fewer than 65 mobile phones per 100 people [4]. The majority of these phones are not Android or iPhones, but they do nevertheless provide a means of communication -- via voice calls, SMSes [6], USSD [7] and instant messaging. By comparison, 25 countries have less than 5% internet penetration [5]. Vumi [1] is an open source text messaging system designed to reach out to those in poverty on a massive scale via their mobile phones. It's written in Python using Twisted. Vumi is already used to: * provide Wikipedia access over USSD and SMS in Kenya [8]. * register a million voters in Libya [10]. * deliver health information to mothers in South Africa [9]. * prevent election violence in Kenya [11]. This talk will cover: * a brief overview of mobile networking and cellphone use in Africa * why we built Vumi * the challenges of operating in unreliable environments * an overview of Vumi's features and architecture * how you can help! Vumi features some cutting edge design choices: * horizontally scalable Twisted processes communicating using RabbitMQ. * declarative data models backed by Riak. * sharing common data models between Django and Twisted. * sandboxing hosted Javascript code from Python. Overview of challenges Vumi addresses: *Scalability*: Vumi needs to support both small scale applications (demos, pilot projects, applications tailored for a particular community) and large ones (things that everyone within a country might use). We address this using Twisted workers that exchange messages via RabbitMQ and store data in Riak. Having projects share RabbitMQ and Riak instances significantly reduces the overhead for small projects (e.g. its not cost effective to launch the recommended minimum of 5 Riak servers for a small project). *Barriers to entry*: Often the people with good ideas don't have access to one of many things needed to run a production system themselves, e.g. capital, time, stable infrastructure. We address this by providing a hosted Vumi instance that runs sandboxed Javascript applications. All the application author needs is their idea, the ability to write Javascript and upload it to our servers. The target audience here is African entrepreneurs at incubator spaces like iHub (Nairobi), kLab (Kigali), BongoHive (Lusaka) and JoziHub (Johannesburg). *Unreliable third-party systems*: It's one thing for parts of ones own system to go down, it's another for crucial third-party systems to go down. Vumi takes an SMTP-like approach to solving this and uses persistent queues so that messages can back up in the queue while third-party systems are down and be processed when they become available again. We also feedback information on whether third-party messaging systems have accepted or reject messages to the application that initiated them. Vumi is developed by the Praekelt Foundation [2] (and individual contributors!). [1]: <http://vumi.org/> "Vumi" [2]: <http://praekeltfoundation.org/> "Praekelt Foundation" [3]: <http://www.youtube.com/watch?v=0bXjgx4J0C4#t=20> "Spotlight on Africa" [4]: <http://en.wikipedia.org/wiki/List_of_countries_by_number_of_mobile_phones_in_use> [5]: <http://en.wikipedia.org/wiki/List_of_countries_by_number_of_Internet_users> [6]: <http://en.wikipedia.org/wiki/Short_Message_Service> [7]: <http://en.wikipedia.org/wiki/Unstructured_Supplementary_Service_Data [8]: <http://blog.praekeltfoundation.org/post/65981723628/wikipedia-zero-over-text-with-praekelt-foundation> [9]: <http://blog.praekeltfoundation.org/post/65042080515/mama-launches-healthy-family-nutrition-programme> [10]: <http://www.libyaherald.com/2014/01/01/over-one-million-register-for-constitutional-elections-on-final-sms-registration-day

Watch

Magdalena Rother - How to become a software developer in science?

Magdalena Rother - How to become a software developer in science? [EuroPython 2014] [24 July 2014] My path from 'Hello world' to software development was long and hard. The approach I learned during my research may help you to create high quality software and improve as a developer. The talk covers how you can benefit from your non-IT knowledge, atomize your project and how collaboration accelerates your learning. ----- **Goal**: give practical tools for improving skills and software quality to people with a background other than IT. Eight years ago, as a plant biologist, I knew almost nothing about programming. When I took a course in python programming, I found myself so fascinated that it altered my entire career. I became a scientific software developer. It was long and hard work to get from the level of 'Hello world' to the world of software development. The talk will cover how to embrace a non-IT education as a strength, how and why to atomize programming tasks and the importance of doing side projects. ### 1. Embrace your background Having domain specific knowledge from a field other than IT helps you to communicate with the team, the users and the group leader. It prevents misunderstandings and helps to define features better. A key step you can take is systematically apply the precise domain specific language to the code e.g when naming objects, methods or functions. Another is to describe the underlying scientific process step by step as a Use Case and write it down in pseudocode. ### 2. Atomisation Having a set of building block in your software helps to define responsibilities clearly. Smaller parts are easier to test, release and change. Modular design makes the software more flexible and avoids the Blob and Lava Flow Anti-Patterns. When using object oriented programming a rule of thumb is that an object (in Python also a method) does only one thing. You can express this Single Responsibility Principle as a short sentence for each module. Another practical action is to introduce Design Patterns that help to decouple data and its internal representation. As a result, your software becomes more flexible. ### 3. Participating in side projects Learning from others is a great opportunity to grow. Through side projects you gain a fresh perspective and learn about best practices in project management. You gain new ideas for improvement and become aware of difficulties in your own project. You can easily participate in a scientific project by adding a small feature, writing a test suite or provide a code review on a part of a program. Summarizing, in scientific software development using domain-specific knowledge, atomisation of software, and participation in side projects are three things that help to create high quality software and to continuously improve as a developer. The talk will address challenges in areas where science differs from the business world. It will present general solution one might use for software developed in a scientific environment for research projects rather then discussing particular scientific packages. ### Qualifications During my PhD I developed a software on 3D RNA modeling (www.genesilico.pl/moderna/) that resulted in 7 published articles. I am coauthor on a paper on bioinformatic software development. Currently I am actively developing a system biology software in Python at the Humboldt University Berlin (www.rxncon.org).

Watch

Ralph Heinkel - Combining the powerful worlds of Python and R

Ralph Heinkel - Combining the powerful worlds of Python and R [EuroPython 2014] [25 July 2014] Although maybe not very well known in the Python community there exists a powerful statistical open-source ecosystem called R. Mostly used in scientific contexts it provides lots of functionality for doing statistical analysis, generation of various kinds of plots and graphs, and much, much more. The triplet R, Rserve, and pyRserve allows the building up of a network bridge from Python to R: Now R-functions can be called from Python as if they were implemented in Python, and even complete R scripts can be executed through this connection. ----- pyRserve is a small open source project originally developed to fulfill the needs of a German biotech company to do statistical analysis in a large Python-based Lab Information Management System (LIMS). In contrast to other R-related libraries like RPy where Python and R run on the same host, pyRserve allows the distribution of complex operations and calculations over multiple R servers across the network. The aim of this talk is to show how easily Python can be connected to R, and to present a number of selected (simple) code examples which demonstrate the power of this setup.

Watch

Ashikaga - Python for Zombies: 15.000 enrolled in the first Brazilian MOOC to teach Python

Ashikaga - Python for Zombies: 15.000 enrolled in the first Brazilian MOOC to teach Python [EuroPython 2014] [24 July 2014] Experiences of how we spread the Python community in Brazil with a non english MOOC (Massive Open Online Course) to teach programming. Hacking basic modules and classes to obtain the "Answer to the Ultimate Question of Life, the Universe, and Everything". A funny way to teach programming.

Watch

Carl Crowder - Automatic code reviews

Carl Crowder - Automatic code reviews [EuroPython 2014] [23 July 2014] A lot of great Python tools exist to analyse and report on your codebase, but they can require a lot of initial set up to be useful. Done right, they can be like an automatic code review. This talk will explain how to set up and get the best out of these tools, especially for an existing, mature codebase. ----- Static analysis tools are a great idea in theory, but are not often really used in practice. These tools usually require quite a lot of initial effort to get set up in a way which produces meaningful output for you or your organisation's particular coding style and values. As a result, it's common to see initial enthusiasm replaced by ignoring the tools. Such tools can be incredibly beneficial however, and even go so far as to provide an automatic code review, and this talk will explain what kind of benefits you can get from the tools, as well as explain what you can and cannot expect. This talk is aimed at experienced developers who are interested in improving their coding practices but who have either never tried static analysis tools, or who have not seen the upsides. It will hopefully also be useful to people who do use the tools, perhaps introducing them to new tools or concepts they were not aware of yet.

Watch

Stefan Schwarzer - Support Python 2 and 3 with the same code

Stefan Schwarzer - Support Python 2 and 3 with the same code [EuroPython 2014] [24 July 2014] Your library supports only Python 2, - but your users keep nagging you about Python 3 support? As Python 3 gets adopted more and more, users ask for Python 3 support in existing libraries for Python 2. Although there are several approaches, this talk focuses on using the very same code for a Python 2 and a Python 3 version. The talk discusses the main problems when supporting Python 3 and best practices to apply for compatibility with Python 2 and 3. ----- Your library supports only Python 2, - but your users keep nagging you about Python 3 support? As Python 3 gets adopted more and more, users ask for Python 3 support in existing libraries for Python 2. This talk mentions some approaches for giving users a Python 3 version, but will quickly focus on using the very same code for a Python 2 and a Python 3 version. This is much easier if you require Python 2.6 and up, and yet a bit easier if you require Python 3.3 as the minimum Python 3 version. The talk discusses main problems when supporting Python 3 (some are easily solved): * `print` is a function. * More Python APIs return iterators that used to return lists. * There's now a clear distinction between bytes and unicode (text) strings. * Files are opened as text by default, requiring an encoding to apply on reading and writing. The talk also explains some best practices: * Start with a good automatic test coverage. * Deal with many automatic conversions with a one-time 2to3 run. * Think about how your library should handle bytes and unicode strings. (Rule of thumb: Decode bytes as early as possible; encode unicode text as late as possible.) * Should you break compatibility with your existing Python 2 API? (Yes, if there's no other way to design a sane API for Python 2 and 3. If you do it, raise the first part of the version number.) * Try to keep code that's different for Python 2 and 3 minimal. Put code that needs to be different for Python 2 and 3 into a `compat` module. Or use third-party libraries like `six` or `future`. Finally, the talk will mention some helpful resources on the web.

Watch

Jyrki Pulliainen - Packaging in packaging: dh-virtualenv

Jyrki Pulliainen - Packaging in packaging: dh-virtualenv [EuroPython 2014] [24 July 2014] Deploying your software can become a tricky task, regardless of the language. In the spirit of the Python conferences, every conference needs at least one packaging talk. This talk is about dh-virtualenv. It's a Python packaging tool aimed for Debian-based systems and for deployment flows that already take advantage of Debian packaging with Python virtualenvs ----- [Dh-virtualenv][1] is an open source tool developed at Spotify. We use it to ease deploying our Python software to production. We built dh-virtualenv as a tool that fits our existing continuous integration flow with a dedicated sbuild server. As we were already packaging software in Debian packages, the aim of dh-virtualenv was to make transition to virtualenv based installations as smooth as possible. This talk covers how you can use dh-virtualenv to help you deploy your software to production, where you are already running a Debian-based system, such as Ubuntu, and what are the advantages and disadvantages of the approach over other existing and popular techniques. We will discuss the deploying as a problem in general, look into building a dh-vritualenv-backed package, and in the end, look into how dh-virtualenv was actually made. Goal is that after this presentation you know how to make your Debian/Ubuntu deployments easier! [dh-virtualenv][1] if fully open sourced, production tested software, licensed under GPLv2+ and available in Debian testing and unstable. More information of it is also available in our [blogpost][2]. Talk outline: 1. Introduction & overview (3min) * Who am I? * Why am I fiddling with Python packaging? * What do you get out of this talk? 2. Different shortcomings of Python deployments (5min) * Native system packages * Virtualenv based installations * Containers, virtual machine images 3. dh-virtualenv (10 min) * What is dh-virtualenv? * Thought behind dh-virtualenv * Advantages over others * Requirements for your deployment flow * Short intro to packaging Sentry with dh-virtualenv 4. How is it built? (10 min) * Debian package building flow primer * How dh-virtualenv fits that flow * What does it do build time and why? [1]:http://github.com/spotify/dh-virtualenv [2]:http://labs.spotify.com/2013/10/10/packaging-in-your-packaging-dh-virtualenv/

Watch

Francisco Fernández Castaño - Graph Databases, a little connected tour

Francisco Fernández Castaño - Graph Databases, a little connected tour [EuroPython 2014] [23 July 2014] There are many kinds of NoSQL databases like, document databases, key-value, column databases and graph databases. In some scenarios is more convenient to store our data as a graph, because we want to extract and study information relative to these connections. In this scenario, graph databases are the ideal, they are designed and implemented to deal with connected information in a efficient way. ----- There are many kinds of NoSQL databases like, document databases, key-value, column databases and graph databases. In some scenarios is more convenient to store our data as a graph, because we want to extract and study information relative to these connections. In this scenario, graph databases are the ideal, they are designed and implemented to deal with connected information in a efficient way. In this talk I'll explain why NoSQL is necessary in some contexts as an alternative to traditional relational databases. How graph databases allow developers model their domains in a natural way without translating these domain models to an relational model with some artificial data like foreign keys and why is more efficient a graph database than a relational one or even a document database in a high connected environment. Then I'll explain specific characteristics of Neo4J as well as how to use Cypher the neo4j query language through python.

Watch