List of videos

Alex Brasetvik - Elasticsearch from the bottom up

Alex Brasetvik - Elasticsearch from the bottom up [EuroPython 2014] [24 July 2014] This talk will teach you about Elasticsearch and Lucene's architecture. The key data structure in search is the powerful inverted index, which is actually simple to understand. We start there, then ascend through abstraction layers to get an overview of how a distributed search cluster processes searches and changes. ----- ## Who I am and motivation I work with hosted Elasticsearch and have interacted with lots of developers. We see what many struggle with. Some relevant theory helps a lot. What follows has already lead to many "Aha!"-moments and developers piecing things together herself. ## The inverted index The most important index structure is actually very simple. It is essentially a sorted dictionary of terms, with a list of postings per term. We show three simple sample documents and the resulting inverted index. ## The index term The index term is the "unit of search", and the terms we make decide how we can search. With the inverted index and its sorted dictionary, we can quickly search for terms given their prefix. ## Importance of text analysis Thus, we need to transform our search problems into string prefix problems. This is done with text analysis, which is the process of making of index terms. It is highly important when implementing search. ## Building indexes The way indexes are built must balance how compact an index is, how easily we can search in it, how fast we can index documents - and the time it takes for changes to be visible. Lucene, and thus Elasticsearch, builds them in segments. ## Index segments A Lucene index consists of index segments, i.e. immutable mini-indexes. A search on an index is done by doing the search on all segments and merging the results. Segments are immutable: This enables important compression techniques. Deletes are not immediate, just a marker. Segments are occasionally merged to larger segments. Then documents are finally deleted. New segments are made by buffering changes in memory, and written when flushing happens. Flushes are largely caused by refreshing every second, due to real time needs. ## Caches Caches like filter- and field caches are managed per segment. They are essential for performance. Immutable segments make for simple reasoning about caches. New segments only cause partial cache invalidations. ## Elasticsearch indexes Much like a Lucene index is made up of many segments, an Elasticsearch index is made up of many Lucene indexes. Two Elasticsearch indexes with 1 shard is essentially the same as one Elasticsearch index with 2 shards. Search all shards and merge. Much like segments, but this time possibly across machines. Shard / Index routing enables various partitioning strategies. Simpler than it sounds, so one important example: Essential for time based data, like logs: can efficiently skip searching entire indexes - and roll out old data by deleting the entire index. ## Common pitfalls We must design our indexing for how we search - not the searches for how things are indexed. Be careful with wildcards and regexes. Since segments are immutable, deleting documents is expensive while deleting an entire index is cheap. Updating documents is essentially a delete and re-index. Heavy updating might cause problems. Have enough memory and then some. Elasticsearch is very reliant on its caches. ## Summary We've seen how index structures are used, and why proper text processing is essential for performant searches. Also, you now know what index segments are, and how they affect both indexing and searching strategies. ## Questions

Watch

Domen Kožar - Rethinking packaging, development and deployment

Domen Kožar - Rethinking packaging, development and deployment [EuroPython 2014] [22 July 2014] In Python, we're trying to solve packaging problems in our own domain, but maybe someone else already solved most our problems. In the talk I'll show how I develop and deploy Python projects that can be easily mixed with non-Python dependencies. http://nixos.org/nix/ will be demonstrated to replace technologies in our stack: pip, virtualenv, buildout, ansible, jenkins. ----- Python is often mixed with other languages in development stack, nowadays it's hard to escape any JavaScript dependencies. If you add some C dependencies such as GStreamer to the stack, packaging becomes a burden. While tweaking our packaging infrastructure will make things better, it's hard to fix fundamental problem of packaging with current ad-hoc solutions in Python domain. Using Nix (http://nixos.org/nix/) for about a year gave me an insight that solving packaging problem at operating system level (bottom-up) is a better approach. For example, wouldn't it be cool to have "virtualenv" implemented inside your package manager, so you could isolate also non-Python dependencies and not just Python packages for your project and not worry if system was updated? We'll also show what benefits do we get by using the same tool for development and deployment and how little we have to do to deploy our application. To see how Haskell community is touching the same subject, see blog post http://ocharles.org.uk/blog/posts/2014-02-04-how-i-develop-with-nixos.html

Watch

Felix Wick/Florian Wilhelm - How to Setup a new Python Project

Felix Wick/Florian Wilhelm - How to Setup a new Python Project [EuroPython 2014] [23 July 2014] Setting up a new Python project from scratch can be quite hard. How to structure your files and directories. Where should my packages, modules, documentation and unit tests go? How do I configure setup.py, Sphinx and so on? We provide proven answers! ----- Whenever a Python beginner starts with its own project he or she is confronted with the same technical questions. Questions about a well thought out directory structure to hold all the files. How setup.py needs to be configured and even what it is capable of like specifying entry_points and other goodies. We show from the experience of our yearslong work with Python how to structure your Python project in terms of folders, files, modules and packages. How to configure setup.py to specify your requirements, to use it with nosetests, with Sphinx and so on. We also elaborate on the usage of Git and Versioneer (https://github.com/warner/python-versioneer) to help you version your package.

Watch

Prashant Agrawal - Jigna: a seamless Python-JS bridge to create rich HTML UIs for Python apps

Prashant Agrawal - Jigna: a seamless Python-JS bridge to create rich HTML UIs for Python apps [EuroPython 2014] [22 July 2014] Jigna aims to provide an easy way to create rich user interfaces for Python applications using web technologies like HTML, CSS and Javascript, as opposed to widget based toolkits like Qt/wx or native toolkits. It provides a seamless two-way data binding between the Python model and the HTML view by creating a Python-JS communication bridge. This ensures that the view is always live as it can automatically update itself when the model changes, and update the model when user actions take place on the UI. The Jigna view can be rendered in an in-process Qt widget or over the web in a browser.

Watch

Mark Smith - Writing Awesome Command-Line Programs in Python

Mark Smith - Writing Awesome Command-Line Programs in Python [EuroPython 2014] [24 July 2014] Command-Line programs can have a lot to them - usually more than you think, yet often suffer from a lack of thought. This is a tour through how to structure your code, tools in the standard library and some 3rd party libraries. Take your command-line programs to the next level! ----- Python is a great language for writing command-line tools - which is why so much of Linux is secretly written in Python these days. Unfortunately, what starts as a simple script can quickly get out of hand as more features are added and more people start using it! The talk will consist of a tour through various useful libraries and practical code showing how each can be used, and include advice on how to best structure simple and complex command-line tools. Things to consider when writing command-line apps: * Single-file vs Multiple-file * Standard library only vs. 3rd party requirements * Installation - setup.py vs. native packaging The different parts of a command-line program: * Option Parsing: * Libraries: getopt, optparse, argparse, docopt * Sub-commands * Configuration: * Formats: Ini file, JSON, YAML * Where should it be stored (cross-platform); * Having multiple configuration files, and allowing user config to override global config * Output: * Colour - colorama * Formatting output for the user * Formatting output for other programs * How do you know when your output is being piped to another program? * Managing logging and verbosity * Managing streamed input * Exit values: What are the conventions? * Interactive apps - REPL * Structuring a bunch of programs/commands around a shared codebase. * Command-line frameworks: clint, compago & cliff * Testing command-line apps * Writing command-line tools in Python 3 vs Python 2

Watch

Julie Pichon - I want to help! How to make your first contribution to open-source.

Julie Pichon - I want to help! How to make your first contribution to open-source. [EuroPython 2014] [23 July 2014] Do you like open-source? Would you like to give back somehow but are not sure what to do or where to start? Together we will look at the usual workflow for making any kind of contribution, using a real patch as an example. ----- This talk aims to show at a high-level what is the process for contributing to most open-source projects. We will go from discovering a project to how to find the contributor guidelines, prepare your contribution for submission and what happens next. The general principles will be illustrated with an example from the speaker's first contribution to OpenStack. The target audience for the talk is people who have never contributed to open-source, though they would like to. Although the example will be a code contribution, the process as described applies to all kinds of contributions.

Watch

Michael König - Embedding Python: Charming the Snake with C++

Michael König - Embedding Python: Charming the Snake with C++ [EuroPython 2014] [23 July 2014] At the example of our in-house distributed scheduling system, we discuss the challenges of embedding the Python interpreter in a C++ program. Besides the actual integration of the interpreter, efficient data exchange between both languages is discussed. In addition, this presentation demonstrates how higher-level abstractions further diminish the language barrier. ----- Python with its huge standard library and sophisticated packages developed by its thriving community has become an incredibly useful tool for data scientists. At Blue Yonder, we value Python for the ease with which we can access and combine machine learning algorithms to build accurate prediction models. To get the most business value out of the use of Python, we strive to rid our model developers from all burdens outside their core expertise, i.e., developing statistical models. To leverage our existing infrastructure, essentially a distributed scheduling system written in C++, we decided to embed a Python interpreter in our application. The goal was to let developers use the language best suited for their problem, and to let them incorporate code created by others even if it is not written in the same language. In this presentation, I will talk about a few obstacles which we had to overcome in integrating the (C)Python interpreter in our C++ program, e.g., clean resource management, error handling, and broken features in the interpreter's API. I will show how we employed features from the [Boost Python C++ library](http://www.boost.org/doc/libs/1_55_0/libs/python/) not only for simple data exchange, but also for more powerful concepts such as data sources. Finally, I will demonstrate how C++ objects can be used to seamlessly interact with Python, for example to use Python's logging package as usual while the actual logging is handled by our C++ application. With this combination of both worlds, we achieved a desirable mix of virtues: safe, reliable operations; good run-time performance; fast development; and highly expressive, unit testable core domain logic.

Watch

juliass - Multiplatform binary packaging and distribution of your client apps

juliass - Multiplatform binary packaging and distribution of your client apps [EuroPython 2014] [24 July 2014] Distributing your python app to clients it’s a common task that can become hard when “stand alone” and “obfuscated code” come as requirements. Common answers in forums are on the lines of “Python is not the language you’re looking for” or “What are you trying to hide?” but another answer is possible.

Watch

Deni Bertovic - Supercharge your development environment using Docker

Deni Bertovic - Supercharge your development environment using Docker [EuroPython 2014] [23 July 2014] These days applications are getting more and more complex. It's becoming quite difficult to keep track of all the different components an application needs to function (a database, an AMQP, a web server, a document store...). It keeps getting harder and harder to setup new development environments and to bring new developers into the team. Stuff works on one dev machine but doesn't on others? Code breaks often when deployed to production even though all tests were passing and it worked on the dev machine? The idea of this talk is to convey how important it is that we have our development environment as close to production as possible. That means setting up all those various services on your laptop/workstation. ----- In this talk I am going to show how to utilize light weight lxc containers using docker, and make your development process much more straightforward. How to share container images among your development team and be sure that everyone is running the exact same stack. Do all this without hogging too many resources, without the need for complex provisioning scripts and management systems. And above all else, how to do it fast! Rough Guidelines: 1. Describe what is LXC (Linux containers) 2. Benefits of using containers instead of traditional VM's 2. Explain where Docker comes in 3. Show how to build simple containers using Dockefile syntax 4. What are container images and how to share them 5. How to share private container images 6. Tips and tricks on how to automate

Watch