List of videos

M.Yasoob Khalid - Web Scraping in Python 101
M.Yasoob Khalid - Web Scraping in Python 101 [EuroPython 2014] [22 July 2014] This talk is about web scraping in Python, why web scraping is useful and what Python libraries are available to help you. I will also look into proprietary alternatives and will discuss how they work and why they are not useful. I will show you different libraries used in web scraping and some example code so that you can choose your own personal favourite. I will also tell why writing your own scrapper in scrapy allows you to have more control over the scraping process. ----- Who am I ? ========= * a programmer * a high school student * a blogger * Pythonista * and tea lover - Creator of freepythontips.wordpress.com - I made soundcloud-dl.appspot.com - I am a main contributor of youtube-dl. - I teach programming at my school to my friends. - It's my first programming related conference. - The life of a python programmer in Pakistan What this talk is about ? ================== - What is Web Scraping and its usefulness - Which libraries are available for the job - Open Source vs proprietary alternatives - Whaich library is best for which job - When and when not to use Scrapy What is Web Scraping ? ================== Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites. - Wikipedia ###In simple words : It is a method to extract data from a website that does not have an API or we want to extract a LOT of data which we can not do through an API due to rate limiting. We can extract any data through web scraping which we can see while browsing the web. Usage of web scraping in real life. ============================ - to extract product information - to extract job postings and internships - extract offers and discounts from deal-of-the-day websites - Crawl forums and social websites - Extract data to make a search engine - Gathering weather data etc Advantages of Web scraping over using an API ======================== - Web Scraping is not rate limited - Anonymously access the website and gather data - Some websites do not have an API - Some data is not accessible through an API etc Which libraries are available for the job ? ================================ There are numerous libraries available for web scraping in python. Each library has its own weaknesses and plus points. Some of the most widely known libraries used for web scraping are: - BeautifulSoup - html5lib - lxml - re ( not really for web scraping, I will explain later ) - scrapy ( a complete framework ) A comparison between these libraries ============================== - speed - ease of use - what do i prefer - which library is best for which purpose Proprietary alternatives ================== - a list of proprietary scrapers - their price - are they really useful for you ? Working of proprietary alternatives =========================== - how they work (render javascript) - why they are not suitable for you - how custom scrapers beat proprietary alternatives Scrapy ======= - what is it - why is it useful - asynchronous support - an example scraper Question ======= - Questions from the viewers
Watch
Martijn Faassen - Morepath: a Python Web Framework with Super Powers
Martijn Faassen - Morepath: a Python Web Framework with Super Powers [EuroPython 2014] [24 July 2014] Morepath is a server web framework written with modern, rich client web development in mind. Why another new Python web framework in 2014? Because it can be done better: Morepath understands how to construct hyperlinks from models. Writing a generic view in Morepath is like writing any other view. With Morepath, you can reuse, extend and override apps as easily as you can construct them. Even if you don't end up using Morepath, you will learn something about how the nature of web frameworks. ----- [Morepath](http://morepath.readthedocs.org) is a new server web framework written with modern, rich client web development in mind. In the talk I will be discussing some core features of Morepath that make it different: * Its different take on routing and linking. Morepath has support to help you construct hyperlinks to models. * Its view system: plain views, generic views, view composition. * Morepath's approach to application construction allows application extension and overriding, and composition. This talk will attempt to convince people to try Morepath. For those unable or unwilling to try, I will communicate some design principles behind Morepath which can be of help to any web developer.
Watch
Niv/tomr - Learning Chess from data
Niv/tomr - Learning Chess from data [EuroPython 2014] [24 July 2014] Is watching a chess game enough to figure out the rules? What is common denominator between different plays and game ending? In this presentation, we will show how Machine Learning and Hadoop can help us re-discover chess rules and gain new understanding of the game. ----- Can empirical samples unveil the big picture? Is chess games descriptions expose good enough data to gain understanding of chess rules - legal piece moves, castling, check versus checkmate, etc. Which features are important in describing a chess game and which features are not. What is a good representation of a chess game for this uses. What is the minimal sample size which is required in order to learn this in a good enough manner and where this learning can go wrong. **Ne3 => E=mc2** Looking at the bigger picture - Can we understand big systems based on empirical samples. Can we reverse engineer physics and discover how physical system work based on no external knowledge beside empirical samples.
Watch
Mauri - VPython goes to School
Mauri - VPython goes to School [EuroPython 2014] [22 July 2014] Using VPython in high school is an interesting way to introduce students to get in touch with computer programming concepts and to link computer science with other disciplines like Math, Geometry, Physics, Chemistry ----- My presentation is focused mainly on my teaching experience in a high school using VPython. I've posed some problems to my students to solve with VPython: from basic static building representations like castle to more complex dynamic models like bouncing balls. This approach seems a good way to get in touch with computer programming concepts and to link computer science with other disciplines like Math, Geometry, Physics, Chemistry
Watch
Flavio Percoco - Systems Integration: The OpenStack success story
Flavio Percoco - Systems Integration: The OpenStack success story [EuroPython 2014] [23 July 2014] OpenStack is a huge, open-source cloud provider. One of the main tenets of OpenStack is the (Shared Nothing Architecture) to which all modules stick very closely. In order to do that, services within OpenStack have adopted different strategies to integrate themselves and share data without sacrificing performance nor moving away from SNA. This strategies are not applicable just to OpenStack but to any distributed system. Sharing data, regardless what that data is, is a must-have requirement of any successful cloud service. This talk will present some of the existing integration strategies that are applicable to cloud infrastructures and enterprise services. The talk will be based on the strategies that have helped OpenStack to be successful and most importantly, scalable. ----- Abstract ======= OpenStack is a huge, open-source cloud provider. One of the main tenets of OpenStack is the (Shared Nothing Architecture) to which all modules stick very closely. In order to do that, services within OpenStack have adopted different strategies to integrate themselves and share data without sacrificing performance nor moving away from SNA. This strategies are not applicable just to OpenStack but to any distributed system. Sharing data, regardless what that data is, is a must-have requirement of any successful cloud service. This talk will present some of the existing integration strategies that are applicable to cloud infrastructures and enterprise services. The talk will be based on the strategies that have helped OpenStack to be successful and most importantly, scalable. Details ====== Along the lines of what I've described in the abstract, the presentation will walk the audience through the state of the art of existing system integration solutions, the ones that have been adopted by OpenStack and the benefits of those solutions. At the end of the talk, a set of solutions under development, ideas and improvements to the existing ones will be presented. The presentation is oriented to distributed services, fault-tolerance and replica determinism. It's based on a software completely written in python and running successfully on several production environments. The presentation will be split in 3 main topics: Distributed System integration ----------------------------------- * What's it ? * Why is it essential for cloud infrastructures? * Existing methods and strategies OpenStack success story ---------------------------- * Which methods did OpenStack adopt? * How / Why do they work? * What else could be done? Coming Next --------------- * Some issues of existing solutions * What are we doing to improve that? * Other solutions coming up
Watch
ykaplan - Marconi - OpenStack Queuing and Notification Service
ykaplan - Marconi - OpenStack Queuing and Notification Service [EuroPython 2014] [22 July 2014] Marconi is a multi-tenant cloud queuing system written in Python as part of the OpenStack project. Marconi aims to ease the design of distributed systems and allow for asynchronous work distribution without creating yet another message broker. This talk aims to give the audience a broad look at Marconi’s design and technologies used. ----- Similar to other message bus frameworks, Marconi's main goals are: performance, availability, durability, fault-tolerance and scalability. Besides providing support for queuing and notification services through OpenStack, Marconi aims to ease the design of distributed systems and allow for asynchronous work distribution without creating yet another message broker. This talk aims to give the audience a broad look at Marconi’s architecture, design, technologies used, development process, and discuss the issues it adresses.
Watch
Jair Trejo - Non Sequitur: An exploration of Python's random module
Jair Trejo - Non Sequitur: An exploration of Python's random module [EuroPython 2014] [24 July 2014] An exploration of Python's random module for the curious programmer, this talk will give a little background in statistics and pseudorandom number generation, explain the properties of python's choice of pseudorandom generator and explore through visualizations the different distributions provided by the module. ----- # Audience Non mathematical people who wants a better understanding of Python's random module. # Objectives The audience will understand pseudorandom number generators, the properties of Python's Mersenne Twister and the differences and possible use cases between the distributions provided by the `random` module. # The talk I will start by talking about what randomness means and then about how we try to achieve it in computing through pseudorandom number generators (5 min.) I will give a brief overview of pseudorandom number generation techniques, show how their quality can be assessed and finally talk about Python's Mersenne Twister and why it is a fairly good choice. (10 min.) Finally I will talk about how from randomness we can build generators with interesting probability distributions. I'll compare through visualizations thos provided in Python's `random` module and show examples of when they can be useful in real-life. (10 min.)
Watch
Katarzyna Jachim - Python in system testing
Katarzyna Jachim - Python in system testing [EuroPython 2014] [23 July 2014] When you think about Python+testing, you usually think about testing your code - unittests, mostly. But it is not the only case! When you have a big system, you need to test it on much higher level - if only to check if all the components are wired in the right way. You may do it manually, but it is tedious and time-consuming - so you want to automate it. And here comes Python - the language of choice in many QA departments. ----- When you think about Python+testing, you usually think about testing your code - unittests, mostly. But it is not the only case! When you have a big system, you need to test it on much higher level - if only to check if all the components are wired in the right way. You may do it manually, but it is tedious and time-consuming - so you want to automate it. And here comes Python - the language of choice in many QA departments. I will tell about differences between unittesting and system testing which result in totally different requirements on test management/running systems. I will tell how we use Python (and a little why) to automate our work. Finally, I will tell a little about my "idee fixe" - a framework for system testing written in Python.
Watch
Julian Berman - Design Your Tests
Julian Berman - Design Your Tests [EuroPython 2014] [23 July 2014] While getting started testing often provides noticeable immediate improvement for any developer, it's often not until the realization that tests are things that need design to provide maximal benefit that developers begin to appreciate or even enjoy them. We'll investigate how building shallow, transparent layers for your tests makes for better failures, clearer tests, and quicker diagnoses. ----- * Life span of a test * 5 minute - why does this fail? * 5 day - what is this missing? * 5 week - do I have coverage for this? * 5 month - what's *not* causing this bug? * Transparent simplicity * one or two "iceberg" layers for meaning * Higher-order assertions - build collections of state that have meaning for the domain in the tests * bulk of the details are in the code itself * show an example * grouping for organization * Mixins * show an example * unittest issues * assertion/mixin clutter * setUp/tearDown tie grouping to the class layer or to inheritance via super * addCleanup * weak association / lookup-ability between code and its tests * package layout * other conventions * Alternative approaches * testtools' matchers * py.test `assert` magic
Watch