List of videos

Thomas Aglassinger - Introduction to sentiment analysis with spaCy 1

Introduction to sentiment analysis with spaCy [EuroPython 2018 - Talk - 2018-07-26 - PyCharm [PyData]] [Edinburgh, UK] By Thomas Aglassinger Sentiment analysis aims at extracting opinions from texts written in natural language, typically reviews or comments on social sites and forums. SpaCy already provides mechanisms for dealing with natural languages in general but does not offer means for sentiment analysis. This talk gives a short introduction to sentiment analysis in general and shows how to extract topics and ratings by utilizing spaCy's basic tools and extending them with a lexicon based approach and simple Python code to consolidate sentiments spread over multiple words. Topic covered are: What is sentiment analysis? Levels of sentiment detection Representing opinions Splitting texts in sentences and words. Finding the base word (lemma) Extending spaCy's pipeline and tokens Matching words to topics and ratings Combining multiple words to a rating Code examples are introduced and explained using a Jupyter notebook that can be used as basis for your own analysis. As additional twist the analyzed texts are not in English but German to show that this approach can be used for multiple languages. No knowledge of German is required though because translations of the short examples sentences are provided. Author's note: This is an extended version of a talk I gave at the PyDays Vienna 2018. The original slides and Jupyter notebook are available at https://github.com/roskakori/talks/tree/master/pydays/analyzingnaturallanguagefeedbackusing_python. License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2018.europython.eu/en/speaker-release-agreement/

Watch

Sarah Bird - The Web is Terrifying! Using the PyData stack to spy on the spies

The Web is Terrifying! Using the PyData stack to spy on the spies. [EuroPython 2018 - Talk - 2018-07-26 - PyCharm [PyData]] [Edinburgh, UK] By Sarah Bird We all know the internet can be a scary place. In this talk I’ll focus on two ways I’ve found it positively terrifying. First, digging into tracking technologies, I have learned about the breadth and depth of ways our online activity is monitored, stored, and repackaged. Second, when starting out to learn a new skill, the tidal wave of information available online can be overwhelming. Using the PyData stack to explore and visualize different data sources, including a new dataset from Mozilla, we’ll examine some of the many types of online tracking. My goal is to leave the audience with: 1) A sense of the breadth of tools in the PyData toolbox that can be applied to a real-world analysis 2) An understanding of a few methods of online tracking so they can be more informed internet citizens In particular, now that the EU’s General Data Protection Regulation (GDPR) has come into force, we can explore the data in light of EU citizens’ new rights, and the new responsibilities of companies worldwide. Along the way, I’ll also talk about becoming a software engineer, then a builder of data science tools, and my new journey into data science. Being self-taught can be, lonely, scary, and full of embarrassing pitfalls. I’ll share some stories about my learning journey, and the people and resources that have supported me. License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2018.europython.eu/en/speaker-release-agreement/

Watch

Obiamaka Agbaneje - Building a Naive Bayes Text Classifier with scikit learn

Building a Naive Bayes Text Classifier with scikit-learn [EuroPython 2018 - Talk - 2018-07-26 - PyCharm [PyData]] [Edinburgh, UK] By Obiamaka Agbaneje Machine learning algorithms used in the classification of text are Support Vector Machines, k Nearest Neighbors but the most popular algorithm to implement is Naive Bayes because of its simplicity based on Bayes Theorem. The Naive Bayes classifier is able to memorise the relationships between the training attributes and the outcome and predicts by multiplying the conditional probabilities of the attributes with the assumption that they are independent of the outcome. It is popularly used in classifying data sets that have a large number of features that are sparse or nearly independent such as text documents. In this talk, I will describe how to build a model using the Naive Bayes algorithm with the scikit-learn library using the spam/ham youtube comment dataset from the UCI repository. Preprocessing techniques such as Text normalisation and Feature extraction will be also be discussed. License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2018.europython.eu/en/speaker-release-agreement/

Watch

Matthew Honnibal - Building new NLP solutions with spaCy and Prodigy

Building new NLP solutions with spaCy and Prodigy [EuroPython 2018 - Talk - 2018-07-26 - PyCharm [PyData]] [Edinburgh, UK] By Matthew Honnibal Commercial machine learning projects are currently like start-ups: many projects fail, but some are extremely successful, justifying the total investment. While some people will tell you to "embrace failure", I say failure sucks --- so what can we do to fight it? In this talk, I will discuss how to address some of the most likely causes of failure for new Natural Language Processing (NLP) projects. My main recommendation is to take an iterative approach: don't assume you know what your pipeline should look like, let alone your annotation schemes or model architectures. I will also discuss a few tips for figuring out what's likely to work, along with a few common mistakes. To keep the advice well-grounded, I will refer specifically to our open-source library spaCy, and our commercial annotation tool Prodigy. License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2018.europython.eu/en/speaker-release-agreement/

Watch

Kamila Stepniowska - How can you use Open Source materials to learn Python & data science?

How can you use Open Source materials to learn Python & data science? [EuroPython 2018 - Talk - 2018-07-26 - PyCharm [PyData]] [Edinburgh, UK] By Kamila Stepniowska Python is very often recommended as the language of choice in a programming education. I can see at least two cases when it's a thing: - introduction to programming - regardless an age and any previous educational experience, - data science - it's just a standard... and actually both - you can teach a future data scientists starting by teaching Python. During this talk, I would like to briefly present you what Open Source Python educational materials do we have there and how those are and can be used to teach Python and data science. PyLadies, Django Girls, Django Carrots, Python Software Foundation creates many very available materials. From the data science side, you have Open Source materials created by kaggle, Minerva, Github repos, and many other organizations and individuals. During this talk you will learn: - where to find an Open Source Python and data science tutorials and educational materials - how does the Python community can support your learning process - why learning data science with Python is a good idea. I'm also interested in learning your educational experience with an Open Source materials and Python community supporting your learning experience. If you would like to share a link or your educational story, please feel free to send me an email on kamila@stepniowski.com. If I will find it interesting for the audience and if you will give me your permission to share it, I might use it in the talk. License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2018.europython.eu/en/speaker-release-agreement/

Watch

Anna Veronika Dorogush - CatBoost - the new generation of Gradient Boosting

CatBoost - the new generation of Gradient Boosting [EuroPython 2018 - Talk - 2018-07-26 - PyCharm [PyData]] [Edinburgh, UK] By Anna Veronika Dorogush Gradient boosting is a powerful machine-learning technique that achieves state-of-the-art results in a variety of practical tasks. For a number of years, it has remained the primary method for learning problems with heterogeneous features, noisy data, and complex dependencies: web search, recommendation systems, weather forecasting, and many others. CatBoost (http://catboost.yandex) is a new open-source gradient boosting library, that outperforms existing publicly available implementations of gradient boosting in terms of quality. It has a set of addional advantages. CatBoost is able to incorporate categorical features in your data (like music genre, URL, search query, etc.) in predictive models with no additional preprocessing. For more details on our approach please refer to our NIPS 2017 ML Systems Workshop paper (http://learningsys.org/nips17/assets/papers/paper_11.pdf). CatBoost inference is 20-60 times faster then in other open-source gradient boosting libraries, which makes it possible to use CatBoost for latency-critical tasks. CatBoost has the fastest GPU and multi GPU training implementations of all the openly available gradient boosting libraries. CatBoost requires no hyperparameter tunning in order to get a model with good quality. CatBoost is highly scalable and can be efficiently trained using hundreds of machines The talk will cover a broad description of gradient boosting and its areas of usage and the differences between CatBoost and other gradient boosting libraries. We will also briefly explain the details of the proprietary algorithm that leads to a boost in quality. License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2018.europython.eu/en/speaker-release-agreement/

Watch

Alisa Dammer - Data is not flat

Data is not flat [EuroPython 2018 - Talk - 2018-07-26 - PyCharm [PyData]] [Edinburgh, UK] By Alisa Dammer Feature engineering and model training often comes hand in hand. Some tasks have an overwhelming amount of high dimensional data, some tasks have little data or very low-dimension data. This talk targets the latter problem: what can be done with the data itself to significantly improve the model performance and when manual feature engineering does make sense. A sample case of Classification problem with NN will be presented The goal of the talk is to remind about something every person working with the data thinks and probably uses. Slides, Jupyter notebook with the example, test and train sets, NN configuration file are available on: https://github.com/Alisa-lisa/conferences License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2018.europython.eu/en/speaker-release-agreement/

Watch

Vinicius Pacheco - Understanding and Applying CQRS

Understanding and Applying CQRS [EuroPython 2018 - Talk - 2018-07-26 - Smarkets] [Edinburgh, UK] By Vinicius Pacheco Creating scalable applications has a number of complex variables and one of them is to work with scalability and performance in the database layer. Command Query Responsibility Segregation (CQRS) is a design pattern that helps produce more performance and resilience in applications where data access is intense. In this talk, we will understand when to use and the problems that CQRS solves. We will also apply CQRS in a Python application using the Nameko framework. The outline talk is: ○ (4 minutes) - Present a real problem of a web application, when creating new instances is not a solution, because the database receives an overwrite of writing and unfeasible to read the data, collapsing the application. ○ (6 minutes) - Present the CQRS pattern conceptually and how this design pattern solves this type of problem using the structure of Command Stack and Query Stack ○ (3 minutes) - Show Nameko as an interesting tool to apply the CQRS. It will demonstrate the use of HTTP, RPC and the possibility of applying pub/sub. ○ (6 minutes) - Create (live code) the Command Stack layer using Nameko on a Postgresql database. ○ (5 minutes) - Create (live code) the Query Stack layer using Nameko over a MongoDB database. ○ (3 minutes) - Explain common myths and mistakes about CQRS (3 minutes) - Q & A Session License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2018.europython.eu/en/speaker-release-agreement/

Watch

Marc Andre Lemburg - How to make money using Python - Unused potential in the Enterprise World

How to make money using Python - Unused potential in the Enterprise World [EuroPython 2018 - Talk - 2018-07-26 - Smarkets] [Edinburgh, UK] By Marc-Andre Lemburg Python has gained quite some traction in the web development world and more recently as the goto language for anything that has to do with data science. However, it's use in the enterprise world of applications is rather limited. Based on the author's many years experience in working in enterprise environments, the talk will demonstrate areas in the business application space where Python has significant advantages over other languages, but which are currently dominated by applications written in Java, C++ or C#. There are huge opportunities out there for companies to excel at and use the Python advantage to their benefit. If you are looking for a lead idea to kick start your Python business (and you have the resources to invest into marketing), this talk is for you. License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2018.europython.eu/en/speaker-release-agreement/

Watch