Sketching Algorithms: Making Sense of Big Data in a Single Stroke | Tun Shwe | Conf42 Python 2024
Read the abstract ➤ https://www.conf42.com/Python_2024_Tun_Shwe_sketching_algorithms_big_data Other sessions at this event ➤ https://www.conf42.com/python2024 Join Discord ➤ https://discord.gg/DnyHgrC7jC Support our mission ➤ https://www.conf42.com/support Chapters 0:00 intro 0:20 preamble 0:30 hello 1:12 quix 1:52 quix streams 2:06 quix cloud 2:44 what is a sketch? 3:20 approximate answers 3:42 sketch characteristics 4:24 sketch components 5:37 why exact == slow 5:40 distributed processing 6:17 unique word count 7:19 massively parallel processing (mpp) 7:31 shuffling is slow 8:00 latency numbers every programmer should know 8:28 why sketches == fast 9:08 sketch design 9:51 sublinear data structure growth 10:21 mergability 10:30 non-additive challenges are everywhere 11:03 unique counts are non-additive 11:29 non-additive challenges solved 12:40 types of sketches 13:57 count min sketch 19:10 open source sketches 19:24 apache datasketches (java, c++, python) 22:26 datasketch extensions 23:14 thank you