Literate Statistical Programming смотреть онлайн
? Abstract:
Literate Statistical Programming. Science is facing a crisis around reproducibility and data science is not immune. Literate Statistical Programming is a workflow that binds the code used in an analysis to the interpretation of the results. While this creates reproducibility it also addresses issues around, auditing, reusability and allows for rapid iteration and experimentation. This talk will describe a workflow that I have successfully used on small-scale datasets in start-ups and on massive-scale problems in my work at Oracle and Amazon, Alexa. The talk will cover the tooling, workflow, and the philosophy you need to master Literate Statistical Programming.
? Speaker bio:
John Peach- Principal Data Scientist, Oracle
"A modern polymath, John possesses a unique and diverse set of skills, knowledge, and experience. Having earned advanced degrees in Mechanical Engineering, Kinesiology and Data Science, his expertise focuses on machine learning, solutions to novel and ambiguous problems. He has a proven history of taking a problem from ideation to production by using a logical, but creative, data-driven approach. As a highly skilled Data Scientist, he has developed new techniques, lead teams, developing innovative data products and is a trusted advisor to decision-makers.
John is a natural leader, customer-focused, excellent communicator and problem-solver. He loves new challenges and opportunities. His extensive background in software development and modeling serves him well. His curiosity, creativity, focus and attention to detail have resulted in a track record of discovering hidden secrets in data.
As a Sr. Applied Data Scientist at Amazon, John lead the Alexa Skill Store Science team. He worked closely with engineering to build systems that enabled Alexa customers to engage with third-party applications, skills. He built machine learning models to arbitrate between skills, entity resolution, search, and personalization.
Currently, John is a Principal Data Scientist at Oracle. He works on the Data Science service as part of the Oracle Cloud Infrastructure team. Leveraging his extensive hands-on experience building machine learning models, he is now defining the tooling to improve the data science workflow. This interest grew out of the challenges that he and his team members have faced working with data at scale in a logical, rigorous and reproducible way.
John fosters the growth of scientists by starting the Amazon Machine Learning University in Irvine and the Alexa wide Data Science Excellence program. He frequently gives talks at universities and conferences. He is working to improve upon and formalize data science best practices. The focus has been on reproducible research. To that end, he has developed an approach to improve data validation and reliability by using data unit tests. He has also developed the Data Science Design Thinking concept; to formalize and increase the efficiency of the analysis process. He also coordinates the largest R meetup group in Southern California (OCRUG)."
If you enjoyed this talk, visit us at https://mlopsworld.com/ and come participate in our next gathering! ?
Would you like to receive email summaries of these talks? Join our newsletter FREE here: http://bit.ly/MLOps_Summaries ?
Timestamps:
0:00 Intro
0:11 Introduction of the host
2:25 Introduction of the speaker
4:40 Overview
8:28 The Tooling Problem
9:02 Solution
9:16 Literate Statistical Programming
12:04 Literate Programming - R
13:10 Literate Programming - Python
14:36 Differences between LP and LSP
14:49 What are the end Products
17:51 State of the Art
19:09 Literate Stat Programming - Knitr
23:54 Literate Stat Programming - JupyterLab
24:57 Key Issues
25:18 The Fragmentation Problem
25:58 Fragmentation - LSP Tooling
28:02 Non-linear Workflow
28:36 Non-linear - LSP Tooling
30:44
31:07 Rework of Analysis
33:29 Rework- LSP Tooling
34:36 Collaboration Challenges
34:58 Collaboration - LSP Tooling
35:27 Pipelining Activity
37:31 Pipelining - LSP Tooling
38:42 Summary
❓ Q&A ❓
40:11 Could you elaborate on institutional memory style?
41:09 Is there an automated method or option to tango code with RMD?
42:10 What do you recommend when you are working with models which have different input formats?
43:02 What would be interested in understanding how unit tests integrate with LP and LSP?
44:51 When referencing a database, should your RMDmd reference the parameters of the query?
46:00 Closing remarks
Что делает видео по-настоящему запоминающимся? Наверное, та самая атмосфера, которая заставляет забыть о времени. Когда вы заходите на RUVIDEO, чтобы посмотреть онлайн «Literate Statistical Programming» бесплатно и без регистрации, вы рассчитываете на нечто большее, чем просто загрузку плеера. И мы это понимаем. Контент такого уровня заслуживает того, чтобы его смотрели в HD 1080, без дрожания картинки и бесконечного буферизации.
Честно говоря, Rutube сегодня — это кладезь уникальных находок, которые часто теряются в общем шуме. Мы же вытаскиваем на поверхность самое интересное. Будь то динамичный экшн, глубокий разбор темы от любимого автора или просто уютное видео для настроения — всё это доступно здесь бесплатно и без лишних формальностей. Никаких «заполните анкету, чтобы продолжить». Только вы, ваш экран и качественный поток.
Если вас зацепило это видео, не забудьте взглянуть на похожие материалы в блоке справа. Мы откалибровали наши алгоритмы так, чтобы они подбирали контент не просто «по тегам», а по настроению и смыслу. Ведь в конечном итоге, онлайн-кинотеатр — это не склад файлов, а место, где каждый вечер можно найти свою историю. Приятного вам отдыха на RUVIDEO!
Видео взято из открытых источников Rutube. Если вы правообладатель, обратитесь к первоисточнику.