Поделитесь видео 🙏

Python Tutorial : Filter and Visualize

Name: Python Tutorial : Filter and Visualize смотреть онлайн
Uploaded: 2023-12-02T06:46:24+03:00
Description: Want to learn more? Take the full course at https://learn.datacamp.com/courses/exploratory-data-analysis-in-python at your own pace. More than a video, you'll l

Python: эволюция в действии

• 📁 Технологии и интернет • 👁️ 16 • 📅 02.12.2023

Want to learn more? Take the full course at https://learn.datacamp.com/courses/exploratory-data-analysis-in-python at your own pace. More than a video, you'll learn hands-on coding & quickly apply skills to your daily work.

---

Let's get back to the motivating question for this chapter: what is the average birth weight for babies in the U.S.?

In the previous lesson, we used data from the NSFG to compute birth weight in pounds and we stored the result in a Series called birth_weight.

Let's see what the distribution of those values looks like. We'll use the pyplot submodule from the matplotlib visualization library, which we import as plt.

Pyplot provides hist(), which takes a Series and plots a histogram; that is, it shows the values and how often they appear.

However, pyplot doesn't work with NaNs, so we have to use dropna(), which makes a new Series that contains only the valid values.

The second argument, bins, tells hist to divide the range of weights into 30 intervals, called "bins", and count how many values fall in each bin.

hist() takes other arguments that specify the type and appearance of the histogram; you will have a chance to explore these options in the next exercise.

To label the axes we'll use xlabel() and ylabel(), and finally, to display the plot, we'll use plt dot show().

And here's what the results look like.

The x-axis is birth weight in pounds, divided into 30 bins. The y-axis is the number of births in each bin.

The distribution looks a little like a bell curve, but the tail is longer on the left than on the right; that is, there are more light babies than heavy babies.

That makes sense because the distribution includes some babies that were born preterm. The most common duration for pregnancy is 39 weeks, which is "full term"; "preterm" is usually defined to be less than 37 weeks.

To see which babies are preterm, we can use the prglngth column, which records pregnancy length in weeks.

When you compare a Series to a value, the result is a Boolean Series; that is, each element is a Boolean value, True or False. In this case, it's True for each preterm baby and False otherwise. We can use head() to see the first 5 elements.

If you compute the sum of a Boolean Series, it treats True as 1 and False as 0, so the sum is the number of Trues, which is the number of preterm babies, about 3700.

If you compute the mean, you get the fraction of Trues; in this case, it's close to 0.4; that is, about 40% of the births in this dataset are preterm.

We can use a Boolean Series as a filter; that is, we can select only rows that satisfy a condition or meet some criterion.

For example, we can use preterm and the bracket operator to select values from birth_weight, so preterm_weight contains birth weights for preterm babies.

To select full-term babies, we can use the tilde operator, which is "logical NOT" or inverse; it makes the Trues false and the Falses true.

Not surprisingly, full term babies are heavier, on average, than preterm babies.

If you have two Boolean Series, you can use logical operators to combine them; ampersand is the logical AND operator, and the vertical bar or pipe is logical OR.

There's one more thing we have to do before we can answer our question: resampling.
The NSFG is not exactly representative of the U.S. population; by design, some groups are more likely to appear in the sample than others; they are "oversampled". Oversampling helps to ensure that you have enough people in every subgroup to get reliable statistics, but it makes the analysis a little more complicated.
However, we can correct for oversampling by resampling. I won't get into the details here, but I have provided a function called resample_rows_weighted() that you can use for the exercises. If you are interested in learning more about resampling, check out DataCamp's statistics courses.

Now we have everything we need to answer the motivating question. Let's get to it.

#DataCamp #PythonTutorial #ExploratoryDataAnalysisinPython

Что делает видео по-настоящему запоминающимся? Наверное, та самая атмосфера, которая заставляет забыть о времени. Когда вы заходите на RUVIDEO, чтобы посмотреть онлайн «Python Tutorial : Filter and Visualize», вы рассчитываете на нечто большее, чем просто загрузку плеера. И мы это понимаем. Контент такого уровня заслуживает того, чтобы его смотрели в HD 1080, без дрожания картинки и бесконечного буферизации.

Честно говоря, Rutube сегодня — это кладезь уникальных находок, которые часто теряются в общем шуме. Мы же вытаскиваем на поверхность самое интересное. Будь то динамичный экшн, глубокий разбор темы от любимого автора или просто уютное видео для настроения — всё это доступно здесь бесплатно и без лишних формальностей. Никаких «заполните анкету, чтобы продолжить». Только вы, ваш экран и качественный поток.

Если вас зацепило это видео, не забудьте взглянуть на похожие материалы в блоке справа. Мы откалибровали наши алгоритмы так, чтобы они подбирали контент не просто «по тегам», а по настроению и смыслу. Ведь в конечном итоге, онлайн-кинотеатр — это не склад файлов, а место, где каждый вечер можно найти свою историю. Приятного вам отдыха на RUVIDEO!

Видео взято из открытых источников Rutube. Если вы правообладатель, обратитесь к первоисточнику.