Statistical Analysis with R for Public Health Specialization
Master Statistics for Public Health and Learn R. Develop your statistical thinking skills and learn key data analysis methods through R
About this Specialization
Statistics are everywhere. The probability that it will rain today. Trends over time in unemployment rates. The odds that India will win the next cricket world cup. In sports like football, they started out as a bit of fun but have grown into big business. Statistical analysis also has a key role in medicine, not least in the broad and core discipline of public health.
In this specialization, you’ll take a peek at what medical research is and how – and indeed why – you turn a vague notion into a scientifically testable hypothesis. You’ll learn about key statistical concepts like sampling, uncertainty, variation, missing values, and distributions. Then you’ll get your hands dirty with analyzing data sets covering some big public health challenges – fruit and vegetable consumption and cancer, risk factors for diabetes, and predictors of death following heart failure hospitalization – using R, one of the most widely used and versatile free software packages around.
This specialization consists of four courses – statistical thinking, linear regression, logistic regression, and survival analysis – and is part of our upcoming Global Master in Public Health degree, which is due to start in September 2019.
The specialization can be taken independently of the GMPH and will assume no knowledge of statistics or R software. You just need an interest in medical matters and quantitative data.
Welcome to Introduction to Statistics & Data Analysis in Public Health! This course will teach you the core building blocks of statistical analysis – types of variables, common distributions, hypothesis testing – but, more than that, it will enable you to take a data set you’ve never seen before, describe its keys features, get to know its strengths and quirks, run some vital basic analyses and then formulate and test hypotheses based on means and proportions. You’ll then have a solid grounding to move on to more sophisticated analysis and take the other courses in the series. You’ll learn the popular, flexible and completely free software R, used by statistics and machine learning practitioners everywhere. It’s hands-on, so you’ll first learn about how to phrase a testable hypothesis via examples of medical research as reported by the media. Then you’ll work through a data set on fruit and vegetable eating habits: data that are realistically
Welcome to Linear Regression in R for Public Health! Public Health has been defined as “the art and science of preventing disease, prolonging life and promoting health through the organized efforts of society”. Knowing what causes disease and what makes it worse are clearly vital parts of this. This requires the development of statistical models that describe how patient and environmental factors affect our chances of getting ill. This course will show you how to create such models from scratch, beginning with introducing you to the concept of correlation and linear regression before walking you through importing and examining your data, and then showing you how to fit models. Using the example of respiratory disease, these models will describe how patient and other factors affect outcomes such as lung function. Linear regression is one of a family of regression models, and the other courses in this series will cover two further members. Regression models have many things in common with each other, though the mathematical details differ. This course will show you how to prepare the data, assess how well the model fits the data, and test its underlying assumptions – vital tasks with any type of regression. You will use the free and versatile software package R, used by statisticians and data scientists in academia, governments, and industry worldwide.
Welcome to Logistic Regression in R for Public Health! Why logistic regression for public health rather than just logistic regression? Well, there are some particular considerations for every data set, and public health data sets have particular features that need special attention. In a word, they’re messy. Like the others in the series, this is a hands-on course, giving you plenty of practice with R on real-life, messy data, with predicting who has diabetes from a set of patient characteristics as the worked example for this course. Additionally, the interpretation of the outputs from the regression model can differ depending on the perspective that you take, and public health doesn’t just take the perspective of an individual patient but must also consider the population angle. That said, much of what is covered in this course is true for logistic regression when applied to any data set, so you will be able to apply the principles of this course to logistic regression more broadly too. By the end of this course, you will be able to: Explain when it is valid to use logistic regression Define odds and odds ratios Run simple and multiple logistic regression analysis in R and interpret the output Evaluate the model assumptions for multiple logistic regression in R Describe and compare some common ways to choose a multiple regression model This course builds on skills such as hypothesis testing, p values, and how to use R, which are covered in the first two courses of the Statistics for Public Health specialization. If you are unfamiliar with these skills, we suggest you review Statistical Thinking for Public Health and Linear Regression for Public Health before beginning this course. If you are already familiar with these skills, we are confident that you will enjoy furthering your knowledge and skills in Statistics for Public Health: Logistic Regression for Public Health. We hope you enjoy the course!
Welcome to Survival Analysis in R for Public Health! The three earlier courses in this series covered statistical thinking, correlation, linear regression