You are here

An Introduction to Data Science is an easy-to-read, gentle introduction for advanced undergraduate, certificate, and graduate students coming from a wide range of backgrounds into the world of data science. After introducing the basic concepts of data science, the book builds on these foundations to explain data science techniques using the R programming language and RStudio® from the ground up. Short chapters allow instructors to group concepts together for a semester course and provide students with manageable amounts of information for each concept. By taking students systematically through the R programming environment, the book takes the fear out of data science and familiarizes students with the environment so they can be successful when performing advanced functions.

 

The authors cover statistics from a conceptual standpoint, focusing on how to use and interpret statistics, rather than the math behind the statistics. This text then demonstrates how to use data effectively and efficiently to construct models, predict outcomes, visualize data, and make decisions. Accompanying digital resources provide code and datasets for instructors and learners to perform a wide range of data science tasks.  



 
Preface
 
About the Authors
 
Introduction: Data Science, Many Skills
What Is Data Science?

 
The Steps in Doing Data Science

 
The Skills Needed to Do Data Science

 
 
Chapter 1 • About Data
Storing Data—Using Bits and Bytes

 
Combining Bytes Into Larger Structures

 
Creating a Data Set in R

 
 
Chapter 2 • Identifying Data Problems
Talking to Subject Matter Experts

 
Looking for the Exception

 
Exploring Risk and Uncertainty

 
 
Chapter 3 • Getting Started With R
Installing R

 
Using R

 
Creating and Using Vectors

 
 
Chapter 4 • Follow the Data
Understand Existing Data Sources

 
Exploring Data Models

 
 
Chapter 5 • Rows and Columns
Creating Dataframes

 
Exploring Dataframes

 
Accessing Columns in a Dataframe

 
 
Chapter 6 • Data Munging
Reading a CSV Text File

 
Removing Rows and Columns

 
Renaming Rows and Columns

 
Cleaning Up the Elements

 
Sorting Dataframes

 
 
Chapter 7 • Onward With RStudio®
Using an Integrated Development Environment

 
Installing RStudio

 
Creating R Scripts

 
 
Chapter 8 • What’s My Function?
Why Create and Use Functions?

 
Creating Functions in R

 
Testing Functions

 
Installing a Package to Access a Function

 
 
Chapter 9 • Beer, Farms, and Peas and the Use of Statistics
Historical Perspective

 
Sampling a Population

 
Understanding Descriptive Statistics

 
Using Descriptive Statistics

 
Using Histograms to Understand a Distribution

 
Normal Distributions

 
 
Chapter 10 • Sample in a Jar
Sampling in R

 
Repeating Our Sampling

 
Law of Large Numbers and the Central Limit Theorem

 
Comparing Two Samples

 
 
Chapter 11 • Storage Wars
Importing Data Using RStudio

 
Accessing Excel Data

 
Accessing a Database

 
Comparing SQL and R for Accessing a Data Set

 
Accessing JSON Data

 
 
Chapter 12 • Pictures Versus Numbers
A Visualization Overview

 
Basic Plots in R

 
Using ggplot2

 
More Advanced ggplot2 Visualizations

 
 
Chapter 13 • Map Mashup
Creating Map Visualizations With ggplot2

 
Showing Points on a Map

 
A Map Visualization Example

 
 
Chapter 14 • Word Perfect
Reading in Text Files

 
Using the Text Mining Package

 
Creating Word Clouds

 
 
Chapter 15 • Happy Words?
Sentiment Analysis

 
Other Uses of Text Mining

 
 
Chapter 16 • Lining Up Our Models
What Is a Model?

 
Linear Modeling

 
An Example—Car Maintenance

 
 
Chapter 17 • Hi Ho, Hi Ho—Data Mining We Go
Data Mining Overview

 
Association Rules Data

 
Association Rules Mining

 
Exploring How the Association Rules Algorithm Works

 
 
Chapter 18 • What’s Your Vector, Victor?
Supervised and Unsupervised Learning

 
Supervised Learning via Support Vector Machines

 
Support Vector Machines in R

 
 
Chapter 19 • Shiny® Web Apps
Creating Web Applications in R

 
Deploying the Application

 
 
Chapter 20 • Big Data? Big Deal!
What Is Big Data?

 
The Tools for Big Data

 
 
Index

Supplements

Student Study Site
    • Lab and homework assignments accompany chapter material and are downloadable as R source code.
    • R Code from the book, available as an R source file.
    • Multimedia content includes links to YouTube videos showing demos of using R, audio, data, and web resources.
Instructor Resouce Site

Password-protected Instructor Resources include the following:

 

    • Editable, chapter-specific Microsoft® PowerPoint® slides offer you complete flexibility in easily creating a multimedia presentation for your course. Highlight essential content and features.
    • Lab and homework assignments and their solutions accompany chapter material and are downloadable as R source code.
    • R Code from the book, available as an R source file
    • Multimedia content includes links to YouTube videos showing demos of using R, audio, data, and web resources that appeal to students with different learning styles and prompts classroom discussion.
Key features

KEY FEATURES:

  • Students cement their knowledge of data science with learning objectives, chapter challenge exercises, R code examples throughout, basic summaries of statistics, and a companion site with digital resources and code.
  • Use of free and open source R and RStudio® to work with real data examples to illustrate both the challenges of data science and the techniques used to address those challenges.
  • Examples with real data make the book meaningful for readers.

Sample Materials & Chapters

Introduction

Chapter 6


Preview this book

Select a Purchasing Option

Electronic version
Prices from
$27.00*
*180 day rental