Advanced Data Analysis

Course #OS4106

Start Starts: not available

Clock Est. completion in 3 months

Location pin Offered through Distance Learning

Avg. tuition cost per course: See tuition Info For specific tuition costs of each program or contact information, please contact the NPS Tuition office at tuition@nps.edu .

Learn more about Service Obligation Info Officers accepting orders to a Graduate Education Program (GEP) are obligated to serve on active duty after completion.

Questions? Reach out directly:

NPS Online Student Support

online@nps.edu

Email
Offerings database access

Overview

This course moves beyond the ordinary linear model to other types of statistical models that will be appropriate in different circumstances. Students are first introduced to supervised models, including logistic regression and "generalized linear models" (GLM). The importance of complexity control and a training-set/test-set division is emphasized. Non-parametric models are introduced through classification and regression trees. Classification performance assessment is discussed. Unsupervised models, to include clustering and principal components are presented. Throughout the course, examples are drawn from practical experience with conducting research and solving problems for Navy and DoD customers.

Included in Degrees & Certificates

  • 268
  • 367

Prerequisites

  • OA3103

Learning Outcomes

Upon successful completion of this course, you should be able to:
• Distinguish between supervised and unsupervised methods
• Implement linear regression models
• Implement logistic regression models
• Implement random forest models
• Utilize regularization (Ridge, Lasso, ElasticNet)
• Distinguish between classification and regression
• Define and distinguish various classification metrics
• Utilize validation techniques to assess model performance and avoid overfitting
• Implement clustering models
• Reduce the dimensionality of your data by using principal component analysis
• Utilize exponential smoothing and ARIMA models for time series data