In Week 7, we will have one more lecture, and then exam.
W7-1 Regression Model Diagnostics: PPT
- DA Worksheet #15: Regression Model Diagnostics (Solution; Python Solution)
- ROC & AUC (Wilber, Amazon Machine Learning University)
- Equality of Odds (Wilber, Amazon Machine Learning University)
- Reading: Transformation & Interaction
- Reading: Influential Points (Outliers)
- Reading: Regression Pitfalls
- Reading: Categorical Predictors
- Reading: Transformation & Interaction
- Reading: Nonlinear Regression
- Khan Academy module on Non-Linear Regression
- Reading: 18 Type of Regression and When to Use them
DA Exam Review (Optional)
In Week 8 we will have our final team project presentations (schedule).
Team Project (detailed instructions for the final report available here)
Each team will also give a 10-min final project presentation. Please confirm your presentation schedule with the instructor. You will also complete the final team project report.Click here for a sample report of descriptive analytics
Click here for a sample research study on prescriptive analytics
Click here for a sample student paper on predictive analytics
Click here for a sample student paper (it's a research paper)
This is a former student's research paper so you can see what a proper APA style paper looks like. His hypothesis statements follow the format that I expect from you. You should also see how he constructed his literature review, presented analysis results, drew conclusions, discussed limitations, etc.
None of these papers are exactly what you should do for the Team Project, but they represent good models for parts of your project. We won't necessarily use the exact analyses presented in some of the papers (E.g., we won't be running advanced machine learning models as presented in the sample prescriptive analytics paper by two Stanford students.) Nevertheless, these papers give you a more concrete idea of what a good analytics report should look like and what you are expected to produce.
Mason School's business library offers a number of databases that contain datasets - See the librarian's presentation for details.
All of these (and more) can be found on the W&M Business Library A-Z Databases page. Feel free to contact the library for support.
You can even use datasets you have collected on your own, or from your prior/current workplace(s) (with permission.)
Your team will submit TP01, which includes Introduction (i.e., team project topic), and a dataset for instructor approval. The dataset should consist of rows of raw records, as opposed to summary statistics. Each row should represent an individual record, such as a person, a property, a product, a brand, etc. Your final dataset should consist of at least 200 cases, 3 numeric variables and 3 categorical variables from the last 10 years. See team project instructions here. See team project presentation instructions here.
Government/EconomicsOpenSanctions: An international database of persons and companies of political, criminal, or economic interest
Food & Drinks
Automobile Dataset (kaggle.com)
FRED Producer Price Index by Commodity: Real Estate Services (Partial): Office Buildings, Gross Rents
Transportation
Aviation Safety (National Transportation Safety Board) (Python tutorial by Bamboo Weekly)
AirBnB in New York City from Tableau Resources: Sample Data
Zillow Prize: Zillow's Home Value Prediction (Zestimate) (Kaggle)
Zillow Prize: Zillow's Home Value Prediction (Zestimate) (Kaggle)
Government/Economics
Household Debt & Credit by Federal Reserve Bank of New York (Python tutorial by Bamboo Weekly)
Energy
Venture Investment
US Census: Business & Economy
States, counties and cities now often have datasets publicly available (e.g., Transparent California)
Government datasets from Tableau Resources: Sample Data
Government datasets from Tableau Resources: Sample Data
Energy
Technology
Venture Investment
Global Entrepreneurship Monitor (Python tutorial by Bamboo Weekly)
Starup venture funding dataset from Tableau Resources: Sample Data
Kickstarter.com
Indiegogo.com
Yahoo! Finance
Retail
Retail datasets from Sam's Club, Dyson Foods, etc (request for access is required)
Telework or work at home for payStarup venture funding dataset from Tableau Resources: Sample Data
Kickstarter.com
Indiegogo.com
Yahoo! Finance
Retail
Retail datasets from Sam's Club, Dyson Foods, etc (request for access is required)
Finance
Simulated dataset from JP Morgan AI Research (request for access is required)
Fannie Mae Single-Family Loan Performance Data