Week 8: Project & Exam

DA Exam Review (Optional)

In Week 8, we will have our final team project presentations (schedule).

Team Project 
Each team will also give a 10-min final project presentation. Please confirm your presentation schedule with the instructor. You will also complete the final team project report.
  • Detailed instructions for the final report is available here.
  • Detailed team project presentation instructions will be available here.
Click here for a sample report of descriptive analytics.
Click here for a sample research study on prescriptive analytics.
Click here for a sample student paper on predictive analytics.
Click here for a sample student paper (it's a research paper.)

This is a former student's research paper so you can see what a proper APA style paper looks like. His hypothesis statements follow the format that I expect from you. You should also see how he constructed his literature review, presented analysis results, drew conclusions, discussed limitations, etc. 

None of these papers are exactly what you should do for the Team Project, but they represent good models for parts of your project. We won't necessarily use the exact analyses presented in some of the papers (E.g., we won't be running advanced machine learning models as presented in the sample prescriptive analytics paper by two Stanford students.) Nevertheless, these papers give you a more concrete idea of what a good analytics report should look like and what you are expected to produce.

Below is a sample list of potential data sources for your team project organized by 14 industry sectors. Note that you should use the train (not test) dataset, if you're getting data from Kaggle. You should also feel free to search for datasets using Google's dataset search engine

Mason School's business library offers a number of databases that contain datasets - See the librarian's presentation for details. All of these (and more) can be found on the W&M Business Library A-Z Databases page. Feel free to contact the library for support.

You can also use datasets you have collected on your own, or from your prior/current workplace(s) (with permission.) 

You can use datasets that are synthetically generated. You can even create your own synthetic datasets with tools like gretel.ai, or any of the general-purpose genAI models. Just be sure to disclose properly how the dataset is created in your writing and presentation.

Your team will submit TP01, which includes Introduction (i.e., team project topic), and a dataset for instructor approval. The dataset should consist of rows of raw records, as opposed to summary statistics. Each row should represent an individual record, such as a person, a property, a product, a brand, etc. Your final dataset should consist of at least 200 cases, 3 numeric variables and 3 categorical variables from the last 10 years.

Food & Drinks
FRED Producer Price Index by Commodity: Real Estate Services (Partial): Office Buildings, Gross Rents

Government/Economics
OpenSanctions: An international database of persons and companies of political, criminal, or economic interest
Household Debt & Credit by Federal Reserve Bank of New York (Python tutorial by Bamboo Weekly)

Energy

Technology