-
Notifications
You must be signed in to change notification settings - Fork 1
Supporting Documentation Page
DB Schemas and File Formats
The schema is still a work in progress as we explore a happy medium between supporting the recommender system and user interactions. However currently we have three simple tables: user table, recipe table and reviews table. User table consists of: email, name, password(salted and hashed), liked recipes, id. Recipe table consists of: ingredients, title, id, nutrition, servings. Review table consists of: ratings, id, user id, text review.
Example of recipe data stored in DB:
[
{
"id":152211,
"ingredients":[
{
"ingredientID":4544,
"displayValue":"1 spaghetti squash, halved lengthwise and seeded",
"grams":700.0,
"displayType":"Normal"
},
{
"ingredientID":3814,
"displayValue":"1/4 cup toasted pine nuts",
"grams":34.0,
"displayType":"Normal"
}...
],
"title":"Spaghetti Squash with Pine Nuts, Sage, and Romano",
"nutrition":{
"calories":{
"name":"Calories",
"amount":150.2581,
"unit":"kcal",
"displayValue":"150",
"percentDailyValue":"8",
"hasCompleteData":true
},
"fat":{...},
"cholesterol":{...},
"sodium":{...},
"carbohydrates":{...},
"protein":{...},
"folate":{...},
"magnesium":{...}...
},
"servings":4,
"prepMinutes":10,
"cookMinutes":50,
"readyMinutes":0
} ...
]
Example of User Data
{
"_id" : ObjectId("5bca72879d869d4240c5207b"),
"email" : "[email protected]",
"hashedPassword" : "$2b$10$XvtM7r0A0Y7d2ldQAXBhhOKiZRpbJBwhTTXCiXgSv/hPsPVAKXGYa",
"name" : "User Name",
"foodAllergies" : [ "Peanuts", "Shrimp" ],
"likedRecipes" : [ "1234354", "1234154" ]
"mealPlan": [ "12345", "123459" ]
}
Algorithms
As one of the functions of the recommender, we will be using k means clustering and perhaps k nearest neighbours to suggest other recipes of similar ingredients. We will take a master list of all ingredients, use them as dimensions and each recipe will have values of 1 or 0 on each of the column (contains or does not contain the ingredient).
Data Scraping Algorithm: With scraping, it will use a bunch of threads all inside a thread pool (so as to not overload the allrecipes server) which has all recipe categories predetermined. The scraper goes through the page count of each category to make it simpler to go through and retrieves each recipe asynchronously. After that, it uses channels to communicate with the reviews scraper when a recipe is retrieved, and passes a recipeID so it knows what set of reviews to get next.
Example using the example data:
This table includes the unique id of each recipe as well as a list of ingredient ids used for the recipe:
Due to the categorical nature of ingredients we need to reformat the data into a matrix of (number of recipe) x (number of total ingredients). This matrix describes the presence of ingredients in each recipe (1.0 = present; 0.0 = absent):
k nearest neighbour algorithm. This picture shows that given each recipe, there is a corresponding list of most similar recipes given the ingredients contained in the recipes:
Example of three similar recipes:
[5838, 4147, 4397, 4342, 4664, 3640, 4582, 2496, 1526, 18681, 16394, 20245, 16421, 16406, 4409, 17660, 16243, 16317, 16421, 16234, 16238]
id: 23600
index: 1
[6307, 4397, 4537, 4342, 16421, 2496, 16317, 16261, 16238, 16159, 16406, 16234]
id: 229491
index: 794
[20312, 5838, 16243, 16317, 4397, 4342, 16403, 16421, 16406, 5803, 20611, 18681]
id: 165350
index: 229