I have a database where each row is a user, and it has two types of columns: features of the user and genre grading from 0 to 6. For example:
User | Age | Sex | ... | Genre 1 | Genre 2 | ... |
---|---|---|---|---|---|---|
1 | 18 | M | 4 | 5 | ||
2 | 75 | F | 3 | 2 | ||
3 | 45 | M | 0 | 1 |
Let us say then that we have features F_1,F_2,...,F_m
and Genres grading G_1, G_2,..., G_d
where each G_i = 0,1,...,5
. The basic idea is to build a function of the sorts F(F_1,F_2,...,F_m) = (G_1,G_2,...,G_d)
. This is, given the characteristics of the user, I want to estimate the genre scores in order to recommend the highest ranked genre, then the next one, and so on.
For example, given the fit of F
, we could have only age and gender as features, and genres romance, action and historical. Thus, the idea is to train the function and, given a new user who is a 35 year old male, give the estimate F(35, male) = (1,3,2)
, where 1 is the rating for romantic, 3 for action, and 2 for historical. Thus, we would recommend action movies first.
There are no missing values, so no worries about that.
I thought of using multivariate multinomial regression, but I couldn't make it work in Python
. I would like any pointers to tackle this problem, specially if it can be implemented in Python
. I am not sure where to start, nor what algorithms I should use. I have seen collaborative filtering, but that seems to be related to how the user interacts with the system, not the user characteristics themselves. I believe my problem is much simplier.