Mixmosa was a dating service that organized 2,961 in-person dates. For each date, it observed:
- the outcomes (e.g. did both people like each other?)
- participant attributes (e.g. physical attractiveness, Big 5 Personality, intelligence, income, drug use, height, values, etc.)
I am sharing the (anonymized) dataset here because it may be of value to people interested in evolutionary psychology, machine learning, mate choice, dating & relationships, etc.
- Here is an example of some analysis done using this dataset.
- Is it possible to predict the outcomes of first dates in advance?
The dataset is available in two formats:
- CSV files
- Postgres SQL
There are two tables I'd recommend you start with:
user_featuresdescribes the participants: age, gender, sexual orientation, IQ, personality, physical attractiveness, etc.interactionsdescribes the outcomes of each date: did the male like the female? did the female like the male?
Data was collected by running in-person speed dating events. Each date lasted 8 minutes.
-
users: people that created an account in the Mixmosa app -
events: speed dating event metadata (note that age ranges were not strictly enforced) -
attendance: which users went to which events -
event_feedback: attendees were asked to rate their experience after each event on a 5-star scale -
questions: the set of questions users were required to complete before attending an event -
responses: users' answers toquestions -
swiping: speed dating attendees were asked who they liked and disliked after 8-minute-long dates -
picture_ratings: users photos were scored on physical attractiveness by the opposite sex on a 1 - 10 scale, with 1 being least attractive, 5 being average, and 10 being most attractive. Independent raters were used (Mixmosa users did not rate other Mixmosa users). -
user_features: derived table that aggregates all user attributes (e.g. the user's average physical attractiveness score frompicture_ratings, their IQ score given theirresponsesto intelligencequestions, their neuroticism score given theirresponsesto Big 5questions). See below for definitions for these features. You can also generate your own features from the raw data. -
interactions: each row represents a date. derived fromswiping. contains date outcomes (e.g. did the man like the woman? did the woman like the man?).
Self Explanatory
agegenderattracted_toheight_inches
Intelligence
iq_share_correct: share of intelligence questions answered correctly by user
Big 5 Personality Traits:
- sum of individual question scores, with each question ranging from 1 to 3. Note that I decided to score "strongly agree" and "agree", and "strongly disagree" and "disagree" the same, which is why this is on a 3 point scale. If you'd like to take these differences into account and move to a 5 point scale, you can regenerate these features yourself. See code below.
- Specific features:
agreeablenessconscientiousnessextraversionneuroticismopenness
def get_personality_by_user(self):
query = """
select
user_id,
q.id as question_id,
category,
lower(misc->>'big_5') as big_5_category,
subcategory,
misc->>'key' as key,
prompt,
responses.id as responses_id,
responses.text_response
from responses
inner join questions q on responses.question_id = q.id and q.category = 'Personality'
inner join users u on responses.user_id = u.id;
"""
personality_responses = pd.read_sql(query, self.engine)
def score_personality_response(row):
# TODO currently not differentiating between "strongly {agree, disagree}", and just {agree, disagree}
response = row["text_response"]
if row["key"] == "1":
if response in ["Strongly Disagree", "Disagree"]:
return 1
elif response == "Neutral":
return 2
elif response in ["Strongly Agree", "Agree"]:
return 3
else:
return None
elif row["key"] == "-1":
if response in ["Strongly Disagree", "Disagree"]:
return 3
elif response == "Neutral":
return 2
elif response in ["Strongly Agree", "Agree"]:
return 1
else:
return None
else:
return None
personality_responses["score"] = personality_responses.apply(
score_personality_response, axis=1
)
personality_users = personality_responses.groupby(
"user_id"
).responses_id.count()
# currently 65 personality questions. filter for where users have answered all personality questions.
users_completing_all_personality_questions = list(
personality_users[personality_users >= 65].index
)
filtered_personality_responses = personality_responses[
personality_responses.user_id.isin(
users_completing_all_personality_questions
)
]
return pd.pivot_table(
filtered_personality_responses,
values="score",
index=["user_id"],
columns=["big_5_category"],
aggfunc=np.sum,
)
Physical Attractiveness
avg_physical_attractiveness_rating: user's average physical attractiveness score frompicture_ratings(1 - 10 scale)
Income:
income_bucket
def score_income(income_bucket: str):
if income_bucket == "Under $15,000":
return 1
elif income_bucket == "$15,000 - $24,999":
return 2
elif income_bucket == "$25,000 - $34,999":
return 3
elif income_bucket == "$35,000 - $49,999":
return 4
elif income_bucket == "$50,000 - $74,999":
return 5
elif income_bucket == "$75,000 - $99,999":
return 6
elif income_bucket == "$100,000 - $149,999":
return 7
elif income_bucket == "$150,000 - $199,999":
return 8
elif income_bucket == "$200,000 and over":
return 9
else:
return None
Drug Use
nicotine_score:
def score_nicotine(text_response):
if text_response == "Never":
return 0
elif text_response in ["Daily", "Weekly", "Monthly"]:
return 1
else:
return None
alcohol_score
def score_alcohol(text_response):
if text_response == "Never":
return 0
elif text_response == "Monthly":
return 1
elif text_response == "Weekly":
return 2
elif text_response == "Daily":
return 3
else:
return None
marijuana_score
def score_marijuana(text_response):
if text_response == "Never":
return 0
elif text_response == "Monthly":
return 1
elif text_response == "Weekly":
return 2
elif text_response == "Daily":
return 3
else:
return None
has_used_psychedelics
psychedelics["has_used_psychedelics"] = psychedelics.text_response.apply(
lambda r: 1 if r == "Yes" else 0
)
Health
behavioral_health_score: how strongly does the person agree with the following: i eat healthy, exercise, and am not overweight?
def score_health(text_response):
if text_response == "Strongly Disagree":
return 0
elif text_response == "Disagree":
return 1
elif text_response == "Neutral":
return 2
elif text_response == "Agree":
return 3
elif text_response == "Strongly Agree":
return 4
else:
return None
Sex
sex_partners_score
def score_sex(text_response):
if text_response == "0":
return 0
elif text_response == "1":
return 1
elif text_response == "2-4":
return 2
elif text_response == "5-9":
return 3
elif text_response == "10+":
return 4
else:
return None
casual_sex_score: this is question 116, i.e.: "How strongly do you agree with the following: I do NOT want to have sex with a person until I am sure that we will have a long-term, serious relationship"
def score_casual_sex(text_response):
if text_response == "Strongly Disagree":
return 4
elif text_response == "Disagree":
return 3
elif text_response == "Neutral":
return 2
elif text_response == "Agree":
return 1
elif text_response == "Strongly Agree":
return 0
else:
return None
Kids
num_children_wanted_score
def score_num_children_wanted(text_response):
if text_response == "0":
return 0
elif text_response == "1":
return 1
elif text_response == "2-3":
return 2
elif text_response == "4+":
return 3
else:
return None
already_has_kids
has_kids["already_has_kids"] = has_kids.text_response.apply(
lambda r: 1 if r == "Yes" else 0
)
Importance of {ethnicity, religion, politics} in a Romantic Partner: Users were asked: How important is it that your romantic partner shares your {ethnicity, religion, political beliefs}?
ethnicity_importance_scorereligion_importance_scorepolitics_importance_score
def score_importance(text_response):
if text_response == "Not Important":
return 0
elif text_response == "Somewhat Important":
return 1
elif text_response == "Very Important":
return 2
else:
return None
Misc
political_tolerance_score: this is question 179, i.e.: To what degree does the following phrase describe you? Comfortable being friends with someone that disagrees with me on important political topics.
def score_political_tolerance(text_response):
if text_response == "Strongly Disagree":
return 0
elif text_response == "Disagree":
return 1
elif text_response == "Neutral":
return 2
elif text_response == "Agree":
return 3
elif text_response == "Strongly Agree":
return 4
else:
return None
get_along_well_with_family_score
def score_family(text_response):
if text_response in ["Strongly Disagree", "Disagree", "Neutral"]:
return 0
elif text_response in ["Agree", "Strongly Agree"]:
return 1
else:
return None
hard_work_and_success_belief_score
def score_does_hard_work_lead_to_success(text_response):
if text_response == "Strongly Disagree":
return 0
elif text_response == "Disagree":
return 1
elif text_response == "Neutral":
return 2
elif text_response == "Agree":
return 3
elif text_response == "Strongly Agree":
return 4
else:
return None
I'd be happy to answer any questions: first dot last on gmail.