Getting Started with ySights

This tutorial introduces the basics of ySights, a Python library for analyzing data from YSocial simulations.

What You’ll Learn

  • How to initialize the YDataHandler

  • Loading and exploring simulation data

  • Working with Agents and Posts

  • Basic data queries

Prerequisites

You need:

  • ySights installed (pip install ysights)

  • A YSocial simulation database file (.db format)


1. Importing ySights

First, let’s import the main components we’ll be using:

[1]:
from ysights import YDataHandler
import matplotlib.pyplot as plt
import numpy as np

# Set up matplotlib for nice plots
plt.style.use('seaborn-v0_8-darkgrid')
%matplotlib inline

2. Initializing the Data Handler

The YDataHandler is your main interface to the simulation database. It provides methods to query and analyze the data.

Note: Replace 'path/to/your/simulation.db' with the actual path to your YSocial database file.

[ ]:
# Initialize the data handler
# Replace this path with your actual database path
db_path = 'ysocial_db.db'

try:
    ydh = YDataHandler(db_path)
    print("✓ Successfully connected to the database!")
except FileNotFoundError:
    print("✗ Database file not found. Please check the path.")
    print("  For this tutorial, we'll show the expected outputs.")

3. Exploring the Simulation

Let’s get some basic information about the simulation.

[ ]:
# Get simulation time range
time_range = ydh.time_range()
print("Simulation Time Range:")
print(f"  Min Round: {time_range['min_round']}")
print(f"  Max Round: {time_range['max_round']}")
print(f"  Duration: {time_range['max_round'] - time_range['min_round']} rounds")
[ ]:
# Get total number of agents
num_agents = ydh.number_of_agents()
print(f"Total Agents in Simulation: {num_agents}")

4. Working with Agents

Agents represent the users in the simulation. Let’s explore their properties.

[ ]:
# Get all agents
agents = ydh.agents()
print(f"Retrieved {len(agents.get_agents())} agents")

# Look at the first agent
first_agent = agents.get_agents()[0]
print("\nFirst Agent Properties:")
print(f"  ID: {first_agent.id}")
print(f"  Age: {first_agent.age}")
print(f"  Gender: {first_agent.gender}")
print(f"  Education: {first_agent.education}")

Filtering Agents by Feature

You can filter agents based on specific attributes:

[ ]:
# Get agents with age 25
young_agents = ydh.agents_by_feature('age', 25)
print(f"Agents aged 25: {len(young_agents.get_agents())}")

# Get agents by gender
female_agents = ydh.agents_by_feature('gender', 'F')
print(f"Female agents: {len(female_agents.get_agents())}")

Age Distribution Visualization

[ ]:
# Collect ages
ages = [agent.age for agent in agents.get_agents()]

# Plot age distribution
plt.figure(figsize=(10, 6))
plt.hist(ages, bins=20, edgecolor='black', alpha=0.7)
plt.xlabel('Age', fontsize=12)
plt.ylabel('Number of Agents', fontsize=12)
plt.title('Age Distribution of Agents', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.show()

5. Working with Posts

Posts represent the content created by agents in the simulation.

[ ]:
# Get posts by a specific agent
agent_id = 1  # Change this to any valid agent ID
agent_posts = ydh.posts_by_agent(agent_id)

print(f"Agent {agent_id} created {len(agent_posts.get_posts())} posts")

# Examine the first post
if agent_posts.get_posts():
    first_post = agent_posts.get_posts()[0]
    print("\nFirst Post Details:")
    print(f"  Post ID: {first_post.id}")
    print(f"  Author: {first_post.user_id}")
    print(f"  Round: {first_post.round}")
    print(f"  Topic: {first_post.topics}")
    print(f"  Emotion: {first_post.emotions}")

6. Agent Interest Profiles

Each agent has an interest profile showing their engagement with different topics.

[ ]:
# Get interest profile for an agent
agent_id = 1
profile = ydh.agent_interests(agent_id)

print(f"Interest Profile for Agent {agent_id}:")
for topic, score in list(profile.items())[:5]:  # Show top 5 topics
    print(f"  Topic {topic}: {score:.3f}")

Visualizing Interest Profile

[ ]:
# Get top topics for visualization
sorted_topics = sorted(profile.items(), key=lambda x: x[1], reverse=True)[:10]
topics = [f"Topic {t[0]}" for t in sorted_topics]
scores = [t[1] for t in sorted_topics]

# Create bar plot
plt.figure(figsize=(12, 6))
plt.barh(topics, scores, color='steelblue', alpha=0.8)
plt.xlabel('Interest Score', fontsize=12)
plt.ylabel('Topics', fontsize=12)
plt.title(f'Top 10 Topics for Agent {agent_id}', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.show()

7. Custom Queries

For more complex analysis, you can execute custom SQL queries:

[ ]:
# Example: Get top 5 most active agents
query = """
    SELECT user_id, COUNT(*) as post_count
    FROM post
    GROUP BY user_id
    ORDER BY post_count DESC
    LIMIT 5
"""

results = ydh.custom_query(query)
print("Top 5 Most Active Agents:")
for i, row in enumerate(results, 1):
    print(f"  {i}. Agent {row[0]}: {row[1]} posts")

Summary

In this tutorial, you learned:

✓ How to initialize YDataHandler with your simulation database
✓ Basic exploration of simulation time range and agent count
✓ Working with Agents and filtering by features
✓ Retrieving and examining Posts
✓ Analyzing agent interest profiles
✓ Creating visualizations of simulation data
✓ Executing custom SQL queries for advanced analysis

Next Steps

Continue with:

  • Network Analysis Tutorial: Learn how to extract and analyze social networks

  • Algorithms Tutorial: Explore profile similarity and recommendation metrics

  • Visualization Tutorial: Create advanced visualizations of simulation data

[ ]: