Certainly! Below is a Python program that generates the three specified pandas DataFrames for housing data, location data, and house types. Each DataFrame will include the necessary fields, and the foreign keys will ensure proper relationships among them.
```python
import pandas as pd
import numpy as np
# Set random seed for reproducibility
np.random.seed(0)
# Function to generate location DataFrame
def generate_location_data(num_locations):
locations = {
"id": range(1, num_locations + 1),
"country": np.random.choice(['USA', 'Canada', 'UK'], num_locations),
"city": np.random.choice(['New York', 'Toronto', 'London', 'Vancouver', 'Manchester'], num_locations),
"population": np.random.randint(50000, 1000000, num_locations),
"area": np.random.randint(10000, 500000, num_locations)
}
return pd.DataFrame(locations)
# Function to generate house types DataFrame
def generate_house_type_data(num_house_types):
house_types = {
"id": range(1, num_house_types + 1),
"house_type": np.random.choice(['Detached', 'Semi-Detached', 'Terraced', 'Flat'], num_house_types),
"average_house_type_price": np.random.randint(100000, 1000000, num_house_types),
"number_of_houses": np.random.randint(10, 1000, num_house_types)
}
return pd.DataFrame(house_types)
# Function to generate housing data DataFrame
def generate_housing_data(num_houses, location_df, house_type_df):
house_sizes = np.random.randint(50, 300, num_houses) # size in m^2
location_ids = np.random.choice(location_df['id'], num_houses)
house_type_ids = np.random.choice(house_type_df['id'], num_houses)
# Generate prices based on size, location, and house type
house_prices = (house_sizes * np.random.randint(2000, 5000, num_houses) // 10) + \
(location_ids * 1000) + \
(house_type_df.loc[house_type_ids - 1, 'average_house_type_price'].values // 4)
housing_data = {
"id": range(1, num_houses + 1),
"house_size": house_sizes,
"house_price": house_prices,
"location_id": location_ids,
"bedrooms": np.random.randint(1, 6, num_houses),
"house_type_id": house_type_ids
}
return pd.DataFrame(housing_data)
# Generate DataFrames
num_locations = 10
num_house_types = 4
num_houses = 100
location_df = generate_location_data(num_locations)
house_type_df = generate_house_type_data(num_house_types)
housing_df = generate_housing_data(num_houses, location_df, house_type_df)
# Display the generated DataFrames
print("Location DataFrame:")
print(location_df.head(), "\n")
print("House Types DataFrame:")
print(house_type_df.head(), "\n")
print("Housing DataFrame:")
print(housing_df.head(), "\n")
# Printing the DataFrame shapes
print(f"Shapes: \nLocation: {location_df.shape}, House Types: {house_type_df.shape}, Housing: {housing_df.shape}")
```
### Explanation of the Code:
1. **Location DataFrame:**
- Generates random locations with attributes such as country, city, population, and area.
2. **House Types DataFrame:**
- Generates different types of houses along with average prices and quantity available.
3. **Housing DataFrame:**
- Generates housing data with increments on price based on house size, location, and house type, while also ensuring foreign keys (IDs) for location and house type.
### Output:
The three DataFrames generated will logically relate to one another with consistent data types and primary–foreign key relationships, resulting in a coherent representation of the housing dataset. The output displays heads of each DataFrame and their shapes for verification.