DocsData Format

Data Format

Learn how to structure your CSV files for fairness analysis with EthixAI.

CSV Structure

Your data must be in CSV format with specific columns for demographic attributes and model predictions.

Example Structure
applicant_id,age,gender,race,income,credit_score,decision,prediction
1001,32,Female,White,55000,720,Approved,Approved
1002,45,Male,Black,62000,680,Approved,Rejected
1003,28,Female,Hispanic,48000,695,Rejected,Rejected
1004,51,Male,Asian,78000,740,Approved,Approved

Required Fields

Protected Attributes
Required

At least one demographic attribute for fairness analysis:

  • gender
    Binary (Male/Female) or categories
  • race
    White, Black, Hispanic, Asian, etc.
  • age
    Numeric or age groups
  • ethnicity
    Alternative to race
Outcome Fields
Required
  • decision
    Actual decision (Approved/Rejected, True/False, 1/0)
  • prediction
    Model prediction (same format as decision)
Feature Columns
Optional

Additional columns used by your model for predictions (income, credit_score, etc.). These enable SHAP explainability analysis.

Data Types

Categorical

Text labels for discrete categories:

gender: "Male", "Female", "Other"
race: "White", "Black", "Asian"
decision: "Approved", "Rejected"
Numeric

Numbers for continuous values:

age: 32, 45, 28
income: 55000, 62000, 48000
credit_score: 720, 680, 695

Best Practices

✓ Data Quality
  • • Use consistent category labels (avoid typos: "Male" vs "male")
  • • Handle missing values before upload
  • • Ensure binary outcomes are clearly defined
  • • Include at least 100 records for meaningful metrics
  • • Balance protected attribute groups when possible
⚠️ Common Issues
  • • Missing required columns (decision or prediction)
  • • Inconsistent category names (Male/male/M)
  • • Empty cells in protected attributes
  • • Mixed data types in same column
  • • Non-CSV file formats

Example Datasets

Download sample datasets to test EthixAI:

Loan Applications

50 rows

Loan approval decisions with demographics

Download CSV →

Hiring Decisions

100 rows

Job candidate screening results

Download CSV →