Synthetic Data Generator

A robust and versatile tool for generating synthetic datasets, designed to support testing, analysis, and machine learning applications. Built with Streamlit, the app features an intuitive, interactive interface where users can easily define and generate data columns with various types, distributions, and constraints.

The app offers a seamless experience for configuring column properties such as range, frequency, and text distribution, making it the perfect solution for generating tailored data for diverse testing and analytical purposes.

Tech Stack:

Pandas

NumPy

Streamlit

Jupyter Notebook

Git

Github

Key Features:

  • Customizable Data Generation:

    Easily create synthetic datasets by defining columns with various types, including numeric, date, and text. Tailor data generation based on specific ranges, frequencies, and distributions.

  • Intuitive User Interface:

    The app features a clean, interactive interface built with Streamlit, allowing users to define and generate data with ease, without needing coding skills.

  • Flexible Column Properties:

    Configure column properties such as numeric range, date frequency, and text distribution to generate diverse datasets that meet your requirements.

  • Real-Time Data Visualization:

    View generated data in real-time, making it easier to fine-tune column settings and visualize your synthetic datasets as they are created.

  • Seamless Data Customization:

    Adjust text distributions with options like count-based or ratio-based methods for more complex data generation scenarios.

  • Ideal for Testing and Machine Learning:

    Perfect for generating realistic synthetic data for testing, analysis, and machine learning model development.