Inventory v2.4

Explore Our Datasets

Every dataset is built to your spec, not pulled from a shelf. Core languages below. Multilingual code-mixed combinations and other languages available on request. All data is 100% human-annotated and ethically sourced.

Hindi Conversational

Noise: Medium Overlap: Yes Built to Order

Transcript Snippet

spk_01: kal deployment slot free hoga kya?
spk_02: haan lekin evening window better rahegi

Metadata Schema

{
  "language": "hindi",
  "accent": "standard_northern"
}

File Structure

session_021/
├── audio.wav
├── transcript.json
└── metadata.json

Primary Use Cases

High-fidelity training for ASR, customer support voicebots, onboarding flows, sales conversation AI, and dialect-specific NLU models.

Request Sample

Hinglish Code-Mixed

Noise: High Overlap: Frequent Built to Order

Transcript Snippet

spk_01: flight cancel ho gayi, what should I do?
spk_02: I am sorry sir, let me check the status

Metadata Schema

{
  "mix_ratio": "60:40",
  "domain": "ecommerce"
}

File Structure

hg_train_set_v1/
├── batch_001.zip
├── manifest.csv
└── segments.json

Primary Use Cases

Ideal for urban Indian AI assistants, e-commerce support bots, and multi-lingual sentiment analysis.

Request Sample

Punjabi Regional

Noise: Low Overlap: Minimal Built to Order

Transcript Snippet

spk_01: ssa ji, ki haal chal hai?
spk_02: vadiya vadiya, tusi daso kidda aana hoya?

Metadata Schema

{
  "dialect": "majhi",
  "setting": "indoor_quiet"
}

File Structure

punjabi_v2_core/
├── raw_wavs/
├── trans_vtt/
└── session_logs.xml

Primary Use Cases

Regional voice search engines, agricultural advisory bots, and government service accessibility.

Request Sample

Marwadi Commerce

Noise: Med-High Overlap: Yes Built to Order

Transcript Snippet

spk_01: mhaare thode paise baaki hai.
spk_02: arey bhai, kal pakka bhej dyun.

Metadata Schema

{
  "domain": "trade_finance",
  "verified_consent": true
}

File Structure

marwadi_trade_v1/
├── audio_processed/
├── master_meta.json
└── consent_proofs/

Primary Use Cases

Hyper-local commerce bots, financial inclusion initiatives, and specialized dialect translation.

Request Sample

Indian English (Enterprise)

Noise: Medium Overlap: Moderate Built to Order

Transcript Snippet

spk_01: Please provide the invoice now.
spk_02: Sending it over via email right away, sir.

Metadata Schema

{
  "accent_profile": "pan_india_professional",
  "role": "support_agent"
}

File Structure

ie_enterprise_cor/
├── wav_16k/
├── ortho_transcripts/
└── sentiment_tags.json

Primary Use Cases

Global support automation, enterprise meeting transcription, and Indian accent-aware LLM evaluation.

Request Sample

Multilingual Code-Mixed

Custom Pairs On Request Same Schema

Example Language Pairs

Hindi–English Marwadi–Hindi Haryanvi–Hindi Marwadi–English Tamil–English Marathi–Hindi + any pair your model needs

Metadata Schema

{
  "lang_pair": "marwadi-hindi",
  "per_token_lang_id": true
}

File Structure

custom_multilingual_v1/
├── audio.wav
├── transcript.json
└── metadata.json

Primary Use Cases

Code-switching ASR, multilingual LLM fine-tuning, cross-dialect NLU, and regional voice assistants that reflect how India actually speaks.

Discuss Your Requirements

Request Dataset Specs

Tell us about your requirements and receive full schema details within 24 hours.

Request Received

We'll send full dataset specifications to your work email within 24 hours.

Secure inquiry handling. No spam guaranteed.