Deep dives into Indian language datasets, ASR research, speech technology, and the real cost of getting training data right.