Models
LLMs on Tabular Data with Limited Semantics: Evidence from Industrial Car Retrofit Prediction
The study evaluates the performance of LLM-based strategies against traditional tabular machine learning models for industrial car retrofit prediction, using a dataset of 284,271 vehicles linked to retrofit management. Key findings indicate that while classical tree ensembles outperform LLMs in standalone tasks, embedding features from LLMs (e.g., Amazon Titan) show utility, achieving a binary AUC of 0.982, whereas direct prompting and hashing significantly degrade performance. The results emphasize that LLMs can serve as complementary tools in privacy-sensitive environments, rather than as replacements for established tabular methods.
tabular datallmindustrial prediction