LLaSA

Abstract

Wearable systems can recognize activities from IMU data but often fail to explain their underlying causes or contextual significance. To address this limitation, we introduce two large-scale resources: SensorCap, comprising 35,960 IMU--caption pairs, and OpenSQA, with 199,701 question--answer pairs designed for causal and explanatory reasoning. OpenSQA includes a curated tuning split (Tune-OpenSQA) optimized for scientific accuracy, narrative clarity, and diagnostic insight. Leveraging these datasets, we develop LLaSA (Large Language and Sensor Assistant), a family of compact sensor-aware language models (7B and 13B) that generate interpretable, context-rich responses to open-ended questions grounded in raw IMU data. LLaSA outperforms commercial LLMs, including GPT-3.5 and GPT-4o-mini, on benchmark and real-world tasks, demonstrating the effectiveness of domain supervision and model alignment for sensor reasoning.

Architecture

Schematic overview of LLaSA. Raw IMU signals are encoded by a frozen sensor encoder (𝑓𝜃 ), then projected into a shared embedding space (H𝑠 ) aligned with the query embedding (H𝑞). The language model (𝑓 Φ) pro- cesses both embeddings to generate the response (X𝑎 ). For instance, given a query about a workout, LLaSA identifies the activity (running) and justifies it using periodic accelerometer patterns.

Qualitative Results

LLaSA provides much more enriched and contextual answers (marked in red) in relation to the given IMU data. For example, for the second instance ’clip-walk’, LLaSA points to the accelerometer reading in the y-axis to determine forward motion.10 10

BibTeX

@article{imran2024llasa, title={LLaSA: A Sensor-Aware LLM for Natural Language Reasoning of Human Activity from IMU Data}, author={Imran, Sheikh Asif and Khan, Mohammad Nur Hossain and Biswas, Subrata and Islam, Bashima}, journal={arXiv preprint arXiv:2406.14498}, year={2024} }