LegalEval Large Language Model: Understanding Legal Texts

Visit the GitHub Repository

Stack

Here are the technologies used in this project:

Client

Client: Semantic Evaluation ACL

Key Contributions

Data Management and Model Optimization

Developed a custom PyTorch data loader to efficiently process and manage large-scale datasets.
Optimized LLaMA2 model performance for log probability analysis, achieving a 15% increase in accuracy.

Advanced Model Training

Implemented model parallelism techniques using:
- Hugging Face’s Accelerate.
- DeepSpeed ZeRO 3.
Achieved a 30% reduction in training time on A100 GPUs.

Model Fine-Tuning and Experiment Tracking

Engineered and fine-tuned a Hugging Face decoder model for NLP tasks.
Utilized Weights & Biases for experiment tracking, leading to:
- 20% improvement in Precision, Recall, and F1 scores.

Distributed Computing and Deployment

Set up and configured a distributed computing cluster, enhancing:
- Data processing capabilities.
- Seamless model deployment across multiple nodes.

Legal Text Classification

Executed tasks for SemEval 2023 LegalEval, including:
- Rhetorical Roles Labeling.
- Court Judgment Prediction.
Contributed to a top 10% performance among 26 participating teams.

Impact

Demonstrated significant improvements in model accuracy, efficiency, and deployment capabilities.
Enhanced performance on complex NLP tasks for legal text classification, positioning the project at a competitive level.

Share on

X (formerly Twitter) Facebook LinkedIn