LegalEval Large Language Model: Understanding Legal Texts

Visit the GitHub Repository

Stack

Here are the technologies used in this project:

AWS GCP Git Firebase Python Github Pytorch Tensorflow Hugging Face

Client

  • Client: Semantic Evaluation ACL

Key Contributions

Data Management and Model Optimization

  • Developed a custom PyTorch data loader to efficiently process and manage large-scale datasets.
  • Optimized LLaMA2 model performance for log probability analysis, achieving a 15% increase in accuracy.

Advanced Model Training

  • Implemented model parallelism techniques using:
    • Hugging Face’s Accelerate.
    • DeepSpeed ZeRO 3.
  • Achieved a 30% reduction in training time on A100 GPUs.

Model Fine-Tuning and Experiment Tracking

  • Engineered and fine-tuned a Hugging Face decoder model for NLP tasks.
  • Utilized Weights & Biases for experiment tracking, leading to:
    • 20% improvement in Precision, Recall, and F1 scores.

Distributed Computing and Deployment

  • Set up and configured a distributed computing cluster, enhancing:
    • Data processing capabilities.
    • Seamless model deployment across multiple nodes.
  • Executed tasks for SemEval 2023 LegalEval, including:
    • Rhetorical Roles Labeling.
    • Court Judgment Prediction.
  • Contributed to a top 10% performance among 26 participating teams.

Impact

  • Demonstrated significant improvements in model accuracy, efficiency, and deployment capabilities.
  • Enhanced performance on complex NLP tasks for legal text classification, positioning the project at a competitive level.