HomeResources

Resources

TECHNICAL RESOURCES

This resource page is a starting point for the technical development of your solution, designed to enable you to start building quickly. The tools, models, and frameworks outlined below are recommended, not mandatory. Feel free to use the technology of your choice.

The scope of your solution

In general, we’re looking for solutions from the legal domain which utilise large language models (LLMs). Closer to the Hackathon.

Need inspiration? Check out the problem statements provided by co-organising institutions and our sponsors. Most of them offer non-monetary prizes to the team that best addresses their problem.

Building the solution

From the technical standpoint, LLM-based web applications usually consist of the following three architectural layers:

  • Front-end – the user interface of the application. This is what your users see and interact with. The front-end is rendered and executed in the user’s web browser, collects user input and presents the LLM responses in a user-friendly manner.
  • Back-end – the application’s logic which exists on a remote server. Typically, the back-end handles user sessions and connects the user with the underlying LLM model.
  • The LLM model – a computer programme based on artificial neural networks to generate, summarise, analyse, and comprehend text. It also can perform tasks such as sorting data into categories (i.e., statistical classification).

The following sections outline how to select suitable technologies for each of the architectural layers outlined above. If you prefer to use a “no code” solution, we have a section on that as well.

Let’s start with the LLM model, as this is they key building block of your application.

I. Choosing the LLM model

In general, you have two options: You can choose a (1) general-purpose LLM or (2) an LLM specifically trained for the legal domain. While general purpose models are heavily researched and developed by top-tier AI companies, and are superior in the number of parameters and the size of the context window (we explain both of these terms later in the text), the domain specific models are trained with specialised data and can perform better on the domain-specific (such as legal) tasks.

Understanding the LLM jargon

Before we deep dive into the list of models, let’s introduce some of the common technical terms. Below, we explain the basic principles of LLMs and and introduce the jargon (in bold).

When somebody designs a new LLM, it is at first an “empty” neural network. Such network consists of millions of artificial neurons (i.e., perceptrons), which are interconnected. The connections between perceptrons differ in terms of how strong they are. When we change the strengths of the connections, we change the behaviour of the model. At the moment of creation, the connections have randomly assigned strengths. The goal is to set up the connections such that the model gives us optimal performance.

Due to the number of connections in LLMs, we wouldn’t be able to set them up manually. However, we can use a large set of example questions and answers to determine, how to change the strengths of the connections in the model to obtain the desired results. The process which changes the strengths of the connections using example questions and answers is called model training. The example questions and answers themselves are called training data.

Training of state-of-the-art LLMs is a long, expensive, and complicated process, which is why only a few companies around the world can afford to do it. Trained general purpose LLMs are sometimes referred to as foundation models. It would be virtually impossible for you to train your own foundation model from scratch for the Hackathon, but the good news is you don’t have to do it.

Often, we would like a bit more than just a foundation model. To improve the performance of the given task, the foundation model can be trained with additional data. Typically, this new training data comes from a specific field in the industry. The resulting models are referred to as domain-specific LLMs. The practice of additional model training is called model fine-tuning. Due to the technical complexity, it would be almost impossible for you to fine-tune your own model from a foundation model at the Hackathon. However, at this stage, you don’t have to worry about model fine-tuning. Should your team be successful, you might want to explore model fine-tuning after the incorporation of your company.

Training is not the only way an LLM can gain knew knowledge. There is a popular technique for “injecting” information into an existing LLM: Retrieval-Augmented Generation (RAG). Using this technique, when you start a new chat session with the user, you can first programmatically insert a pre-existing set of documents into the context of the conversation – otherwise known as context window. LLM models significantly differ when it comes to the size of the context window, or in other words, in the ability to consume additional data just before the start of the conversation. The bigger the size of the context window is, the more text you can put in.

👀 To learn more, you can refer to Groq’s blog post and video explaining RAG as well as their RAG implementation example.

The performance of LLM models can be measured using benchmark tests (i.e., benchmarks). LawBench is a popular domain-specific benchmark for Law LLMs. The current LawBench leaderboard of selected LLMs can be found on their website.

Finally, once you know which model you want to use, it needs to placed on some server connected to the internet, so that the logic of your application (i.e., the back-end) can connect to it. You have two options here:

  • You can use an off-the-shelf service (i.e., managed service) and connect to it via API
  • You can manually deploy an open-source LLM to a cloud-based service

Available LLM models

At the Hackathon, there are no restrictions in terms of the model you use in your application.

Depending on how adventurous you are, we recommend to choose from the options below.

No code options

The following providers offer no-code solutions, if you would like to completely avoid programming your own application. Each of them is a one-stop-shop, particularly suitable for non-technical participants:

Convenient options available via API

The following models are all general-purpose and available as a managed service:

Advanced options

  • SaulLM-7B: a pioneering open-source LLM for legal applications based on Mistral 7B. The model is available on Hugging Face, where you can deploy it from to a cloud platform such as AWS SageMaker, Azure ML, or Google Cloud.
  • Cambridge Legal LLM: To be confirmed

II. Back-end

If you intend to use one of the managed LLMs listed above (or SaulLM-7B), we recommend to use:

Video tutorial: LangChain Explained in 13 minutes

Cloud services

All participants will have free access to selected Google Cloud and AWS services during the day of the Hackathon (only).

Credentials are available at the Info Point.

Databases

To store data in your application such as user profiles or chat history, we recommend to use a managed database on the cloud platform of your choice. Each of the providers has a free tier, which should be sufficient for the event.

Important: Hackathon participants will get free credits for the services on the Google Cloud platform.

III. Front-end

We recommend the following building blocks:

Video tutorial: TypeScript for React in one hour

Video tutorial: React for Beginners in one hour

Legal data for the context window (RAG)

If you intend to utilise the context window of your LLM, we recommend exploring the following resources:

After the Hackathon

If you would like to continue working on your solution after June 23rd, you might want to consider fine-tuning your own LLM model. You can use resources like the Mistral fine-tuning codebase.

We’re eagerly waiting to hear about your progress. If you have something interesting to share, we’re more than willing to introduce you to the event’s sponsors who might be able to assist you further. Stay in touch at [email protected]!

STAY TUNED

© 2024 Hack_the_Law Cambridge

[email protected]

Designed by RH

King’s Entrepreneurship Lab

Stanford CodeX

Liquid Legal Institute