Jurists and the culture of the document

Jurists consume information in the culture of the document and its maximum expression is the existence of the vade mecum. Although the format is in decline, it is not possible to say that it is totally obsolete.

This same culture leads to courts publishing their internal regulations in pdf format, not on web pages. So, somehow, we remain stuck in the documentary metaphor and there is no easy way out of it. The fact is that, within the universe of law, few information systems have been built and we continue to be attached to documents as the starting point of our work.

When the fever of the RAG , I even tested the feasibility of transposing the data from this type of document to a vector bank, with the aim of consuming this information through a chatbot. But, of course, this is an unfortunate transposition, since, in the case of the vade mecum, all information about the structure of the compendium is lost. In practice, AI responds by knowing the content, but not knowing what law it is .

In the case of regulations (and small documents in general), ChatGPT itself has already provided the solution, allowing the upload of small documents so that the bot can consult its information. Then you can choose a Chat2PDF solution from the many on the market to solve your own life quickly.

On the other hand, a satisfactory solution for more complex cases would require the organization of documents (and their parts) in order to reflect their structure. To my knowledge, there is still no commercial solution to this.

Either way, if you want to make your own attempt at conversing with texts in general, I recommend hosting an instance of Weaviate as your vector bank, if paying USD 25 per month is not a problem. The chat interface can be prototyped in the Flowise , which is an open source tool for creating routines based on LanchChain.

Since loading large documents into the database can be a problem, I recommend the VectorAdmin to manage the inclusion of these entries. And to complete the open source stack, Weaviate-UI helps you see the information entered into your database.

I discard the use of other databases, such as Pinecone (for its price), Qdrant (for the limitations of the endpoint offered in the open source version) and Chroma (because it has no interface). With that, I recommend the Weaviate-centric stack

If you have success indexing this type of content in a vector database, please tell me. Personally, I found the result unsatisfactory, except when the indexed content has the form of entries (which is the case of summaries and newsletters). Thus, I do not consider the approach of talking to a large pdf as a vade mecum to be productive. And for small texts, any application (like ChatGPT) would already solve your life.