Combining Data, Tech and Social Science to understand the Indian Judiciary


The judicial system in India is an interconnected web of court complexes and establishments. It's a multi-tier system of 674 District courts, 25 State High courts, and the Supreme Court - all working together to bring justice to the 1.3BN citizens of this country. Information around new case registrations, pending and disposed case details creates a massive data pool of legal data


Data management, standardization and accessibility are some huge challenges and rarely there are cases where people cite these platforms for conducting important legal research on important topics of Case pendency and Case-law analysis, etc. This coupled with the stories of judicial corruption in the media has fueled a low level of trust in the Collegium. Current research is highly fragmented and is powered by data provided by some closed source tools which makes it extremely difficult to validate and conduct reproducible research


We envisage an ‘Open Judicial Data Platform’ that makes it easy for researchers to get access to a range of information - making it possible to research about the oldest case while still accessing the latest court judgements - and takes the burden of data cleaning off their shoulders, thereby ensuring that they spend their time building the narrative. By building data tools on top of this data, we close the information loop by making it easier to digest these research pieces by other stakeholders, eventually increasing their scope to participate in the legal process.


I would like to share some insights:

  1. On the process of creating this platform with our partners including legal researchers, lawyers and data scientists
  2. overcoming the barrier of understanding the legal space
  3. handling data and tech challenges using open tools and frictionless data packages
  4. and making the platform available to a diverse set of user classes.

As one of the use cases of the platform, I would also like to demonstrate a case study where we used open source entity recognition tools such as Spacy on the text of legal judgements to understand the juvenile justice activity in the country


Outline/Structure of the Case Study

  1. How judiciary works in the country
  2. The official E-courts data platform
  3. Challenges faced by Indian judiciary
  4. How data (ML/AI) can help solve some of these challenges
  5. Challenges faced by the legal tech ecosystem
  6. Co-Creating the Open Judicial Data Platform
    1. Objective
    2. How we chalked out a path to understand the legal space with our partners
    3. Observations from the PAN India scoping study on lower courts and high courts
    4. Designing the data pipeline (Data Architecture)
    5. Stakeholders
    6. Citizen participation
    7. Creating Data tools ( NLP / ML / AI)
    8. Reproducible legal research
    9. The way ahead (How we plan to involve the community)
  7. Case Study (Demo)
    1. Juvenile justice in the country
      1. Analyzing the case-laws from POCSO (Protection of children from Sexual Offences) Act, 2012
    2. Using Natural language processing tools such as spacy to understand the legal judgements
    3. How can data science and social science be used to understand the low conviction rates in a POCSO case

Learning Outcome

  1. Understand the technicalities and the challenges behind a massive pool of judicial data that is growing daily at an exponential rate
  2. The philosophy of co-creation - How data scientists can partner with experts to co-create important data science tools
  3. Community contribution - How we as citizens can contribute to this legal-data cum tech ecosystem
  4. Our tech cum data stack and strategies to create a near real time data platform
  5. Natural language processing use cases - How can spacy be used out of the box and be tuned to context specific use cases like legal tech.

Target Audience

Data Scientists, Civic Tech enthusiasts, NLP Practitioners, Legal Tech Researchers, HCI enthusiasts

schedule Submitted 6 months ago

Public Feedback

comment Suggest improvements to the Speaker
  • Dr. Vikas Agrawal
    By Dr. Vikas Agrawal  ~  5 months ago
    reply Reply

    Dear Apoorv: Can you please share a video of yourself presenting or introducing the topic?

    It might be interesting to expand the technical component of the talk in terms of the machine learning specific area, and perhaps summarize the other areas to make the talk even more interesting for the ODSC audience.

    Warm Regards,


    • Apoorv
      By Apoorv  ~  5 months ago
      reply Reply
      Hey Vikas,

      Pardon for a late reply, I missed this comment of yours. Just want to make sure that the review is still in progress so please let me know if there is still time for me to upload the video ?

      On Mon, May 6, 2019 at 2:47 PM ODSC India 2019 <> wrote:
      Dear Apoorv,

      Please note that the proposal: Combining Data, Tech and Social Science to understand the Indian Judiciary has received a new comment from Dr. Vikas Agrawal

      Dear Apoorv: Can you please share a video of yourself presenting or introducing the topic?

      Warm Regards,


      Visit to respond to the comment OR simply reply to this email (Please make sure, you delete the previous comment's content from the email before replying.)

      ODSC India 2019 Team
      You have received this important update from ConfEngine on behalf of ODSC India 2019.
      Download ConfEngine's Mobile App to totally personalise your conference experience.
      If you are 100% sure, that this email was sent to you by mistake, please click here to un-subscribe from all future email updates from ConfEngine.

      Apoorv Anand | +91-991-699-9297
      • Deepti Tomar
        By Deepti Tomar  ~  5 months ago
        reply Reply

        Hello Apoorv,

        No problem.

        Yes, the review is still in progress. It would be great to have the video from you at the earliest though.

        Also, request you to respond to Vikas's suggestion on expanding the technical component.



  • Liked Gaurav Godhwani

    Gaurav Godhwani / Swati Jaiswal - Fantastic Indian Open Datasets and Where to Find Them

    45 Mins
    Case Study

    With the big boom in Data Science and Analytics Industry in India, a lot of data scientists are keen on learning a variety of learning algorithms and data manipulation techniques. At the same time, there is this growing interest among data scientists to give back to the society, harness their acquired skills and help fix some of the major burning problems in the nation. But how does one go about finding meaningful datasets connecting to societal problems and plan data-for-good projects? This session will summarize our experience of working in Data-for-Good sector in last 5 years, sharing few interesting datasets and associated use-cases of employing machine learning and artificial intelligence in social sector. Indian social sector is replete with good volume of open data on attributes like annotated images, geospatial information, time-series, Indic languages, Satellite Imagery, etc. We will dive into understanding journey of a Data-for-Good project, getting essential open datasets and understand insights from certain data projects in development sector. Lastly, we will explore how we can work with various communities and scale our algorithmic experiments in meaningful contributions.

  • Liked Akash Tandon

    Akash Tandon - Traversing the graph computing and database ecosystem

    Akash Tandon
    Akash Tandon
    Data Engineer
    schedule 6 months ago
    Sold Out!
    45 Mins

    Graphs have long held a special place in computer science’s history (and codebases). We're seeing the advent of a new wave of the information age; an age that is characterized by great emphasis on linked data. Hence, graph computing and databases have risen to prominence rapidly over the last few years. Be it enterprise knowledge graphs, fraud detection or graph-based social media analytics, there are a great number of potential applications.

    To reap the benefits of graph databases and computing, one needs to understand the basics as well as current technical landscape and offerings. Equally important is to understand if a graph-based approach suits your problem.
    These realizations are a result of my involvement in an effort to build an enterprise knowledge graph platform. I also believe that graph computing is more than a niche technology and has potential for organizations of varying scale.
    Now, I want to share my learning with you.

    This talk will touch upon the above points with the general premise being that data structured as graph(s) can lead to improved data workflows.
    During our journey, you will learn fundamentals of graph technology and witness a live demo using Neo4j, a popular property graph database. We will walk through a day in the life of data workers (engineers, scientists, analysts), the challenges that they face and how graph-based approaches result in elegant solutions.
    We'll end our journey with a peek into the current graph ecosystem and high-level concepts that need to be kept in mind while adopting an offering.

  • Liked Deepthi Chand

    Deepthi Chand / Shreya Agrawal - Samantar, an open assistive translation framework for Indic Languages

    45 Mins
    Case Study

    India is a land of many languages. There are 23 official and much more unofficial languages prevalently used in day-to-day conversations. Unfortunately, information dissemination to the low resource languages get difficult because of the geo-spatial distances. Popular translation platforms helped to fill this gap in major languages but their efficiency is challenged by the lack of availability of proper datasets and their generic nature. This problem is very evident when more domain information gets involved.

    We present Samantar, an open translation suggestion framework targeted at Indian languages. Samantar is built with open parallel corpora and opensource technologies. The translations can be tuned to suggest according to different target domains.