A Deep Data Dive into China’s Judicial System

With support from the Parker School Global Innovation Awards, Ben Liebman and a team of students have constructed an online database of more than 1 million public documents.

Lawsuits by individuals seeking assistance from the police to block forced takings of property. Letters from Chinese courts to other government agencies to refer suspected criminal conduct. Documents reducing the sentences of incarcerated individuals. Hundreds of thousands of routine civil cases.

These are some of the Chinese court decisions examined in an ongoing research project led by Columbia Law School Professor Benjamin L. Liebman that offers an unprecedented look into the country’s rapidly changing judicial system. Over the past 18 months, Liebman—a renowned scholar of Chinese law—and a team of students from Columbia Law School and the Columbia Department of Computer Science have constructed a database of more than 1 million public documents posted online by courts in Henan, a central Chinese province that is home to nearly 100 million people. Now, as a recipient of the Parker School of Foreign and Comparative Law Global Innovation Award, which supports innovative research and teaching on foreign, international, or transnational legal issues, Liebman will be able to expand his team’s inquiry to millions of additional opinions.

Liebman’s project takes advantage of recent efforts by the Chinese government and judiciary to make judicial opinions available online. The project, which is being conducted together with postdoctoral fellow Alice Z. Wang ’16, Professor Rachel Stern of the University of California, Berkeley, School of Law, and Professor Margaret Roberts of the University of California, San Diego. The germ of the idea came about during an informal conversation between Liebman and Wang when she was a second-year student at the Law School. The full research team (pictured above, left to right) includes: Michael Jia ’19; Tiffany Young ’19; Ying Wang ’20 CC; Alice Wang ’16; Zicheng Xu ’18 SEAS; Chuan Tian ’18 SEAS; and Yingting Fu ’19 (not pictured).

“What happens in China matters to the world,” said Liebman, the Robert L. Lieff Professor of Law and director of the Law School’s Center for Chinese Legal Studies. “The world has long been focused on the question of whether Chinese courts are able to deliver justice. So this is important to policymakers in Washington; it’s important to business people; and it’s important to people in China.”

Although scholars in China are also studying the online court decisions, Liebman’s project is unique. The project examines not just cases made public, but also what is and what is not made public. Unlike other attempts to use the large volume of data now being made public by China’s courts, the Henan Database project will combine analysis of large amounts of data with on-the-ground qualitative research. The project also brings together scholars from political science, law, and computational social science to use computational tools to study Chinese court judgments. For example, an initial forthcoming paper uses a computer science method known as “topic modeling” to analyze more than 30,000 administrative decisions—lawsuits against the government. In the technique, unsupervised machine learning is used to identify patterns of text that are likely to appear together. Liebman said that the technique is already paying dividends.  “Our topic model identifies trends in administrative litigation that in the past were largely below the surface or unobserved,” he explained. “The topic model also excels at generating future questions for on-the-ground research.” 

“The use of computational tools to study large volumes of court opinions is novel everywhere,” Liebman continued. “Our hope is that the upsides from this project won’t just be for those studying China, but also will move forward the field of computational tools to study court opinions.”

The real value of the database is its sheer size, although there are gaps—not all courts put everything online, an issue Liebman called the “missingness” problem. Still, “there’s no limit to the academic questions you could ask,” he said.

In addition to Wang—who majored in electrical engineering as an undergraduate and worked as a coder putting U.S. court documents online before starting at the Law School—Liebman has benefitted from the contributions of several other students who are fluent in Chinese and have strong computer science skills.

That he has been able to easily find such smart and skilled students is “a real testament to the strength of the Columbia Law School student body,” he said.

# # #

Posted April 26, 2017