A common VAT tax evasion strategy in low compliance environments involves fraudulent “paper” firms that allow other firms to reduce their tax liabilities. Identifying these fraudulent firms can potentially help recover lost tax revenue, but locating them is a problem for tax administrators with weak legal and enforcement capacity. Machine learning methods on available tax returns data can facilitate finding these fraudulent firms in a cost-effective manner.
- Does better targeting of inspections lead to increased tax revenue for the state?
- What is the bottleneck – locating these firms, collection efforts, or corruption?
- In the long term, will improved audits deter evasion, creating an environment of better tax compliance?
Improving the state’s ability to tax effectively is increasingly seen as central to the development process and value added tax (VAT) is often proposed as a key tool towards accomplishing this goal. However, VAT implementation in many low compliance environments is plagued by firms
generating false paper trails. This demand for false paper trails has led to the creation of fraudulent firms (referred to as “bogus” firms by tax authorities) which issue fake receipts to genuine firms that allow the latter to lower their tax liability.
This pilot study will initiate the first stage of a long-term intervention to improve tax collections in Delhi (India). In this stage, we plan to apply machine learning methods on a large network data set (the universe of all tax returns for five years from Delhi) to identify fraudulent firms and then use on-the-field verification of such guesses to further improve the machine learning algorithm. In the second stage, we plan to implement an RCT with the tax authority that compares the authority’s current method to our data-driven approach towards identifying fraudulent firms.
Updates and related resources
June 2018 – “Who is Bogus?: Using One-Sided Labels to Identify Fraudulent Firms from Tax Returns” by Shekhar Mittal, Ofir Reich and Aprajit Mahajan was presented at the ACM SIGCAS Conference on Computing and Sustainable Societies (COMPASS). View the public version of the paper.