Unified virtual memory greatly simplifies GPU programming, but it introduces huge address translation overhead. To reduce this overhead, modern GPUs utilize the translation lookaside buffer (TLB) to accelerate the address translation process. However, the benefit of TLB is far from achieving optimal performance. In this work, we find that GPU performance deficiency mainly stems from the private property of L1 TLBs. First, there exist a lot of duplicate page table entries among L1 TLBs, which induces insufficient space utilization. Second, the miss rate of L2 TLB is high due to the massive number of requests from L1 TLB miss, which leads to a significant GPU performance degradation. To reduce L1 TLB miss and improve the address translation performance of GPU, we propose a hardware scheme by exploiting an Intra-Group Sharing approach, named IGS-TLB. In IGS-TLB, L1 TLBs are decoupled from the compute units and aggregated into groups. Specifically, there only exist shared L1 TLB entries inside TLB groups that are responsible for non-overlapping address ranges. This greatly eliminates duplicate page table entries in L1 TLBs and significantly reduces the request number of L1 TLB misses. Our evaluation on a wide set of GPU workloads shows that IGS-TLB can effectively reduce L1 TLB miss rate and the L2 TLB traffic, speeding up the GPU performance by 20.5% on average.