In order to meet the requirements in the real time decoding of high definition(HD) video, an efficient very large-scale integration(VLSI) architecture proper for de-blocking loop filter in audio video coding standard(AVS) was presented. The 8×8 blocks were divided into 4×4 blocks for filtering operations. After centralized process of 4×4 block boundaries, data and filtering operations were performed at the same time by improving filtering order. This architecture can increase the efficiency of pipelining and operating data in the SRAM, thereby highly reducing the total clock cycles of filtering process. Experiment results show that only 196 clock cycles are needed to finish filtering a macro-block for de-blocking filter in AVS. The processing speed increases by 50%. When the maximum frequency is 100 MHz, the real time decoding of HD video can be achieved in this architecture.