A year ago, the Chinese company Biren Technology (Shanghai) announced the release of its graphics chips aimed at artificial intelligence and high-performance computing. The processors were supposed to leave the assembly line in the third quarter of 2021, and go on sale in the first half of 2022.
See also: BEST LAPTOP FOR MACHINE LEARNING, DEEP LEARNING AND AI DEVELOPMENT

Despite a slight delay, in early August, the company announced the imminent start of sales of its BR100 general purpose GPU. It should set a world record for computing power: at its peak, the latter can reach a petaflop level (on eight-bit integer operations), and in terms of other characteristics, the processor is comparable to existing competitor solutions and even surpasses them.
BR100 uses the original "Bi Liren" hardware architecture developed internally by Biren Technology, and is the first Chinese GPU built with PCIe 5.0 chiplet technology and CXL interconnect protocol. It is also known that the processor is manufactured using 7nm technology and contains 77 billion transistors according to the design of TSMC 2.5D CoWoS. There is also 300MB of onboard cache on board, and the GPU can be equipped with up to 64GB of HBM2e memory with a bandwidth of 2.3TB/s.
The official release of the BR100 marks the first time that a Chinese company has broken (prior to independent tests, we believe on paper) the world general-purpose GPU processing power record previously held by international giants like NVIDIA. A noteworthy nuance: the company was founded only in 2019, and already in March 2021 received funding for more than 730 million dollars - and is the Chinese "unicorn". That is, only three years have passed before the release of the first product.
Although there is nothing to be surprised here: for example, the former head of AMD's Chinese research center and at the same time the former vice president of AMD development, and before that, S3 development director Allen Lee is one of the company's leaders. Biren Technology co-founder and GPU product line CEO Jiao Guofan is a well-known technology leader in the industry, and as head of the GPU group, he developed 5 generations of classic Qualcomm Adreno architecture. His other colleagues in the company also have impressive experience.
See also: Data science from scratch first principles with Python
In addition to the chip, Biren Technology also introduced the OAM (OCP Acceleration Module) server Haixuan, its OAM module WallWait 100, and the BIRENSUPA software platform. By installing up to 8 OAM modules, the server will provide up to 8 PFLOPS of peak computing power, and BIRENSUPA, which contains a complete set from drivers and compilers to acceleration libraries and a set of tools, will allow you to fully unlock the potential of the GPU. BIRENSUPA supports the main deep learning frameworks.
A simplified version of the BR104 chip is presented for the mass market (the use of chiplets allows the production of both older and younger models on the same line), its performance is about half that of the BR100. According to the declared characteristics, it also surpasses the flagships of international manufacturers, and the power consumption of solutions based on this chip should not exceed 300 W.

For example, the NVIDIA RXT A5500 delivers about 35 TFLOPS in single precision (FP32) operations, while the BR104 should reach 128 TFLOPS. New solutions from the "green" ones based on the Ada Lovelace architecture, according to preliminary tests, will be about twice as fast, but also consume up to 450 watts.
In terms of time to market, the company said the following: BR104 is already available to some Chinese manufacturers for testing and in the coming months solutions based on it will be put into mass production. For example, Wallen Technology announced the release of its BR104-based product in PCIe format, the Wallace 104. The card is equipped with 32 GB of HBM2E memory at 819 GB / s and a PCIe 5.0 x16 interface with CXL support. The card will also support 32 HEVC/H.264 encoding channels and 256 HEVC/H.264 decoding channels.
The Haixuan OAM server is undergoing extensive internal testing and is scheduled to be available to partners in the fourth quarter of this year. Companies such as Ping An Technology, China Mobile and leading Chinese universities are also planning to use Biren's solutions.
Suggested book: Deep learning for coders with FastAI and PyTorch
Let's anticipate the obvious question: it is still unknown whether these solutions will appear in Russia and how much they can cost us. China first introduces all the latest developments in the domestic market, primarily in government agencies, and entering the foreign market is seen as an additional opportunity for commercialization.
But if you are interested, we will talk about the history of Chinese GPUs, try to understand the current situation in the industrial markets and try to predict together with you what awaits us in a couple of years.