Methods, apparatuses, and devices for generating word vectors

    公开(公告)号:US11030411B2

    公开(公告)日:2021-06-08

    申请号:US16883184

    申请日:2020-05-26

    IPC分类号: G06F40/295 G06N3/04 G06N3/08

    摘要: Implementations of the present specification disclose a method for generating word vectors, apparatus, and device. The method includes: obtaining words by segmenting a corpus; establishing a feature vector of each obtained word based on n-ary characters; training a convolutional neural network based on the feature vectors of the obtained words and the feature vectors of context words associated with each obtained word in the corpus; and generating a word vector for each obtained word based on the feature vector of the obtained word and the trained convolutional neural network.

    Logistic regression gradient calculation method and apparatus

    公开(公告)号:US10970596B2

    公开(公告)日:2021-04-06

    申请号:US16043043

    申请日:2018-07-23

    发明人: Jun Zhou

    摘要: The present disclosure provides logistic regression gradient calculation methods and apparatuses. One exemplary calculation method comprises: acquiring training data, the training data including X-row user data and Y-row click-through data corresponding to the X-row user data; converting the X-row user data into X-column data; segmenting the X-column data and a weight vector to form N X-column data segmentation blocks and N weight vector segmentation blocks; starting N threads respectively to generate N sub-logistic regression gradients according to the N X-column data segmentation blocks, the N weight vector segmentation blocks, and the corresponding Y-row click-through data; and splicing the N sub-logistic regression gradients to form a full logistic regression gradient. With embodiments of the present disclosure, a computing machine can support training of a super-large-scale logistic regression model, which increases the calculation speed, shortens the training time, and greatly reduces the memory usage of the computing machine.

    Method and device for processing short link, and short link server

    公开(公告)号:US10798056B2

    公开(公告)日:2020-10-06

    申请号:US16019897

    申请日:2018-06-27

    发明人: Jun Zhou

    摘要: Techniques for navigating webpages requested through short links are provided. In some implementations, a short link uniform resource locator (URL) is received, the short link URL is processed to extract a simplified short link and an address code, and a determination is made as to whether the simplified short link is associated with a long link URL representing an address of a webpage. In response to determining that the simplified short link is associated with a long link URL, the associated long link URL is provided. In response to determining that the simplified short link is not associated with a long link URL, a common long link URL associated with the address code is provided.

    Short link processing to improve service efficiency

    公开(公告)号:US10789321B2

    公开(公告)日:2020-09-29

    申请号:US16019228

    申请日:2018-06-26

    发明人: Jun Zhou

    摘要: A server receives a short link application from a requester. The short link application includes a long link uniform resource locator (URL). The server obtains a database identifier based on the long link URL. The server determines whether a database associated with the database identifier is accessible by the server. In response to a determination that the database associated with the database identifier is accessible by the server, the server obtains a short link URL associated with the long link URL from the database, and transmits the short link URL to the requester.

    METHOD AND DEVICE FOR VIRTUAL RESOURCE ALLOCATION, MODELING, AND DATA PREDICTION

    公开(公告)号:US20200097329A1

    公开(公告)日:2020-03-26

    申请号:US16697913

    申请日:2019-11-27

    发明人: Jun Zhou Xiaolong Li

    IPC分类号: G06F9/50 G06N20/00 G06F17/16

    摘要: Evaluation results of a plurality of users are received from a plurality of data providers. The evaluation results are obtained by the plurality of data providers evaluating the plurality of users based on evaluation models of the plurality of data providers. A plurality of training samples is constructed by using the evaluation results. Each training sample includes a respective subset of the evaluation results corresponding to a same user of the plurality of users. A label for each training sample is generated based on an actual service execution status of the same user. A model is trained based on the plurality of training samples and the plurality of labels, including setting a plurality of variable coefficients, each variable coefficient specifying a contribution level of a corresponding data provider. Virtual resources to each data provider are allocated based on the plurality of variable coefficients.

    Automatic Multi-Threshold Feature Filtering Method and Apparatus

    公开(公告)号:US20190042982A1

    公开(公告)日:2019-02-07

    申请号:US16132264

    申请日:2018-09-14

    IPC分类号: G06N99/00 G06F17/30

    摘要: An automatic multi-threshold feature filtering method and an apparatus thereof are provided. In an iterative process of training a machine learning model, the feature filtering method calculates a feature filtering threshold and feature correlation values of a current round of iteration based on a result of a previous iteration, and performs feature filtering on samples based on the calculated feature filtering threshold and the calculated feature correlation values. The feature filtering apparatus of the present disclosure includes a calculation module and a feature filtering module. The method and apparatus of the present disclosure can automatically generate different feature filtering thresholds at each iteration, which greatly improves an accuracy of a filtering threshold, and can greatly increase the training speed of automatic machine learning and an accuracy of a machine learning model compared with fixed and single thresholds nowadays.

    Method and system for training machine learning system

    公开(公告)号:US12026618B2

    公开(公告)日:2024-07-02

    申请号:US18342204

    申请日:2023-06-27

    发明人: Jun Zhou

    摘要: The present disclosure provides a method and a system for training a machine learning system. Multiple pieces of sample data are used for training the machine learning system. The method includes acquiring multiple sample sets, each sample set including sample data in a corresponding sampling time period; setting a sampling rate for each sample set according to the corresponding sampling time period; acquiring multiple sample sets sampled according to set sampling rates; determining importance values of the multiple sampled sample sets; correcting each piece of sample data in the multiple sampled sample sets by using a corresponding importance value to obtain corrected sample data; and inputting the corrected sample data into the machine learning system to train the machine learning system.

    Method and system for training machine learning system

    公开(公告)号:US11720787B2

    公开(公告)日:2023-08-08

    申请号:US16114078

    申请日:2018-08-27

    发明人: Jun Zhou

    摘要: The present disclosure provides a method and a system for training a machine learning system. Multiple pieces of sample data are used for training the machine learning system. The method includes acquiring multiple sample sets, each sample set including sample data in a corresponding sampling time period; setting a sampling rate for each sample set according to the corresponding sampling time period; acquiring multiple sample sets sampled according to set sampling rates; determining importance values of the multiple sampled sample sets; correcting each piece of sample data in the multiple sampled sample sets by using a corresponding importance value to obtain corrected sample data; and inputting the corrected sample data into the machine learning system to train the machine learning system.